Posted in python, Research tools

Introducing BENEATH as a data warehouse platform

Data warehousing improves data accessibility to multiple datasets, increasing the speed and efficiency of data analysis. By storing and categorising multiple processed datasets in one specified location, this allows data scientists to quickly query and analyse multiple datasets. It also facilitates advanced analysis like predictive modelling and machine learning.

Here, I introduce BENEATH as a data warehouse platform to store multiple datasets. There are several features which I find this platform cool:

  1. User-friendly interface allows ease of depositing dataframes
  2. Ability to use the SQL programming language to do various kinds of data query
  3. Python commands are adequate to import and export files, meaning that no background in other programming languages is required
  4. Easy to set permissions. One command allows other users to create or view dataframes. You can also have the option to make your database public.
  5. Free of charge (unless your datasets are huge)
  6. Responsive community to accelerate troubleshooting

I will break down the individual steps involved in depositing a dataframe in BENEATH:

First, install BENEATH by executing the below command in the Jupyter Notebook (you only need to do this once):

pip install --upgrade beneath

Next, set up an account in Thereafter, under the command line (Terminal), execute the command below and follow the subsequent instructions to authenticate your account.

beneath auth

Thus far, we have installed the important packages and executed the authentication procedures. We are now ready to deposit any dataframe into BENEATH. In this example, we will import a .csv file (named test) into Python using the below command in a Jupyter Notebook. The dataframe is saved under the variable “df”

import csv
import numpy as np
import pandas as pd
df = pd.read_csv('/Users/kuanrongchan/Desktop/test.csv')

Output file is as follows:


Depending on your username (in my case is kuanrongchan), we can create a folder to store this dataframe. In this example, we will create a folder under the name: vaccine_repository, to store this dataframe. You will have to execute the code under the command line (Terminal):

beneath project create kuanrongchan/vaccine_repository

Finally, to deposit your dataset, execute the below command. The table_path will direct the dataframe to the assigned folder . You can then set the index under “key” command. In this case, I have assigned subject as the index key, which will allow you to query subject IDs quickly in future. You can also add a description to your file to include any other details of this dataset.

import beneath
await beneath.write_full(
    description="my first test file"

Now, for the cool part: You can execute a simple command to quickly import this dataframe into Jupyter Notebook for data analysis:

df = await beneath.load_full("kuanrongchan/vaccine-repository/test")

To share this folder with a friend, you can execute the below command. In this case, assume my friend BENEATH username is abc.

beneath project update-permissions kuanrongchan/vaccine_candidates abc --view --create

Multiple datasets can be easily deposited in the platform by using the above described codes. However, the current limitation of BENEATH is the lack of data visualisation tools, which means that the graphs will have to be processed in Jupyter Notebooks. The developers are currently working on this particular aspect, which should make BENEATH a great data warehouse platform for data scientists.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s