Posted in Data visualisation, IPyWidgets, python

Making Jupyter notebooks interactive with IPyWidgets

There are many instances where scientists and data scientists will have to identify cut-offs before deeper data analyses. For instance, what is the fold change and adjusted p-value threshold that is most appropriate for identification of DEGs? For pre-rank pathway analysis, what is an appropriate cut-off for enrichment scores to be considered significant? For RNAseq, what is the minimum counts that we should filter out without trimming away too many transcripts? Executing these tasks can take a long time, even with coding because you will need multiple lines of coding to identify the most suitable cut-off.

Consider that you are unsure what is an appropriate cut-off for identification of DEGs. The usual way will be to probably do a query on the number of DEGs identified when fold-change is at least 1.5 with adjusted p-value < 0.05. Imagine you identified only 30 DEGs amongst 20,000 genes. This may indicate that your cutoff may be too stringent, and this time, you decide to adjust to fold-change of at least 1.2 with adjusted p-value < 0.05. Again, you will require another line of code. Imagine this time you found 1000 genes that satisfy the criteria and now you think this number is now too many for pathway analysis. Again, this means you might need another code to query fold-change of at least 1.3, and the cycle repeats until you find that suitable cutoff.

To circumvent this issue, I introduce IPython widgets (IPyWidgets), which turn Jupyter Notebooks from static documents into interactive dashboards, perfect for exploring and visualizing data. To install, simply type the command:

pip install ipywidgets

We first import the packages and load a dataset. Here, I load a dataset from Zak DE et al. from my BENEATH account. The study looks into the transcriptomic responses in human subjects across time, after taking the MRKAd5/HIV vaccination, and the details have been previously elaborated in my blog:

import csv
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import beneath
df = await beneath.load_full("kuanrongchan/vaccine-repository/ad5-seroneg-daniel-2012")
df.info()
df.head(2)

The output for the header columns is as follows. fc indicate fold-change, pval indicate p-value with respect to day 0 and qval indicate adjusted p-value with respect to day 0. The time-points considered are 6hrs, 1, 3 and 7 days after vaccination.

genefc006hpval006hqval006hfc001dpval001dqval001dfc003dpval003dqval003dfc007dpval007dqval007d@meta.timestamp
0A1BG-1.134080.1577810.500370-1.037426.099120e-010.804213-1.139440.1430990.678669-1.048680.5899130.9975792021-07-16 00:24:39.972000+00:00
1A1CF-1.041990.3837500.6388801.074756.557900e-020.1701181.019200.6860740.929992-1.013690.7725000.9975792021-07-16 00:24:39.972000+00:00

We will use the following command to make an interactive slider. Assuming we are fixing adjusted p-value to be less than 0.05 and want to find out the number of DEGs based on a fold-change cut-off:

import ipywidgets as widgets
from ipywidgets import interact, interact_manual

@interact
def show_DEGs(column1=['fc001d','fc003d','fc007d'], x=(0,3,0.1),column2=['qval001d','qval003d','qval007d']):
    return len(df[(abs(df[column1]) > x) & (df[column2] < 0.05)].index)

Output is as follows:

With the @interact decorator, the IPyWidgets library automatically gives us a text box and a slider for choosing a column and number! In this case, we can move the slider from 0 to 3, and for the different time-points (6hrs, 1, 3 and 7 days post-vaccination) to see the number of upregulated DEGs for different fold-change cut-offs. The output shown has the slider moved to 1.3 and 1.5, which yielded 1510 and 905 DEGs respectively.

You can also use the @interact function with .loc function to call out the genes and expression values that satisfy the criteria designated by your slider:

Finally, you can also use the IPyWidgets to do multiple comparisons, especially using the dropdown widget. For example, if you want to quickly find out if the transcript expression at two time-points are correlated, you can use IPyWidgets to create the dropdown menus for multiple comparisons:

@interact
def correlations(column1=list(df.select_dtypes('number').columns),
                 column2=list(df.select_dtypes('number').columns)):
    print(f"Correlation:{df[column1].corr(df[column2])}")

The output is two dropdown lists, that would allow you to quickly evaluate the correlation of any 2 variables in the dataframe:

The potential of IPyWidgets is limitless, allowing you to scan through data very quickly, and brings some life to the static Jupyter Notebook. You can also use this for graph plotting, scanning through files etc. The possibilities are endless… 🙂