Posted in Data visualisation, python

Making interactive and executable notebooks/dashboards with Lux and Voilà packages

As described in my previous blog posts, pair plots and correlation matrices are useful to quickly analyse relationships between multiple variables. However, when analysing the datasets with more than 10-20 variables, these analysis methods have their limitations. One critical limitation is the difficulty in assessing the effects of 2 or more categorical variables. Another is the difficulty in visualising a large number of variables plotted altogether in one plot.

Here, I propose two python packages that will make graph plotting less tedious. First is the Voilà package, that supports Jupyter interactive widgets and second is the Lux package, which is integrated with a Jupyter interactive widget that allows users to quickly browse through large collections of data directly within their Jupyter notebooks. These packages are useful in displaying data interactively, which makes them highly versatile for fast experimentation with data. I will share the features of these 2 packages as I run through the procedures for running these packages:

First, we need to install the packages. To install Voilà, execute the following command:

pip install voila

Note that this command will also be automatically be installed as an extension if you have version 3 of JupyterLab installed:

Next, to install Lux, enter the following command in terminal:

pip install lux-api

With these packages, you are now ready to use these applications. We will import all the necessary packages and use the iris dataset from PlotlyExpress as a proof of concept.:

import csv
import lux
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
df = px.data.iris()

The output file is as follows:

sepal_lengthsepal_widthpetal_lengthpetal_widthspeciesspecies_id
05.13.51.40.2setosa1
14.93.01.40.2setosa1
24.73.21.30.2setosa1
34.63.11.50.2setosa1
45.03.61.40.2setosa1

Next, launch Voilà in the Jupyter Notebook extension (indicated in arrow):

This should allow you to toggle between Pandas and Lux freely (indicated in arrow):

The Lux package automatically performs a pairwise plot against all variables in your dataframe, allowing you to focus on the scatterplots that show correlated relationship. Under the “Distribution” tab, you can quickly visualise the distribution of the data:

For the correlation graph, you can click onto the plots that you are interested in, and click the magnifying glass which indicates “intent.” This will then zoom into the graph further, allowing you to see how the other variables (in this case, species), interact with your scatterplot. This immediately provides you with a 3-dimensional view of the dataset.

As you can begin to appreciate, the Voilà package is a powerful tool to plot interactive widgets directly within your local address. You can even publish it online, allowing other users to use the interactive widgets to explore the dataset according to their research intent. Combined with the Lux package, this allows users, even those without bioinformatics background, to zoom into the interesting scatterplots for further analysis.

Posted in python

Executing Excel functions in Jupyter Notebooks using Mito

Mito

Data scientists often need to stratify their data using “IF” function. For instance, what is the difference of immune responses in the young and elderly after vaccination? In this case, you would filter the young by a pre-defined age cut-off (less than 65 years old). Thereafter, the data can be sorted in descending order using fold change, so as to determine the genes that are most differentially regulated. Another filtering can be used to allow identification of differentially expressed genes (DEGs), based on a pre-determined fold change and p-value cut-off. Then, another possibility may arise, where you may wonder if the dataset is influenced by gender differences, and how gender interacts with age to influence vaccine responses. This will require even more filtering of the dataset to probe into these interesting biological questions…

As you can imagine, there are many instances of filtering and comparisons to perform, especially when analysing a human dataset, due to large variations between individuals. Microsoft Excel is a useful tool to do many of these filtering and aggregation functions. However, a major disadvantage is that this consequently generates many Excel files and Excel sheets which can become difficult to manage. Python can be used to perform these filtering functions easily, but this will usually require typing of long lists of codes which can be potentially time-consuming. Hence, I recommend a python package named Mito that can accelerate execution of these filtering functions.

To install the package, execute the following command in the terminal to download the installer:

python -m pip install mitoinstaller

Ensure that you have Python 3.6 or above to utilise this package. Next, run the installer in the terminal:

python -m mitoinstaller install

Finally, launch JupyterLab and restart your Notebook Kernel:

python -m jupyter lab

To run the package in Jupyter Lab:

import mitosheet
mitosheet.sheet()

Output is as such:

The features of a mitosheet includes:

1. Import dataset
2. Export dataset
3. Add columns (you can insert formulas here)
4. Delete columns that are unnecessary
5. Pivot tables allows you to transpose and query the dataframe differently
6. Merge datasets. Possible if you have 2 excel files with a shared column.
7. Basic graph features, using Plotly
8. Filter data
9. Different excel sheets and for inserting new sheets

Most excitingly, Mito will also generate the equivalent annotated Python codes for each of these edits, meaning that you can easily use the commands to do further analysis in python. Overall, Mito is a user-friendly tool that allows you to perform excel functions within Python. This helps the workflow to be much more efficient as we do not have to import multiple excel files into Jupyter notebooks. Critically, typing of long codes is not required as the Python codes are automatically generated as you plow through the data.

Posted in python

Saving python files for scientific publications

High resolution image files are essential for scientific publications. I personally prefer to use Adobe Illustrator to make my scientific figures, as the created files are in a vector-based format, with infinitely high resolution. To export graph files made by Matplotlib, you can execute the code below after plotting your graph in Jupyter Notebooks:

plt.savefig('destination_path.eps', format='eps')

Saving the file directly in a .eps format allows you to be able to transfer your file directly from Jupyter Notebooks into Adobe Illustrator. Moreover, you can manipulate the files easily within Adobe Illustrator, without any loss in figure resolution. Hope you like this TGIF tip! 🙂

Posted in Data visualisation, Pathway analysis, python

Dot plots as a visualisation tool for pathway analysis

As described in my previous blog post, heatmaps allow quick visualisation of various measurements between samples. The magnitude differences are usually represented as hue or colour intensity changes. However, if you want to include another parameter, this can be challenging. Imagine the scenario where you identified 5 Gene Ontology Biological Pathways (GOBP) which are significantly different between the infected and uninfected samples over a course of 3 days. To plot them on a graph, you can choose to negative log-transform the adjusted p-values and then plot a heatmap as shown below:

However, if you want to also display the combined score from your EnrichR analysis, you will have to plot another heatmap:

As shown in the example above, you will need 2 figure panels to fully describe your pathway analysis. A more elegant way to display these results could thus be to use a dot plot. In simple terms, dot plots are a form of data visualisation that plots data points as dots on a graph. The advantage of plotting data points in dots rather than rectangles in heatmaps is that you can alter the size of the dots to add another dimension to your data visualisation. For instance, in this specific example, you can choose to display the p-values to be proportional to the size of the dots and the hue of the dots to represent enrichment score. This also means you only need one graph to fully represent the pathway analysis!

Dot plots can be easily plotted in Python, using either the Seaborn package or the Plotly Express package. I personally prefer the Plotly Express package as the syntax is simpler and you can mouse over the dots to display the exact values. To plot the dot plot, we first load the standard packages:

import csv
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px

We then load a ‘test’ dataset from my desktop, into a format where columns will contain timepoints, pathway terms, negative logP values and combined scores. It is also a good habit to convert the timepoint to a “string” datatype so that the x-axis does not include the default time-points such as 1.5 and 2.5.

df = pd.read_csv('/Users/kuanrongchan/Desktop/test.csv')
df['timepoint'] = pd.Series(df['timepoint'], dtype="string")
df.head(5)

Output is as follows:

timepointTermadjusted_pvalCombined_scoreneg_logP
01Defense Response To Virus3.942000e-2587.53124.404283
12Defense Response To Virus3.940000e-27875.31026.404283
23Defense Response To Virus5.000000e-022.0001.301030
31Defense Response To Symbiont2.256000e-2595.55524.646661
42Defense Response To Symbiont2.260000e-27955.55026.646661

Finally, the dot plot can be plotted using the following syntax:

fig = px.scatter(df, x="timepoint", y="Term", color="Combined_score", size="neg_logP", color_continuous_scale=px.colors.sequential.Reds)

fig.show()

Output is a dotplot, where size is proportional to the -log p-value and the colour intensity. You can choose to customise your colours available at this website:

Because of the ability of the dot-plot to add another dimension of analysis, most pathway analysis are presented as dot-plots. However, I am sure there are other scenerios where dot plots can be appropriately used. Next time, if you decide to plot multiple heatmaps, do consider the possibility of using dot-plots as an alternative data visualisation tool!

Posted in Data visualisation, pair plots, python

Visualising multidimensional data with pair plots

Pairplots can be effectively used to visualise relationship between multiple variables, which allows you to zoom into the interesting aspects of the dataset for further analysis

As discussed in my previous blog entry, correlation coefficients can measure the strength of relationship between 2 variables. In general, correlation coefficients of 0.5 and above is considered strong correlation, 0.3 is moderate and 0.1 is considered weak. To quickly visualise the relatedness between multiple variables, correlation matrix can be used. However, there are some limitations. For instance, the correlation matrix does not provide data on the steepness of slope between variables, nor does it display the relationship if the variables are not linearly correlated. In addition, the correlation matrix also cannot inform how categorical variables can potentially interact with these other continuous variables.

Let me provide you with an example to illustrate my case. Consider you want to understand whether the flower sepal length, sepal width, petal length, petal width are correlated. Plotting a correlation matrix for these 4 variables will allow you to visualise the strength of relationship between these continuous variables. However, if you have another categorical variable, for example flower species and you are interested to understand how flower species interacts with all these different parameters, then merely plotting a correlation matrix is inadequate.

To get around this problem, pair plots can be used. Pair plots allows us to see the distribution between multiple continuous variables. In Python, you can even use colours to visualise how the categorical variables interact with the other continuous variables. Surprisingly, plotting pair plots in Python is relatively straightforward. Let’s use the iris dataset from Plotly Express as a proof of concept.

We first load the required packages, the iris dataset from Plotly Express, and display all the variables from the dataset with the head() function:

# Load packages
import csv
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# Load iris dataset from plotlyexpress
df = px.data.iris()
df.head()

The output file looks like this, with sepal length, sepal width, petal length, petal width as continuous variables and flower species as the categorical variable

sepal_lengthsepal_widthpetal_lengthpetal_widthspeciesspecies_id
05.13.51.40.2setosa1
14.93.01.40.2setosa1
24.73.21.30.2setosa1
34.63.11.50.2setosa1
45.03.61.40.2setosa1

To visualise the correlation between continuous variables, the pair plot can be quickly plotted with a single line of code. In addition, we can annotate the different flower species with different colours. Finally, rather than just plotting the same scatterplot on both sides of the matrix, we will also plot the lower half of the matrix with the best fit straight line so we can visualise the slope of the relationship. The code used is thus as follows:

g = sns.pairplot(df, vars = ["sepal_length", "sepal_width", "petal_length", "petal_width"], dropna = True, hue = 'species', diag_kind="kde")
g.map_lower(sns.regplot)

Output is as follows:

Isn’t it cool that you can immediately see that the variables are strongly correlated with each other, and these parameters are dependent on the flower species? As you may already appreciate, the pair plot is a useful tool to visualise and analyse multi-dimensional datasets.

Posted in python

Converting strings to list in Python

Lists are used to store multiple items in a single variable. Grouping them together in a single list tells Python that these variables should be analysed as a group, rather than as individual entries. However, converting a string of items to a single list may not be an easy task. In the past, my solution was to copy the item list into a new notebook, and use the “find and replace” function to convert delimiters to “,”. Recently, I chanced upon an article that describes how you can elegantly do this task in a single line of code. Imagine you have a list of genes from EnrichR pathway analysis, and the genes are separated by the semicolon (;). You can convert these genes to a list by executing with a single line command:

DEGs = "IFNG;IFNA14;CXCL10;RSAD2;ISG15"
Gene_list = DEGs.split(";")
print(Gene_list)

Output will then be the list of genes which you can use to query in other datasets:

['IFNG', 'IFNA14', 'CXCL10', 'RSAD2', 'ISG15']

Super cool!

Posted in Data visualisation, IPyWidgets, python

Making Jupyter notebooks interactive with IPyWidgets

There are many instances where scientists and data scientists will have to identify cut-offs before deeper data analyses. For instance, what is the fold change and adjusted p-value threshold that is most appropriate for identification of DEGs? For pre-rank pathway analysis, what is an appropriate cut-off for enrichment scores to be considered significant? For RNAseq, what is the minimum counts that we should filter out without trimming away too many transcripts? Executing these tasks can take a long time, even with coding because you will need multiple lines of coding to identify the most suitable cut-off.

Consider that you are unsure what is an appropriate cut-off for identification of DEGs. The usual way will be to probably do a query on the number of DEGs identified when fold-change is at least 1.5 with adjusted p-value < 0.05. Imagine you identified only 30 DEGs amongst 20,000 genes. This may indicate that your cutoff may be too stringent, and this time, you decide to adjust to fold-change of at least 1.2 with adjusted p-value < 0.05. Again, you will require another line of code. Imagine this time you found 1000 genes that satisfy the criteria and now you think this number is now too many for pathway analysis. Again, this means you might need another code to query fold-change of at least 1.3, and the cycle repeats until you find that suitable cutoff.

To circumvent this issue, I introduce IPython widgets (IPyWidgets), which turn Jupyter Notebooks from static documents into interactive dashboards, perfect for exploring and visualizing data. To install, simply type the command:

pip install ipywidgets

We first import the packages and load a dataset. Here, I load a dataset from Zak DE et al. from my BENEATH account. The study looks into the transcriptomic responses in human subjects across time, after taking the MRKAd5/HIV vaccination, and the details have been previously elaborated in my blog:

import csv
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import beneath
df = await beneath.load_full("kuanrongchan/vaccine-repository/ad5-seroneg-daniel-2012")
df.info()
df.head(2)

The output for the header columns is as follows. fc indicate fold-change, pval indicate p-value with respect to day 0 and qval indicate adjusted p-value with respect to day 0. The time-points considered are 6hrs, 1, 3 and 7 days after vaccination.

genefc006hpval006hqval006hfc001dpval001dqval001dfc003dpval003dqval003dfc007dpval007dqval007d@meta.timestamp
0A1BG-1.134080.1577810.500370-1.037426.099120e-010.804213-1.139440.1430990.678669-1.048680.5899130.9975792021-07-16 00:24:39.972000+00:00
1A1CF-1.041990.3837500.6388801.074756.557900e-020.1701181.019200.6860740.929992-1.013690.7725000.9975792021-07-16 00:24:39.972000+00:00

We will use the following command to make an interactive slider. Assuming we are fixing adjusted p-value to be less than 0.05 and want to find out the number of DEGs based on a fold-change cut-off:

import ipywidgets as widgets
from ipywidgets import interact, interact_manual

@interact
def show_DEGs(column1=['fc001d','fc003d','fc007d'], x=(0,3,0.1),column2=['qval001d','qval003d','qval007d']):
    return len(df[(abs(df[column1]) > x) & (df[column2] < 0.05)].index)

Output is as follows:

With the @interact decorator, the IPyWidgets library automatically gives us a text box and a slider for choosing a column and number! In this case, we can move the slider from 0 to 3, and for the different time-points (6hrs, 1, 3 and 7 days post-vaccination) to see the number of upregulated DEGs for different fold-change cut-offs. The output shown has the slider moved to 1.3 and 1.5, which yielded 1510 and 905 DEGs respectively.

You can also use the @interact function with .loc function to call out the genes and expression values that satisfy the criteria designated by your slider:

Finally, you can also use the IPyWidgets to do multiple comparisons, especially using the dropdown widget. For example, if you want to quickly find out if the transcript expression at two time-points are correlated, you can use IPyWidgets to create the dropdown menus for multiple comparisons:

@interact
def correlations(column1=list(df.select_dtypes('number').columns),
                 column2=list(df.select_dtypes('number').columns)):
    print(f"Correlation:{df[column1].corr(df[column2])}")

The output is two dropdown lists, that would allow you to quickly evaluate the correlation of any 2 variables in the dataframe:

The potential of IPyWidgets is limitless, allowing you to scan through data very quickly, and brings some life to the static Jupyter Notebook. You can also use this for graph plotting, scanning through files etc. The possibilities are endless… 🙂

Posted in python, Research tools

Introducing BENEATH as a data warehouse platform

Data warehousing improves data accessibility to multiple datasets, increasing the speed and efficiency of data analysis. By storing and categorising multiple processed datasets in one specified location, this allows data scientists to quickly query and analyse multiple datasets. It also facilitates advanced analysis like predictive modelling and machine learning.

Here, I introduce BENEATH as a data warehouse platform to store multiple datasets. There are several features which I find this platform cool:

  1. User-friendly interface allows ease of depositing dataframes
  2. Ability to use the SQL programming language to do various kinds of data query
  3. Python commands are adequate to import and export files, meaning that no background in other programming languages is required
  4. Easy to set permissions. One command allows other users to create or view dataframes. You can also have the option to make your database public.
  5. Free of charge (unless your datasets are huge)
  6. Responsive community to accelerate troubleshooting

I will break down the individual steps involved in depositing a dataframe in BENEATH:

First, install BENEATH by executing the below command in the Jupyter Notebook (you only need to do this once):

pip install --upgrade beneath

Next, set up an account in https://beneath.dev/ Thereafter, under the command line (Terminal), execute the command below and follow the subsequent instructions to authenticate your account.

beneath auth

Thus far, we have installed the important packages and executed the authentication procedures. We are now ready to deposit any dataframe into BENEATH. In this example, we will import a .csv file (named test) into Python using the below command in a Jupyter Notebook. The dataframe is saved under the variable “df”

import csv
import numpy as np
import pandas as pd
df = pd.read_csv('/Users/kuanrongchan/Desktop/test.csv')
df.head(2)

Output file is as follows:

subjectab30dab60dab90dtcell7d
010.2847200.0806110.9971330.151514
120.1638440.8187040.4599750.143680

Depending on your username (in my case is kuanrongchan), we can create a folder to store this dataframe. In this example, we will create a folder under the name: vaccine_repository, to store this dataframe. You will have to execute the code under the command line (Terminal):

beneath project create kuanrongchan/vaccine_repository

Finally, to deposit your dataset, execute the below command. The table_path will direct the dataframe to the assigned folder . You can then set the index under “key” command. In this case, I have assigned subject as the index key, which will allow you to query subject IDs quickly in future. You can also add a description to your file to include any other details of this dataset.

import beneath
await beneath.write_full(
table_path="kuanrongchan/vaccine_repository/test",
    records=df,
    key=["subject"],
    description="my first test file"
)

Now, for the cool part: You can execute a simple command to quickly import this dataframe into Jupyter Notebook for data analysis:

df = await beneath.load_full("kuanrongchan/vaccine-repository/test")

To share this folder with a friend, you can execute the below command. In this case, assume my friend BENEATH username is abc.

beneath project update-permissions kuanrongchan/vaccine_candidates abc --view --create

Multiple datasets can be easily deposited in the platform by using the above described codes. However, the current limitation of BENEATH is the lack of data visualisation tools, which means that the graphs will have to be processed in Jupyter Notebooks. The developers are currently working on this particular aspect, which should make BENEATH a great data warehouse platform for data scientists.

Posted in correlation, Data visualisation, python

Correlation matrix – A useful tool to visualise relatedness between multiple variables

Correlations are a great way to show if the variables are associated with each other. The strength of correlation is often represented by the correlation coefficient, where values range between -1.0 and 1.0. A positive value indicates positive association while a negative value indicates negative association. While there are no fixed definition, a correlation value larger than 0.7 or smaller than -0.7 is usually considered as a strong correlation. To quickly visualise the correlation between multiple variables, correlation matrixes can be used. In my opinion, the Python programming language allows individuals to most quickly plot these matrices, which is what I will elaborate in this blog entry.

First, we load the dataset and packages into python, where I have assigned headers to be variables (ab30d, ab60d, ab90d and tcell7d) and the first column to be the subject ID column. We will assign this dataframe as “test_1”

# Import packages
import csv
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Import data with 10 subjects into python. In this case only the top 2 rows will be shown in the output
test_1 = pd.read_csv('/Users/kuanrongchan/Desktop/test.csv',index_col=0)
test_1.head(2)

Output file is as follows:

subjectab30dab60dab90dtcell7d
10.2847200.0806110.9971330.151514
20.1638440.8187040.4599750.143680

Next, we can visualise the correlation coefficient across all variables using the command:

corrM = test_1.corr()
corrM

Output file showing all the correlation coefficient values is as follows:

ab30dab60dab90dtcell7d
ab30d1.0000000.3789260.042423-0.324900
ab60d0.3789261.000000-0.489996-0.374259
ab90d0.042423-0.4899961.000000-0.392458
tcell7d-0.324900-0.374259-0.3924581.000000

We can also determine the p-values of the correlation by using the commands, as follows:

from scipy import stats 
from scipy.stats import pearsonr
pvals = pd.DataFrame([[pearsonr(test_1[c], test_1[y])[1] for y in test_1.columns] for c in test_1.columns],
                     columns=test_1.columns, index=test_1.columns)
pvals

Output file is as follows:

ab30dab60dab90dtcell7d
ab30d6.646897e-640.2802149.073675e-010.359673
ab60d2.802137e-010.0000001.505307e-010.286667
ab90d9.073675e-010.1505316.646897e-640.261956
tcell7d3.596730e-010.2866672.619556e-010.000000

Finally, to plot the correlation matrix with the correlation coefficient values within, we execute the following command:

corr = test_1.corr()
mask = np.zeros_like(corr)
mask[np.triu_indices_from(mask)] = True
with sns.axes_style("white"):
    f, ax = plt.subplots(figsize=(7, 5))
    ax = sns.heatmap(corr, cmap=cmap, mask=mask, center=0, square=True, linewidths=.5, annot=True, cbar_kws={"shrink": .5})

Output file is a beautiful correlation matrix:

At one glance, you can quickly tell which of the variables are positively and which are negatively correlated. Depending on your correlation matrix size, you can tailor the size of the output diagram by changing the figsize dimensions within the code. Correlation matrixes can also be complemented with pair plots, which I will elaborate more in my future blog entry.

Posted in Data visualisation, Heatmaps, python

The art of plotting heatmaps

Bar charts provide precise measurements with standard deviation, but showing multiple bar charts to show trends for multiple genes or proteins is difficult to visualise. In such cases, a heatmap can be a useful alternative to visualise changes in multiple gene or protein expression across different conditions. In a heatmap, the variation in magnitude of expression is often visualised by changing the intensity or hue. For instance, a higher expression can be depicted by using the colour red, and a greater increase in expression by a darker intensity. On the other hand, downregulation can be depicted by using the colour blue (more colour-blind friendly than green), with a stronger intensity indicating greater downregulation. While plotting raw expression values in a heatmap is acceptable if the relative abundance between genes and proteins are similar, data normalisation is required when the baseline gene or protein abundance is different.

For instance, consider you want to plot expression levels of gene A and gene B in a single heatmap. Gene A has a baseline expression of 5 copies per cell whereas gene B has a much higher baseline expression of 50 copies per cell. Let’s assume the condition where during a virus infection, gene A expression is reduced to 2.5 copies/cell but gene B expression is increased by 2 fold to 100 copies/cell. In this case, due to the high baseline abundance of gene B, plotting the heatmap using gene abundance will mask the ability to visualise differential expression of gene A after infection.

To display the relative expression of both genes A and B in a heatmap, data normalisation is recommended. One of the most common strategies is to plot a heatmap based on log2 fold-change instead of abundance. In this specific case, the log2 fold-change for gene A and gene B after virus infection is -1 and 1 respectively, and these values can be nicely presented in a heatmap to depict the phenomena. Another common alternative is to perform a Z-score transformation and plot the Z-scores in the heatmap. The Z-score transformation converts each gene or protein across all conditions with mean = 0 and standard deviation = 1, which means that values above 0 are above average and values below 0 are below average. Hence, the Z-score transformation standardises the values between genes or proteins to the same mean and standard deviation, enabling users to easily compare expression values across genes and proteins. However, an important point to note is that Z-score can only be applied for values that are obtained within a single experiment.

To plot a heatmap, Microsoft excel (under the format cell column) or specialised software such as PRISM can be used. However, most of these tools do not perform clustering of heatmaps . From my experience, besides clustergrammer, python is one of the fastest and easiest way to plot a heatmap or clustergram. After loading the data, the heatmap can be quickly plotted with a single line of code, and with various kinds of customisation. The step by step commands and code descriptions are as follows:

# Import packages
import csv
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Loading data into python as dataframe and check that the dataset is properly loaded
# Gene symbol is assigned as the index column
df = pd.read_csv('/Users/kuanrongchan/Desktop/Blog.csv',index_col=0)
df.head()

Output file:

Gene_SymbolControl_1Control_2Control_3Condition_1_1Condition_1_2Condition_1_3Condition_2_1Condition_2_2Condition_2_3
HERC52.352.181.903.412.993.252.672.552.46
IFIT14.504.424.586.316.266.236.085.525.21
IFIT24.344.844.765.915.965.955.565.364.97
IFNA41.421.521.441.732.052.351.141.541.61
IFNA70.881.220.561.461.231.701.230.710.64
# Packages to plot clustergram
from mpl_toolkits.axes_grid1.axes_divider import make_axes_locatable
from mpl_toolkits.axes_grid1.colorbar import colorbar

# Plot clustergram
g = sns.clustermap(df, z_score=0, cmap="vlag", figsize=(10, 4), method='average', cbar_kws={"ticks":[-1.5,0,1.5]}) 

# Various customisations can be performed on the above code as follows:
# Z_score = 0 ; tells python to plot the Z-score values
# cmap = "vlag" ; indicates red-blue heatmap
# figsize ; indicates figure size dimensions
# method = 'average' ; default option
# robust=True ; can be added in the code to plot as robust quantiles instead of the extreme values
# cbar_kws={"ticks":[-1.5,0,1.5]} ; defines points to show in colour bar
# g.cax.set_visible(False) ; can be added to remove colour bar
# Add yticklabels=True ; can be added for big heatmaps to display all gene symbols 

Output:

At a single glance, you can quickly tell that condition 2 behaves similarly with the control condition, whereas condition 1 show a marked increase in expression of IFNA4, IFNA7, HERC5, IFIT1 and IFIT2. The replicates are also clustered together. As you may already appreciate, this kind of data presentation is particularly useful for analysing big datasets.