A clustergram or a heatmap is one of several techniques that can directly visualise data without the need for dimensionality reduction. As clustergrams are easy to interpret, they are widely used to visualise biological data in print publications. Based on similarities and differences in gene expression patterns, clustergrams can also allow direct visualisation of clusters.
In this entry, I will introduce Clustergrammer, which is a user-friendly webtool for plotting clustergrams. The loading of the data into Clustergrammer can be summarised in 3 basic steps:
- Normalise the gene expression data by performing a Z score transformation. This ensures that the grand mean of each gene will be centralised at value of 0, with standard deviation of 1.
- Make sure that the samples are arranged in columns and the genes are arranged in rows. I recommend ordering the samples in the same way as how you would want your data to be published (e.g. controls on the extreme left and the other samples on the right), as proper ordering of the variables allows Clustergrammer to perform supervised clustering. Finally, if you have multiple conditions, you may assign the clusters beforehand by inserting additional rows at the top. You may also consider adding additional columns on the left to assign genes that perform similar functions (see detailed instructions within website).
- Save file in .txt format and upload file in Clustergrammer.
By default, Clustergrammer performs an unsupervised clustering on both rows and columns, and clusters can be visualised by the small arrowheads at the bottom and right of the heatmap. A single-click on the arrowhead reveals the genes within the cluster, allowing you to query their functions directly in Enrichr. A double-click allows you to zoom into the heatmap within the cluster. To further examine the expression levels at the individual level, you can move your mouse cursor within the heatmap and use the mouse scroll to zoom in or zoom out.
For supervised clustering, you can choose to arrange the rows and columns according to the sample order originally assigned. The sidebar is located at the top left hand side of the website. If you have pre-assigned your clusters by adding additional rows, you may choose to click on the category you have classified.
Finally, to determine the relatedness between the different conditions, Clustergrammer also plots the co-expression matrix. The applications of Clustergrammer are not just limited to analysing gene expression studies, but can be extended to proteomics, metabolomics, virus-host interactions and cyTOF analyses. The ease of use, interactive interface and the ability to directly visualise gene expression patterns makes Clustergrammer my top choice in analysing omics datasets.