Posted in Data visualisation, KEA3

KEA3: A web-based tool to predict involvement of upstream kinases based on a list of gene or protein queries.

(Left) Top 10 kinases that could be involved in transcriptional responses to the yellow fever live-attenuated vaccine (YF-17D). (Right) Interactive map showing the relationship of the different kinases identified in the left panel. Query list is based on the upregulated differentially expressed genes at day 7 post YF-17D vaccination from Querec et al., Nature Immunology, 2009.

Protein kinases catalyze the transfer of a phosphate group from ATP to other proteins’ threonine, serine, or tyrosine residues. This addition of phosphate group to a protein can influence substrate protein activity, stability, localization, and interactions with other molecules. While kinases can be suitably targeted by drugs, characterization of the cell kinome is not easy, as intercellular staining with phospho-specific antibodies is required.

A plausible solution is to leverage on the transcriptomic data to down-select kinase candidates for further validation. Here, I introduce Kinase Enrichment Analysis 3 (KEA3) developed by MV Kuleshov et al., which is a webserver application that predicts upstream kinases based on a list of gene or protein queries.

The KEA3 background database contains measured and predicted kinase-substrate interactions (KSI), kinase-protein interactions (KPI), and interactions supported by co-expression and co-occurrence data. By integrating KSIs and KPIs across data sources, KEA3 produces a composite ranking that improves the recovery of the expected kinases. In addition, the relationship between the top predicted kinase are also displayed in an interactive map.

I tested the ability for KEA3 to evaluate the possible kinases involved in the host transcriptomic responses to the YF-17D vaccine published by Querec et al., Nature Immunology, 2009. Taking the up-regulated differentially expressed genes at day 7 post-YF17D administration as the query list, the top 10 kinase hits are displayed in the figure at the top of this post. Notably, these kinases appear to be highly interconnected and the predicted involvement of EIF2AK2-JAK1-JAK2-TYK axis suggests the involvement of these kinases in triggering type-I interferon responses. This finding is consistent with previous studies showing that YF-17D induces strong interferon and antiviral responses.

Overall, KEA3 is a user-friendly tool that allows users to quickly predict the upstream kinases involved, based on a list of proteins or genes. While an experimental validation will be needed to confirm the involvement of these predicted kinases, the tool provides an informed prediction on the kinases involved that can be used for future studies.

Posted in Data visualisation, Volcano Plot

Genoppi: A new tool to plot volcano plots with annotated functions

Volcano plot plotted using Genoppi. Data is from Vahey et al., J Infect Dis., 2010, demonstrating the types of proteins that are most significantly induced in protected individuals receiving the RTS,S malaria vaccine. Functions are annotated from HGNC, ranked from greatest to lowest number of hits.

As elaborated in my previous post, volcano plots are a great way of visualising the differentially expressed genes (DEGs) that are most affected by a particular treatment (compared to untreated control). However, a limitation is that volcano plots do not directly annotate whether the most influential genes perform similar functions. Here, I introduce Genoppi, which is an open-source computational tool (R package source code: that allows plotting of interactive volcano plots with the corresponding gene functions derived from HGNC, GO or MSigDB (see example figure on top). In addition, by assigning a bait protein of interest, the tool is able to identify the interacting partners that are significant on the volcano plot. The interaction partners are compiled from InWeb_InBioMap, iRefIndex, or BioPlex which includes data from >40000 scientific articles. If interested to find out if SNPs (single-nucleotide polymorphisms) could play a role in your dataset, you may also assign the GWAS study from the NHGRI-EBI GWAS catalog to the dataset.

There are several other functions that Genoppi can do, but not elaborated in this blog post. Interested users may visit the details within the publication or in the guide provided in the website. Overall, while Genoppi is likely to be most utilised by scientists interested in proteomics and interactome research, my experience suggests that Genoppi can be potentially applied more broadly to transcriptomics analyses as well.

Posted in Clustergrammer, Data visualisation

Clustergrammer: A great online tool for plotting clustergrams and heatmaps

Clustergrammer is an online tool can be used to visualise gene expression patterns. Red indicates increased expression whereas blue indicates reduced expression. Distinct gene clusters are depicted at the bottom of the heatmap. Source from Fernandez et al., scientific data, 2017.

A clustergram or a heatmap is one of several techniques that can directly visualise data without the need for dimensionality reduction. As clustergrams are easy to interpret, they are widely used to visualise biological data in print publications. Based on similarities and differences in gene expression patterns, clustergrams can also allow direct visualisation of clusters.

In this entry, I will introduce Clustergrammer, which is a user-friendly webtool for plotting clustergrams. The loading of the data into Clustergrammer can be summarised in 3 basic steps:

  1. Normalise the gene expression data by performing a Z score transformation. This ensures that the grand mean of each gene will be centralised at value of 0, with standard deviation of 1.
  2. Make sure that the samples are arranged in columns and the genes are arranged in rows. I recommend ordering the samples in the same way as how you would want your data to be published (e.g. controls on the extreme left and the other samples on the right), as proper ordering of the variables allows Clustergrammer to perform supervised clustering. Finally, if you have multiple conditions, you may assign the clusters beforehand by inserting additional rows at the top. You may also consider adding additional columns on the left to assign genes that perform similar functions (see detailed instructions within website).
  3. Save file in .txt format and upload file in Clustergrammer.

By default, Clustergrammer performs an unsupervised clustering on both rows and columns, and clusters can be visualised by the small arrowheads at the bottom and right of the heatmap. A single-click on the arrowhead reveals the genes within the cluster, allowing you to query their functions directly in Enrichr. A double-click allows you to zoom into the heatmap within the cluster. To further examine the expression levels at the individual level, you can move your mouse cursor within the heatmap and use the mouse scroll to zoom in or zoom out.

For supervised clustering, you can choose to arrange the rows and columns according to the sample order originally assigned. The sidebar is located at the top left hand side of the website. If you have pre-assigned your clusters by adding additional rows, you may choose to click on the category you have classified.

Finally, to determine the relatedness between the different conditions, Clustergrammer also plots the co-expression matrix. The applications of Clustergrammer are not just limited to analysing gene expression studies, but can be extended to proteomics, metabolomics, virus-host interactions and cyTOF analyses. The ease of use, interactive interface and the ability to directly visualise gene expression patterns makes Clustergrammer my top choice in analysing omics datasets.