As elaborated in my previous post, volcano plots are a great way of visualising the differentially expressed genes (DEGs) that are most affected by a particular treatment (compared to untreated control). However, a limitation is that volcano plots do not directly annotate whether the most influential genes perform similar functions. Here, I introduce Genoppi, which is an open-source computational tool (R package source code: github.com/lagelab/Genoppi) that allows plotting of interactive volcano plots with the corresponding gene functions derived from HGNC, GO or MSigDB (see example figure on top). In addition, by assigning a bait protein of interest, the tool is able to identify the interacting partners that are significant on the volcano plot. The interaction partners are compiled from InWeb_InBioMap, iRefIndex, or BioPlex which includes data from >40000 scientific articles. If interested to find out if SNPs (single-nucleotide polymorphisms) could play a role in your dataset, you may also assign the GWAS study from the NHGRI-EBI GWAS catalog to the dataset.
There are several other functions that Genoppi can do, but not elaborated in this blog post. Interested users may visit the details within the publication or in the guide provided in the website. Overall, while Genoppi is likely to be most utilised by scientists interested in proteomics and interactome research, my experience suggests that Genoppi can be potentially applied more broadly to transcriptomics analyses as well.
As described in my previous post, using differentially expressed genes (DEGs) to analyse biological datasets does not provide information on the magnitude or extent of gene changes. In contrast, a volcano plot, which is a scatterplot of -log10(Adjusted p-value) against log2(Fold change), allows visualisation of the distribution of DEGs and the DEGs that are most differentially expressed. The genes with greatest fold changes and significant p-values (p<0.05) are also ideal targets for validation.
The volcano plot is comprised of a two-step procedure. First, fold change is determined by taking the ratio of the gene abundance in the treatment group to the control group, followed by a log2 transformation to obtain a normal or near-normal distribution. Values > 0 are considered as upregulated genes, whereas values < 0 are downregulated. Second, an adjusted p-value, corrected for multiple correction, is used to calculate if the gene expression changes between the treatment and control groups are significantly different. This is then followed by a -log10 transformation for normalisation, to obtain the -log(adjusted p-value).
Volcano plots can be plotted using excel or more specialised biostatistics tools such as Prism-GraphPad. However, manual annotation of genes with largest fold changes and p values can be laborious. R scripts using the EnhancedVolcano R package can be used. Alternatively, I recommend the use of VolcaNoseR, as it allows greater ease of creating and labelling volcano plots. VolcaNoseR is an user-friendly open source web app that allows one to quickly change the fold change or p-value thresholds, as well as quick annotation of genes with greatest fold changes and p-values. I have personally tried plotting volcano plots using VolcaNoseR with 30,000 genes without experiencing serious lag issues. However, a disadvantage with this tool is that the annotation of genes may overlap if the gene names are too long.