As described in my previous post, using differentially expressed genes (DEGs) to analyse biological datasets does not provide information on the magnitude or extent of gene changes. In contrast, a volcano plot, which is a scatterplot of -log10(Adjusted p-value) against log2(Fold change), allows visualisation of the distribution of DEGs and the DEGs that are most differentially expressed. The genes with greatest fold changes and significant p-values (p<0.05) are also ideal targets for validation.
The volcano plot is comprised of a two-step procedure. First, fold change is determined by taking the ratio of the gene abundance in the treatment group to the control group, followed by a log2 transformation to obtain a normal or near-normal distribution. Values > 0 are considered as upregulated genes, whereas values < 0 are downregulated. Second, an adjusted p-value, corrected for multiple correction, is used to calculate if the gene expression changes between the treatment and control groups are significantly different. This is then followed by a -log10 transformation for normalisation, to obtain the -log(adjusted p-value).
Volcano plots can be plotted using excel or more specialised biostatistics tools such as Prism-GraphPad. However, manual annotation of genes with largest fold changes and p values can be laborious. R scripts using the EnhancedVolcano R package can be used. Alternatively, I recommend the use of VolcaNoseR, as it allows greater ease of creating and labelling volcano plots. VolcaNoseR is an user-friendly open source web app that allows one to quickly change the fold change or p-value thresholds, as well as quick annotation of genes with greatest fold changes and p-values. I have personally tried plotting volcano plots using VolcaNoseR with 30,000 genes without experiencing serious lag issues. However, a disadvantage with this tool is that the annotation of genes may overlap if the gene names are too long.