Differentially expressed genes (DEGs) can be identified based on a pre-determined fold change and adjusted p-value cutoff. Some scientific questions can include:
- What are the functional roles of different DEGs and in what cellular processes do they participate?
- How are the DEGs regulated? Do they interact with each other to perform a common function?
- Are the identified DEGs also seen in other similar studies?
There are several tools that we can employ to answer some or all of these questions. In this entry, I describe the use of Enrichr, which is an integrative web-based and mobile software application with many gene-set libraries. Using a list of DEGs as data input, the Enrichr tool will query this list of DEGs against the multiple gene-set libraries, broadly categorised as: Transcription pathways, Pathways, Ontologies, Diseases/Drugs, Cell types and Miscellaneous. The Reactome database under the “Pathways” tab and the Gene Ontology Biological Pathways under the “Ontology” tab has been most useful to me, as my research interest is on host responses to virus or vaccines. However, if one is interested in epigenetics, ChIP-seq datasets from the Roadmap Epigenomics project is available to query against. The humanCyc database provides insights into the interactions of the DEGs with metabolic pathways, although I would also recommend Metaboanalyst to have a better understanding of the metabolic pathways involved.
After identifying the top pathway hits in the respective gene-set libraries, the next important feature to look out for is whether the pathways are significantly enriched. The statistical tests provided are :
- The Fisher exact test that calculates the p-value or adjusted p-value to determine if the pathway is statistically enriched.
- Z-score that calculates the deviation from the expected rank by the Fisher exact test
- Combined score that multiplies the log of the p-value computed with the Fisher exact test by the z-score.
By default, an adjusted p-value of <0.05 is considered to be significantly enriched. However, in some cases where the DEGs input list is small, it is common to see that the adjusted p-values will not reach statistical significance. In this case, Enrichr may not be the best tool for analysing small datasets, although it may still be used to provide some insights into the pathways that these genes are involved in. Alternatively, gene-set enrichment analysis (GSEA) could be more informative. In my experience, ranking of pathways by the combined score and filtering by adjusted p-value < 0.05 typically provides a better representation of the top key biological pathways involved. This is likely because the combined score considers both Z-score and p-value, both of which are important parameters to consider for calculating pathway enrichment.
Overall, Enrichr is a state-of-the-art gene set enrichment analysis, and I would highly recommend using Enrichr to understand the biological functions of your DEGs list. For Enrichr to be most informative, it is important to also note that the transcripts in your datasets are preferably not biased to any biological pathways. As such, Enrichr is most suitable for microarray and RNAseq datasets.