Posted in Data visualisation, python, Venn diagram

Using python codes to study unique or shared elements in Venn diagrams

As elaborated in my previous post, Venn diagrams are useful for identifying unique and shared genes between different treatment conditions. While Venny remains to be one of my favourite tools for plotting Venn diagrams, inputting lists of genes can be cumbersome when the gene lists are long (e.g. >1,000 genes) or when gene lists are provided as .csv or .txt files.

Here, I introduce how you can use python to identify the unique and overlapping genes. The operations, & for intersection, and | for union can be used. An example is shown below:

# Assume each alphabet as individual genes
# Treatment A and treatment B serve as variables

Treatment_A = {"A", "B", "C", "D"}
Treatment_B = {"B", "C", "D", "E", "F"}

Total = list(Treatment_A | Treatment_B)

Intersection = list(Treatment_A & Treatment_B)

Treatment_A_no_B = list(Treatment_A - (Treatment_A & Treatment_B))

Treatment_B_no_A = list(Treatment_B - Treatment_A & Treatment_B)
# Output from above commands are as follows:
['A', 'B', 'C', 'D', 'E', 'F']
['B', 'C', 'D']
['E', 'F']

Based on the counts, we can then plot a proportionate Venn diagram using the following code:

from matplotlib_venn import venn2 
from matplotlib import pyplot as plt

venn2(subsets = (1, 2, 3), set_labels = ('Treatment A', 'Treatment B'))
plt.title("Treatment A vs Treatment B")
Output file after command

Edited on 5 July 2021, with contribution from Gabrielle 🙂 You can also plot a Venn diagram with 3 different comparisons as well using the command below:

import matplotlib.pyplot as plt
from matplotlib_venn import venn3

set1 = set(["A", "B", "C", "D"])
set2 = set(["B", "C", "D", "E", "F"])
set3 = set(["C", "D", "E","G","H","J"])

venn3([set1, set2, set3], ('Treatment_A', 'Treatment_B', 'Treatment_C'))
Output file after the command
# To obtain the elements in the shared and unique treatments
Treatment_A = {"A", "B", "C", "D"}
Treatment_B = {"B", "C", "D", "E", "F"}
Treatment_C = {"C", "D", "E","G","H","J"}

Total = list(Treatment_A | Treatment_B | Treatment_C)

Intersect_all = Treatment_A & Treatment_B & Treatment_C

Treatment_A_no_BC = list(Treatment_A - (Treatment_B | Treatment_C))

Treatment_B_no_AC = list(Treatment_B - (Treatment_A | Treatment_C))

Treatment_C_no_AB = list(Treatment_C - (Treatment_A | Treatment_B))
# Output file
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J']
['C', 'D']
['G', 'H', 'J']

To use these commands for your application, simply replace the alphabets to your gene lists. Happy coding!

Posted in Data visualisation, Venn diagram

Venn diagrams: A basic method for comparing variables

Venn diagram showing the number of upregulated DEGs under ADE, HI-ADE and DENV conditions. Source: Chan et al., mSphere, 2019

Venn diagrams are a fun and easy way to visualise the unique and overlapping differentially expressed genes between variables in a dataset. In my experience, I find Venny to be the most useful. By copying and pasting gene lists from various comparisons, Venny allows quick identification of genes that are either shared or unique between comparisons.

Another way to draw Venn diagrams is to use the Venn diagram plotter, which allows drawing of correctly proportioned Venn diagrams. The size of the Venn diagram plotted will be proportional to the size of the gene list, allowing a visual representation of the extent of overlap between comparisons (see above for example).

Lastly, it is important to note that Venn diagrams are unable to assign gene rankings, so I would recommend doing a pathway enrichment for the different segments of the Venn diagram to prioritise the genes of interest. Alternatively, you may consider Venn-diaNet for such analysis.