Posted in Data visualisation, python, Venn diagram

Using python codes to study unique or shared elements in Venn diagrams

As elaborated in my previous post, Venn diagrams are useful for identifying unique and shared genes between different treatment conditions. While Venny remains to be one of my favourite tools for plotting Venn diagrams, inputting lists of genes can be cumbersome when the gene lists are long (e.g. >1,000 genes) or when gene lists are provided as .csv or .txt files.

Here, I introduce how you can use python to identify the unique and overlapping genes. The operations, & for intersection, and | for union can be used. An example is shown below:

# Assume each alphabet as individual genes
# Treatment A and treatment B serve as variables

Treatment_A = {"A", "B", "C", "D"}
Treatment_B = {"B", "C", "D", "E", "F"}

Total = list(Treatment_A | Treatment_B)
print(sorted(Total))

Intersection = list(Treatment_A & Treatment_B)
print(sorted(Intersection))

Treatment_A_no_B = list(Treatment_A - (Treatment_A & Treatment_B))
print(sorted(Treatment_A_no_B))

Treatment_B_no_A = list(Treatment_B - Treatment_A & Treatment_B)
print(sorted(Treatment_B_no_A))
# Output from above commands are as follows:
['A', 'B', 'C', 'D', 'E', 'F']
['B', 'C', 'D']
['A']
['E', 'F']

Based on the counts, we can then plot a proportionate Venn diagram using the following code:

from matplotlib_venn import venn2
from matplotlib import pyplot as plt

venn2(subsets = (1, 2, 3), set_labels = ('Treatment A', 'Treatment B'))
plt.title("Treatment A vs Treatment B")
plt.show()

Edited on 5 July 2021, with contribution from Gabrielle 🙂 You can also plot a Venn diagram with 3 different comparisons as well using the command below:

import matplotlib.pyplot as plt
from matplotlib_venn import venn3

set1 = set(["A", "B", "C", "D"])
set2 = set(["B", "C", "D", "E", "F"])
set3 = set(["C", "D", "E","G","H","J"])

venn3([set1, set2, set3], ('Treatment_A', 'Treatment_B', 'Treatment_C'))

plt.show()
# To obtain the elements in the shared and unique treatments
Treatment_A = {"A", "B", "C", "D"}
Treatment_B = {"B", "C", "D", "E", "F"}
Treatment_C = {"C", "D", "E","G","H","J"}

Total = list(Treatment_A | Treatment_B | Treatment_C)
print(sorted(Total))

Intersect_all = Treatment_A & Treatment_B & Treatment_C
print(sorted(Intersect_all))

Treatment_A_no_BC = list(Treatment_A - (Treatment_B | Treatment_C))
print(sorted(Treatment_A_no_BC))

Treatment_B_no_AC = list(Treatment_B - (Treatment_A | Treatment_C))
print(sorted(Treatment_B_no_AC))

Treatment_C_no_AB = list(Treatment_C - (Treatment_A | Treatment_B))
print(sorted(Treatment_C_no_AB))
# Output file
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J']
['C', 'D']
['A']
['F']
['G', 'H', 'J']

To use these commands for your application, simply replace the alphabets to your gene lists. Happy coding! 