Posted in Data visualisation, python, Venn diagram

Using python codes to study unique or shared elements in Venn diagrams

As elaborated in my previous post, Venn diagrams are useful for identifying unique and shared genes between different treatment conditions. While Venny remains to be one of my favourite tools for plotting Venn diagrams, inputting lists of genes can be cumbersome when the gene lists are long (e.g. >1,000 genes) or when gene lists are provided as .csv or .txt files.

Here, I introduce how you can use python to identify the unique and overlapping genes. The operations, & for intersection, and | for union can be used. An example is shown below:

# Assume each alphabet as individual genes
# Treatment A and treatment B serve as variables

Treatment_A = {"A", "B", "C", "D"}
Treatment_B = {"B", "C", "D", "E", "F"}

Total = list(Treatment_A | Treatment_B)
print(sorted(Total))

Intersection = list(Treatment_A & Treatment_B)
print(sorted(Intersection))

Treatment_A_no_B = list(Treatment_A - (Treatment_A & Treatment_B))
print(sorted(Treatment_A_no_B))

Treatment_B_no_A = list(Treatment_B - Treatment_A & Treatment_B)
print(sorted(Treatment_B_no_A))
# Output from above commands are as follows:
['A', 'B', 'C', 'D', 'E', 'F']
['B', 'C', 'D']
['A']
['E', 'F']

Based on the counts, we can then plot a proportionate Venn diagram using the following code:

from matplotlib_venn import venn2 
from matplotlib import pyplot as plt

venn2(subsets = (1, 2, 3), set_labels = ('Treatment A', 'Treatment B'))
plt.title("Treatment A vs Treatment B")
plt.show()
Output file after plt.show() command

Edited on 5 July 2021, with contribution from Gabrielle 🙂 You can also plot a Venn diagram with 3 different comparisons as well using the command below:

import matplotlib.pyplot as plt
from matplotlib_venn import venn3

set1 = set(["A", "B", "C", "D"])
set2 = set(["B", "C", "D", "E", "F"])
set3 = set(["C", "D", "E","G","H","J"])

venn3([set1, set2, set3], ('Treatment_A', 'Treatment_B', 'Treatment_C'))

plt.show()
Output file after the plt.show() command
# To obtain the elements in the shared and unique treatments
Treatment_A = {"A", "B", "C", "D"}
Treatment_B = {"B", "C", "D", "E", "F"}
Treatment_C = {"C", "D", "E","G","H","J"}

Total = list(Treatment_A | Treatment_B | Treatment_C)
print(sorted(Total))

Intersect_all = Treatment_A & Treatment_B & Treatment_C
print(sorted(Intersect_all))

Treatment_A_no_BC = list(Treatment_A - (Treatment_B | Treatment_C))
print(sorted(Treatment_A_no_BC))

Treatment_B_no_AC = list(Treatment_B - (Treatment_A | Treatment_C))
print(sorted(Treatment_B_no_AC))

Treatment_C_no_AB = list(Treatment_C - (Treatment_A | Treatment_B))
print(sorted(Treatment_C_no_AB))
# Output file
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J']
['C', 'D']
['A']
['F']
['G', 'H', 'J']

To use these commands for your application, simply replace the alphabets to your gene lists. Happy coding!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s