Posted in python

Building interactive dashboards with Streamlit (Part I)

Growing Irises – Planting & Caring for Iris Flowers | Garden Design
Iris flower dataset used to illustrate how we can use Streamlit to build data dashboards

Python is often used in back-end programming, which builds functionality in web applications. For instance, Python can be used to connect the web to a database, so that users can query information. By adding the backend components, Python turns a static webpage into a dynamic web application, where users can interact with the webpage based on an updated and connected database. If data provided is big, developers can even use the machine learning tools in Python to make data predictions. However, front-end development, which is the part of making website beautiful and interactive webpages, is often done with other programming languages such as HTML, CSS and JavaScript. The question remains, do I need to be a full-stack developer to master both front-end and back-end web development? This can be time-consuming as this means you have to learn multiple languages.

This is where Streamlit, Django and Flask comes into the rescue! These are built on the Python programming language, and allows building of websites with minimal knowledge on CSS and JavaScript. Among these frameworks, I personally find Flask and Streamlit easier to learn and implement. However, with more experience, I decided to focus on Streamlit as the syntax are easier to understand and the deployment is more rapid.

Rather than going through Streamlit commands one-by-one, I will instead illustrate the functionality of the different commands using my favourite iris dataset which is readily available in PlotlyExpress. The GitHub repository is publicly available, and the output of the codes will be hosted here.

First, we will create a file called iris.py using Sublime text or VisualStudio. In this blog entry, we will focus on acquiring the basic statistics of the individual columns, and present this information in a data dashboard for project showcasing. As with all Python commands, we need to import the required packages:

import streamlit as st
import numpy as np
import pandas as pd
import plotly.express as px
from wordcloud import WordCloud
from typing import Any, List, Tuple

After that, we type in the commands needed to read the iris dataframe. The data is available in PlotlyExpress and can be loaded into Streamlit with the following commands. The title of the dataframe is also included for data dashboarding:

st.title('Data analysis of iris dataset from PlotlyExpress')
df = px.data.iris()

Next, the basic stats such as counts, mean, standard deviation, min, max and quantiles can be displayed using the command df.describe(). However, the advantage of Streamlit is the ability to add widgets, thus preventing the dashboard from looking too cluttered. In this example, we will look into creating widgets for the variables in the header columns for users to determine the basic stats in the specific columns:

col1, col2, col3, col4, col5 = st.columns(5)
col1.metric("Mean", column.mean())
col2.metric("Max", column.max())
col3.metric("Min", column.min())
col4.metric("Std", column.std())
col5.metric("Count", int(column.count()))

The output will generate 5 columns, showing the mean, max, min, standard deviation and counts in each respective columns. The widget can be visualised here.

These statistics are appropriate for numeric or continuous variables. However, to visualise the categorical variables (in this case, the flower species), we can use a word cloud or a table to know the exact counts for each species using the following commands:

# Build wordcloud
non_num_cols = df.select_dtypes(include=object).columns
column = st.selectbox("Select column to generate wordcloud", non_num_cols)
column = df[column]
wc = WordCloud(max_font_size=25, background_color="white", repeat=True, height=500, width=800).generate(' '.join(column.unique()))
st.image(wc.to_image())

# Build table
st.markdown('***Counts in selected column for generating WordCloud***')
unique_values = column.value_counts()
st.write(unique_values)

The aim for part I is to provide various solutions to inspect the distribution of your categorical and continuous variables within your dataset. We will slowly cover the other topics including data visualisation, machine learning and interactive graphical plots in my subsequent posts!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s