Streamlit + DuckDB Tutorial
The beautiful thing about learning is that nobody can take it away from you.
B.B King
Motivation
In this era of Everything Web, we are basically compelled to build web interfaces for our various application to ease the access, hence, bring value.
As a Python developer , we have a lot modules & framework that helps us build web interfaces.
This is where Streamlit comes in, with it’s pythonic way of creating and web applications, it's the go to solution to build fast and with great ease web application.
We’ll use duckDB for it’s built-in OLAP capabilities and easy usage (the databases is a file, no complex infrastructure needed).
In this tutorial, you’ll learn how to build web apps using Streamlit and using duckDB as your database.
Table of content
What is Streamlit ?
Streamlit turns data scripts into shareable web apps in minutes.
All in pure Python. No front‑end experience required.
With Streamlit, we can easily create using python ONLY dynamic web application. It’s ideal for data apps as it natively has built-in function to present data in different ways (tables, charts, etc.).
You can read more here
What is duckDB ?
DuckDB is an in-process SQL OLAP database management system.
It is serverless comes with great features like (Direct Parquet & CSV querying, Vectorized Engine, Parallel query Processing). It is an alternative to SQLite purposely built for Analytics workload.
You can read more here.
Code
The full code can be access below.
https://github.com/mikekenneth/streamlit_duckdb
Data Generator
The below generates a 1000
rows data and loads to orders
table in duck.db
when the button
is clicked:
from generate_data import generate_dataset_orders, load_file
button = st.button(label="Generate or Refresh Data")
if button:
db = "duck.db"
destination_table_name = "orders"
filename = "orders.csv"
generate_dataset_orders(filename=filename, num_rows=1000)
load_file(db=db, infile_path=filename, table_name=destination_table_name)
Screenshot
I wrote a little helper to easily generate the orders data and load into duckDB. The code can be accessed here.
Extract data from DuckDB
I created a function execute_query
to easily get the data using only the desired SQL query. It also allows specifying the type of output needed. You can read more in the official documentation.
def execute_query(query: str, return_type: str = "df"):
with duckdb.connect("duck.db", read_only=True) as con:
if return_type == "df":
return con.execute(query).df()
elif return_type == "arrow":
return con.execute(query).arrow()
elif return_type == "list":
return con.execute(query).fetchall()
destination_table_name = "orders"
data = execute_query(f"select * from {destination_table_name}", return_type="df")
Display a sample of the data
With Streamlit we can easily display a Dataframe
like a table. You can read more here.
st.write("## Sample")
st.dataframe(data.head(10), height=300)
Screenshot
Visualization
Streamlit allow us to easily display charts. We will use it to display some bar charts. You can read more here.
st.write("## Visualization")
option = st.selectbox("Select a dimension", ["product_name", "customer_name", "status"], key="option")
if option:
st.write(f"### Bar Chart: {option} x Quantity")
st.bar_chart(data, x=option, y="quantity")
st.write(f"### Bar Chart: {option} x Amount")
st.bar_chart(data, x=option, y="amount")
st.write(f"### Bar Chart: {option} x Count")
st.bar_chart(data[option].value_counts())
Screenshot
Filtering Data
We can also filter out data directly from the web using the below. You can read more here.
st.write("## Filters (by Products Name)")
products_list = [
row[0]
for row in execute_query(
f"select distinct(product_name) from {destination_table_name}", db=db, return_type="list"
)
]
product_filter = st.selectbox(label="Select a Product", options=products_list, key="product_filter")
if product_filter != "--":
result = execute_query(
f"select * from {destination_table_name} where product_name = '{product_filter}'",
db=db,
return_type="df",
)
st.dataframe(result, height=400)
Screenshot
Download / Exporting Data
We can export data to csv using the Streamlit st.download_button
function.
@st.experimental_memo # An optimization wrapper to memoize the result of the function
def export_df(df):
return df.to_csv(index=False).encode("utf-8")
st.write("### Download Data")
st.download_button(
label="Press to Download",
data=export_df(result),
file_name=f"orders - product='{product_filter}'.csv",
mime="text/csv",
key="download-csv",
)
Screenshot
That is a wrap. I hope this helps you. 🙂
About Me
I am a Data Engineer with 2+ years of experience and more years as a Software Engineer (5+ years). I enjoy learning and teaching (mostly learning 😎).
Top comments (0)