Harnessing Langchain and Streamlit for CSV Data Visualization
Written on
Large language models (LLMs) have seen significant improvements in their capabilities. These models are versatile and can be employed for various tasks like generating text, translating languages, and answering inquiries.
What is Langchain?
In a prior article, I provided an overview of Langchain and its functionalities. A review of my previous posts will reveal an in-depth explanation of how Langchain operates.
In this guide, I will demonstrate how to utilize Langchain and Streamlit to analyze CSV files. We will use the OpenAI API for access to GPT-3 and Streamlit for developing the user interface. This setup will allow users to upload CSV files and ask questions about the data, with the system generating responses and visualizing the information through tables and charts.
Before We Begin!
If this topic interests you and you would like to support my work: 1. Give my article 50 claps; your support matters! 2. Follow me on Medium and subscribe for my latest articles. 3. For those who prefer video tutorials, consider subscribing to my YouTube channel, where I convert many of my articles into visual formats.
Let’s dive into coding!
1. Set Up Your Environment
Begin by creating a venv on your local machine. Open your terminal and set up a virtual environment:
python -m venv venv
Activate it with:
venvScriptsactivate
You should now see (Venv) in your terminal.
Next, install the necessary dependencies:
pip install langchain==0.0.146, python-dotenv==1.0.0, streamlit==1.22.0, openai==0.27.7, tabulate==0.9.0
Finally, set an environment variable for your OpenAI API key:
set OPENAI_API_KEY=<YOUR_API_KEY>
Where Can I Locate My OpenAI API Key?
Visit GitHub, Twitter, LinkedIn, or my website for more details.
Now that we are set up, let's proceed!
Create a file named Talk_with_CSV.py, where we will define the functions for handling queries.
First, import the required libraries:
from langchain import OpenAI from langchain.agents import create_pandas_dataframe_agent import pandas as pd from dotenv import load_dotenv import json import streamlit as st
Read the environment variable:
load_dotenv() openai_key = os.getenv("OPENAI_API_KEY")
Setting Up the Agent
Creating the agent is straightforward with the create_pandas_dataframe_agent from Langchain. An agent is a component designed to interact with a collection of tools and make decisions based on user input. Agents can be categorized into two main types: “Action Agents” and “Plan-and-Execute Agents.”
Action Agents decide on actions to take and execute them step by step. They can pull and manage data from various sources, including databases, APIs, and, in this case, a CSV file.
Define the Tool_CSV Function
The Tool_CSV function takes the path of a CSV file and returns an agent capable of utilizing a large language model (LLM). This function creates an OpenAI object, reads the CSV, converts it into a Pandas DataFrame, and then generates a Pandas DataFrame agent.
def csv_tool(filename: str):
df = pd.read_csv(filename)
return create_pandas_dataframe_agent(OpenAI(temperature=0), df, verbose=True)
Define the Ask_agent Function
The ask_agent function is the central component of this process. It takes a pandas_dataframe_agent and a query as input, returning the agent's response as a string. This function prepares a prompt for the agent, detailing the desired response format. The aim is to have the agent provide a string that can be converted into a dictionary, facilitating the program to display a graph, table, or a simple text response.
def ask_agent(agent, query):
"""
Query an agent and return the response as a string.
Args:
agent: The agent to query.
query: The query to ask the agent.
Returns:
The response from the agent as a string."""
prompt = (
"""
Let's decode how to respond to the queries. The responses depend on the type of information requested in the query.
If the query requires a table, format your answer like this:
{"table": {"columns": ["column1", "column2", ...], "data": [[value1, value2, ...], [value1, value2, ...], ...]}}
For a bar chart, respond like this:
{"bar": {"columns": ["A", "B", "C", ...], "data": [25, 24, 10, ...]}}
If a line chart is more appropriate, your reply should look like this:
{"line": {"columns": ["A", "B", "C", ...], "data": [25, 24, 10, ...]}}
For a plain question that doesn't need a chart or table, your response should be:
{"answer": "Your answer goes here"}
If the answer is unknown, respond with:
{"answer": "I do not know."}
Return all output as a string. Remember to encase all strings in the "columns" list and data list in double quotes.
Now, let's tackle the query step by step. Here's the query for you to work on:
"""
- query
)
response = agent.run(prompt)
return str(response)
The ask_agent function is crucial, taking in two parameters: an agent (Pandas DataFrame) and a query. It starts by creating a 'prompt' for the agent, indicating the type of answers we seek. The objective is for the agent to return a string, which will later be transformed into a dictionary for further processing.
Setting Up the Streamlit Interface
Streamlit is a free and open-source framework for quickly building and sharing appealing machine learning and data science web applications. It is a Python library designed for ease of use, allowing users to create interactive apps without needing knowledge of JavaScript or CSS.
def decode_response(response: str) -> dict:
"""Converts the string response from the model to a dictionary object.
Args:
response (str): response from the modelReturns:
dict: dictionary with response data"""
return json.loads(response)
The decode_response function transforms the agent's string response into a dictionary for easier manipulation.
def write_answer(response_dict: dict):
"""Display a response from an agent in a Streamlit app.
Args:
response_dict: The response from the agent.Returns:
None."""
if "answer" in response_dict:
st.write(response_dict["answer"])if "bar" in response_dict:
data = response_dict["bar"]
try:
df_data = {col: [x[i] if isinstance(x, list) else x for x in data['data']] for i, col in enumerate(data['columns'])}
df = pd.DataFrame(df_data)
df.set_index("Products", inplace=True)
st.bar_chart(df)
except ValueError:
print(f"Couldn't create DataFrame from data: {data}")if "line" in response_dict:
data = response_dict["line"]
try:
df_data = {col: [x[i] for x in data['data']] for i, col in enumerate(data['columns'])}
df = pd.DataFrame(df_data)
df.set_index("Products", inplace=True)
st.line_chart(df)
except ValueError:
print(f"Couldn't create DataFrame from data: {data}")if "table" in response_dict:
data = response_dict["table"]
df = pd.DataFrame(data["data"], columns=data["columns"])
st.table(df)
This function accepts a response dictionary and displays the output on the Streamlit app, which could include answers, bar graphs, line graphs, or tables. The function checks the type of response and generates the corresponding visualization.
Now, let’s build the initial interface. Add the following lines:
st.set_page_config(page_title="Talk with your CSV") st.title("Talk with your CSV")
st.write("Please upload your CSV file below.")
data = st.file_uploader("Upload a CSV", type="csv")
query = st.text_area("Send a Message")
if st.button("Submit Query", type="primary"):
agent = csv_tool(data)
response = ask_agent(agent=agent, query=query)
decoded_response = decode_response(response)
write_answer(decoded_response)
This code creates a Streamlit app that allows users to interact with their CSV files. Users can upload a CSV file and enter a query. When the "Submit Query" button is pressed, the app queries the agent and displays the answer.
The app relies on the following functions: - csv_tool(): Generates an agent from a CSV file. - ask_agent(): Queries the agent and provides the answer. - decode_response(): Converts an agent's response. - write_answer(): Displays the response in the Streamlit app.
Let’s Try It Out!
Run the application by entering streamlit run Talk_with_csv.py in the terminal. This command will launch a new browser window appearing like this:
For this tutorial, I will be using data available on my GitHub repository. Feel free to upload your CSV file and start submitting your queries!
First Query: Which Products have the highest orders?
Second Query: Tabulate the first 5 Products. Include the Products and Order columns only.
Third Query: Create a line graph of the first 5 products. Use Products as columns and Orders as data values.
Fourth Query: Create a bar graph of the first 5 products. Use Products as columns and Orders as data values.
Conclusion
Langchain and Streamlit provide powerful tools that simplify the process of querying data using language models. This application allows users to visualize their data effectively, making it easier to interpret and understand.
If you have any questions, please feel free to reach out!
References
- https://python.langchain.com/en/latest/modules/agents/agents/examples/chat_conversation_agent.html?highlight=agent_chain
- https://docs.streamlit.io/
- https://www.datacamp.com/tutorial/streamlit
Stay tuned for more insights on trending AI applications and discussions on my personal blog. If you're not a Medium member yet and want unlimited access to articles, consider using my referral link to sign up—it's just $5 a month!
We are AI application experts! If you wish to collaborate on a project, feel free to inquire here, visit our website, or schedule a consultation with us.
Explore my other articles: - Talk To Your Website: How To Create A Chatbot For A Website Using LangChain And Streamlit - LangChain Tutorial: A Step-by-Step LangChain Python Crash Course
Level Up Coding
Thank you for being part of our community! Before you go: - Clap for the article and follow the author. - Explore more content in the Level Up Coding publication. - Access a free coding interview course. - Follow us on Twitter, LinkedIn, and our newsletter. Join the Level Up talent collective and discover amazing job opportunities!