This project leverages an AI agent for automated data extraction and processing. The system allows users to upload a CSV file or connect to a Google Sheet, then interact with the data using natural language queries. The agent generates Python code based on the query, executes the code to manipulate the data, and presents the result in various formats such as a table, plot, or string, or scrapes useful data for your file.
To run this project, you need Python 3.7 or later. The project uses several third-party libraries that can be installed via pip.
git clone https://github.com/UjjawalGusain/CheatSheet-Langchain-Project.git
cd Langchain-Web-Agent python3 -m venv venv
source venv/bin/activate # On Windows, use `venvScriptsactivate`
pip install -r requirements.txtSet up Google Sheets API:
Run the application:
streamlit run dashboard.pyAfter running this command, the application will start, and you can access it through your browser.
This project utilizes multiple APIs to handle different operations, including data scraping and interaction with the model. The APIs integrated are:
Groq API:
Model: llama-3.1-70b-versatile
Purpose: The Groq API is used to interact with the large language model for generating responses, executing operations on the dataset, and handling complex queries. The model llama-3.1-70b-versatile is utilized for efficient natural language understanding and generation, helping process queries and produce actionable results.
Usage:
The API is called to process queries related to the data, including operations like extraction, filtering, and generating summaries. The response from the model helps in shaping the operations applied to the dataset.
The prompts used for the model are structured in a specific format to ensure the desired response and avoid errors during execution.
Scraper API:
Purpose: The Scraper API is used to gather additional data from external sources and append this data to the dataset.
Usage:
Select a data source from the sidebar: either Upload CSV or Connect Google Sheets.
Enter a Query:
View Results:
Prompt Formatting and Complexity: A significant challenge was ensuring that the prompts passed to the model were correctly formatted and handled by the system. The model needed to generate accurate responses based on the structure and complexity of the queries. It was also important to maintain clarity and consistency in the way information was extracted and presented to the user, especially with complex queries.
Managing Security Risks with LLMs: Leveraging large language models (LLMs) introduced potential security risks, particularly concerning data privacy and the handling of sensitive information. Ensuring that no confidential or private data was inadvertently exposed while interacting with the model was a critical aspect of the development process. We had to implement safeguards to minimize these risks while using LLMs for generating code and processing data.