The Solidity Smart-Contract Advisor is a cutting-edge tool designed to extend the capabilities of traditional language models in understanding and advising on Solidity smart contracts. At its core, the project integrates OpenAI's language models with a vast repository of verified smart contracts and comprehensive Solidity documentation, creating a specialized, context-aware system capable of delivering expert insights in the realm of smart contract development.
This project represents a significant stride in bridging the gap between general-purpose language models and the nuanced, specialized requirements of blockchain development. By training on over 5,000 verified smart contracts and extensive Solidity documentation, the advisor provides not just theoretical answers, but practical, real-world advice grounded in verified code and documentation.
Success was measured by the system's ability to accurately parse, understand, and provide context-relevant advice on complex Solidity queries. This was achieved through a sophisticated blend of technologies, each playing a crucial role in enhancing the system's effectiveness.


Using etherscan.py, downloaded 5,000 known and verified smart contracts from Etherscan API. This forms the backbone of our analysis, providing real-world smart contract code for embedding and analysis.
Alongside smart contracts, extracted documentation as HTML files from the Solidity documentation website, ensuring a comprehensive base for the language model.
Custom-built LangChain RetrievalQA model processes the above data. I've created embedding chunks of size 1000 with an overlap of 1000, resulting in around 200,000 chunks for detailed analysis.
For smart contracts, a Solidity-specific parser breaks down the contracts into manageable chunks. For the HTML content (the official documentation of the solidity), I employ a standard text splitter. These parsed chunks are then embedded for further processing.
In this project, the Weaviate Vector Database, equipped with the text2vec-contextionary module, plays a pivotal role in managing and querying vectorized data derived from Solidity smart contracts and documentation. Vector databases like Weaviate are designed to handle vectorized data – essentially, data points represented in multi-dimensional space. This is particularly useful in the realm of machine learning and natural language processing, where complex data can be vectorized for efficient and meaningful analysis.
Weaviate utilizes machine learning models to convert text data into these vectors, allowing for high-speed and semantically relevant searches. In the context of the Solidity Smart-Contract Advisor, Weaviate stores and manages the embeddings (vector representations) of Solidity contracts and documentation. These embeddings capture the nuanced meanings and contexts of the text, enabling the system to provide accurate, context-aware responses to user queries.
The text2vec-contextionary module specifically is critical for understanding the textual context. It allows the system to interpret the meaning behind words and phrases in the context of Solidity, enhancing the relevance and accuracy of search results and responses.
The role of Weaviate in this project is multifaceted:
For more information on Weaviate and its capabilities, you can visit their official website and documentation:
The use of Weaviate in this project ensures that the Solidity Smart-Contract Advisor is not only powerful in its analytical capabilities but also efficient and scalable, making it a state-of-the-art tool for smart contract analysis and advice.

I adapted and significantly modified the UI from whuang214 This ReactJS and TypeScript-based interface resembles a ChatGPT interaction model, tailored for our specific use case.
This Python-based Flask API serves as a crucial bridge between user queries and the powerful backend. It efficiently handles interactions with the Weaviate vectors and leverages OpenAI's GPT-4-0613 model for generating responses. The choice of GPT-4-0613 is strategic, due to its less restrictive nature, which is particularly beneficial for providing more direct and unfiltered answers on sensitive topics, such as re-entrancy in smart contracts.
For those interested in exploring other OpenAI Language Models, a comprehensive list can be found at OpenAI's Model Overview.
Additionally, the Flask API integrates LangChain, a framework designed to enhance the capabilities of language models through agents and prompt engineering. LangChain's modular architecture allows for easy integration of different language models, making the system highly adaptable and extendable. With the same interface, various other language models could be seamlessly incorporated, offering flexibility and scope for future enhancements in response capabilities.
The combination of Flask, Weaviate, and LangChain provides a robust, scalable, and versatile backend infrastructure, capable of handling complex queries with precision and delivering contextually relevant responses in the domain of Solidity smart contracts.

LangChain represents a revolutionary approach in the utilization of language models, playing a vital role in our Solidity Smart-Contract Advisor project. It's a framework specifically designed to augment the capabilities of language models, such as those provided by OpenAI, through the use of agents and augmented retrievals.
In LangChain, agents are modular components that enable a variety of interactions with language models. They can be programmed to perform specific tasks, such as parsing text, generating queries, or handling responses. This modular approach allows for a high degree of customization and flexibility, enabling the creation of complex workflows that can process, analyze, and respond to user queries with unparalleled precision.
One of the key features of LangChain is its ability to perform augmented retrievals. This involves using language models to enhance the retrieval of information from databases or other sources. In the context of this project, LangChain's augmented retrieval capability allows for more sophisticated and contextually relevant searches within the Weaviate Vector Database. It ensures that the retrieved information is not only relevant but also tailored to the specific nuances of Solidity smart contracts.
The integration of LangChain in this Flask API adds a layer of intelligence and adaptability. It empowers the system to handle complex, multifaceted queries that a standalone language model might struggle with. By leveraging both the raw computational power of language models and the strategic structuring offered by LangChain, the system can provide in-depth, accurate, and highly relevant responses to a wide range of queries related to Solidity smart contracts.
The use of LangChain essentially transforms the project into a more dynamic, intelligent, and responsive tool, capable of addressing the intricate and evolving needs of Solidity developers and enthusiasts.
For more information and a deeper dive into LangChain and its capabilities, visit the LangChain.
To interact with the Smart-Contract Advisor, users can input their queries through the UI. The system is designed to handle specific Solidity-related questions, providing detailed and accurate responses. For instance, asking about re-entrancy will yield direct and informative answers. Or even you can ask your code audition. It perfectly fits with the solidty-related questions. The ui has almost same capabilities of the ChatGPT interface including, registeration to chat histories. Aside of this, if you check the console you will find out that LLM model refferencing to which vectors from Weaviate. It could be a couple of verified smart-contract or even an official documentation of the solidity. All combined, examined and have a conclusion.

Setting up the Solidity Smart-Contract Advisor involves several steps, including configuring databases, environment variables, and running servers for both the backend and frontend. Below are the detailed instructions:
Starting MongoDB and Weaviate Services:
docker-compose up -d
Setting Environment Variables:
.env file in the root directory of the Python project and include the following:
OPENAI_KEY=your_openai_key
ETHERSCAN_KEY=your_etherscan_key
MONGODB_DSN=your_mongodb_dsn
Running the Backend and Client:
backend and client. Each of these needs to be started separately.backend folder and run:
npm i
npm run dev
client folder and run the same command:
npm i
npm run dev
Downloading Smart Contracts:
contracts.csv file, which can be downloaded from Etherscan.etherscan.py script to download 5000 verified smart contract source codes into the downloaded_contracts folder:
python etherscan.py

weaviate_ingest.py script to chunk the data and insert it into the Weaviate Vector Database:
python weaviate_ingest.py

Ingesting HTML Files:
weaviate_ingest_htmls.py script in a similar manner:
python weaviate_ingest_htmls.py
Alternative to Weaviate - Using Pinecone:
pinecone_ingest.py script is provided:
python pinecone_ingest.py
After completing these steps, your Solidity Smart-Contract Advisor should be up and running, ready to provide insights and advice on Solidity smart contracts.
All contributions are welcome :)
The Solidity Smart-Contract Advisor, as it stands, is a robust and innovative tool. However, the realm of blockchain and smart contract technology is constantly evolving, presenting numerous opportunities for future enhancements and expansions of this project.
Deep Dive into YUL – The Intermediate Language: A key area for future development is the incorporation of YUL, Ethereum's intermediate language. By understanding and analyzing YUL, the Smart-Contract Advisor can offer insights not just at the Solidity level, but also at the more granular level of compiled contracts. This opens up new avenues for understanding how Solidity code translates into lower-level operations.
Opcode-Level Analysis and Gas Optimization: One of the most challenging aspects of smart contract development is optimizing for gas consumption. By expanding the advisor's capabilities to include opcode-level analysis, developers can receive suggestions for more efficient code patterns and potential gas optimizations. This would involve teaching the system to understand the subtleties of opcode execution costs and how they accumulate in different smart contract scenarios.
Convenient Access to LLM Support for Low-Level Coding: Enhancing the advisor to provide convenient and user-friendly support for low-level coding practices. This could involve integrating specialized prompts or agents within LangChain that are adept at understanding and suggesting optimizations in YUL or directly in opcodes. The goal is to make gas optimization and low-level coding more accessible to developers who primarily work with high-level languages like Solidity.
Bridging High-Level and Low-Level Development: By bridging the gap between high-level Solidity code and low-level YUL or opcodes, the advisor can provide a holistic view of smart contract development. This includes not just writing efficient high-level code but also understanding its implications at the blockchain execution level.
These future enhancements in the realm of YUL and opcode optimization are aimed at providing developers with deeper insights into the inner workings of their smart contracts. This will not only aid in writing more efficient and cost-effective code but also contribute to a better understanding of the Ethereum execution environment, ultimately leading to more robust and optimized smart contract development.
These enhancements aim to solidify the Solidity Smart-Contract Advisor as a cutting-edge tool, indispensable for developers in the blockchain space. The ultimate goal is to create a dynamic, intelligent system that not only responds to current needs but also anticipates and adapts to future trends in smart contract development.
This project is released under the MIT License.
Deniz Umut Dereli