LawMate Romania is a project focused on creating a Large Language Model (LLM) specialized in the Romanian legal domain. This model is designed to assist with various legal tasks by understanding and generating text based on Romanian legal documents. The project uses the Equall/Saul-7B-Instruct-v1 pre-trained model from Hugging Face's library, specifically fine-tuned on Romanian legal texts like the Constitution and the Education Law.
documents/: Contains text documents used for training the model, including the Romanian Constitution and the Education Law.
training_ds/: Contains the dataset files generated from the text documents for training purposes.
env_llm.txt: Lists the dependencies and environment settings required to run the project.
main.py: The main script for training and evaluating the Large Language Model (LLM).
.gitignore: Specifies files and directories to be ignored by Git to keep the repository clean.
LawMate Romania/: Includes the chatbot script and screenshots demonstrating example interactions.
Set Up the Environment:
pip install -r env_llm.txtPrepare the PDF Files:
documents/ directory.Fine-Tune the Model:
main.py to fine-tune the pre-trained LLM on the provided dataset.Evaluate and Save the Model: