Applications of RAG and LLM in Financial Q&A
在大型语言模型加速催化各式技术的年代,语言模型的开发周期越来越短、效能越来越强。随着大型语言模型的问世,金融业庞大且复杂的资料已经不再是语料检索无法高度泛化的障碍,而是逐渐被解决的问题。 本届挑战赛聚焦在金融问答领域,提供丰富的资料库供参赛者使用。参赛者需设计机制以提高检索结果的准确性,包括从提供的语料中找出完整回答问题的正确资料等基本要求,以及应用大型语言模型的生成能力,产出正确且完整的回答。
Download the Repo
git clone https://github.com/FanChiMao/Competition-2024-PyTorch-LLMRAG.git
cd Competition-2024-PyTorch-LLMRAG
git submodule update --init
Prepare the environment
❗ Noted: Please check your GPU and OS environment, and go to the PyTorch Website to install Pytorch first.
conda create --name LLMRAG python=3.10 # to reproduce the results, you have to install python 3.10
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # take cuda 11.8 as example
pip install -r requirements.txt
Go to the official website to download the datasets. (due to the policy, we can't provide the dataset)
You can directly run the script
cd scripts
1.download_preliminary_data.bat
or run the snippet at ./datasets/download_preliminary_datasets.py
cd datasets
python ./download_preliminary_datasets.py
Place the dataset in ./datasets.
You can directly run the script to run the baseline code
cd scripts
2.run_baseline_code.bat
or run the snippet at ./main_baseline.py
python ./main_baseline.py
After running the baseline code, it will generate the json result on ./output/baseline.json
To reproduce our submitted results, you can run
cd scripts
3.run_preliminary_results.bat
or run the snippet at ./main_preliminary.py
python ./preliminary_results.py
After running the baseline code, it will generate the json result on ./output/preliminary_results.json
python ./evaluation.py --gt [path of ground_truths_example.json] --rs [path of output json]
take baseline result for example:
python ./evaluation.py --gt ./datasets/preliminary/ground_truths_example.json --rs ./outputs/baseline.json