Applications of RAG and LLM in Financial Q&A
在大型語言模型加速催化各式技術的年代,語言模型的開發週期越來越短、效能越來越強。隨著大型語言模型的問世,金融業龐大且複雜的資料已經不再是語料檢索無法高度泛化的障礙,而是逐漸被解決的問題。 本屆挑戰賽聚焦在金融問答領域,提供豐富的資料庫供參賽者使用。參賽者需設計機制以提高檢索結果的準確性,包括從提供的語料中找出完整回答問題的正確資料等基本要求,以及應用大型語言模型的生成能力,產出正確且完整的回答。
Download the Repo
git clone https://github.com/FanChiMao/Competition-2024-PyTorch-LLMRAG.git
cd Competition-2024-PyTorch-LLMRAG
git submodule update --init
Prepare the environment
❗ Noted: Please check your GPU and OS environment, and go to the PyTorch Website to install Pytorch first.
conda create --name LLMRAG python=3.10 # to reproduce the results, you have to install python 3.10
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # take cuda 11.8 as example
pip install -r requirements.txt
Go to the official website to download the datasets. (due to the policy, we can't provide the dataset)
You can directly run the script
cd scripts
1.download_preliminary_data.bat
or run the snippet at ./datasets/download_preliminary_datasets.py
cd datasets
python ./download_preliminary_datasets.py
Place the dataset in ./datasets.
You can directly run the script to run the baseline code
cd scripts
2.run_baseline_code.bat
or run the snippet at ./main_baseline.py
python ./main_baseline.py
After running the baseline code, it will generate the json result on ./output/baseline.json
To reproduce our submitted results, you can run
cd scripts
3.run_preliminary_results.bat
or run the snippet at ./main_preliminary.py
python ./preliminary_results.py
After running the baseline code, it will generate the json result on ./output/preliminary_results.json
python ./evaluation.py --gt [path of ground_truths_example.json] --rs [path of output json]
take baseline result for example:
python ./evaluation.py --gt ./datasets/preliminary/ground_truths_example.json --rs ./outputs/baseline.json