Llama 2_Huggingface_4Bit_QLoRA
1.0.0
更新版本可以在新的回购中找到
https://github.com/gmongaras/wizard_qlora_finetuning
使用HuggingFace的4位Qlora Falcon/Llama2型号的工作示例
要开始登录,编辑和运行main.py
填充完成后,您应该在./outputs中具有检查点。在运行推理之前,我们可以将洛拉的权重与原始权重结合,以更快地推断和推理期间的GPU要求较小。为此,请使用您的路径运行merge_weights.py脚本。
最后,您可以在合并的模型给定模型下运行生成generate.py 。
运行脚本的Python要求位于sumpliont.txt中
You should also download the Falcon weights of the 7B model here https://huggingface.co/tiiuae/falcon-7b and put the files in a directory ./tiiuae/falcon-7b or download the Llama-2 weights here https://huggingface.co/meta-llama/Llama-2-7b-hf and put them in a directory named ./llama-2
该脚本不支持4位登录上的多GPU。如果我找到了这样做的方法,我将更新脚本。
python -m pip uninstall bitsandbytes transformers accelerate peft -y
python -m pip install git+https://github.com/huggingface/transformers.git git+https://github.com/huggingface/peft.git git+https://github.com/huggingface/accelerate.git git+https://github.com/timdettmers/bitsandbytes.git -U
CUDA Setup failed despite GPU being available. Please run the following command to get more information ,然后您需要从源来构建bitsandbytes,然后按照https://github.com/oobabooga/text-generation-webui/issues/147 ,将其放入零件和字节site-package中。