Llama 2_Huggingface_4Bit_QLoRA下载Llama 2_Huggingface_4Bit

Llama 2_Huggingface_4Bit_QLoRA

Ai源码

1.0.0

下载

更新注释

更新版本可以在新的回购中找到

https://github.com/gmongaras/wizard_qlora_finetuning

llama-2_huggingface_4bit_qlora

使用HuggingFace的4位Qlora Falcon/Llama2型号的工作示例

要开始登录，编辑和运行main.py

填充完成后，您应该在./outputs中具有检查点。在运行推理之前，我们可以将洛拉的权重与原始权重结合，以更快地推断和推理期间的GPU要求较小。为此，请使用您的路径运行merge_weights.py脚本。

最后，您可以在合并的模型给定模型下运行生成generate.py 。

要求

运行脚本的Python要求位于sumpliont.txt中

You should also download the Falcon weights of the 7B model here https://huggingface.co/tiiuae/falcon-7b and put the files in a directory ./tiiuae/falcon-7b or download the Llama-2 weights here https://huggingface.co/meta-llama/Llama-2-7b-hf and put them in a directory named ./llama-2

多个GPU

该脚本不支持4位登录上的多GPU。如果我找到了这样做的方法，我将更新脚本。

GPU要求

基本模型大约需要6 GB的内存。
填充取决于适配器大小，批处理大小，最大长度等。在当前配置中，内存使用率约为8GB。

问题

如果训练时出现形状错误，则BitsandBytes和/或PEFT存在问题。解决这个问题的最好方法是完全卸载它们并将其从来源重新安装：

 python -m pip uninstall bitsandbytes transformers accelerate peft -y
python -m pip install git+https://github.com/huggingface/transformers.git git+https://github.com/huggingface/peft.git git+https://github.com/huggingface/accelerate.git git+https://github.com/timdettmers/bitsandbytes.git -U

如果您遇到了错误CUDA Setup failed despite GPU being available. Please run the following command to get more information ，然后您需要从源来构建bitsandbytes，然后按照https://github.com/oobabooga/text-generation-webui/issues/147 ，将其放入零件和字节site-package中。

展开

附加信息