Chinese LLaMA Alpaca Usage 다운로드 - Chinese LLaMA Alpaca Usage 소스 코드 다운로드

이 프로젝트는 지침에 대한 중국-알라마 알파카 v3.1을 기반으로합니다. 중국-알라마 알파카는 라마를 기반으로 한 중국의 확장과 개선을 개척했습니다. 원래의 LLAMA를 기반으로 중국어 어휘 목록을 확장하고 중국 데이터를 2 차 사전 훈련에 사용하여 중국의 기본 의미 론적 이해 능력을 더욱 향상 시켰습니다.

프로젝트 구성 : <소스>

.
├── README.md # 使用说明文件
├── SHA256.md # LLaMA模型SHA值对比文件
├── notebooks
│   ├── convert_and_quantize_chinese_alpaca_plus.ipynb
│   └── convert_and_quantize_chinese_llama.ipynb
├── requirements.txt # 依赖文件
└── scripts
    ├── chinese_sp.model # 中文词表文件
    ├── crawl_prompt.py # 1. 通过OpenAI的大模型（如ChatGPT、GPT4等）生成可用于微调的数据
    ├── inference_hf.py # 5. 对微调训练产生的LoRA模型和原始LLaMA模型做推理
    ├── merge_llama_with_chinese_lora.py # 4. 合并模型权重
    ├── merge_tokenizers.py # 2. 词表扩充
    └── run_clm_pt_with_peft.py # 3. 对模型进行训练或者微调

1. 데이터 준비 <소스>

사전 훈련이든 미세 조정하든 데이터를 준비해야합니다. 데이터를 준비하는 두 가지 방법이 있습니다.

(공개) 미세 조정 또는 교육에 사용할 수있는 공개 표준 데이터를 사용할 수 있다면이 단계를 건너 뛸 수 있습니다.
(생성) 적절한 미세 조정 또는 교육 데이터가없는 경우 scripts/crawl_prompt.py 사용하여 해당 데이터를 생성 할 수 있습니다. 기본 아이디어는 데이터 생성에 ChatGpt 또는 기타 OpenAI 효율적인 모델을 사용하는 것입니다.

2. 라마 무게를 준비하십시오

 # tokenizer
wget https : // agi . gpt4 . org / llama / LLaMA / tokenizer . model - O . / tokenizer . model
wget https : // agi . gpt4 . org / llama / LLaMA / tokenizer_checklist . chk - O . / tokenizer_checklist . chk
# 7B
wget https : // agi . gpt4 . org / llama / LLaMA / 7 B / consolidated . 00. pth - O . / 7 B / consolidated . 00. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 7 B / params . json - O . / 7 B / params . json
wget https : // agi . gpt4 . org / llama / LLaMA / 7 B / checklist . chk - O . / 7 B / checklist . chk
# 13B
wget https : // agi . gpt4 . org / llama / LLaMA / 13 B / consolidated . 00. pth - O . / 13 B / consolidated . 00. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 13 B / consolidated . 01. pth - O . / 13 B / consolidated . 01. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 13 B / params . json - O . / 13 B / params . json
wget https : // agi . gpt4 . org / llama / LLaMA / 13 B / checklist . chk - O . / 13 B / checklist . chk
# 30B
wget https : // agi . gpt4 . org / llama / LLaMA / 30 B / consolidated . 00. pth - O . / 30 B / consolidated . 00. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 30 B / consolidated . 01. pth - O . / 30 B / consolidated . 01. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 30 B / consolidated . 02. pth - O . / 30 B / consolidated . 02. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 30 B / consolidated . 03. pth - O . / 30 B / consolidated . 03. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 30 B / params . json - O . / 30 B / params . json
wget https : // agi . gpt4 . org / llama / LLaMA / 30 B / checklist . chk - O . / 30 B / checklist . chk
# 65B
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / consolidated . 00. pth - O . / 65 B / consolidated . 00. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / consolidated . 01. pth - O . / 65 B / consolidated . 01. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / consolidated . 02. pth - O . / 65 B / consolidated . 02. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / consolidated . 03. pth - O . / 65 B / consolidated . 03. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / consolidated . 04. pth - O . / 65 B / consolidated . 04. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / consolidated . 05. pth - O . / 65 B / consolidated . 05. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / consolidated . 06. pth - O . / 65 B / consolidated . 06. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / consolidated . 07. pth - O . / 65 B / consolidated . 07. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / params . json - O . / 65 B / params . json
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / checklist . chk - O . / 65 B / checklist . chk

다양한 크기 매개 변수의 LLAMA 모델 가중치를 다운로드해야합니다. 매개 변수가 클수록 무게가 클수록 무게가 클수록 정확도가 높아지고 미세 조정 및 훈련 시간도 비교적 길다. 일반적으로 7B 또는 13B 모델은 대부분의 사람들의 선택이 될 것입니다.

LLAMA 기본 모델의 무결성을 확인하고 SHA256.MD에 표시된 값과 일치하는지 확인하십시오. 그렇지 않으면 병합 작업을 수행 할 수 없습니다.

3. HF 형식 가중치로 변환하십시오

 # 安装依赖库
pip install git + https : // github . com / huggingface / transformers

# 转化HF权重
python - m transformers . models . llama . convert_llama_weights_to_hf 
   - - input_dir llama - weights 
   - - model_size 7 B 
   - - output_dir llama - hf - weights
  
> python - m transformers . models . llama . convert_llama_weights_to_hf - - input_dir . / - - model_size 7 B - - output_dir . / output / 7 B - hf

수동으로 직접 변환하지 않으려면 다른 사람들이 변환 한 LLAMA-HF 모델도 사용할 수도 있습니다. Pinkmanlove는 Huggingface에서 변환 된 llama-HF의 무게를 가지고 있습니다. 실패하면 다른 사람들을 검색하여 HuggingFace-Models 로 잘 전환 할 수 있습니다.

4. 모델 <소스> 훈련 및 미세 조정

전체 교육 및 미세 조정 프로세스는 세 가지 단계로 구성됩니다.

어휘 확장;
사전 훈련 (선택 사항);
지침의 미세 조정;

4.1 어휘 확장 <소스>

 python scripts / merge_tokenizers . py 
  - - llama_tokenizer_dir llama_tokenizer_dir 
  - - chinese_sp_model_file chinese_sp_model_file

> python scripts / merge_tokenizers . py - - llama_tokenizer_dir output / 7 B - hf - - chinese_sp_model_file scripts / chinese_sp . model

매개 변수 설명 :

llama_tokenizer_dir : 원래 llama 토큰 화기가 저장된 디렉토리를 가리키십시오.
chinese_sp_model_file : 문장과 함께 훈련 된 중국어 어휘 파일 (Chinese_sp.Model)을 가리 키십시오.

메모
어휘 목록을 확장하는 두 가지 주요 방법이 있습니다. (1) 어휘 목록을 병합하고 확장합니다. (2) 큰 어휘 목록을 찾아서 쓸모없는 단어를 삭제하여 어휘 목록을 얻습니다.

4.2 사전 훈련 (선택 사항)

사전 훈련 단계에서, 일반 중국 코퍼스는 원래 라마 가중치를 기준으로 더 사전 훈련에 사용됩니다. 프로세스는 두 단계로 나뉩니다.

첫 번째 단계 : 동결 변압기 매개 변수, 트레인 임베딩 만 및 원래 모델을 방해하지 않고 새로 추가 된 중국어 단어 벡터에 적응합니다.
두 번째 단계 : LORA 기술을 사용하여 모델에 LORA 가중치 (어댑터)를 추가하고 교육 임베딩 중에 LORA 매개 변수를 업데이트합니다.

모델은 사전 훈련의 첫 단계에서 천천히 수렴합니다. 특히 풍부한 시간과 컴퓨팅 리소스가없는 경우이 단계를 건너 뛰는 것이 좋습니다. 사전 훈련 교육의 두 번째 단계는 다음과 같습니다 (단일 플레이어 단일 카드) :

 ########参数设置########
lr = 2e-4
lora_rank = 8
lora_alpha = 32
lora_trainable = "q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
modules_to_save = "embed_tokens,lm_head"
lora_dropout = 0.05

pretrained_model = path / to / hf / llama / dir
chinese_tokenizer_path = path / to / chinese / llama / tokenizer / dir
dataset_dir = path / to / pt / data / dir
data_cache = temp_data_cache_dir
per_device_train_batch_size = 1
per_device_eval_batch_size = 1
training_steps = 100
gradient_accumulation_steps = 1
output_dir = output_dir

deepspeed_config_file = ds_zero2_no_offload . json

########启动命令########
torchrun - - nnodes 1 - - nproc_per_node 1 run_clm_pt_with_peft . py 
    - - deepspeed ${ deepspeed_config_file } 
    - - model_name_or_path ${ pretrained_model } 
    - - tokenizer_name_or_path ${ chinese_tokenizer_path } 
    - - dataset_dir ${ dataset_dir } 
    - - data_cache_dir ${ data_cache } 
    - - validation_split_percentage 0.001 
    - - per_device_train_batch_size ${ per_device_train_batch_size } 
    - - per_device_eval_batch_size ${ per_device_eval_batch_size } 
    - - do_train 
    - - seed $ RANDOM 
    - - fp16 
    - - max_steps ${ training_steps } 
    - - lr_scheduler_type cosine 
    - - learning_rate ${ lr } 
    - - warmup_ratio 0.05 
    - - weight_decay 0.01 
    - - logging_strategy steps 
    - - logging_steps 10 
    - - save_strategy steps 
    - - save_total_limit 3 
    - - save_steps 500 
    - - gradient_accumulation_steps ${ gradient_accumulation_steps } 
    - - preprocessing_num_workers 8 
    - - block_size 512 
    - - output_dir ${ output_dir } 
    - - overwrite_output_dir 
    - - ddp_timeout 30000 
    - - logging_first_step True 
    - - lora_rank ${ lora_rank } 
    - - lora_alpha ${ lora_alpha } 
    - - trainable ${ lora_trainable } 
    - - modules_to_save ${ modules_to_save } 
    - - lora_dropout ${ lora_dropout } 
    - - torch_dtype float16 
    - - gradient_checkpointing 
    - - ddp_find_unused_parameters False

매개 변수 설명 :

--model_name_or_path : 원래 HF 형식 Llama 모델이있는 디렉토리;
--tokenizer_name_or_path : 중국어 토큰 화기가 위치한 디렉토리 (merge_tokenizers.py synthesis의 결과);
--dataset_dir : txt에서 끝나는 여러 일반 텍스트 파일을 포함 할 수있는 미리 훈련 된 데이터의 디렉토리;
--data_cache_dir : 데이터 캐시 파일이 저장된 디렉토리를 지정합니다.

여러 기계 및 여러 카드 :

 torchrun 
  - - nnodes ${ num_nodes } 
  - - nproc_per_node ${ num_gpu_per_node } 
  - - node_rank ${ node_rank } 
  - - master_addr ${ master_addr } 
  - - master_port ${ master_port } 
  run_clm_pt_with_peft . py 
    ...

중국 라마 모델은 원래 버전을 기반으로 중국 어휘 목록을 확장했으며 2 차 사전 훈련에 중국 일반 텍스트 데이터를 사용했습니다. 여기서 저자는 우리가 스스로 훈련하는 데 자원을 소비하지 않고 이러한 사전 훈련 가중치를 다운로드하는 두 가지 방법을 제공합니다.

(1) Google Drive 또는 Baidu NetDisk

모델 이름	교육 데이터	모델을 리팩토링합니다	크기	로라 다운로드
중국어-7B	일반 20g	원래 llama-7b	770m	[Baidu Netdisk] [구글 드라이브]
중국어-플러스 -7b 슬	범용 120g	원래 llama-7b	790m	[Baidu Netdisk] [구글 드라이브]
중국-줄라기 -13b	일반 20g	원래 llama-13b	1g	[Baidu Netdisk] [구글 드라이브]
중국어-플러스 -13b ️	범용 120g	원래 llama-13b	1g	[Baidu Netdisk] [구글 드라이브]

(2) 위의 모든 모델은 모델 허브에서 다운로드 할 수 있으며 중국 라마 모델은 변압기와 PEFT를 사용하여 호출 할 수 있습니다. 다음 모델 호출 이름은 .from_pretrained() 에 지정된 모델 이름을 나타냅니다.

모델 이름	모델 호출 이름	링크
중국어-7B	Ziqingyang/Chinese-Llama-lora-7b	모델 허브 링크
중국어-플러스 -7b	Ziqingyang/Chinese-Llama-plus-lora-7b	모델 허브 링크
중국-줄라기 -13b	Ziqingyang/Chinese-Llama-lora-13b	모델 허브 링크
중국어-플러스 -13B	Ziqingyang/Chinese-Llama-plus-lora-13b	모델 허브 링크

4.3 명령 미세 조정

교육 체계는 또한 효율적인 미세 조정을 위해 LORA를 사용하고 훈련 가능한 매개 변수의 수를 추가로 증가시킵니다.

싱글 플레이어 싱글 카드 :

 ########参数部分########
lr = 1e-4
lora_rank = 8
lora_alpha = 32
lora_trainable = "q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
modules_to_save = "embed_tokens,lm_head"
lora_dropout = 0.05

pretrained_model = path / to / hf / llama / or / merged / llama / dir / or / model_id
chinese_tokenizer_path = path / to / chinese / llama / tokenizer / dir
dataset_dir = path / to / sft / data / dir
per_device_train_batch_size = 1
per_device_eval_batch_size = 1
training_steps = 100
gradient_accumulation_steps = 1
output_dir = output_dir
peft_model = path / to / peft / model / dir
validation_file = validation_file_name

deepspeed_config_file = ds_zero2_no_offload . json

########启动命令########
torchrun - - nnodes 1 - - nproc_per_node 1 run_clm_sft_with_peft . py 
    - - deepspeed ${ deepspeed_config_file } 
    - - model_name_or_path ${ pretrained_model } 
    - - tokenizer_name_or_path ${ chinese_tokenizer_path } 
    - - dataset_dir ${ dataset_dir } 
    - - validation_split_percentage 0.001 
    - - per_device_train_batch_size ${ per_device_train_batch_size } 
    - - per_device_eval_batch_size ${ per_device_eval_batch_size } 
    - - do_train 
    - - do_eval 
    - - seed $ RANDOM 
    - - fp16 
    - - max_steps ${ training_steps } 
    - - lr_scheduler_type cosine 
    - - learning_rate ${ lr } 
    - - warmup_ratio 0.03 
    - - weight_decay 0 
    - - logging_strategy steps 
    - - logging_steps 10 
    - - save_strategy steps 
    - - save_total_limit 3 
    - - evaluation_strategy steps 
    - - eval_steps 250 
    - - save_steps 500 
    - - gradient_accumulation_steps ${ gradient_accumulation_steps } 
    - - preprocessing_num_workers 8 
    - - max_seq_length 512 
    - - output_dir ${ output_dir } 
    - - overwrite_output_dir 
    - - ddp_timeout 30000 
    - - logging_first_step True 
    - - lora_rank ${ lora_rank } 
    - - lora_alpha ${ lora_alpha } 
    - - trainable ${ lora_trainable } 
    - - modules_to_save ${ modules_to_save } 
    - - lora_dropout ${ lora_dropout } 
    - - torch_dtype float16 
    - - validation_file ${ validation_file } 
    - - peft_path ${ peft_model } 
    - - gradient_checkpointing 
    - - ddp_find_unused_parameters False

매개 변수 설명 :

--tokenizer_name_or_path : 중국-알파카 토큰 화기가있는 디렉토리 (merge_tokenizers.py 합성 결과);
--dataset_dir : json에서 끝나는 Stanford alpaca 형식의 하나 이상의 명령 미세 조정 데이터 파일을 포함하는 명령 미세 조정 데이터를위한 디렉토리;
--validation_file : JSON으로 끝나는 유효성 검사 세트로 사용되는 단일 명령 미세 조정 파일은 또한 Stanford Alpaca 형식을 따릅니다.

소위 스탠포드 알파카 형식은 다음과 같습니다.

[
  { "instruction" : ...,
   "input" : ...,
   "output" : ... },
  ...
]

여기에서 데이터는 중국-롤라 마라 카카 사용/# 준비 데이터 생성 방법을 사용하여 생성 할 수 있습니다.

구성 지침 :

중국-알파카 모델의 로라 웨이트를 계속 훈련하려면 :
- --model_name_or_path : 원래 HF 형식 Llama 모델 (비 플러스 알파카 모델을 계속 훈련시키는 경우) 또는 중국어-플러스-로라 (플러스 모델을 계속 훈련하는 경우);
- --peft_path : 중국-알파카의 로라 체중 디렉토리;

--lora_rank , --lora_alpha , --lora_dropout , --trainable 및 --modules_to_save 매개 변수를 지정할 필요가 없습니다.

중국-롤라 훈련을 기반으로 LORA 웨이트를 미세 조정하려면 :
- --model_name_or_path : 중국-줄라기 후 중국어-롤라 모델을 병합합니다 (플러스 모델인지 여부에 관계없이);
- --peft_path :이 매개 변수를 제공하지 말고 스크립트에서 --peft_path 삭제하십시오.

--lora_rank , --lora_alpha , --lora_dropout , --trainable 및 --modules_to_save 매개 변수를 지정해야합니다.

여러 기계 및 여러 카드 :

 torchrun 
  - - nnodes ${ num_nodes } 
  - - nproc_per_node ${ num_gpu_per_node } 
  - - node_rank ${ node_rank } 
  - - master_addr ${ master_addr } 
  - - master_port ${ master_port } 
  run_clm_sft_with_peft . py 
    ...

5. 무게를 병합 <소스>

5.1 단일 로라 무게 중첩

중국-롤라마, 중국-줄라기 플러스, 중국-알파카에 적합합니다

 python scripts / merge_llama_with_chinese_lora . py 
    - - base_model path_to_original_llama_hf_dir 
    - - lora_model path_to_chinese_llama_or_alpaca_lora 
    - - output_type [ pth | huggingface ]
    - - output_dir path_to_output_dir

매개 변수 설명 :

--base_model : LLAMA 모델 가중치 및 구성 파일이 HF 형식으로 저장되는 디렉토리;
--lora_model : 중국어 llama/alpaca lora에서 파일이 압축되는 디렉토리, 모델 허브 모델 호출 이름을 사용할 수도 있습니다.
--output_type : pth 또는 huggingface 일 수있는 출력 형식을 지정합니다. 지정되지 않은 경우 기본값은 pth 입니다.
--output_dir : 전체 모델 가중치를 저장하는 디렉토리를 지정하고 기본값은 ./ ;
(선택 사항) --offload_dir : 저 메모리 사용자의 경우 오프로드 캐시 경로를 지정해야합니다.

output_type 에 대한 추가 설명 :

.pth 파일은 다음과 같은 양자화 및 배포에 사용될 수 있습니다. llama.cpp 도구;
.bin 파일은 다음에 사용될 수 있습니다 : 추론을위한 변압기; 인터페이스 구축을위한 텍스트-세대-부비;

단일 로라 가중치 합병은 온라인으로 수행되며 동시에 정량화합니다.

5.2 다중 LORA 중량 겹침

중국-알파카-플러스를 병합하려면 두 개의 로라 가중치, 즉 중국-줄라기-로라와 중국-알파카-플러스 로라가 필요합니다.

 python scripts / merge_llama_with_chinese_lora . py 
    - - base_model path_to_original_llama_hf_dir 
    - - lora_model path_to_chinese_llama_plus_lora , path_to_chinese_alpaca_plus_lora 
    - - output_type [ pth | huggingface ]
    - - output_dir path_to_output_dir

켈 두 LORA 모델의 순서는 중요하며 반전 할 수 없습니다. Llama-plus-lora를 먼저 작성한 다음 Alpaca-plus-lora를 작성하십시오. 켈

멀티 로라 가중치 합병은 온라인으로 수행되며 동시에 정량화합니다.

6. 모델 <소스>를 배포하고 실행하십시오

 CUDA_VISIBLE_DEVICES = 0 python scripts / inference_hf . py 
    - - base_model path_to_original_llama_hf_dir 
    - - lora_model path_to_chinese_llama_or_alpaca_lora 
    - - with_prompt 
    - - interactive

merge_llama_with_chinese_lora_to_hf.py 스크립트가 Lora 가중치를 병합하기 전에 실행 된 경우 --lora_model 지정할 필요가 없으며 시작 방법이 더 간단합니다.

 CUDA_VISIBLE_DEVICES = 0 python scripts / inference_hf . py 
    - - base_model path_to_merged_llama_or_alpaca_hf_dir 
    - - with_prompt 
    - - interactive

CUDA_VISIBLE_DEVICES=0 이 삭제되면 CPU 추론 모드입니다. 물론 Webui에 실행 및 배포 할 수도 있습니다. <소스>

이 프로젝트의 모델은 주로 다음의 정량화, 추론 및 배치 방법을 지원합니다.

추론 및 배치 방법	특징	플랫폼	CPU	GPU	정량적 하중	그래픽 인터페이스	지도 시간
llama.cpp	풍부한 양적 옵션과 효율적인 지역 추론	일반적인	✅	✅	✅		링크
? 변압기	기본 변압기 추론 인터페이스	일반적인	✅	✅	✅	✅	링크
텍스트-세대-부이	프론트 엔드 웹 UI 인터페이스를 배포하는 방법	일반적인	✅	✅	✅	✅	링크
llamachat	MACOS의 그래픽 상호 작용 인터페이스 (LLAMA.CPP 모델과 일치해야 함)	마코스	✅		✅	✅	링크