ChatGLM Tuning Download - ChatGLM Tuning Source code download

ChatGLM Tuning

AI Source Code

1.0.0

Download

ChatGLM-Tuning

A affordable chatgpt implementation solution, finetune based on Tsinghua's ChatGLM-6B + LoRA.

Dataset: alpaca

Students with colab can try it directly on colab:

Official ptuning code

Demo

Open source version of Wenxin Yiyan

S1 Finetune

Prepare

Graphics card: Graphics memory >= 16G (preferably 24G or above)
environment:
- python>=3.8
- cuda>=11.6, cupti, cuDNN, TensorRT and other deep learning environments
- pip3 install -r requirements.txt The installation package bitsandbytes in requirements.txt is recommended to install the version 0.41.2.post2. Previous versions may prompt an error: bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats

Data preprocessing

Convert alpaca dataset to jsonl

python cover_alpaca2jsonl.py 
    --data_path data/alpaca_data.json 
    --save_path data/alpaca_data.jsonl

tokenization

python tokenize_dataset_rows.py 
    --jsonl_path data/alpaca_data.jsonl 
    --save_path data/alpaca 
    --max_seq_length 200  
    --skip_overlength  False
    --chatglm_path model_path/chatglm
    --version v1

--jsonl_path fine-tuned data path, format jsonl, encode the ['context'] and ['target'] fields of each row
--save_path output path
--max_seq_length sample maximum length
--chatglm_path to import the model's path (you can choose different paths of chatglm or chatglm2)
--version model version (v1 refers to chatglm, v2 refers to chatglm2)

train

python finetune.py 
    --dataset_path data/alpaca 
    --lora_rank 8 
    --per_device_train_batch_size 6 
    --gradient_accumulation_steps 1 
    --max_steps 52000 
    --save_steps 1000 
    --save_total_limit 2 
    --learning_rate 1e-4 
    --fp16 
    --remove_unused_columns false 
    --logging_steps 50 
    --output_dir output
    --chatglm_path model_path/chat_glm