ดาวน์โหลด Chinese Llama 2 7b ดาวน์โหลด - Chinese Llama 2 7b ดาวน์โหลดรหัสแหล่งที่มา

Llama จีน 2 7b

โอเพ่นซอร์สทั้งหมด รุ่น LLAMA2 รุ่นที่มีการค้าครบถ้วนและชุดข้อมูล SFT ภาษาจีน-อังกฤษ รูปแบบอินพุตอย่างเคร่งครัดเป็นไปตามรูปแบบ Llama-2-Chat และเข้ากันได้กับการเพิ่มประสิทธิภาพทั้งหมดสำหรับรุ่น Llama-2-Chat ดั้งเดิม

Llama2 7b จีน

การสาธิตขั้นพื้นฐาน

การสาธิตฐาน

ทดลองใช้ออนไลน์

พูดคุยราคาถูกแสดงการสาธิต

ที่อยู่สาธิต/huggingface
colab (FP16/จำเป็นต้องเปิดใช้งาน RAM สูงและไม่สามารถใช้เวอร์ชันฟรีได้)
ไม่สามารถใช้ colab (int4/จำเป็นต้องใช้ RAM สูงและไม่สามารถใช้เวอร์ชันฟรีได้)

อัปเดตล่าสุด

จัดทำเมื่อวันที่ 26 ตุลาคมด้วยรูปแบบการแชท Llama2 ของจีน
ลิงค์รุ่นใหม่เพื่อเพิ่มโมเดลแชท Llama2 จีนในวันที่ 24 สิงหาคม
ในวันที่ 31 กรกฎาคมภาษาจีน-อังกฤษสองภาษา-ข้อความ LLASM แบบจำลองหลายรูปแบบโอเพ่นซอร์ส
ในวันที่ 31 กรกฎาคมภาษาจีน-ภาษาอังกฤษสองภาษาข้อความวิสัยทัศน์ภาษาจีน-llava แบบหลายรูปแบบโอเพ่น
วันที่ 26 กรกฎาคม Chinese-llama2-7B-GGML Open Open Source
อัปเดตรุ่น 7B เมื่อวันที่ 23 กรกฎาคมเพิ่ม API และจัดทำโมเดลเชิงปริมาณ 4 บิต
รหัสการฝึกอบรม/การอนุมานของ SFT จะเปิดตัวในวันที่ 22 กรกฎาคม
ในวันที่ 21 กรกฎาคม Docker จะถูกนำไปใช้ออนไลน์ด้วยคลิกเดียว
การสาธิตจะเปิดตัวในวันที่ 21 กรกฎาคม
21 กรกฎาคมโอเพ่นซอร์สข้อมูลสองภาษาจีนและภาษาอังกฤษ
โอเพ่นซอร์สรุ่นจีน-llama2-7B 21 กรกฎาคม

ดาวน์โหลดทรัพยากร

ดาวน์โหลดรุ่น
- Shizhi AI: Model Chinese Llama2 Chat
- ModelsCope: รุ่นแชท Llama2 จีน
- HuggingFace: รุ่นแชท Llama2 จีน
- Baidu Netdisk: 1.0 เวอร์ชันทางการ
- Baidu Netdisk: 1.1 เวอร์ชันไฟขั้นสูง
ปริมาณ 4 บิต
- HuggingFace: Llama2 4bit แชทรุ่นจีน
- Baidu Netdisk: Llama2 4bit Chines Chines
รุ่น GGML Q4:
- https://huggingface.co/linksoul/chinese-llama-2-7b-ggml
- https://huggingface.co/rffx0/chinese-llama-2-7b-ggml-model-q4_0
- https://huggingface.co/soulteary/chinese-llama-2-7b-ggml-q4
- Baidu Netdisk: Chinese-Llama-2-7B-GGML

เราใช้ชุดข้อมูล SFT ภาษาจีนและภาษาอังกฤษที่มีปริมาณข้อมูล 10 ล้าน

ชุดข้อมูล: https://huggingface.co/datasets/linksoul/instruction_merge_set

ทดสอบอย่างรวดเร็ว

 from transformers import AutoTokenizer , AutoModelForCausalLM , TextStreamer

model_path = "LinkSoul/Chinese-Llama-2-7b"

tokenizer = AutoTokenizer . from_pretrained ( model_path , use_fast = False )
model = AutoModelForCausalLM . from_pretrained ( model_path ). half (). cuda ()
streamer = TextStreamer ( tokenizer , skip_prompt = True , skip_special_tokens = True )

instruction = """[INST] <<SYS>> n You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

            If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. n <</SYS>> n n {} [/INST]"""

prompt = instruction . format ( "用中文回答，When is the best time to visit Beijing, and do you have any suggestions for me?" )
generate_ids = model . generate ( tokenizer ( prompt , return_tensors = 'pt' ). input_ids . cuda (), max_new_tokens = 4096 , streamer = streamer )

นักเทียบท่า

คุณสามารถใช้ DockerFile ในที่เก็บเพื่อสร้างภาพพื้นฐานอย่างรวดเร็วตาม NVIDIA รุ่นล่าสุดของ nvcr.io/nvidia/pytorch:23.06-py3 และใช้ภาชนะที่ใดก็ได้

docker build -t linksoul/chinese-llama2-chat .

หลังจากสร้างภาพให้ใช้คำสั่งเพื่อเรียกใช้ภาพ:

docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --rm -it -v ` pwd ` /LinkSoul:/app/LinkSoul -p 7860:7860 linksoul/chinese-llama2-chat

ggml/llama.cpp

ต้องการเรียกใช้โมเดล LLAMA2 ในสภาพแวดล้อม CPU หรือไม่? ใช้วิธีการต่อไปนี้

ใช้ ggml/convert_to_ggml.py เพื่อดำเนินการแปลง สำหรับรายละเอียดดูพารามิเตอร์ CLI ที่สนับสนุนโดยสคริปต์
หรือใช้ docker pull soulteary/llama2:converter เพื่อดาวน์โหลดภาพเครื่องมือแปลงรูปแบบรูปแบบและใช้สองคำสั่งต่อไปนี้ในคอนเทนเนอร์ Docker เพื่อดำเนินการให้เสร็จสมบูรณ์ (การสอนเพื่อสร้าง Metaai Llama2 รุ่นใหญ่จีน

python3 convert.py /app/LinkSoul/Chinese-Llama-2-7b/ --outfile /app/LinkSoul/Chinese-Llama-2-7b-ggml.bin
./quantize /app/LinkSoul/Chinese-Llama-2-7b-ggml.bin /app/LinkSoul/Chinese-Llama-2-7b-ggml-q4.bin q4_0

คำจำกัดความการกำหนดค่าเชิงปริมาณ:

พิมพ์ซ้ำจาก: https://www.reddit.com/r/localllama/comments/139yt87/notable_differences_between_q4_2_and_q5_1/

Q4_0 = 32 ตัวเลขในก้อน, 4 บิตต่อน้ำหนัก, ค่าสเกล 1 ค่าที่ลอย 32 บิต (5 บิตต่อค่าเฉลี่ย) น้ำหนักแต่ละตัวจะได้รับจากสเกลทั่วไป * ค่าเชิงปริมาณ
Q4_1 = 32 ตัวเลขในก้อน, 4 บิตต่อน้ำหนัก, ค่าสเกล 1 ค่าและค่าอคติ 1 ที่ลอย 32 บิต (6 บิตต่อค่าเฉลี่ย) น้ำหนักแต่ละน้ำหนักจะได้รับจากสเกลทั่วไป * ค่าเชิงปริมาณ + อคติทั่วไป
Q4_2 = เหมือนกับ Q4_0 แต่ 16 ตัวเลขในก้อน, 4 บิตต่อน้ำหนัก, ค่าสเกล 1 ค่าที่ลอย 16 บิต, ขนาดเท่ากับ Q4_0 แต่ดีกว่าเพราะชิ้นมีขนาดเล็กกว่า
Q4_3 = ตายไปแล้ว แต่คล้ายคลึงกัน: Q4_1 แต่ 16 ตัวเลขในก้อน, 4 บิตต่อน้ำหนัก, ค่าสเกลที่ 16 บิตและอคติก็มี 16 บิตขนาดเท่ากับ Q4_1 แต่ดีกว่าเพราะชิ้นเล็กลง
Q5_0 = 32 ตัวเลขในก้อน 5 บิตต่อน้ำหนัก 1 ค่าสเกลที่ลอย 16 บิตขนาด 5.5 บิตต่อน้ำหนัก
Q5_1 = 32 ตัวเลขในก้อน 5 บิตต่อน้ำหนัก 1 ค่าสเกลที่ 16 บิตลอยและค่าอคติ 1 บิตขนาด 6 บิตต่อน้ำหนัก
Q8_0 = เหมือนกับ Q4_0 ยกเว้น 8 บิตต่อน้ำหนักค่าสเกล 1 ค่าที่ 32 บิตทำให้รวม 9 บิตต่อน้ำหนัก

การปรับใช้ API

ก่อนอื่นคุณต้องติดตั้ง pip install fastapi uvicorn จากนั้นเรียกใช้ api.py ในที่เก็บ:

python api.py

ปรับใช้บนพอร์ตท้องถิ่น 8000 โดยค่าเริ่มต้นและเรียกมันผ่านวิธีการโพสต์

curl -X POST " http://127.0.0.1:8000 " 
     -H ' Content-Type: application/json ' 
     -d ' {"prompt": "你好", "history": []} '

ค่าผลตอบแทนที่ได้คือ

{
  " response " : " 你好！我是一个人工智能语言模型，可以回答你的问题和进行对话。请问你有什么需要帮助的吗？ " ,
  " history " :[[ " <<SYS>>nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.nn            If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.n<</SYS>>nn你好" , " 你好！我是一个人工智能语言模型，可以回答你的问题和进行对话。请问你有什么需要帮助的吗？ " ]],
  " status " :200,
  " time " : " 2023-08-01 09:22:16 "
}

วิธีการฝึกอบรม

DATASET= " LinkSoul/instruction_merge_set "

DATA_CACHE_PATH= " hf_datasets_cache "
MODEL_PATH= " /PATH/TO/TRANSFORMERS/VERSION/LLAMA2 "

output_dir= " ./checkpoints_llama2 "

torchrun --nnodes=1 --node_rank=0 --nproc_per_node=8 
    --master_port=25003 
        train.py 
        --model_name_or_path ${MODEL_PATH} 
        --data_path ${DATASET} 
        --data_cache_path ${DATA_CACHE_PATH} 
        --bf16 True 
        --output_dir ${output_dir} 
        --num_train_epochs 1 
        --per_device_train_batch_size 4 
        --per_device_eval_batch_size 4 
        --gradient_accumulation_steps 1 
        --evaluation_strategy ' no ' 
        --save_strategy ' steps ' 
        --save_steps 1200 
        --save_total_limit 5 
        --learning_rate 2e-5 
        --weight_decay 0. 
        --warmup_ratio 0.03 
        --lr_scheduler_type cosine 
        --logging_steps 1 
        --fsdp ' full_shard auto_wrap ' 
        --fsdp_transformer_layer_cls_to_wrap ' LlamaDecoderLayer ' 
        --tf32 True 
        --model_max_length 4096 
        --gradient_checkpointing True