تنزيل Chinese Llama 2 7b - تنزيل رمز المصدر Chinese Llama 2 7b

الصينية لاما 2 7 ب

جميع الإصدار الصيني المفتوح المصدر ، من طراز LLAMA2 ومجموعة بيانات SFT الصينية-الإنجليزية . يتبع تنسيق الإدخال بشكل صارم تنسيق LLAMA-2-Chat وهو متوافق مع جميع التحسينات لنموذج LLAMA-2 الأصلي.

الصينية لاما 2 7 ب

العرض التجريبي الأساسي

قاعدة تجريبية

تجربة عبر الإنترنت

الحديث رخيص ، أظهر لك العرض التوضيحي.

العناوين التجريبية/المساحات المعانقة
كولاب (FP16/Need لتمكين ذاكرة الوصول العشوائي العالية ، لا يمكن استخدام الإصدار المجاني)
Colab (Int4/Need لتمكين ذاكرة الوصول العشوائي العالية ، لا يمكن استخدام الإصدار المجاني)

آخر التحديثات

تم توفيره في 26 أكتوبر مع نموذج الدردشة الصيني Llama2
رابط ModelsCope الجديد لإضافة نموذج دردشة Llama2 الصيني في 24 أغسطس
في 31 يوليو ، النطق الصيني-النطق النطق Llasm Multimodal Source Open Source
في 31 يوليو ، المصدر متعدد الوسائط من النموذج الصيني-النص الصيني-النموذج الصيني-ليلافا المصدر مفتوح
26 يوليو صينية-لاما 2-7B-GGGML نموذج مفتوح
تم تحديث طراز 7B في 23 يوليو ، وأضف API ، وقم بتوفير نموذج كمي 4it
سيتم إطلاق رمز التدريب/الاستدلال SFT في 22 يوليو
في 21 يوليو ، سيتم نشر Docker عبر الإنترنت بنقرة واحدة
سيتم إطلاق العرض التوضيحي في 21 يوليو
21 يوليو ، بيانات SFT الصينية والإنجليزية ثنائية اللغة مفتوحة المصدر
21 يوليو الصينية-لاما 2-7B نموذج مفتوح

تنزيل الموارد

تنزيل النموذج
- Shizhi AI: نموذج دردشة Llama2 الصيني
- ModelsCope: نموذج دردشة Llama2 الصيني
- Huggingface: نموذج دردشة Llama2 الصيني
- Baidu NetDisk: 1.0 نسخة رسمية
- Baidu NetDisk: 1.1 نسخة من قوة النيران المحسنة
4bit الكمية
- Huggingface: نموذج دردشة Llama2 4Bit الصيني
- Baidu NetDisk: نموذج الدردشة الصيني Llama2 4bit
نموذج GGML Q4:
- https://huggingface.co/Linksoul/Chinese-llama-2-7b-ggml
- https://huggingface.co/rffx0/chinese-llama-2-7b-ggml-model-q4_0
- https://huggingface.co/Soulteary/Chinese-llama-2-7b-ggml-q4
- Baidu NetDisk: الصينية-لاما 2-7B-GGLML

استخدمنا مجموعات بيانات SFT الصينية والإنجليزية مع حجم بيانات 10 ملايين.

مجموعة البيانات: https://huggingface.co/Datasets/Linksoul/instruction_merge_set

اختبار سريع

 from transformers import AutoTokenizer , AutoModelForCausalLM , TextStreamer

model_path = "LinkSoul/Chinese-Llama-2-7b"

tokenizer = AutoTokenizer . from_pretrained ( model_path , use_fast = False )
model = AutoModelForCausalLM . from_pretrained ( model_path ). half (). cuda ()
streamer = TextStreamer ( tokenizer , skip_prompt = True , skip_special_tokens = True )

instruction = """[INST] <<SYS>> n You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

            If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. n <</SYS>> n n {} [/INST]"""

prompt = instruction . format ( "用中文回答，When is the best time to visit Beijing, and do you have any suggestions for me?" )
generate_ids = model . generate ( tokenizer ( prompt , return_tensors = 'pt' ). input_ids . cuda (), max_new_tokens = 4096 , streamer = streamer )

عامل ميناء

يمكنك استخدام Dockerfile في المستودع لإنشاء صورة أساسية بسرعة استنادًا إلى أحدث إصدار من NVIDIA من nvcr.io/nvidia/pytorch:23.06-py3 ، واستخدام الحاويات في أي مكان لتشغيل تطبيقات طراز LLAMA2 الصينية.

docker build -t linksoul/chinese-llama2-chat .

بعد إنشاء الصورة ، استخدم الأمر لتشغيل الصورة:

docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --rm -it -v ` pwd ` /LinkSoul:/app/LinkSoul -p 7860:7860 linksoul/chinese-llama2-chat

GGML/LLAMA.CPP

هل تريد تشغيل نموذج LLAMA2 في بيئة وحدة المعالجة المركزية؟ استخدم الطريقة التالية.

استخدم ggml/convert_to_ggml.py لإجراء عمليات التحويل. للحصول على التفاصيل ، راجع معلمات CLI المدعومة من البرنامج النصي.
أو استخدم docker pull soulteary/llama2:converter لتنزيل صورة أداة تحويل تنسيق النموذج واستخدم الأمر التاليان في حاوية Docker لإكمال العملية (البرنامج التعليمي لإنشاء طراز Metaai Llama2 الصيني الذي يمكن تشغيله باستخدام وحدة المعالجة المركزية):

python3 convert.py /app/LinkSoul/Chinese-Llama-2-7b/ --outfile /app/LinkSoul/Chinese-Llama-2-7b-ggml.bin
./quantize /app/LinkSoul/Chinese-Llama-2-7b-ggml.bin /app/LinkSoul/Chinese-Llama-2-7b-ggml-q4.bin q4_0

تعريف التكوين الكمي:

أعيد طبعه من: https://www.reddit.com/r/localllama/comments/139yt87/notable_differences_between_q4_2_and_q5_1/

Q4_0 = 32 أرقام في الجزء ، 4 بتات لكل وزن ، قيمة المقياس 1 في تعويم 32 بت (5 بتات لكل قيمة في المتوسط) ، يتم إعطاء كل وزن بواسطة المقياس المشترك * الكمي.
Q4_1 = 32 أرقام في قطعة ، 4 بتات لكل وزن ، قيمة مقياس 1 وقيمة التحيز 1 في تعويم 32 بت (6 بتات لكل قيمة في المتوسط) ، يتم إعطاء كل وزن بواسطة المقياس الشائع * القيمة الكمية + التحيز الشائع.
Q4_2 = مثل Q4_0 ، ولكن 16 أرقامًا في الجزء ، 4 بت لكل وزن ، وقيمة مقياس واحدة هي 16 بت ، بنفس حجم Q4_0 ولكن أفضل لأن القطع أصغر.
Q4_3 = ميت بالفعل ، ولكن مماثل: Q4_1 ولكن 16 أرقام في قطعة ، 4 بتات لكل وزن ، وقيمة المقياس 16 بت والتحيز أيضا 16 بت ، نفس حجم Q4_1 ولكن أفضل لأن القطع أصغر.
Q5_0 = 32 أرقام في قطعة ، 5 بتات لكل وزن ، قيمة مقياس واحد عند عائم 16 بت ، الحجم هو 5.5 بت لكل وزن
Q5_1 = 32 أرقام في قطعة ، 5 بتات لكل وزن ، وقيمة مقياس 1 بتطفو 16 بت و 1 قيمة التحيز في 16 بت ، الحجم 6 بتات لكل وزن.
Q8_0 = نفس Q4_0 ، باستثناء 8 بتات لكل وزن ، قيمة المقياس 1 عند 32 بت ، مما يجعل إجمالي 9 بتات لكل وزن.

نشر API

أولاً ، تحتاج إلى تثبيت PIP الإضافي pip install fastapi uvicorn ، ثم تشغيل api.py في المستودع:

python api.py

نشر على المنفذ المحلي 8000 افتراضيًا واتصل به من خلال طريقة النشر.

curl -X POST " http://127.0.0.1:8000 " 
     -H ' Content-Type: application/json ' 
     -d ' {"prompt": "你好", "history": []} '

قيمة الإرجاع التي تم الحصول عليها

{
  " response " : " 你好！我是一个人工智能语言模型，可以回答你的问题和进行对话。请问你有什么需要帮助的吗？ " ,
  " history " :[[ " <<SYS>>nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.nn            If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.n<</SYS>>nn你好" , " 你好！我是一个人工智能语言模型，可以回答你的问题和进行对话。请问你有什么需要帮助的吗？ " ]],
  " status " :200,
  " time " : " 2023-08-01 09:22:16 "
}

كيف تتدرب

DATASET= " LinkSoul/instruction_merge_set "

DATA_CACHE_PATH= " hf_datasets_cache "
MODEL_PATH= " /PATH/TO/TRANSFORMERS/VERSION/LLAMA2 "

output_dir= " ./checkpoints_llama2 "

torchrun --nnodes=1 --node_rank=0 --nproc_per_node=8 
    --master_port=25003 
        train.py 
        --model_name_or_path ${MODEL_PATH} 
        --data_path ${DATASET} 
        --data_cache_path ${DATA_CACHE_PATH} 
        --bf16 True 
        --output_dir ${output_dir} 
        --num_train_epochs 1 
        --per_device_train_batch_size 4 
        --per_device_eval_batch_size 4 
        --gradient_accumulation_steps 1 
        --evaluation_strategy ' no ' 
        --save_strategy ' steps ' 
        --save_steps 1200 
        --save_total_limit 5 
        --learning_rate 2e-5 
        --weight_decay 0. 
        --warmup_ratio 0.03 
        --lr_scheduler_type cosine 
        --logging_steps 1 
        --fsdp ' full_shard auto_wrap ' 
        --fsdp_transformer_layer_cls_to_wrap ' LlamaDecoderLayer ' 
        --tf32 True 
        --model_max_length 4096 
        --gradient_checkpointing True