alpaca 7b chinese Download - alpaca 7b chinese Source code download

alpaca 7b chinese

AI Source Code

1.0.0

Download

???? Finetune LLaMA-7B with Chinese instruction datasets

For more finetune methods for LLM, please see LLM-Finetune-Guide

This repository is a tutorial for finetuning LLaMA-7B with Chinese datasets! I survey and combine the dataset & method for finetuning my own LLM for complex NLP tasks such as summarization, question answering, text generation, custom data augmentation, etc.

Since the original Stanford Alpaca-7B finetune need lots of GPU resources, I focus on surveying the method with low GPU consumption.

So here's how to reproduce:

Installation

Install requirements

$ pip install -r requirements.txt

Install PyTorch at compatible version with CUDA

$ pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116

Datasets

This repository combined all datasets using English-instruction, Chinese-output construction:

alpaca_data.json: Original dataset from Stanford Alpaca
alpaca_data_cleansed.json: Cleansing by gururise/AlpacaDataCleaned
alpaca-zhCN.json: Translate by carbonz0/alpaca-chinese-dataset
alpaca-zhTW.json: Translate to Traditional Chinese using OpenCC
alpaca-en-zh.json: Combine the English instruction/input and Chinese output by ntunlplab/traditional-chinese-alpaca: (Traditional Chinese dataset translate by ChatGPT API (gpt-3.5-turbo) by ntunlplab/traditional-chinese-alpaca (Update at 2023.03.29))

Finetune

Reference finetune method provide by tloen/alpaca-lora

Run on 1 GPU with Colab: https://colab.research.google.com/drive/1QvtrJpikkkNKSbwwG766SIGbBw2TQRd5?usp=sharing

LLaMA

$ cd finetune/
$ python finetune.py --base_model decapoda-research/llama-7b-hf --data_dir ../data/alpaca-en-zh.json --output_dir ../finetuned/llama-7b-hf_alpaca-en-zh --lora_target_modules '["q_proj", "v_proj"]'

BLOOM

$ cd finetune/
$ python finetune.py --base_model bigscience/bloomz-7b1-mt --data_dir ../data/alpaca-en-zh.json --output_dir ../finetuned/bloomz-7b1-mt_alpaca-en-zh --lora_target_modules '["query_key_value"]'

Use torchrun for distributed training on Multi-GPUs

LLaMA

$ cd finetune/
$ torchrun --standalone --nnodes=1 --nproc_per_node=4 finetune.py --base_model decapoda-research/llama-7b-hf --data_dir ../data/alpaca-en-zh.json --output_dir ../finetuned/llama-7b-hf_alpaca-en-zh --lora_target_modules '["q_proj", "v_proj"]'

BLOOM

$ cd finetune/
$ torchrun --standalone --nnodes=1 --nproc_per_node=4 finetune.py --base_model bigscience/bloomz-7b1-mt --data_dir ../data/alpaca-en-zh.json --output_dir ../finetuned/bloomz-7b1-mt_alpaca-en-zh --lora_target_modules '["query_key_value"]'

Finetune Domain Tasks

I've collected different domain tasks in my repository: instruction-finetune-datasets

Welcome cooperations! Please contact me at: [email protected]. I'd like to try tasks from different domains such as investment, fraud, e-commerce, law, healthcare, ...

Model Serving

To serve your own model service through API & simple website UI!

Model API
```
$ cd serve/
$ python api.py
```
demo UI
```
$ cd serve/
$ python ui.py
```

Learn More

I arranged finetune methods for LLM at LLM-Finetune-Guide

I curated lots of method that try to run large language models with fewer GPU resources:

PEFT
LoRA
FlexGen ...

See full list: chatgpt-alternatives

@misc{alpaca-7b-chinese,
  author = {JiunYi Yang},
  title = {Alpaca-7B Chinese: Finetune LLaMA-7B with Chinese instruction datasets},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {url{https://github.com/A-baoYang/alpaca-7b-chinese}},
}

Expand

Additional Information

Version 1.0.0
Type AI Source Code
Update Time 2025-09-03
size 18.02MB
From Github

Related Applications

GitHub sgrebnov/cordova plugin background download

2024-11-05
Wa ch navra maza navsacha 2 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-03
Wa ch the greatest of all time 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-02
wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

2024-11-01
Chinese DOS games (Chinese DOS games in browser) project source code official version

2022-11-01
Alpaca Ball: Allstars

2022-08-08

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
ML stack

AI Source Code

1.0.0
awesome free chatgpt

AI Source Code

1.0.0
pywin_contextmenu

AI Source Code

Version update
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3

Related Information All