LM Infinite下載LM Infinite源代碼下載

LM侵入：大語言模型的零射擊極端概括

這是Pytorch中大型語言模型（NAACL 2024傑出紙張獎）的論文LM限制的代碼：零射擊的極端概括。這項工作是由Chi Han，Qifan Wang，Hao Peng，Wenhan Xiong，Yu Chen，Heng Ji，Sinong Wang完成的。

介紹

在本文中，作者提出了一種稱為LM Infinite的簡單方法，以將大語言模型的長度概括提高到200m令牌的極端長度，而沒有任何其他培訓或參數更新。

我們是由首先確定LLMS長度泛化失敗的三個因素的動機： （a）因子1：在代幣之間看不見的距離會導致注意力邏輯爆炸。 （b）因子2：看不見的令牌數量會導致注意熵隨著長度的增加而超出訓練範圍。 （c）因子3：啟動令牌很少有一個獨特的特徵區域，不應丟棄。

關鍵的想法是使用（1） $ lambda $ - 形狀的注意力模式，因此每個令牌只參加最近的 $ l_ {PROTIRAIN} $令牌以及一些啟動令牌，（2）距離極限 $ l_ {PROTIRAIN} $ ，以便注意距離上限 $ l_ {PROTIRAIN} $ 。所提出的方法與多種最先進的語言模型兼容，包括但不限於Llama，Llama-2，GPT-J，MPT-7B系列。 LM侵入也是計算效率的，僅 $ O（n）$時間複雜性。

???現在，一個用於擁抱面變壓器的替換！

我們已經實施了LM限制方法作為擁抱面變壓器的倒入替換。加載變壓器模型後，如果是Llama模型，MPT模型或GPT-J模型，則可以運行以下代碼以啟用LM Infinite。

對於駱駝模型：

 from models.llama import convert_llama_model
model = convert_llama_model(model, 4096, 10)

對於MPT模型：

 from models.mpt_7b import convert_mpt_model
model = convert_mpt_model(model, 4096, 10)

對於GPT-J模型：

 from models.gpt_j import convert_gpt_j_model
model = convert_gpt_j_model(model, 4096, 10)

然後，您可以照常使用模型！

要求

Python 3.11
Pytorch 2.0.1
數據集2.14.4
令牌0.13.3
變形金剛4.32.1
句子0.1.99
評估0.4.0
胭脂得分0.1.2
Protobuf 3.20.3
加速0.22.0
深速0.10.2
TQDM 4.66.1
Einops 0.6.1

從requirements.txt角度可以找到Python軟件包的詳細列表。某些軟件包由conda安裝，有些則由pip安裝。我在Anaconda＆Pip環境中安裝要求的命令如下：

 conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
conda install -c conda-forge sentencepiece einops cudatoolkit-dev tqdm ipython datasets evaluate rouge-score protobuf accelerate langchain openai
pip install transformers deepspeed

目錄結構

 ├── LICENSE
├── README.md
├── requirements.txt
├── configs
│   └── zero3_efficient_config.json         # config for deepspeed acceleration
├── data
│   ├── generation_metrics.py
│   ├── get_data.py                         # dataset loading and preprocessing
│   ├── passkey_retrieval
│   │   ├── create_passkey_data.py
│   │   ├── create_passkey_data.sh
│   │   └── passkey_retrieval_accuracy.py
│   └── split_pile_file.py                  # split the Pile dataset into task-specific files
├── models
│   ├── constant.py                         # a constant function model
│   ├── get_llama2
│   │   ├── convert_llama_weights_to_hf.py  # convert llama-2 weights to huggingface format
│   │   └── download_llama2.sh
│   ├── get_model.py
│   ├── gpt_j.py
│   ├── lambda_attention.py                 # efficient implementation of lambda attention
│   ├── llama.py
│   ├── model_base.py
│   └── mpt_7b.py
├── scripts
│   ├── combine_evaluate_generation.py
│   ├── combine_results.py
│   ├── eval_downstream_tasks.py            # evaluate on passkey retrieval task
│   ├── eval_generation.py                  # evaluate generation metrics
│   └── eval_ppl_deepspeed.py               # evaluate perplexity
├── utils
│   ├── arguments.py
│   └── utils.py
└── visualization
    ├── plot_nll.py
    ├── position_pca.py
    └── relative_attention_explosion.py

用法

數據準備

對於數據集，您需要準備一個語料庫數據集。如果將原始樁源（https://pile.eleuther.ai）下載到${PILE_PATH}/test.jsonl.zst和${PILE_PATH}/val.jsonl.zst ，請運行以下命令以提取壓縮數據集。

 cd ${PILE_PATH}
zstd -d ./ test.jsonl.zst
zstd -d ./ val.jsonl.zst

然後運行以下命令將數據集拆分為特定於任務的文件。

 cd ${REPOSITORY_ROOT}
mkdir -p ${PILE_PATH}/val
mkdir -p ${PILE_PATH}/test
python data/split_pile_file.py ${PILE_PATH}/val.jsonl ${PILE_PATH}/val
python data/split_pile_file.py ${PILE_PATH}/test.jsonl ${PILE_PATH}/test

但是，官方堆似乎不再可供下載，因此您可能需要找出另一個來源（例如，https://huggingface.co/datasets/arxiv_dataset或https://openwebtext2.creadt.readthedocs.io/en/latest/）。另外，您也可以使用自己的語料庫。兩個選項都要求您編輯數據/get_data.py。

模型準備

對於骨幹模型，本文使用Llama-2，Llama，GPT-J和MPT-7B。最後3個型號是直接從HuggingFace Model Hub立即獲得的，因此事先需要不需要操作。 Llama-2下載密鑰需要從Meta AI請求表中請求。然後運行以下命令

 bash models/get_llama2/download_llama2.sh

並按照提示將檢查點下載到${PATH_TO_LLAMA2_CHECKPOINTS} 。然後運行

 python models/get_llama2/convert_llama_weights_to_hf.py 
    --input_dir ${PATH_TO_LLAMA2_CHECKPOINTS} 
    --model_size 7B 
    --output_dir ${PATH_TO_LLAMA2_CHECKPOINTS}/llama-2-7b-hf

將Llama-2-7b檢查點轉換為擁抱面格式。

評估

這些代碼需要${LOG_DIR}存儲日誌和結果。請選擇一個具有足夠空間的目錄。

困惑

評估在ARXIV測試集上Llama-2模型的困惑。

 TRIAL=llama2-infinite-ArXiv
mkdir -p $LOG_DIR/$TRIAL
CUDA_VISIBLE_DEVICES=0
MASTER_PORT=$(shuf -i 29500-65535 -n 1)
DS_SKIP_CUDA_CHECK=1 PYTHONPATH=. deepspeed --include localhost:$CUDA_VISIBLE_DEVICES --master_port $MASTER_PORT scripts/eval_ppl_deepspeed.py 
    --deepspeed_config configs/zero3_efficient_config.json 
    --model ${PATH_TO_LLAMA2_CHECKPOINTS}/llama-2-7b-hf --tokenizer_path ${PATH_TO_LLAMA2_CHECKPOINTS} 
    --use_lambda_attention --local_branch 4096 --global_branch 100 --limit_distance 4096 
    --dataset the_pile --dataset_group ArXiv --split test --dataset_dir ${PILE_PATH} 
    --max_length 32770 
    --log_dir $LOG_DIR/$TRIAL

對論點的簡要說明：

--model ：模型的路徑或名稱。通過decapoda-research/llama-7b-hf使用Llama， mosaicml/mpt-7b使用MPT-7B，而EleutherAI/gpt-j-6b使用GPT-J-6B。
--tokenizer_path ：通往令牌的路徑。如果不使用Llama-2，請刪除此參數。
--use_lambda_attention ：使用Lambda注意。（LM無限必需）
--local_branch ：本地分支大小。 2048年，用於Llama，MPT-7B和GPT-J（LM限制必需）
--global_branch ：全球分支大小。範圍10-100給出了通常相似的效果。（LM無限必需）
--limit_distance ：距離限制。 2048年，用於Llama，MPT-7B和GPT-J（LM限制必需）
--dataset ：數據集名稱。請參閱數據/get_data.py來弄清楚如何使用自定義數據集。

如果您想在沒有LM Infinite的香草型號上進行評估，只需刪除--use_lambda_attention --local_branch 4096 --global_branch 100 --limit_distance 4096參數集。

如果您只想在測試集的子集上進行評估，則可以使用--start_data_from參數來指定測試集的起始索引和/或--max_data_num來指定該索引之後的示例數。

評估極端的困惑


TRIAL=llama2-infinite-ArXiv-extreme
CUDA_VISIBLE_DEVICES=0
MASTER_PORT=$(shuf -i 29500-65535 -n 1)
echo port: $MASTER_PORT
mkdir -p $LOG_DIR/$TRIAL
DS_SKIP_CUDA_CHECK=1 PYTHONPATH=. deepspeed --include localhost:$CUDA_VISIBLE_DEVICES --master_port $MASTER_PORT scripts/eval_infinite_ppl.py 
    --deepspeed_config configs/zero3_efficient_config.json 
    --model ${PATH_TO_LLAMA2_CHECKPOINTS}/llama-2-7b-hf --tokenizer_path ${PATH_TO_LLAMA2_CHECKPOINTS} 
    --use_lambda_attention --local_branch 4096 --global_branch 10 --limit_distance 4096 
    --dataset the_pile --dataset_group ArXiv --split test --dataset_dir ${PILE_PATH} 
    --streaming_length 200000000 --max_length 128000 --start_data_from 2300 
    --log_dir $LOG_DIR/$TRIAL

一代

從ARXIV測試集中從Llama-2模型中生成評估。


TRIAL=llama2-infinite-generate-ArXiv
mkdir -p $LOG_DIR/$TRIAL
CUDA_VISIBLE_DEVICES=0
MASTER_PORT=$(shuf -i 29500-65535 -n 1)
DS_SKIP_CUDA_CHECK=1 PYTHONPATH=. deepspeed --include localhost:$CUDA_VISIBLE_DEVICES --master_port $MASTER_PORT scripts/eval_generation.py 
    --deepspeed_config configs/zero3_efficient_config.json 
    --model ${PATH_TO_LLAMA2_CHECKPOINTS}/llama-2-7b-hf --tokenizer_path ${PATH_TO_LLAMA2_CHECKPOINTS} 
    --use_lambda_attention --local_branch 4096 --global_branch 100 --limit_distance 4096 
    --dataset the_pile --dataset_group ArXiv --split test --dataset_dir ${PILE_PATH} 
    --max_length 33000 
    --max_generation_length 100 --evaluate_metrics --evaluate_positions 4096 8192 12288 16384 
    --log_dir $LOG_DIR/$TRIAL

評估下游任務

Passkey檢索

首先，我們需要準備PassKey檢索數據集。

 for MAX_LENGTH in 2048 3072 4096 5120 6144 7168 8192 10240 12288 14335 16384; do
    echo $MAX_LENGTH
    python data/passkey_retrieval/create_passkey_data.py 
        --token-length $MAX_LENGTH 
        --dump-file-path ${PASSKEY_DATA}/${MAX_LENGTH} 
        --tokenizer-path ${PATH_TO_LLAMA2_CHECKPOINTS} 
        --num-samples 1000
done

然後，讓我們評估Passkey檢索任務。


CUDA_VISIBLE_DEVICES=0
for MAX_LENGTH in 6144 8192 10240 12288 16384; do
    TRIAL=llama2-infinite-passkey-$MAX_LENGTH
    mkdir -p $LOG_DIR/$TRIAL
    MASTER_PORT=$(shuf -i 29500-65535 -n 1)
    DS_SKIP_CUDA_CHECK=1 PYTHONPATH=. deepspeed --master_port $MASTER_PORT --include localhost:$CUDA_VISIBLE_DEVICES scripts/eval_downstream_tasks.py 
        --deepspeed_config configs/zero3_efficient_config.json 
        --model ${PATH_TO_LLAMA2_CHECKPOINTS}/llama-2-7b-hf --tokenizer_path ${PATH_TO_LLAMA2_CHECKPOINTS} 
        --use_lambda_attention --local_branch 4096 --global_branch 10 --limit_distance 4096 --triangle_offset 0 
        --top_k_attention 5 --top_k_from_layer 4 
        --dataset passkey_retrieval --dataset_dir ${PASSKEY_DATA} --dataset_group ${MAX_LENGTH} 
        --max_generation_length 7 --evaluate_metrics 
        --log_dir $LOG_DIR/$TRIAL
done

Qasper

運行Qasper任務：


CUDA_VISIBLE_DEVICES=0
DATASET=qasper
TRIAL=llama2-infinite-$DATASET
mkdir -p $LOG_DIR/$TRIAL
MASTER_PORT=$(shuf -i 29500-65535 -n 1)
echo port: $MASTER_PORT
DS_SKIP_CUDA_CHECK=1 PYTHONPATH=. deepspeed --include localhost:$CUDA_VISIBLE_DEVICES --master_port $MASTER_PORT scripts/eval_downstream_tasks.py 
    --deepspeed_config configs/zero3_efficient_config_large.json 
    --model ${PATH_TO_LLAMA2_CHECKPOINTS}/llama-2-7b-hf --tokenizer_path ${PATH_TO_LLAMA2_CHECKPOINTS} 
    --use_lambda_attention --local_branch 4096 --global_branch 10 --limit_distance 4096 --triangle_offset 0 
    --top_k_attention 5 --top_k_from_layer 4 
    --dataset $DATASET --split test --evaluate_metrics 
    --max_length 6144 --truncation_side center 
    --log_dir $LOG_DIR/$TRIAL

引用

 @inproceedings{han2024lm,
  title={LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models},
  author={Han, Chi and Wang, Qifan and Peng, Hao and Xiong, Wenhan and Chen, Yu and Ji, Heng and Wang, Sinong},
  booktitle={Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)},
  pages={3991--4008},
  year={2024}
}

展開