staged training下載 - staged training源代碼下載

staged training

Ai源碼

1.0.0

下載

上演訓練

在我們針對變壓器語言模型的論文分階段培訓中，我們提出了一個分階段的培訓設置，該設置始於小型模型，並通過應用“增長操作員”來增加模型深度和寬度，從而增加了用於培訓的計算量。通過使用上一個階段的每個階段初始化每個階段，訓練過程可以有效地從前階段重新使用計算，並變得更有效。

我們在此處發布了增長操作員和評估腳本的可再現代碼。

設定

此存儲庫中的腳本需要Python 3.7或更新。擁有合適的Python環境後，首先根據官方說明首先安裝Pytorch v1.9.0。然後運行

 pip install -r requirements.txt

增長操作員

我們的增長操作員（寬度/深度）每個都將整個培訓狀態（包括模型參數，優化器狀態，學習率計劃等）作為輸入，並輸出一種培訓的新培訓狀態。

請參閱scripts/cheatsheet.txt以獲取有關如何使用相應腳本的更多示例。

例如，您可以使用以下方式應用寬度操作員

 CUDA_VISIBLE_DEVICES=0,1,2,3 python scripts/gpt_pretrain.py 
  --save_prefix final_gpt2_large_div2_width_check_bs512_lr0.0020_warmup3k_seqlen1024_debug 
  --gpu_count -1 
  --model gpt2  
  --tokenizer gpt2 
  --batch_size 4 
  --grad_accum 32  
  --lr 0.002006911598778545  
  --warmup_steps 3000   
  --train_steps 250000  
  --val_every 50  
  --val_batches 50 
  --fp16 
  --seqlen 1024 
  --log_rate 10 
  --num_workers 4 
  --size GPT2_large_div2_width 
  --random 
  --resume final_runs/final_gpt2_large_div2_width_check_bs512_lr0.0021_warmup3k_seqlen1024_debug/checkpoint-xxx.ckpt 
  --doubling weights

或深度操作員：

 CUDA_VISIBLE_DEVICES=0,1,2,3 python scripts/gpt_pretrain.py 
  --save_prefix final_gpt2_large_div2_depthx2_check_bs512_lr0.0020_warmup3k_seqlen1024_debug 
  --gpu_count -1 
  --model gpt2  
  --tokenizer gpt2 
  --batch_size 4 
  --grad_accum 32 
  --lr 0.002006911598778545 
  --warmup_steps 3000 
  --train_steps 250000 
  --val_every 50 
  --val_batches 50 
  --fp16 
  --seqlen 1024 
  --log_rate 10 
  --num_workers 4 
  --size GPT2_large_div2_depth 
  --random 
  --resume final_runs/final_gpt2_large_div2_depth_check_bs512_lr0.0020_warmup3k_seqlen1024_debug/checkpoint-epoch=0-step=6499.ckpt 
  --doubling layers

評估

使用evaluation/eval_wikitext.py或evaluation/eval_lambada.py來評估一個受支持的數據集上的gpt-2。例如：

python evaluation/eval_wikitext.py

或使用Docker：

docker build -t evaluation:latest .
docker run --rm --gpus all evaluation:latest evaluation/eval_wikitext.py

參考

如果您在研究中使用分階段的培訓或希望參考此處發布的基線結果，請使用以下Bibtex條目。

 @misc{shen2022staged,
    title={Staged Training for Transformer Language Models},
    author={Sheng Shen and Pete Walsh and Kurt Keutzer and Jesse Dodge and Matthew Peters and Iz Beltagy},
    year={2022},
    eprint={2203.06211},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

展開

附加信息

版本 1.0.0
類型 Ai源碼
更新時間 2025-09-10
大小 247KB
來自於 Github

相關應用

GitHub sgrebnov/cordova plugin background download

2024-11-05
wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

2024-11-01
TBT大腦訓練APP遊戲（Brain Training）

2023-12-06
牆壁踢手壓力突破（Wall Kicker Reaction Training）

2023-11-01
偶像夢幻祭Training

2023-08-15
合奏訓練

2023-08-15

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
ML stack

Ai源碼

1.0.0
awesome free chatgpt

Ai源碼

1.0.0
pywin_contextmenu

Ai源碼

Version update
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3

相關資訊全部