截至2024年3月,我們已使用新的測試用例進行了修訂的v2.0基準。請參閱我們的更新論文以獲取更多詳細信息。
[演示] [網站] [紙]
此存儲庫包含規則的代碼:遵守規則的語言評估方案,這是評估語言模型中規則遵循的基準。
SimonSays and Questions scenarios, added support for Google VertexAI API models. Please re-evaluate existing results with python -m llm_rules.scripts.reevaluate .llm_rules library.--conv_template to --fastchat_template . pip install -e .
To evaluate models with our API wrappers ( llm_rules/models/* ), install the optional dependencies:
pip install -e .[models]
.env file: OPENAI_API_KEY=<key>
ANTHROPIC_API_KEY=<key>
GEMINI_API_KEY=<key>
GCP_PROJECT_ID=<project_id>
>>> from huggingface_hub import snapshot_download
>>> snapshot_download(repo_id="meta-llama/Llama-2-7b-chat-hf", local_dir="/my_models/Llama-2-7b-chat-hf", local_dir_use_symlinks=False)
logs/ . 啟動一個交互式會話:
python -m llm_rules.scripts.manual_redteam --provider openai --model gpt-3.5-turbo-0613 --scenario Authentication --stream
用以下方式可視化測試用例
python -m llm_rules.scripts.show_testcases --test_suite redteam
Our main evaluation script is llm_rules/scripts/evaluate.py , but since we support lots of evaluation options the code may be hard to follow. Please see llm_rules/scripts/evaluate_simple.py for a simplified version of the evaluation script.
我們將API呼叫與無限恢復,以易於評估。您可能需要更改重試功能以適應您的需求。
redteam test suite python -m llm_rules.scripts.evaluate --provider openai --model gpt-3.5-turbo-0613 --test_suite redteam --output_dir logs/redteam
When evaluating models using vLLM, evaluate.py launches an API server in-process.對於VLLM模型,應將並發設置更高。運行評估:
python -m llm_rules.scripts.evaluate --provider vllm --model /path/to/model --fastchat_template llama-2 --concurrency 100
查看單個測試套件的詳細結果:
python -m llm_rules.scripts.read_results --output_dir logs/redteam/gpt-3.5-turbo-0613
在評估了所有三個測試套件(良性,基本和redteam)之後,計算總規則得分:
python -m llm_rules.scripts.read_scores --model_name gpt-3.5-turbo-0613
最後,您可以通過以下方式查看對單個測試案例的回答
python -m llm_rules.scripts.show_responses --output_dir logs/redteam/gpt-3.5-turbo-0613 --failed_only
在每次迭代中使用隨機方案參數運行GCG攻擊:
cd gcg_attack
python main_gcg.py --model /path/to/model --fastchat_template <template_name> --scenario Authentication --behavior withholdsecret
Output logs will be stored in logs/gcg_attack .
To then evaluate models on the direct_request test cases with the resulting GCG suffixes:
python -m llm_rules.scripts.evaluate --provider vllm --model /path/to/model --suffix_dir logs/gcg_attack/<model_name> --test_dir data/direct_request --output_dir logs/direct_request_gcg
To reproduce our fine-tuning experiments with Llama-2 7B Chat on the basic_like test cases:
cd finetune
./finetune_llama.sh
我們使用4x A100-80G GPU進行微調Llama-2 7b聊天和Mismtral 7b指令,您可以調整DeepSpeed設置以在較小/較少的GPU上運行。
When evaluating community models, we mostly rely on FastChat conversation templates (documented in model_templates.yaml ) with the exception of a few custom templates added to llm_rules/templates.py .
@article{mu2023rules,
title={Can LLMs Follow Simple Rules?},
author={Norman Mu and Sarah Chen and
Zifan Wang and Sizhe Chen and David Karamardian and
Lulwa Aljeraisy and Basel Alomair and
Dan Hendrycks and David Wagner},
journal={arXiv},
year={2023}
}