llm_rules

llm_rules

其他源碼

v2.1.0

下載

LLM可以遵循簡單的規則嗎？

截至2024年3月，我們已使用新的測試用例進行了修訂的v2.0基準。請參閱我們的更新論文以獲取更多詳細信息。

[演示] [網站] [紙]

此存儲庫包含規則的代碼：遵守規則的語言評估方案，這是評估語言模型中規則遵循的基準。

更新

September 26 2024 : Wording and typo fixes to prompts so results are not directly comparable to previous values.撞到v3.0.0。
June 12 2024 : Fixed evaluation bugs in SimonSays and Questions scenarios, added support for Google VertexAI API models. Please re-evaluate existing results with python -m llm_rules.scripts.reevaluate .
April 25 2024 : Moved scripts into llm_rules library.
April 25 2024 : Added support for chat templates as specified in HuggingFace tokenizer config files and renamed --conv_template to --fastchat_template .

設定

作為一個可編輯的軟件包安裝：

 pip install -e .

To evaluate models with our API wrappers ( llm_rules/models/* ), install the optional dependencies:

 pip install -e .[models]

Create OpenAI/Anthropic/Google API keys and write them to a .env file:

 OPENAI_API_KEY=<key>
ANTHROPIC_API_KEY=<key>
GEMINI_API_KEY=<key>
GCP_PROJECT_ID=<project_id>

使用snapshot_download下載Llama-2或其他HuggingFace模型到本地路徑：

 >>> from huggingface_hub import snapshot_download
>>> snapshot_download(repo_id="meta-llama/Llama-2-7b-chat-hf", local_dir="/my_models/Llama-2-7b-chat-hf", local_dir_use_symlinks=False)

(Optional) Download and extract evaluation logs here to logs/ .

手動紅色團隊

啟動一個交互式會話：

 python -m llm_rules.scripts.manual_redteam --provider openai --model gpt-3.5-turbo-0613 --scenario Authentication --stream

探索測試案例

用以下方式可視化測試用例

 python -m llm_rules.scripts.show_testcases --test_suite redteam

評估

Our main evaluation script is llm_rules/scripts/evaluate.py , but since we support lots of evaluation options the code may be hard to follow. Please see llm_rules/scripts/evaluate_simple.py for a simplified version of the evaluation script.

我們將API呼叫與無限恢復，以易於評估。您可能需要更改重試功能以適應您的需求。

Evaluate on `redteam` test suite

 python -m llm_rules.scripts.evaluate --provider openai --model gpt-3.5-turbo-0613 --test_suite redteam --output_dir logs/redteam

使用VLLM評估本地模型（需要GPU）

When evaluating models using vLLM, evaluate.py launches an API server in-process.對於VLLM模型，應將並發設置更高。運行評估：

 python -m llm_rules.scripts.evaluate --provider vllm --model /path/to/model --fastchat_template llama-2 --concurrency 100

可視化評估結果

查看單個測試套件的詳細結果：

 python -m llm_rules.scripts.read_results --output_dir logs/redteam/gpt-3.5-turbo-0613

在評估了所有三個測試套件（良性，基本和redteam）之後，計算總規則得分：

 python -m llm_rules.scripts.read_scores --model_name gpt-3.5-turbo-0613

最後，您可以通過以下方式查看對單個測試案例的回答

 python -m llm_rules.scripts.show_responses --output_dir logs/redteam/gpt-3.5-turbo-0613 --failed_only

GCG攻擊（需要GPU）

在每次迭代中使用隨機方案參數運行GCG攻擊：

 cd gcg_attack
python main_gcg.py --model /path/to/model --fastchat_template <template_name> --scenario Authentication --behavior withholdsecret

Output logs will be stored in logs/gcg_attack .

To then evaluate models on the direct_request test cases with the resulting GCG suffixes:

 python -m llm_rules.scripts.evaluate --provider vllm --model /path/to/model --suffix_dir logs/gcg_attack/<model_name> --test_dir data/direct_request --output_dir logs/direct_request_gcg

微調

To reproduce our fine-tuning experiments with Llama-2 7B Chat on the basic_like test cases:

 cd finetune
./finetune_llama.sh

我們使用4x A100-80G GPU進行微調Llama-2 7b聊天和Mismtral 7b指令，您可以調整DeepSpeed設置以在較小/較少的GPU上運行。

對話模板

When evaluating community models, we mostly rely on FastChat conversation templates (documented in model_templates.yaml ) with the exception of a few custom templates added to llm_rules/templates.py .

引用

 @article{mu2023rules,
    title={Can LLMs Follow Simple Rules?},
    author={Norman Mu and Sarah Chen and
            Zifan Wang and Sizhe Chen and David Karamardian and
            Lulwa Aljeraisy and Basel Alomair and
            Dan Hendrycks and David Wagner},
    journal={arXiv},
    year={2023}
}

展開

附加信息