scaleeval
1.0.0
該存儲庫包含源代碼和鏈接到我們的論文。
Scaleeval是一個代理輔助輔助元評估框架,可利用多個交流LLM代理的能力。該框架支持多輪討論,以幫助人類辨別最有能力的基於LLM的評估者。用戶可以通過我們的框架來提供其LLM提交,標準和場景以進行元評估。

pip install scaleeval
export OPENAI_API_KEY=XXXX.YYYY.ZZZ
export ANTHROPIC_API_KEY=XXXX.YYYY.ZZZ需要python 3.9+。
git clone [email protected]:GAIR-NLP/scaleeval.git
cd scaleeval
pip install -e .
# install pre-commit hooks
pip install pre-commit
pre-commit install # this is necessary before you commit
git init
git add .
pre-commit run cd scaleeval
python evaluation.pycriteria/metaeval_creativity :根據創造力標準為每個LLM提交的評分1至5,並確定哪個提交更好。criteria/metaeval_helpfulness :根據有用性標準為每個LLM提交的評分1至5,並確定提交更好的提交。criteria/metaeval_interpretability :根據解釋性為每個LLM提交的評分1至5,並確定提交更好的評分。criteria/metaeval_reasoning :根據推理標準為每個LLM提交的評分1到5,並確定哪個提交更好。scaleeval/responses/gpt3.5/brainstorming.jsonl :基於頭腦風暴場景中的提示而產生的gpt3.5-turbo的響應。scaleeval/responses/claude-instant/brainstorming.jsonl :基於頭腦風暴場景中的提示而產生的響應。 cd criteria將您自己的標準描述添加到新的.YAML文件中!遵循以下格式:
criteria :
[ your defined criteria ]:
"1" : ...
"2" : ...
"3" : ...
"4" : ...
"5" : ...收集您的LLM提交,並將其保存在以下格式的.jsonl文件中:
{ "prompt" : ..., "category" : ..., "response" : ...}這是一個例子:
{ "prompt" : "Can you tell me how to make chocolate chip cookies?" , "category" : "brainstorming" , "response" : "Certainly! Here's a simple recipe for making chocolate chip cookies: n n Ingredients: n - 1 cup (2 sticks) unsalted butter, softened n - 1 cup granulated sugar n - 1 cup packed brown sugar n - 2 large eggs n - 1 teaspoon vanilla extract n - 3 cups all-purpose flour n - 1 teaspoon baking soda n - 1/2 teaspoon salt n - 2 cups chocolate chips n n Instructions: n 1. Preheat your oven to 350 u00b0 F (175 u00b0 C) and line a baking sheet with parchment paper. n 2. In a large mixing bowl, cream together the softened butter, granulated sugar, and brown sugar until well combined and creamy. n 3. Add the eggs one at a time, beating well after each addition. Then, stir in the vanilla extract. n 4. In a separate bowl, whisk together the flour, baking soda, and salt. Gradually add this dry mixture to the wet ingredients, mixing until just combined. n 5. Fold in the chocolate chips until evenly distributed throughout the dough. n 6. Using a cookie scoop or tablespoon, drop rounded portions of dough onto the prepared baking sheet, spacing them about 2 inches apart. n 7. Bake in the preheated oven for 10-12 minutes, or until the edges are golden brown. The centers may still appear slightly soft, but they will firm up as they cool. n 8. Remove the baking sheet from the oven and let the cookies cool on the sheet for a few minutes before transferring them to a wire rack to cool completely. n 9. Repeat the process with the remaining dough until all the cookies are baked. n n Enjoy your homemade chocolate chip cookies!" }如果您覺得我們的回購和結果有用,請考慮引用我們的論文!
@article{chern2024scaleeval,
title={Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate},
author={Chern, Steffi and Chern, Ethan and Neubig, Graham and Liu, Pengfei},
journal={arXiv preprint arXiv:2401.16788},
year={2024}
}