Descarga de scaleeval - Descargar el código fuente scaleeval

scaleeval

Otro código fuente

1.0.0

Descargar

ScaleEval: meta-evaluación escalable de LLM como evaluadores a través del debate del agente

Este repositorio contiene el código fuente y el enlace a nuestro documento.

ScaleEval es un marco de meta-evaluación asistido por agente-debate que aprovecha las capacidades de los agentes LLM comunicativos múltiples. Este marco admite discusiones múltiples para ayudar a los humanos a discernir a los evaluadores basados en LLM más capaces. Los usuarios pueden suministrar sus presentaciones, criterios y escenarios de LLM con nuestro marco para realizar meta-evaluación.

Marco de scaleeval

Empezar

pip install scaleeval
export OPENAI_API_KEY=XXXX.YYYY.ZZZ
export ANTHROPIC_API_KEY=XXXX.YYYY.ZZZ

Se requiere Python 3.9+.

Para desarrolladores

Instalar como desarrollador

git clone [email protected]:GAIR-NLP/scaleeval.git
cd scaleeval
pip install -e .

# install pre-commit hooks
pip install pre-commit
pre-commit install

Formación de ejecución

 # this is necessary before you commit
git init
git add .
pre-commit run

Ejecutar meta-evaluación

 cd scaleeval
python evaluation.py

Criterios de muestra

criteria/metaeval_creativity : puntuación de 1 a 5 para cada envío de LLM en función de los criterios de creatividad y decide qué sumisión es mejor.
criteria/metaeval_helpfulness : puntuación de 1 a 5 para cada sumisión de LLM en función de los criterios de ayuda, y decida qué sumisión es mejor.
criteria/metaeval_interpretability : puntuación de 1 a 5 para cada envío de LLM en función de la interpretabilidad y decide qué sumisión es mejor.
criteria/metaeval_reasoning : puntuación de 1 a 5 para cada presentación de LLM basada en los criterios de razonamiento, y decida qué sumisión es mejor.

Muestra las indicaciones y envíos de preguntas de LLM

scaleeval/responses/gpt3.5/brainstorming.jsonl : respuestas generadas a partir de GPT3.5-TURBO basadas en las indicaciones en el escenario de lluvia de ideas.
scaleeval/responses/claude-instant/brainstorming.jsonl : Respuestas generadas a partir de Claude-Instant basadas en las indicaciones en el escenario de lluvia de ideas.

¡Proporcione sus propios criterios y escenario!

 cd criteria

¡Agregue sus propias descripciones de criterios a un nuevo archivo .yaml! Siga el formato a continuación:

 criteria :
    [ your defined criteria ]:
        "1" : ...
        "2" : ...
        "3" : ...
        "4" : ...
        "5" : ...

¡Suministre sus propias presentaciones de LLM!

Recopile sus envíos de LLM y guárdelos en un archivo .jsonl con el siguiente formato:

{ "prompt" : ..., "category" : ..., "response" : ...}

Aquí hay un ejemplo:

{ "prompt" : "Can you tell me how to make chocolate chip cookies?" , "category" : "brainstorming" , "response" : "Certainly! Here's a simple recipe for making chocolate chip cookies: n n Ingredients: n - 1 cup (2 sticks) unsalted butter, softened n - 1 cup granulated sugar n - 1 cup packed brown sugar n - 2 large eggs n - 1 teaspoon vanilla extract n - 3 cups all-purpose flour n - 1 teaspoon baking soda n - 1/2 teaspoon salt n - 2 cups chocolate chips n n Instructions: n 1. Preheat your oven to 350 u00b0 F (175 u00b0 C) and line a baking sheet with parchment paper. n 2. In a large mixing bowl, cream together the softened butter, granulated sugar, and brown sugar until well combined and creamy. n 3. Add the eggs one at a time, beating well after each addition. Then, stir in the vanilla extract. n 4. In a separate bowl, whisk together the flour, baking soda, and salt. Gradually add this dry mixture to the wet ingredients, mixing until just combined. n 5. Fold in the chocolate chips until evenly distributed throughout the dough. n 6. Using a cookie scoop or tablespoon, drop rounded portions of dough onto the prepared baking sheet, spacing them about 2 inches apart. n 7. Bake in the preheated oven for 10-12 minutes, or until the edges are golden brown. The centers may still appear slightly soft, but they will firm up as they cool. n 8. Remove the baking sheet from the oven and let the cookies cool on the sheet for a few minutes before transferring them to a wire rack to cool completely. n 9. Repeat the process with the remaining dough until all the cookies are baked. n n Enjoy your homemade chocolate chip cookies!" }

Citación

¡Considere citar nuestro documento si encuentra útiles nuestro repositorio y resultados!

 @article{chern2024scaleeval,
  title={Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate},
  author={Chern, Steffi and Chern, Ethan and Neubig, Graham and Liu, Pengfei},
  journal={arXiv preprint arXiv:2401.16788},
  year={2024}
}

Expandir

Información adicional

Versión 1.0.0
Tipo Otro código fuente
Fecha de actualización 2025-03-08
tamaño 5.4MB
Proviene de Github

Aplicaciones relacionadas

Google Dorks

2025-03-10
shepherd

2025-06-04
mongo express

2025-06-04
hidusbf

2025-02-14
Free Algorithms Books

2025-05-29
markdownpedia

2025-04-22

Recomendado para ti

chat.petals.dev

Otro código fuente

1.0.0
GPT Prompt Templates

Otro código fuente

1.0.0
GPTyped

Otro código fuente

GPTyped 1.0.5
Google Dorks

Otro código fuente

1.0
shepherd

Otro código fuente

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Otro código fuente

v1.1.0-rc-3
Google Dorks

Otro código fuente

1.0
shepherd

Otro código fuente

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Otro código fuente

v1.1.0-rc-3

Información relacionada Todo