MODELFORGE下载 - MODELFORGE源代码下载

MODELFORGE

其他源码

1.0.0

下载

？模型

根据任务，评估托管的OpenAI GPT / Google Vertex AI Palm2 / Gemini或本地Ollama模型。

将任意任务分配给本地或托管语言模型。在模型中，Forge任务通过：代理，可选的后处理器和评估人员分解。任务具有最高级别的提示 - 实际工作。例如，您可以将以下内容用作任务提示：“在C中使用以下签名：void* malloc void* malloc(size_t size)在C中实现一个简单的示例”。接下来，您可以将后处理器请求包括到本地模型中，以仅从代理的响应中提取程序的源代码。最后，您的评估者将被指示在理想的任务中充当任务的专家，其中包括基于COT（思想链）的示例。

要求

MacOS / Linux（测试了Ubuntu作为发行版）
Python 3.10+
取决于用例：
- OpenAI API键
- Google Cloud Vertex AI服务帐户信条.json
- 安装了ollama

？快速开始

克隆存储库
设置Python环境并安装依赖项
执行入口点脚本： python src/main.py

git clone https://github.com/Brandon7CC/MODELFORGE
cd MODELFORGE/
python -m venv forge-env
source forge-env/bin/activate
pip install -r requirements.txt
python src/main.py -h
echo " Done! Next, you can try FizzBuzz with Ollama locally!npython src/main.py task_configs/FizzBuzz.yaml

支持提供者

OpenAI文本完成模型。例如，
- GPT-3.5-Turbo
- GPT-4
- GPT-4-1106-preiview
Google Vertex AI Palm2 / Gemini文本 /代码完成模型。例如，
- 双子座
- text-unicorn@001
- 代码 - 野主
OLLAMA的OSS模型，例如Llama，Orca2，Vicuna，Mixtral8x7b，Mistral，Phi2等

用例

根据一项共同任务评估模型
产生创新方法来解决问题的示例
- 链模型一起启用一个简单的思想循环

fizzbuzz！

FizzBuzz是一个经典的“您可以编码”问题。这很简单，但可以提供有关开发人员通过问题如何思考的洞察力。例如，在Python中，使用控制流，Lambdas等。这是问题陈述：

编写一个程序以显示从1到n的数字。对于三个的倍数，请打印“ Fizz”而不是数字，对于五个的倍数，打印“嗡嗡声”。对于三个和五个倍数的数字，请打印“ FizzBuzz”。

接下来，我们将制作我们的任务配置文件（这已经在task_configs/FizzBuzz.yaml中为您完成），但是我们会引导您浏览它。为此，我们将定义一个名为“ FizzBuzz”的顶级任务，请及时及时及时使用我们希望他模型解决问题的次数。

 tasks :
  - name : FizzBuzz
    # If a run count is not provided then the task will only run until evaluator success.
    run_count : 5
    prompt : |
        Write a program to display numbers from 1 to n. For multiples of three, print "Fizz" 
        instead of the number, and for the multiples of five, print "Buzz". For numbers which 
        are multiples of both three and five, print "FizzBuzz".
        Let's think step by step.

现在，我们将定义我们的“代理” - 该模型将充当完成我们任务的专家。模型可以是任何受支持的托管 /本地Ollama型号（例如Google的双子座，OpenAI的GPT-4或Mismtral AI的Mixtral8x7b通过Ollama）。

 tasks :
  - name : FizzBuzz
    run_count : 5
    prompt : |
        ...
    agent : 
      # We'll generate a custom model for each base model
      base_model : mixtral:8x7b-instruct-v0.1-q4_1
      temperature : 0.98
      system_prompt : | 
        You're an expert Python developer. Follow these requirement **exactly**:
        - The code you produce is at the principal level;
        - You follow modern object oriented programming patterns;
        - You list your requirements and design a simple test before implementing.
        Review the user's request and follow these requirements.

您可以选择创建一个“后处理器” 。我们只希望对代理完成代码进行评估，因此在这里我们将使后处理器模型从代理的响应中提取源代码。

 tasks :
  - name : FizzBuzz
    # If a run count is not provided then the task will only run until evaluator success.
    run_count : 5
    prompt : |
        ...
    agent : 
      # We'll generate a custom model for each base model
      base_model : gpt-4-1106-preview
      temperature : 0.98
      system_prompt : | 
        ...
    postprocessor :
      base_model : mistral
      temperature : 0.1
      system_prompt : |
        You have one job: return the source code provided in the user's message. 
        **ONLY** return the exact source code. Your response is not read by a human.

最后，您需要一个“评估者”模型，该模型将充当审查代理/后处理器的输出的专家。评估者的工作是返回true / false。此外，我们最多可以失败10次- 重新询问代理商。这是一些魔术的来临- 我们将简要介绍失败的尝试 - 在对代理商的下一个查询中的批评。这使代理可以以更有效的方式迭代自己。在这里，我们将希望我们的评估员审查FizzBuzz的实现。

 tasks :
  - name : FizzBuzz
    # If a run count is not provided then the task will only run until evaluator success.
    run_count : 5
    prompt : |
        ...
    agent : 
      # We'll generate a custom model for each base model
      base_model : codellama
      temperature : 0.98
      system_prompt : | 
        ...
    postprocessor :
      base_model : gemini-pro
      temperature : 0.1
      system_prompt : |
        ...
    # Evaluators have defined system prompts to only return true / false for their domain.
    evaluator :
      base_model : gpt-4-1106-preview
      temperature : 0.1
      system_prompt : |
        Assess if a given sample program correctly implements Fizz Buzz. 
        The program should display numbers from 1 to n. For multiples of three, it should 
        print "Fizz" instead of the number, for the multiples of five, it should print "Buzz", 
        and for numbers which are multiples of both three and five, it should print "FizzBuzz".
        Guidelines for Evaluation
          - Correctness: Verify that the program outputs "Fizz" for multiples of 3, "Buzz" for 
            multiples of 5, and "FizzBuzz" for numbers that are multiples of both 3 and 5. For
            all other numbers, it should output the number itself.
          - Range Handling: Check if the program correctly handles the range from 1 to n, where
            n is the upper limit provided as input.
          - Error Handling: Assess if the program includes basic error handling, such as ensuring
            the input is a positive integer.