提示嚮導是使用各種評估方法評估自定義提示的軟件包。它允許您提供自己的提示或自動生成它們,然後在JSON文件中獲取結果。
要使用提示嚮導,您需要使用PIP安裝包裝及其所有依賴項
pip install promptwizard或者,使用git克隆https://github.com/leniolabs/promptwiz.git克隆存儲庫
要運行提示嚮導,您需要設置並指定OpenAI API密鑰。您可以在https://platform.openai.com/account/api-keys上生成一個。獲得API鍵後,使用OpenAI_API_KEY環境變量指定它。在運行EVALS時,請注意與使用API相關的成本。
在使用提示嚮導之前,您需要將環境變量定義。您有兩個有效的選項,在正確的文件夾中的.env OPENAI_API_KEY定義了OPENAI_API_KEY ,或者,如果您決定使用azure則需要在您的.env中正確地將OPENAI_API_TYPE定義為azure和OPENAI_API_BASE和OPENAI_API_VERSION 。
您有兩個使用方法。
如果要使用yaml文件:
確保您的YAML文件具有要評估的提示。 YAML文件應遵循正確的結構。
promptwizard YAML_FILE_PATHpromptwizard YAML_FILE_PATH --env_path .env_FILE_PATH當被問及您是否要繼續時,回應“ Y”。
output.json文件中。如果選擇ELO方法進行及時評估,則散點圖scatter_plot.png也將保存在與YAML文件同一文件夾中。如果您在YAML文件中指示要執行迭代的情況,也將生成更多的文件。如果在YAML文件中未定義“提示”變量,則該程序將自動生成評估提示。
運行包裹以參數傳遞您的YAML文件:
promptwizard YAML_FILE_PATH當被問及您是否要繼續時,回應“ Y”。
output.json文件與YAML文件夾相同的文件夾中。如果選擇ELO方法進行及時評估,則散點圖scatter_plot.png也將保存在與YAML文件同一文件夾中。您也可以在Python腳本中使用它:
import promptwizard並使用提示Wizard可以為您提供的各種功能,例如:
# Example of using PromptWizard
from promptwizard import prompt_generation
test_cases = [
{ ' input ' : ' How do you make a classic spaghetti carbonara? ' , ' output ' : ' REPLY ' },
{ ' input ' : " What is John Smith's phone number? " , ' output ' : ' NOT_REPLY ' },
]
description = " Decide whether the question should be answered or not. " # A short description of the type of task for the test cases.
system_gen_prompt = " " " Your job is to generate system prompts for GPT, given a description of the use-case and some test cases.
In your generated prompt, you should describe how the AI should behave in plain English. Include what it will see, and what it's allowed to output. Be creative with prompts to get the best possible results. The AI knows it's an AI -- you don't need to tell it this.
Remember that the prompt should only allow the AI to answer the answer and nothing else. No explanation is necessary.
You will be graded based on the performance of your prompt... but don't cheat! You cannot include specifics about the test cases in your prompt. Any prompts with examples will be disqualified. I repeat, do not include the test cases.
Most importantly, output NOTHING but the prompt. Do not include anything else in your message. " " " # Here you have to indicate to the LLM how your generated prompts should be. This example is useful if you later want to use the equals evaluation method.
# Create 4 prompts.
prompts = prompt_generation.generate_candidate_prompts(system_gen_prompt, test_cases, description)[0]
如果需要,您還可以指定要在提供的提示或將自動生成的迭代次數的數量,以獲取為語言模型實現最佳行為的提示。另一方面,可以使用函數來迭代您的Python腳本提示:
from promptwizard.prompt_generation import iteration
results = iteration.iterations(test_cases, method= ' Elo ' , prompts=old_prompts, number_of_prompts=3)我們為您提供有關YAML文件的有效結構的說明,以及其中某些變量的某些限制。我們建議您在進行評估之前仔細閱讀。
以下是您的YAML文件應具有的結構。
test:
cases: " " " Here, you have to put the test cases you are going to use to evaluate your prompts. If you are going to use the
Elo method to evaluate them, it should be just a list of strings. If you are going to use the methods classification,
equal or includes, it should be a list of tuples with two elements, where the first element is the test case and the
second element is the correct response to the test. Remember that if you decide to use classification, only a boolean
value is allowed as a response. the form of your test cases has to be, in case of selecting the Elo method:
-'Test1'
-'Test2'...
If you choose the methods Classification, Equals, Includes, Semantic Similarity or LogProbs they must be of the form:
-input: 'Test1'
output: 'Answer1'
-input: 'Test2'
output: 'Answer2'
In case the method is Function Calling:
-input: 'Test1'
output1: 'name_function'
output2: 'variable'
-input: 'Test2'
output1: 'name_function'
output2: 'variable'
If you choose Code Generation:
- input: 'Test1'
arguments: (arg1,) in case there is only one argument, (arg1, arg2,...) in case there are more than one argument.
output: res
and finally if you choose JSON Validation:
- input: 'Test1'
output: json_output " " "
description: " " " Here is the description of the type of task that summarizes the test cases. You only have to use this field if
you are going to use the 'Elo' method " " "
method: " " " Here, you select the evaluation method for your prompts. You must choose between 'Elo',
'Classification', 'Equals', 'Includes', 'Function Calling', 'Code Generation' 'JSON Validation', 'Semantic Similarity' and 'LogProbs'. " " "
model:
name: " " " The name of the GPT model you will use to evaluate the prompts. " " "
temperature: " " " The temperature of the GPT model you will use to evaluate the prompts. " " "
max_tokens: " " " The maximum number of tokens you will allow the GPT model to use to generate the response to the test. " " "
functions: " " " This field must only be filled out in case the 'Function Calling' method is intended to be used.
If another method is used, it must not be filled out. The structure is a JSON object. Let's break down the different components:
- Function Name (name): This is the identifier used to refer to this function within the context of your code.
- Function Description (description): A brief description of what the function does.
- Function Parameters (parameters): This section defines the input parameters that the function accepts.
- Type (type): The type of the parameter being defined.
- Properties (properties): This is an object containing properties that the input parameter object should have.
- File Type (file_type): This is a property of the parameter object.
- Enum (enum): An enumeration of allowed values for the 'file_type' property. (optional)
- Description (description): A description of what the 'file_type' property represents.
- Required (required): An array listing the properties that are required within the parameter object. (optional) " " "
function_call: " " " This field must only be filled out in case the 'Function Calling' method is intended to be
used. If another method is used, it must not be filled out. " " "
prompts: " " " You have two options, either provide your list of prompts or generate them following the instructions below. " " "
list: " " " A list of prompts you want to evaluate. If you want to generate them with the prompt generator, don't use this field.
Please provide a minimum number of 4 prompts. Your prompts must be listed as follows:
- 'Prompt1'
- 'Prompt2'... " " "
generation:
number: " " " The number of prompts you are going to evaluate. You need to provide this key value only if you are going to generate the prompts. Indicate the quantity of prompts you want to generate. Please provide a minimum number of 4 prompts. If you do not define this key by default, 4 prompts will be created. " " "
constraints: " " " If you are going to generate prompts, this optional feature allows you to add special characteristics to the prompts that will be generated. For example, if you want prompts with a maximum length of 50 characters, simply complete with 'Generate prompts with a maximum length of 50 characters'. If you don't want to use it, you don't need to have this key defined. " " "
description: " " " Here is the description of the type of task that summarizes the test cases. If you use the 'Elo' method you mustn't use this field. " " "
best_prompts: " " " The number of prompts you want to iterate over and on which you want to highlight the final results. the value must be between 2 and the number of prompts you provide (or generate) minus one. If you do not define this value the default value will be 2. " " "
model:
name: " " " The name of the GPT model you will use to generate the prompts. " " "
temperature: " " " The temperature of the GPT model you will use to generate the prompts. " " "
max_tokens: " " " The maximum number of tokens you will allow the GPT model to use to generate your prompts. " " "
iterations:
number: " " " The number of iterations you want to perform on the best prompts obtained in your initial testing to arrive at
prompts with better final results. If you don't want to try alternatives combining your best prompts just put 0. " " "
best_percentage: " " " Number between 0 and 100 indicating that iterations should be stopped if all 'best_prompts' equaled or exceeded the indicated accuracy. If this value is not defined, it will default to 100. " " "
model:
name: " " " The name of the GPT model you will use to generate the prompts. " " "
temperature: " " " The temperature of the GPT model you will use to generate the prompts. " " "
max_tokens: " " " The maximum number of tokens you will allow the GPT model to use to generate your prompts. " " "
You can not define these variables for ' model ' in case you want to keep the same variables that were used in ' generation ' , in case the ' generation ' field has not been used it will take the following default values:
name: ' gpt-4
temperature: 0.6
max_tokens: 300"""
timeout: """Timeout set for an API request. This time limit indicates how long the client should wait to receive a response before the request expires."""
n_retries: """Number of attempts that will be automatically made to resend an API request in case the initial request fails."""如果您要評估的YAML文件的結構有錯誤,請不要擔心。在由及時工程師評估之前,您的文件將得到驗證,您將收到一條通知,指示您需要在何處進行更正,以便成功地評估。
請記住,當您生成提示時,您可以使用constraints鍵明確要求您要生成的提示具有特殊的特徵,例如,“生成不超過20個單詞的提示”。
如果您想知道進行評估費用要花多少錢,只需輸入:
promptwizard YAML_FILE_PATH當被問及您是否要繼續時,只需回答“ N”。
否則,響應“ Y”並進行評估,您將獲得大約成本,以及最後的最終費用。在最終的JSON文件中,除了看到最佳結果的頂部提示外,您還將獲得有關GPT-3.5-Turbo和GPT-4有效消耗的成本和代幣數量相同的信息。
另外,您可以在Python腳本中執行以下操作:
from promptwizard.approximate_cost import cost
print(cost.approximate_cost(test_cases, method, prompts_value))而且您將看到可能的評估費用大概。
如果您想查看用法示例,我們提供以下COLAB筆記本,供您探索使用PromsterWizard的不同方式。 (https://colab.research.google.com/drive/1iw2y43923vecohkpuhogenwy1y81rw8i?usp = sharing)
Leniolabs和越來越多的貢獻者社區以愛心為促的。我們通過您的想法建立數字體驗。取得聯繫!另外,如果您對PromsterWizard有任何疑問或反饋,請隨時通過[email protected]與我們聯繫。我們很想收到您的來信!