Este repositório ajuda a coletar conversas sintéticas entre caráter e usuário. Este pipeline suporta diferentes provedores de modelos, incluindo OpenAI ou ISVC personalizado.
Estrela este repositório:
DataSets HuggingFace:
Se você fez um conjunto de dados com este projeto, fico feliz em adicionar o seu aqui.
O pipeline de geração de conjunto de dados sintético consiste em 2 peças principais:
Para gerar perfis de caracteres, podemos usar gpt-3.5-turbo do OpenAI. Como não vamos gerar nada de especial aqui, não podemos nos preocupar com a moderação (basta criar boas sementes).
Para executar o script, podemos fazer o seguinte:
cd experiments/character_profiles
python3 main.py --config_path ./experiments/topic_experts/romance/config.yamlComo resultado, podemos conseguir personagens como este:
{
"bot_name" : " Kiriko (quiet girl in class) " ,
"personalities" : " shy, honest, sweet, she is sure to comment on all things beautiful if she can get over her shyness " ,
"categories" : " romance, school, urban-grounded "
} Esta etapa do pipeline pode ser feita com qualquer provedor de nuvem com pequenas alterações. Os usuários de Chai preferem as gerações de Vicuña em vez de gpt-3.5-turbo do Openai. Mas você pode usar gpt-3.5-turbo ou gpt4 , dê uma olhada neste exemplo: link.
Usaremos um construtor de bot estendido. O código de exemplo pode ser assim:
import os
from role_play_synthetic . generator . base import Generator
from role_play_synthetic . models . chai_isvc import ChaiISVCModel
from role_play_synthetic . prompters . vicuna_v1 import VicunaV1Prompter
from role_play_synthetic . prompters . seed import Seed
from experiments . vicuna . config import (
seeds ,
description_template ,
first_message_template ,
user_message_template ,
character_message_template ,
)
ENDPOINT_URL = os . getenv ( "ENDPOINT_URL" )
DEFAULT_GENERATION_PARAMS = {
'temperature' : 0.9 ,
'top_p' : 1 ,
'top_k' : 40 ,
'frequency_penalty' : 0. ,
'presence_penalty' : 0.1
}
model = ChaiISVCModel ( endpoint_url = ENDPOINT_URL )
prompter = VicunaV1Prompter (
description_template = description_template ,
first_message_template = first_message_template ,
user_message_template = user_message_template ,
character_message_template = character_message_template ,
)
generator = Generator ( prompter = prompter , model = model )
inputs = Seed (
name = "Professor Quantum (Time Travelling Scientist)" ,
categories = [ 'sci-fi' , 'time-travel' , 'mystery' , 'role-play' ],
personalities = [ 'intelligent' , 'eccentric' , 'enthusiastic' , 'always carrying a pocket watch' , 'quirky' ],
is_input = True
)
character = generator . generate ( seeds = seeds , input_seed = inputs , generation_params = DEFAULT_GENERATION_PARAMS )
print ( character . to_dict ())Saída:
{
"name" : " Professor Quantum (Time Travelling Scientist) " ,
"categories" : [
" sci-fi " ,
" time-travel " ,
" mystery " ,
" role-play "
],
"personalities" : [
" intelligent " ,
" eccentric " ,
" enthusiastic " ,
" always carrying a pocket watch " ,
" quirky "
],
"description" : " Professor Quantum, the eccentric time traveler, has spent his life studying the mysteries of time and reality. His enthusiasm and intelligence shine through as he discusses the intricacies of his groundbreaking theories. Constantly carrying a pocket watch, he delights in the unexpected twists and turns that time travel brings, always eager to explore the unknown. " ,
"conversation" : [
{
"role" : " character " ,
"content" : " *Professor Quantum taps his pocket watch, a smile spreading across his face.* The past is a strange place... let's see where it takes us. "
},
{
"role" : " user " ,
"content" : " *I nod eagerly* Professor Quantum, lead the way! "
},
{
"role" : " character " ,
"content" : " *Professor Quantum pulls out a glowing blue orbs, and points it at the time and space.* Quantum Leap, activate! "
},
{
"role" : " user " ,
"content" : " *I feel a strange sensation as I am transported through time and space* Wow, is this really happening? "
},
{
"role" : " character " ,
"content" : " *The Professor nods, a mischievous twinkle in his eye.* It sure is! Now, let's see where we end up! "
},
{
"role" : " user " ,
"content" : " *I look around* Where are we? This doesn't look like any time or place I've ever seen. "
},
{
"role" : " character " ,
"content" : " *The Professor grins, his eyes sparkling.* That's the beauty of time travel! The possibilities are endless. Let's see what adventures await us in this new time and place. "
}
]
}Usamos modelos e sementes para operar com o Bot Builder. Todos os modelos e Prompters compartilham a mesma API, por isso é muito fácil de alterar (para o OpenAI, por exemplo) ou se estender com novos Prompters ou modelos. Dê uma olhada neste config.py.
Assim que preparamos sementes e modelos em config.py, estamos prontos para geração de início:
cd experiments/topic_experts
python3 main.py --config_path romantic/config.py --output_dataset_path AlekseyKorshuk/synthetic-romantic-characters