Descarga mutate - Descargar el código fuente mutate

mutate

Código Fuente de IA

1.0.0

Descargar

? Mudar

Una biblioteca para sintetizar conjuntos de datos de texto utilizando modelos de idiomas grandes (LLM). Mutate lee a través de los ejemplos en el conjunto de datos y genera ejemplos similares utilizando pocas indicaciones de toma generadas automáticamente.

1. Instalación

 pip install mutate-nlp

o

 pip install git+https://github.com/infinitylogesh/mutate

2. Uso

2.1 Sintetizar datos de texto de archivos CSV locales

 from mutate import pipeline

pipe = pipeline ( "text-classification-synthesis" ,
                model = "EleutherAI/gpt-neo-2.7B" ,
                device = 1 )

task_desc = "Each item in the following contains movie reviews and corresponding sentiments. Possible sentimets are neg and pos"


# returns a python generator
text_synth_gen = pipe ( "csv" ,
                    data_files = [ "local/path/sentiment_classfication.csv" ],
                    task_desc = task_desc ,
                    text_column = "text" ,
                    label_column = "label" ,
                    text_column_alias = "Comment" ,
                    label_column_alias = "sentiment" ,
                    shot_count = 5 ,
                    class_names = [ "pos" , "neg" ])

#Loop through the generator to synthesize examples by class
for synthesized_examples  in text_synth_gen :
    print ( synthesized_examples )

Mostrar salida

{
    "text" : [ "The story was very dull and was a waste of my time. This was not a film I would ever watch. The acting was bad. I was bored. There were no surprises. They showed one dinosaur," ,
    "I did not like this film. It was a slow and boring film, it didn't seem to have any plot, there was nothing to it. The only good part was the ending, I just felt that the film should have ended more abruptly." ]
    "label" :[ "neg" , "neg" ]
}

{
    "text" :[ "The Bell witch is one of the most interesting, yet disturbing films of recent years. It’s an odd and unique look at a very real, but very dark issue. With its mixture of horror, fantasy and fantasy adventure, this film is as much a horror film as a fantasy film. And it‘s worth your time. While the movie has its flaws, it is worth watching and if you are a fan of a good fantasy or horror story, you will not be disappointed." ],
    "label" :[ "pos" ]
}

# and so on .....

2.2 Sintetizar datos de texto de? conjuntos de datos

¿Debajo del capó Mutate usa lo maravilloso? Biblioteca de conjuntos de datos para el procesamiento del conjunto de datos, ¿por lo que es compatible? conjuntos de datos fuera de la caja.

 from mutate import pipeline

pipe = pipeline ( "text-classification-synthesis" ,
                model = "EleutherAI/gpt-neo-2.7B" ,
                device = 1 )

task_desc = "Each item in the following contains customer service queries expressing the mentioned intent"

synthesizerGen = pipe ( "banking77" ,
                    task_desc = task_desc ,
                    text_column = "text" ,
                    label_column = "label" ,
                    # if the `text_column` doesn't have a meaningful value
                    text_column_alias = "Queries" ,
                    label_column_alias = "Intent" , # if the `label_column` doesn't have a meaningful value
                    shot_count = 5 ,
                    dataset_args = [ "en" ])


for exp in synthesizerGen :
    print ( exp )

Mostrar salida

{ "text" :[ "How can i know if my account has been activated? (This is the one that I am confused about)" ,
         "Thanks! My card activated" ],
"label" :[ "activate_my_card" ,
         "activate_my_card" ]
}

{
"text" : [ "How do i activate this new one? Is it possible?" ,
         "what is the activation process for this card?" ],
"label" :[ "activate_my_card" ,
         "activate_my_card" ]
}

# and so on .....

2.3 Me siento afortunado: revuelve infinitamente el conjunto de datos para generar ejemplos indefinidamente

PRECAUCIÓN : El bucle infinético a través del conjunto de datos tiene una mayor probabilidad de generar ejemplos duplicados.

 from mutate import pipeline

pipe = pipeline ( "text-classification-synthesis" ,
                model = "EleutherAI/gpt-neo-2.7B" ,
                device = 1 )

task_desc = "Each item in the following contains movie reviews and corresponding sentiments. Possible sentimets are neg and pos"


# returns a python generator
text_synth_gen = pipe ( "csv" ,
                    data_files = [ "local/path/sentiment_classfication.csv" ],
                    task_desc = task_desc ,
                    text_column = "text" ,
                    label_column = "label" ,
                    text_column_alias = "Comment" ,
                    label_column_alias = "sentiment" ,
                    class_names = [ "pos" , "neg" ],
                    # Flag to generate indefinite examples
                    infinite_loop = True )

#Infinite loop
for exp in synthesizerGen :
    print ( exp )

3. Apoyo

3.1 actualmente es compatible

Síntesis de conjuntos de datos de clasificación de texto : pocos datos de texto de tomas Synsthessize para conjuntos de datos de clasificación de texto utilizando LLMS causales (GPT Like)

3.2 Hoja de ruta:

Otros tipos de síntesis de conjunto de datos de texto : ner, pares de oraciones, etc.
Soporte para sintonizar para una mejor generación de calidad
Etiquetado de pseudo

4. Crédito

Eleutherai para democratizar grandes LMS.
Esta biblioteca usa? Conjuntos de datos y? Transformadores para procesar conjuntos de datos y modelos.

5. Referencias

La idea de generar ejemplos del modelo de lenguaje grande está inspirada en los trabajos a continuación,

Algunos ejemplos más pueden valer miles de millones de parámetros de Yuval Kirstain, Patrick Lewis, Sebastian Riedel, Omer Levy
GPT3MIX: Aprovechando modelos de idiomas a gran escala para el aumento de texto por Kang Min Yoo, Dongju Park, Jaewook Kang, Sang-Woo Lee, Woomyeong Park
Aumento de datos utilizando modelos de transformadores previamente capacitados por Varun Kumar, Ashutosh Choudhary, Eunah Cho

Expandir

Información adicional

Versión 1.0.0
Tipo Código Fuente de IA
Fecha de actualización 2025-09-11
tamaño 132.95KB
Proviene de Github

Aplicaciones relacionadas

ML stack

2025-07-01
awesome free chatgpt

2025-01-04
pywin_contextmenu

2025-08-31
promptl

2025-02-17
tick.chat

2025-09-16
FastLoRAChat

2025-09-03

Recomendado para ti

chat.petals.dev

Otro código fuente

1.0.0
GPT Prompt Templates

Otro código fuente

1.0.0
GPTyped

Otro código fuente

GPTyped 1.0.5
ML stack

Código Fuente de IA

1.0.0
awesome free chatgpt

Código Fuente de IA

1.0.0
pywin_contextmenu

Código Fuente de IA

Version update
Google Dorks

Otro código fuente

1.0
shepherd

Otro código fuente

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Otro código fuente

v1.1.0-rc-3

Información relacionada Todo