Easy NLP Augmentationダウンロード - Easy NLP Augmentationソースコードのダウンロード

Easy NLP Augmentation

AI ソースコード

1.0.0

ダウンロード

簡単なテキストアウゲン剤

Easy Text Augmenterは、さまざまなNLPテクニックを使用して、Pandasデータフレームでテキストデータを直接拡張するためのPythonパッケージです。今のところ3つのテクニックしかありません：

augment_random_word
augment_random_character
augment_word_bert

インストール

 ! pip install easy-nlp-augmentation
import easy_text_augmenter
easy_text_augmenter.info ()

使い方

augment_random_word

 import pandas as pd
from easy_text_augmenter import augment_random_word

df = pd . DataFrame ({
    'text' : [ 'This is a test' , 'Another test data ' , 'Of course we need more data' , 'Newton does not like apple' , 'Hello world I am a human' ],
    'label' : [ 'A' , 'A' , 'B' , 'B' , 'A' ]
})
classes_to_augment = [ 'A' , 'B' ]
augmented_df = augment_random_word ( df , classes_to_augment , augmentation_percentage = 0.8 , text_column = 'text' )
print ( augmented_df )

結果：

                          text label
0               This is a test     A
1           Another test data      A
2  Of course we need more data     B
3   Newton does not like apple     B
4     Hello world I am a human     A
5             Th is is a te st     A
6                 Another data     A
7   Does not newton like apple     B

augment_random_character

 from easy_text_augmenter import augment_random_word

classes_to_augment = [ 'A' , 'B' ]
augmented_df = augment_random_character ( df , classes_to_augment , augmentation_percentage = 0.8 , text_column = 'text' )
print ( augmented_df )

結果：

                          text label
0               This is a test     A
1           Another test data      A
2  Of course we need more data     B
3   Newton does not like apple     B
4     Hello world I am a human     A
5               This is a estt     A
6            Another te8t data     A
7   Newtun d0e8 not like apple     B

augment_word_bert

 from easy_text_augmenter import augment_word_bert

classes_to_augment = [ 'A' , 'B' ]
augmented_df = augment_word_bert ( df , classes_to_augment , augmentation_percentage = 0.8 , text_column = 'text' , model_path = 'bert-base-uncased' , random_state = 70 )
print ( augmented_df )

結果：

                                          text label
0                               This is a test     A
1                           Another test data      A
2                  Of course we need more data     B
3                   Newton does not like apple     B
4                     Hello world I am a human     A
5                         another test of data     A
6                      this term is not a test     A
7  newton does absolutely not like every apple     B

著者

で私に連絡してください：

[email protected]
shizuka.my.id

ドキュメント

augment_random_word

説明：

augment_random_word関数は、テキスト列に3つの拡張技術（スワップ、削除、分割）のいずれかをランダムに適用することにより、データフレームの特定のクラスで特定の割合のサンプルを増強します。

augment_random_word(df, classes_to_augment, augmentation_percentage, text_column, random_state=42, weights=[0.5, 0.3, 0.2])

パラメーター：

df （pandas.dataframe）：テキストデータとラベルを含む入力データフレーム。
classes_to_augment （list）：増強する必要があるクラスラベルのリスト。
augmentation_percentage （float）：指定された各クラスから増強するサンプルの割合。
text_column （str）：テキストデータを含むデータフレームの列の名前。
random_state （int、optional）：拡張する行を指定するために使用されるランダムシード。デフォルトは42です。
weights （リスト、オプション）：各増強タイプを選択する確率を決定するための重みのリスト。デフォルトは、それぞれスワップ、削除、および分割の場合[0.5、0.3、0.2]です。

weightsテクニック：

スワップ：ワードをテキストにランダムに交換します。
削除：テキストで単語をランダムに削除します。
分割：テキストでワードをランダムに分割します。

返品：

Pandas.DataFrame：元のデータに追加された拡張データを備えた新しいデータフレーム。

augment_random_character

説明：

augment_random_character関数は、データフレーム内のテキストデータの特定のクラスでランダムな文字ベースの増強を実行します。いくつかの拡張技術を使用して、テキストの文字をランダムに変更し、データセットの多様性を高めます。

augment_random_character(df, classes_to_augment, augmentation_percentage, text_column, random_state=42, weights=[0.2, 0.2, 0.2, 0.2, 0.2])

パラメーター：

df （PD.DataFrame）：テキストデータと対応するラベルを含む入力データフレーム。
classes_to_augment （list）：どのクラスを増強するかを示すクラスラベルのリスト。
augmentation_percentage （float）：拡張する必要がある各クラスのサンプルの割合。
text_column （str）：拡張するテキストデータを含むデータフレームの列名。
random_state （int、optional）：拡張する行を指定するために使用されるランダムシード。デフォルトは42です。
weights （リスト、オプション）：各技術を選択する確率を決定するために使用される各増強技術の重みのリスト。デフォルトは[0.2、0.2、0.2、0.2、0.2]です。

weightsテクニック：