Easy NLP Augmentation下載 - Easy NLP Augmentation源代碼下載

中文(繁体)

中文(简体) 中文(繁体) 한국어 日本語 English Português Español Русский العربية Indonesia Deutsch Français ภาษาไทย

首頁>編程相關>Ai源碼

Easy NLP Augmentation

Ai源碼

1.0.0

下載

易於文本增強器

Easy Text Augmenter是使用各種NLP技術直接在PANDAS DATAFAME上增強文本數據的Python軟件包。目前只有3種技術：

augment_random_word
augment_random_character
augment_word_bert

安裝

 ! pip install easy-nlp-augmentation
import easy_text_augmenter
easy_text_augmenter.info ()

如何使用

augment_random_word

 import pandas as pd
from easy_text_augmenter import augment_random_word

df = pd . DataFrame ({
    'text' : [ 'This is a test' , 'Another test data ' , 'Of course we need more data' , 'Newton does not like apple' , 'Hello world I am a human' ],
    'label' : [ 'A' , 'A' , 'B' , 'B' , 'A' ]
})
classes_to_augment = [ 'A' , 'B' ]
augmented_df = augment_random_word ( df , classes_to_augment , augmentation_percentage = 0.8 , text_column = 'text' )
print ( augmented_df )

結果：

                          text label
0               This is a test     A
1           Another test data      A
2  Of course we need more data     B
3   Newton does not like apple     B
4     Hello world I am a human     A
5             Th is is a te st     A
6                 Another data     A
7   Does not newton like apple     B

augment_random_character

 from easy_text_augmenter import augment_random_word

classes_to_augment = [ 'A' , 'B' ]
augmented_df = augment_random_character ( df , classes_to_augment , augmentation_percentage = 0.8 , text_column = 'text' )
print ( augmented_df )

結果：

                          text label
0               This is a test     A
1           Another test data      A
2  Of course we need more data     B
3   Newton does not like apple     B
4     Hello world I am a human     A
5               This is a estt     A
6            Another te8t data     A
7   Newtun d0e8 not like apple     B

augment_word_bert

 from easy_text_augmenter import augment_word_bert

classes_to_augment = [ 'A' , 'B' ]
augmented_df = augment_word_bert ( df , classes_to_augment , augmentation_percentage = 0.8 , text_column = 'text' , model_path = 'bert-base-uncased' , random_state = 70 )
print ( augmented_df )

結果：

                                          text label
0                               This is a test     A
1                           Another test data      A
2                  Of course we need more data     B
3                   Newton does not like apple     B
4                     Hello world I am a human     A
5                         another test of data     A
6                      this term is not a test     A
7  newton does absolutely not like every apple     B

作者

請與我聯繫：

[email protected]
shizuka.my.id

文件

augment_random_word

augment_random_word

描述：

通過將三種增強技術（交換，刪除，拆分）之一隨機應用於文本列中， augment_random_word函數通過隨機將三種增強技術之一（交換，刪除，拆分）隨機應用於給定的數據框中的指定百分比。

augment_random_word(df, classes_to_augment, augmentation_percentage, text_column, random_state=42, weights=[0.5, 0.3, 0.2])

參數：

df （pandas.dataframe）：包含文本數據和標籤的輸入數據框架。
classes_to_augment （列表）：需要增強的類標籤列表。
augmentation_percentage （float）：從每個指定類增強的樣本百分比。
text_column （str）：包含文本數據的數據框中的列的名稱。
random_state （INT，可選）：用於指定要增加的隨機種子。默認值為42。
weights （列表，可選）：確定選擇每種增強類型的概率的權重列表。掉期，刪除和拆分的默認值為[0.5，0.3，0.2]。

weights技術：

交換：文本中隨機交換單詞。
刪除：文本中隨機刪除單詞。
拆分：文本中隨機拆分單詞。

返回：

pandas.dataframe：一個新的數據框架，上面貼在原始數據上的增強數據。

augment_random_character

augment_random_character

描述：

augment_random_character函數在數據框架內對特定類別的文本數據類別執行基於隨機字符的增強。它使用幾種增強技術來隨機更改文本中的字符，從而增加了數據集的多樣性。

augment_random_character(df, classes_to_augment, augmentation_percentage, text_column, random_state=42, weights=[0.2, 0.2, 0.2, 0.2, 0.2])

參數：

df （pd.dataframe）：包含文本數據及其相應標籤的輸入數據框架。
classes_to_augment （列表）：類標籤的列表，指示應增強哪些類。
augmentation_percentage （float）：每個班級中應增強樣本的百分比。
text_column （str）：數據框中包含要增強的文本數據的列名。
random_state （INT，可選）：用於指定要增加的隨機種子。默認值為42。
weights （列表，可選）：每種增強技術的權重列表，用於確定選擇每種技術的概率。默認值為[0.2、0.2、0.2、0.2、0.2]。

weights技術：

Aug_ocr：基於OCR的增強。
Aug_keyboard：鍵盤錯誤模擬。
Aug_insert：隨機字符插入。
Aug_swap：隨機字符交換。
Aug_delete：隨機字符刪除。

返回：

pandas.dataframe：一個新的數據框架，上面貼在原始數據上的增強數據。

augment_word_bert

augment_word_bert

描述：

使用基於BERT的Word增強技術， augment_word_bert函數在數據框中增強文本數據。它在指定的文本列中插入或替換單詞作為指定類中給定百分比的樣本。

def augment_word_bert(df, classes_to_augment, augmentation_percentage, text_column, model_path, random_state=42, weights=[0.7, 0.3])

參數：

df （pandas.dataframe）：包含要增強數據的數據框架。
classes_to_augment （列表）：類標籤的列表，指示應增強哪些類。
augmentation_percentage （float）：每個班級中的樣本百分比以增強（例如，為0.2％，為20％）。
text_column （str）：數據框中包含要增強文本的列的名稱。
model_path （STR）：用於增強預訓練的BERT模型的路徑。
random_state （INT，可選）：用於指定要增加的隨機種子。默認值為42。
weights （列表，可選）：在插入和替代技術之間進行選擇的權重（默認值為[0.7，0.3]）。

返回：

pandas.dataframe：帶有其他增強樣品的原始數據框。

展開

附加信息

版本 1.0.0
類型 Ai源碼
更新時間 2025-08-30
大小 13.36KB
來自於 Github

相關應用

easy steamcmd

2024-11-14
easy digital downloads

2024-11-06
gp|簡易CMS

2011-10-18
Easy網管

2009-06-30
簡單的思考

2009-05-11
簡易內容管理系統

2009-05-05

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
ML stack

Ai源碼

1.0.0
awesome free chatgpt

Ai源碼

1.0.0
pywin_contextmenu

Ai源碼

Version update
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3

相關資訊全部