Easy NLP Augmentation Download - Easy NLP Augmentation Source Download Download

Easy NLP Augmentation

โค้ดแหล่งที่มา AI

1.0.0

ดาวน์โหลด

เพิ่มข้อความง่าย ๆ

Easy Text Augmenter เป็นแพ็คเกจ Python สำหรับการเพิ่มข้อมูลข้อความบน pandas dataframe ของคุณโดยตรงโดยใช้เทคนิค NLP ต่างๆ ตอนนี้มีเพียง 3 เทคนิคเท่านั้น:

augment_random_word
augment_random_character
augment_word_bert

การติดตั้ง

 ! pip install easy-nlp-augmentation
import easy_text_augmenter
easy_text_augmenter.info ()

วิธีใช้

augment_random_word

 import pandas as pd
from easy_text_augmenter import augment_random_word

df = pd . DataFrame ({
    'text' : [ 'This is a test' , 'Another test data ' , 'Of course we need more data' , 'Newton does not like apple' , 'Hello world I am a human' ],
    'label' : [ 'A' , 'A' , 'B' , 'B' , 'A' ]
})
classes_to_augment = [ 'A' , 'B' ]
augmented_df = augment_random_word ( df , classes_to_augment , augmentation_percentage = 0.8 , text_column = 'text' )
print ( augmented_df )

ผลลัพธ์ :

                          text label
0               This is a test     A
1           Another test data      A
2  Of course we need more data     B
3   Newton does not like apple     B
4     Hello world I am a human     A
5             Th is is a te st     A
6                 Another data     A
7   Does not newton like apple     B

augment_random_character

 from easy_text_augmenter import augment_random_word

classes_to_augment = [ 'A' , 'B' ]
augmented_df = augment_random_character ( df , classes_to_augment , augmentation_percentage = 0.8 , text_column = 'text' )
print ( augmented_df )

ผลลัพธ์ :

                          text label
0               This is a test     A
1           Another test data      A
2  Of course we need more data     B
3   Newton does not like apple     B
4     Hello world I am a human     A
5               This is a estt     A
6            Another te8t data     A
7   Newtun d0e8 not like apple     B

augment_word_bert

 from easy_text_augmenter import augment_word_bert

classes_to_augment = [ 'A' , 'B' ]
augmented_df = augment_word_bert ( df , classes_to_augment , augmentation_percentage = 0.8 , text_column = 'text' , model_path = 'bert-base-uncased' , random_state = 70 )
print ( augmented_df )

ผลลัพธ์ :

                                          text label
0                               This is a test     A
1                           Another test data      A
2                  Of course we need more data     B
3                   Newton does not like apple     B
4                     Hello world I am a human     A
5                         another test of data     A
6                      this term is not a test     A
7  newton does absolutely not like every apple     B

ผู้เขียน

ติดต่อฉันได้ที่:

[email protected]
shizuka.my.id

เอกสาร

augment_random_word

คำอธิบาย:

ฟังก์ชั่น augment_random_word เพิ่มเปอร์เซ็นต์ที่ระบุของตัวอย่างในคลาสที่กำหนดของ dataframe โดยการใช้เทคนิคการเสริมหนึ่งในสามแบบสุ่ม (swap, delete, split) ไปยังคอลัมน์ข้อความ

augment_random_word(df, classes_to_augment, augmentation_percentage, text_column, random_state=42, weights=[0.5, 0.3, 0.2])

พารามิเตอร์:

df (pandas.dataframe): อินพุต dataframe ที่มีข้อมูลข้อความและฉลาก
classes_to_augment (รายการ): รายการฉลากคลาสที่ต้องเพิ่ม
augmentation_percentage (Float): เปอร์เซ็นต์ของตัวอย่างที่จะเพิ่มจากแต่ละคลาสที่ระบุ
text_column (str): ชื่อของคอลัมน์ใน dataframe ที่มีข้อมูลข้อความ
random_state (int, เป็นทางเลือก): เมล็ดสุ่มที่ใช้สำหรับระบุแถวที่จะเพิ่ม ค่าเริ่มต้นคือ 42
weights (รายการตัวเลือก): รายการน้ำหนักเพื่อกำหนดความน่าจะเป็นในการเลือกแต่ละประเภทการเสริม ค่าเริ่มต้นคือ [0.5, 0.3, 0.2] สำหรับการแลกเปลี่ยนลบและแยกตามลำดับ

เทคนิค weights :

SWAP: คำสลับแบบสุ่มในข้อความ
ลบ: สุ่มลบคำในข้อความ
แยก: สุ่มแยกคำเป็นข้อความ

ผลตอบแทน:

pandas.dataframe: dataframe ใหม่ที่มีข้อมูลเพิ่มเติมต่อท้ายข้อมูลต้นฉบับ

augment_random_character

คำอธิบาย:

ฟังก์ชั่น augment_random_character ดำเนินการเพิ่มการเพิ่มอักขระแบบสุ่มในคลาสเฉพาะของข้อมูลข้อความภายใน DataFrame มันใช้เทคนิคการเสริมหลายอย่างเพื่อเปลี่ยนอักขระแบบสุ่มในข้อความเพิ่มความหลากหลายของชุดข้อมูล

augment_random_character(df, classes_to_augment, augmentation_percentage, text_column, random_state=42, weights=[0.2, 0.2, 0.2, 0.2, 0.2])

พารามิเตอร์:

df (Pd.DataFrame): อินพุต dataframe ที่มีข้อมูลข้อความและฉลากที่เกี่ยวข้อง
classes_to_augment (รายการ): รายการป้ายกำกับคลาสที่ระบุว่าควรเพิ่มคลาสใด
augmentation_percentage (Float): เปอร์เซ็นต์ของตัวอย่างในแต่ละชั้นเรียนที่ควรเพิ่ม
text_column (str): ชื่อคอลัมน์ใน dataframe ที่มีข้อมูลข้อความที่จะเพิ่ม
random_state (int, เป็นทางเลือก): เมล็ดสุ่มที่ใช้สำหรับระบุแถวที่จะเพิ่ม ค่าเริ่มต้นคือ 42
weights (รายการตัวเลือก): รายการน้ำหนักสำหรับแต่ละเทคนิคการเสริมใช้เพื่อกำหนดความน่าจะเป็นของการเลือกแต่ละเทคนิค ค่าเริ่มต้นคือ [0.2, 0.2, 0.2, 0.2, 0.2]

เทคนิค weights :

AUG_OCR: การเสริม OCR
Aug_keyboard: การจำลองข้อผิดพลาดของแป้นพิมพ์
Aug_insert: การแทรกอักขระแบบสุ่ม
Aug_swap: การแลกเปลี่ยนอักขระแบบสุ่ม
Aug_delete: การลบอักขระแบบสุ่ม

ผลตอบแทน:

pandas.dataframe: dataframe ใหม่ที่มีข้อมูลเพิ่มเติมต่อท้ายข้อมูลต้นฉบับ

augment_word_bert

คำอธิบาย:

ฟังก์ชั่น augment_word_bert จะเพิ่มข้อมูลข้อความใน DataFrame โดยใช้เทคนิคการเพิ่มคำที่ใช้ BERT มันแทรกหรือแทนที่คำในคอลัมน์ข้อความที่ระบุสำหรับเปอร์เซ็นต์ที่กำหนดของตัวอย่างในคลาสที่ระบุ

def augment_word_bert(df, classes_to_augment, augmentation_percentage, text_column, model_path, random_state=42, weights=[0.7, 0.3])

พารามิเตอร์:

df (pandas.dataframe): dataframe ที่มีข้อมูลที่จะเพิ่ม
classes_to_augment (รายการ): รายการป้ายกำกับคลาสที่ระบุว่าควรเพิ่มคลาสใด
augmentation_percentage (Float): เปอร์เซ็นต์ของตัวอย่างภายในแต่ละชั้นเรียนเพื่อเพิ่ม (เช่น 0.2 สำหรับ 20%)
text_column (str): ชื่อของคอลัมน์ใน dataframe ที่มีข้อความที่จะเพิ่ม
model_path (str): เส้นทางไปยังแบบจำลอง Bert ที่ผ่านการฝึกอบรมมาก่อนที่ใช้สำหรับการเพิ่ม
random_state (int, เป็นทางเลือก): เมล็ดสุ่มที่ใช้สำหรับระบุแถวที่จะเพิ่ม ค่าเริ่มต้นคือ 42
weights (รายการตัวเลือก): น้ำหนักสำหรับการเลือกระหว่างเทคนิคการแทรกและการเสริมการทดแทน (ค่าเริ่มต้นคือ [0.7, 0.3])