Unsupervised Question Answering by Cloze Translation
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
How Effective is Task-Agnostic Data Augmentation for Pretrained Transformers?
It’s Morphin’ Time! Combating Linguistic Discrimination with Inflectional Perturbations
Overview
A Visual Survey of Data Augmentation in NLP
Task-independent data augmentation for NLP
Robust, Unbiased Natural Language Processing pdf
Methods
General
random insertion, deletion, word, sentence shuffling
Replacing words with synonyms
Replace the words from dicitionary of the same label
NER
Perturbations (letter, word, or sentence level)
noisemix
Language model
Contextual augmentation
Back translation
Machine traslation
Round-trip translation
Paraphasing
Low-resource parallel corpuses
中文文本纠错任务
Leverage External Data
Using external data derived from Wikipedia. linking wikipedia articles to arbitrary input text. The idea is that if the input text were on Wikipedia, it would have links to other Wikipedia articles (that are semantically related and provide additional info).
break the input text into n-grams
check whether each n-gram exists as a wikipedia article to create a set of ‘candidate links’
prune the candidate links by computing the similarity of the input text and the abstract of each candidate