
MULTI_TASK_NLP是一種實用工具包,使NLP開發人員可以輕鬆地訓練和推斷出多個任務的單個模型。我們支持大多數NLU任務和多個基於變壓器的編碼器的各種數據格式(例如Bert,Distil-Bert,Albert,Roberta,XLNET等)
有關此庫的完整文檔,請參閱文檔
任何對話AI系統都涉及構建多個組件以執行各種任務和管道將所有組件縫合在一起。提供了NLP中基於變壓器的模型的最新有效性,非常常見的是建立一個基於變壓器的模型來解決您的用例。但是,讓多個這樣的模型共同運行以進行對話AI系統,可以導致昂貴的資源消耗,提高預測的潛伏期並使系統難以管理。這對任何想以簡單的方式構建對話式AI系統的人構成了真正的挑戰。
MULTI_TASK_NLP使您能夠將多個任務定義在一起並訓練單個模型,該模型同時學習所有定義的任務。這意味著可以執行與單個任務相當的延遲和資源消耗的多個任務。
要使用多任務-NLP,您可以使用以下終端命令將存儲庫克隆到系統上所需的位置。
$ cd /desired/location/
$ git clone https://github.com/hellohaptik/multi-task-NLP.git
$ cd multi-task-NLP
$ pip install -r requirements.txt 注意: - 使用Python 3.7.3構建和測試庫。建議在虛擬環境中安裝需求。
快速指南,顯示如何僅在3個簡單的步驟中對單個/多個NLU任務進行培訓,而無需代碼!
遵循以下三個簡單步驟來訓練您的多任務模型!
任務文件是一個YAML格式文件,您可以在其中添加所有要訓練多任務模型的任務。
TaskA :
model_type : BERT
config_name : bert-base-uncased
dropout_prob : 0.05
label_map_or_file :
- label1
- label2
- label3
metrics :
- accuracy
loss_type : CrossEntropyLoss
task_type : SingleSenClassification
file_names :
- taskA_train.tsv
- taskA_dev.tsv
- taskA_test.tsv
TaskB :
model_type : BERT
config_name : bert-base-uncased
dropout_prob : 0.3
label_map_or_file : data/taskB_train_label_map.joblib
metrics :
- seq_f1
- seq_precision
- seq_recall
loss_type : NERLoss
task_type : NER
file_names :
- taskB_train.tsv
- taskB_dev.tsv
- taskB_test.tsv要了解任務文件參數以使您的任務文件,任務文件參數。
定義任務文件後,運行以下命令以準備數據。
$ python data_preparation.py
--task_file 'sample_task_file.yml'
--data_dir 'data'
--max_seq_len 50要了解data_preparation.py腳本及其參數,請參閱運行數據準備。
最後,您可以使用以下命令開始培訓。
$ python train.py
--data_dir 'data/bert-base-uncased_prepared_data'
--task_file 'sample_task_file.yml'
--out_dir 'sample_out'
--epochs 5
--train_batch_size 4
--eval_batch_size 8
--grad_accumulation_steps 2
--log_per_updates 25
--save_per_updates 1000
--eval_while_train True
--test_while_train True
--max_seq_len 50
--silent True 要了解train.py腳本及其論點,請參考運行火車
一旦您對任務進行了多任務模型,我們就會提供一種方便,簡便的方法,用於通過推理管道對樣本進行預測。
對於使用訓練有素的Taska,Taskb和Taskc對樣本進行推斷,您可以通過製作此類對象來導入InferPipeline類並加載相應的多任務模型。
> >> from infer_pipeline import inferPipeline
> >> pipe = inferPipeline ( modelPath = 'sample_out_dir/multi_task_model.pt' , maxSeqLen = 50 )可以調用infer功能以獲取有關上述任務的輸入樣本的預測。
> >> samples = [ [ 'sample_sentence_1' ], [ 'sample_sentence_2' ] ]
> >> tasks = [ 'TaskA' , 'TaskB' ]
> >> pipe . infer ( samples , tasks )有關了解infer_pipeline ,請參閱推斷。
在這裡,您可以找到各種對話式AI任務作為示例,並可以通過筆記本中提到的簡單步驟來訓練多任務模型。
(設置:多任務,任務類型:多重)
意圖檢測(任務類型:單句分類)
Query: I need a reservation for a bar in bangladesh on feb the 11th 2032
Intent: BookRestaurant
NER(任務類型:序列標籤)
Query: ['book', 'a', 'spot', 'for', 'ten', 'at', 'a', 'top-rated', 'caucasian', 'restaurant', 'not', 'far', 'from', 'selmer']
NER tags: ['O', 'O', 'O', 'O', 'B-party_size_number', 'O', 'O', 'B-sort', 'B-cuisine', 'B-restaurant_type', 'B-spatial_relation', 'I-spatial_relation', 'O', 'B-city']
片段檢測(任務類型:單句分類)
Query: a reservation for
Label: fragment
筆記本:-Intent_ner_fragment
變換文件:-Transform_file_snips
任務文件:-tasks_file_snips
(設置:單任務,任務類型:句子對分類)
Query1: An old man with a package poses in front of an advertisement.
Query2: A man poses in front of an ad.
Label: entailment
Query1: An old man with a package poses in front of an advertisement.
Query2: A man poses in front of an ad for beer.
Label: non-entailment
筆記本:-Entailment_snli
變換文件:-transform_file_snli
任務文件:-tasks_file_snli
(設置:單任務,任務類型:句子對分類)
Query: how much money did evander holyfield make
Context: Evander Holyfield Net Worth. How much is Evander Holyfield Worth? Evander Holyfield Net Worth: Evander Holyfield is a retired American professional boxer who has a net worth of $500 thousand. A professional boxer, Evander Holyfield has fought at the Heavyweight, Cruiserweight, and Light-Heavyweight Divisions, and won a Bronze medal a the 1984 Olympic Games.
Label: answerable
筆記本: - Answerability_detection_msmarco
變換文件:-Transform_file_answerability
任務文件:-tasks_file_answerability
(設置:單任務,任務類型:單句分類)
Query: what's the distance between destin florida and birmingham alabama?
Label: NUMERIC
Query: who is suing scott wolter
Label: PERSON
筆記本: - query_type_detection
變換文件:-Transform_file_queryType
任務文件:-tasks_file_queryType
(設置:多任務,任務類型:序列標籤)
Query: ['Despite', 'winning', 'the', 'Asian', 'Games', 'title', 'two', 'years', 'ago', ',', 'Uzbekistan', 'are', 'in', 'the', 'finals', 'as', 'outsiders', '.']
NER tags: ['O', 'O', 'O', 'I-MISC', 'I-MISC', 'O', 'O', 'O', 'O', 'O', 'I-LOC', 'O', 'O', 'O', 'O', 'O', 'O', 'O']
POS tags: ['I-PP', 'I-VP', 'I-NP', 'I-NP', 'I-NP', 'I-NP', 'B-NP', 'I-NP', 'I-ADVP', 'O', 'I-NP', 'I-VP', 'I-PP', 'I-NP', 'I-NP', 'I-SBAR', 'I-NP', 'O']
筆記本:-ner_pos_tagging_conll
變換文件:-Transform_file_conll
任務文件:-tasks_file_conll
(設置:單任務,任務類型:單句分類)
Query: What places have the oligarchy government ?
Label: well-formed
Query: What day of Diwali in 1980 ?
Label: not well-formed
筆記本: - query_correctness
變換文件:-Transform_file_query_correctness
任務文件:-tasks_file_query_correctness
(設置:單任務,任務類型:單句分類)
Query1: What is the most used word in Malayalam?
Query2: What is meaning of the Malayalam word ""thumbatthu""?
Label: not similar
Query1: Which is the best compliment you have ever received?
Query2: What's the best compliment you've got?
Label: similar
筆記本: - QUERY_SIMIRALITY
變換文件:-Transform_file_qqp
任務文件:-tasks_file_qqp
(設置:單任務,任務類型:單句分類)
Review: What I enjoyed most in this film was the scenery of Corfu, being Greek I adore my country and I liked the flattering director's point of view. Based on a true story during the years when Greece was struggling to stand on her own two feet through war, Nazis and hardship. An Italian soldier and a Greek girl fall in love but the times are hard and they have a lot of sacrifices to make. Nicholas Cage looking great in a uniform gives a passionate account of this unfulfilled (in the beginning) love. I adored Christian Bale playing Mandras the heroine's husband-to-be, he looks very very good as a Greek, his personality matched the one of the Greek patriot! A true fighter in there, or what! One of the movies I would like to buy and keep it in my collection...for ever!
Label: positive
筆記本:-imdb_sentiment_analysis
變換文件:-Transform_file_imdb
任務文件:-tasks_file_imdb