
MULTI_TASK_NLP是一种实用工具包,使NLP开发人员可以轻松地训练和推断出多个任务的单个模型。我们支持大多数NLU任务和多个基于变压器的编码器的各种数据格式(例如Bert,Distil-Bert,Albert,Roberta,XLNET等)
有关此库的完整文档,请参阅文档
任何对话AI系统都涉及构建多个组件以执行各种任务和管道将所有组件缝合在一起。提供了NLP中基于变压器的模型的最新有效性,非常常见的是建立一个基于变压器的模型来解决您的用例。但是,让多个这样的模型共同运行以进行对话AI系统,可以导致昂贵的资源消耗,提高预测的潜伏期并使系统难以管理。这对任何想以简单的方式构建对话式AI系统的人构成了真正的挑战。
MULTI_TASK_NLP使您能够将多个任务定义在一起并训练单个模型,该模型同时学习所有定义的任务。这意味着可以执行与单个任务相当的延迟和资源消耗的多个任务。
要使用多任务-NLP,您可以使用以下终端命令将存储库克隆到系统上所需的位置。
$ cd /desired/location/
$ git clone https://github.com/hellohaptik/multi-task-NLP.git
$ cd multi-task-NLP
$ pip install -r requirements.txt 注意: - 使用Python 3.7.3构建和测试库。建议在虚拟环境中安装需求。
快速指南,显示如何仅在3个简单的步骤中对单个/多个NLU任务进行培训,而无需代码!
请按照以下三个简单步骤训练您的多任务模型!
任务文件是一个YAML格式文件,您可以在其中添加所有要训练多任务模型的任务。
TaskA :
model_type : BERT
config_name : bert-base-uncased
dropout_prob : 0.05
label_map_or_file :
- label1
- label2
- label3
metrics :
- accuracy
loss_type : CrossEntropyLoss
task_type : SingleSenClassification
file_names :
- taskA_train.tsv
- taskA_dev.tsv
- taskA_test.tsv
TaskB :
model_type : BERT
config_name : bert-base-uncased
dropout_prob : 0.3
label_map_or_file : data/taskB_train_label_map.joblib
metrics :
- seq_f1
- seq_precision
- seq_recall
loss_type : NERLoss
task_type : NER
file_names :
- taskB_train.tsv
- taskB_dev.tsv
- taskB_test.tsv要了解任务文件参数以使您的任务文件,任务文件参数。
定义任务文件后,运行以下命令以准备数据。
$ python data_preparation.py
--task_file 'sample_task_file.yml'
--data_dir 'data'
--max_seq_len 50要了解data_preparation.py脚本及其参数,请参阅运行数据准备。
最后,您可以使用以下命令开始培训。
$ python train.py
--data_dir 'data/bert-base-uncased_prepared_data'
--task_file 'sample_task_file.yml'
--out_dir 'sample_out'
--epochs 5
--train_batch_size 4
--eval_batch_size 8
--grad_accumulation_steps 2
--log_per_updates 25
--save_per_updates 1000
--eval_while_train True
--test_while_train True
--max_seq_len 50
--silent True 要了解train.py脚本及其论点,请参考运行火车
一旦您对任务进行了多任务模型,我们就会提供一种方便,简便的方法,用于通过推理管道对样本进行预测。
对于使用训练有素的Taska,Taskb和Taskc对样本进行推断,您可以通过制作此类对象来导入InferPipeline类并加载相应的多任务模型。
> >> from infer_pipeline import inferPipeline
> >> pipe = inferPipeline ( modelPath = 'sample_out_dir/multi_task_model.pt' , maxSeqLen = 50 )可以调用infer功能以获取有关上述任务的输入样本的预测。
> >> samples = [ [ 'sample_sentence_1' ], [ 'sample_sentence_2' ] ]
> >> tasks = [ 'TaskA' , 'TaskB' ]
> >> pipe . infer ( samples , tasks )有关了解infer_pipeline ,请参阅推断。
在这里,您可以找到各种对话式AI任务作为示例,并可以通过笔记本中提到的简单步骤来训练多任务模型。
(设置:多任务,任务类型:多重)
意图检测(任务类型:单句分类)
Query: I need a reservation for a bar in bangladesh on feb the 11th 2032
Intent: BookRestaurant
NER(任务类型:序列标签)
Query: ['book', 'a', 'spot', 'for', 'ten', 'at', 'a', 'top-rated', 'caucasian', 'restaurant', 'not', 'far', 'from', 'selmer']
NER tags: ['O', 'O', 'O', 'O', 'B-party_size_number', 'O', 'O', 'B-sort', 'B-cuisine', 'B-restaurant_type', 'B-spatial_relation', 'I-spatial_relation', 'O', 'B-city']
片段检测(任务类型:单句分类)
Query: a reservation for
Label: fragment
笔记本:-Intent_ner_fragment
变换文件:-Transform_file_snips
任务文件:-tasks_file_snips
(设置:单任务,任务类型:句子对分类)
Query1: An old man with a package poses in front of an advertisement.
Query2: A man poses in front of an ad.
Label: entailment
Query1: An old man with a package poses in front of an advertisement.
Query2: A man poses in front of an ad for beer.
Label: non-entailment
笔记本:-Entailment_snli
变换文件:-transform_file_snli
任务文件:-tasks_file_snli
(设置:单任务,任务类型:句子对分类)
Query: how much money did evander holyfield make
Context: Evander Holyfield Net Worth. How much is Evander Holyfield Worth? Evander Holyfield Net Worth: Evander Holyfield is a retired American professional boxer who has a net worth of $500 thousand. A professional boxer, Evander Holyfield has fought at the Heavyweight, Cruiserweight, and Light-Heavyweight Divisions, and won a Bronze medal a the 1984 Olympic Games.
Label: answerable
笔记本: - Answerability_detection_msmarco
变换文件:-Transform_file_answerability
任务文件:-tasks_file_answerability
(设置:单任务,任务类型:单句分类)
Query: what's the distance between destin florida and birmingham alabama?
Label: NUMERIC
Query: who is suing scott wolter
Label: PERSON
笔记本: - query_type_detection
变换文件:-Transform_file_queryType
任务文件:-tasks_file_queryType
(设置:多任务,任务类型:序列标签)
Query: ['Despite', 'winning', 'the', 'Asian', 'Games', 'title', 'two', 'years', 'ago', ',', 'Uzbekistan', 'are', 'in', 'the', 'finals', 'as', 'outsiders', '.']
NER tags: ['O', 'O', 'O', 'I-MISC', 'I-MISC', 'O', 'O', 'O', 'O', 'O', 'I-LOC', 'O', 'O', 'O', 'O', 'O', 'O', 'O']
POS tags: ['I-PP', 'I-VP', 'I-NP', 'I-NP', 'I-NP', 'I-NP', 'B-NP', 'I-NP', 'I-ADVP', 'O', 'I-NP', 'I-VP', 'I-PP', 'I-NP', 'I-NP', 'I-SBAR', 'I-NP', 'O']
笔记本:-ner_pos_tagging_conll
变换文件:-Transform_file_conll
任务文件:-tasks_file_conll
(设置:单任务,任务类型:单句分类)
Query: What places have the oligarchy government ?
Label: well-formed
Query: What day of Diwali in 1980 ?
Label: not well-formed
笔记本: - query_correctness
变换文件:-Transform_file_query_correctness
任务文件:-tasks_file_query_correctness
(设置:单任务,任务类型:单句分类)
Query1: What is the most used word in Malayalam?
Query2: What is meaning of the Malayalam word ""thumbatthu""?
Label: not similar
Query1: Which is the best compliment you have ever received?
Query2: What's the best compliment you've got?
Label: similar
笔记本: - QUERY_SIMIRALITY
变换文件:-Transform_file_qqp
任务文件:-tasks_file_qqp
(设置:单任务,任务类型:单句分类)
Review: What I enjoyed most in this film was the scenery of Corfu, being Greek I adore my country and I liked the flattering director's point of view. Based on a true story during the years when Greece was struggling to stand on her own two feet through war, Nazis and hardship. An Italian soldier and a Greek girl fall in love but the times are hard and they have a lot of sacrifices to make. Nicholas Cage looking great in a uniform gives a passionate account of this unfulfilled (in the beginning) love. I adored Christian Bale playing Mandras the heroine's husband-to-be, he looks very very good as a Greek, his personality matched the one of the Greek patriot! A true fighter in there, or what! One of the movies I would like to buy and keep it in my collection...for ever!
Label: positive
笔记本:-imdb_sentiment_analysis
变换文件:-Transform_file_imdb
任务文件:-tasks_file_imdb