rasa_nlu_gq Download - rasa_nlu_gq Source Code Download

rasa_nlu_gq

Other source code

1.0.0

Download

Rasa NLU GQ

Rasa NLU (Natural Language Understanding) is a tool for understanding natural semantics. For example, the official website is as follows:

"I'm looking for a Mexican restaurant in the center of town"

And returning structured data like:

  intent: search_restaurant
  entities: 
    - cuisine : Mexican
    - location : center

Introduction

The original project is on branch 0.2.7 and can be switched freely. The modification of this version is based on the latest version of rasa. The original component in rasa_nlu_gao has been modified, and no new additions have been made. Moreover, the previous practices are a bit cumbersome and do not need to be modified in the rasa source code. You can directly load the original component as addon, inherit the latest version of rasa, and update it in real time.

New features

The new features currently added are as follows (please download the latest rasa-nlu-gao version) (edit at 2019.06.24):

A new model for entity recognition has been added, one is bilstm+crf and the other is idcnn+crf expansion convolution model. The corresponding yml file configuration is as follows:

  language: "zh"

  pipeline:
  - name: "JiebaTokenizer"
  - name: "CountVectorsFeaturizer"
    token_pattern: "(?u)bw+b"
  - name: "EmbeddingIntentClassifier"
  - name: "rasa_nlu_gao.extractors.bilstm_crf_entity_extractor.BilstmCRFEntityExtractor"
    lr: 0.001
    char_dim: 100
    lstm_dim: 100
    batches_per_epoch: 10
    seg_dim: 20
    num_segs: 4
    batch_size: 200
    tag_schema: "iobes"
    model_type: "bilstm" # 模型支持两种idcnn膨胀卷积模型或bilstm双向lstm模型
    clip: 5
    optimizer: "adam"
    dropout_keep: 0.5
    steps_check: 100

A new module for jieba part-of-speech annotation can be added, which can easily identify the part-of-speech that jieba can support, such as names, place names, organization names, etc. The corresponding yml file configuration is as follows:

  language: "zh"

  pipeline:
  - name: "JiebaTokenizer"
  - name: "CRFEntityExtractor"
  - name: "rasa_nlu_gao.extractors.jieba_pseg_extractor.JiebaPsegExtractor"
    part_of_speech: ["nr", "ns", "nt"]
  - name: "CountVectorsFeaturizer"
    OOV_token: oov
    token_pattern: "(?u)bw+b"
  - name: "EmbeddingIntentClassifier"

A new reverse modification intention based on entity has been added, and the corresponding file configuration is as follows:

  language: "zh"

  pipeline:
  - name: "JiebaTokenizer"
  - name: "CRFEntityExtractor"
  - name: "JiebaPsegExtractor"
  - name: "CountVectorsFeaturizer"
    OOV_token: oov
    token_pattern: '(?u)bw+b'
  - name: "EmbeddingIntentClassifier"
  - name: "rasa_nlu_gao.classifiers.entity_edit_intent.EntityEditIntent"
    entity: ["nr"]
    intent: ["enter_data"]
    min_confidence: 0

A new feature of the word vector extracted by bet model has been added, and the corresponding configuration files are as follows:

  language: "zh"

  pipeline:
  - name: "JiebaTokenizer"
  - name: "rasa_nlu_gao.featurizers.bert_vectors_featurizer.BertVectorsFeaturizer"
    ip: '127.0.0.1'
    port: 5555
    port_out: 5556
    show_server_config: True
    timeout: 10000
  - name: "EmbeddingIntentClassifier"
  - name: "CRFEntityExtractor"

New configurations for CPU and GPU utilization are added, mainly EmbeddingIntentClassifier and ner_bilstm_crf , two components that use tensorflow, are configured as follows (of course, config_proto can not be configured, and the default value will utilize all resources):

  language: "zh"

  pipeline:
  - name: "JiebaTokenizer"
  - name: "CountVectorsFeaturizer"
    token_pattern: '(?u)bw+b'
  - name: "EmbeddingIntentClassifier"
    config_proto: {
      "device_count": 4,
      "inter_op_parallelism_threads": 0,
      "intra_op_parallelism_threads": 0,
      "allow_growth": True
    }
  - name: "rasa_nlu_gao.extractors.bilstm_crf_entity_extractor.BilstmCRFEntityExtractor"
    config_proto: {
      "device_count": 4,
      "inter_op_parallelism_threads": 0,
      "intra_op_parallelism_threads": 0,
      "allow_growth": True
    }

An embedding_bert_intent_classifier classifier has been added, and the corresponding configuration files are as follows:

  language: "zh"

  pipeline:
  - name: "JiebaTokenizer"
  - name: "rasa_nlu_gao.featurizers.bert_vectors_featurizer.BertVectorsFeaturizer"
    ip: '127.0.0.1'
    port: 5555
    port_out: 5556
    show_server_config: True
    timeout: 10000
  - name: "rasa_nlu_gao.classifiers.embedding_bert_intent_classifier.EmbeddingBertIntentClassifier"
  - name: "CRFEntityExtractor"

When the basic word vector uses bert, the backend classifier is completed using the tensorflow advanced API, tf.estimator,tf.data,tf.example,tf.saved_model intent_estimator_classifier_tensorflow_embedding_bert classifier, and the corresponding configuration file is as follows:

 language: "zh"

pipeline:
- name: "JiebaTokenizer"
- name: "rasa_nlu_gao.featurizers.bert_vectors_featurizer.BertVectorsFeaturizer"
  ip: '127.0.0.1'
  port: 5555
  port_out: 5556
  show_server_config: True
  timeout: 10000
- name: "rasa_nlu_gao.classifiers.embedding_bert_intent_estimator_classifier.EmbeddingBertIntentEstimatorClassifier"
- name: "SpacyNLP"
- name: "CRFEntityExtractor"