easy bert下载 - easy bert源代码下载

easy bert

Ai源码

v1.0.3

下载

Easy-Bert

Easy-bert是一种使用Google在Python和Java中使用Google高质量BERT语言模型的简单API。

目前，Easy-Bert专注于从Python和Java中的预训练的BERT模型中获取嵌入。将来还会添加对Python进行微调和预训练的支持，并支持使用Easy-Bert用于其他任务以外的其他任务。

Python

如何获得它

PYPI上可以使用Easy-bert。您可以使用pip install easybert或pip install git+https://github.com/robrua/easy-bert.git如果您想要最新的话。

用法

您可以使用Tensorflow Hub的预训练的BERT模型或Tensorflow保存的模型格式中的本地型号使用Easy-Bert。

要从TensowFlow Hub模型创建BERT嵌入器，只需用目标TF-HUB URL实例化BERT对象：

 from easybert import Bert
bert = Bert ( "https://tfhub.dev/google/bert_multi_cased_L-12_H-768_A-12/1" )

您还可以使用Bert.load中的tensorflow的保存模型格式加载本地模型：

 from easybert import Bert
bert = Bert . load ( "/path/to/your/model/" )

加载了BERT模型后，您可以使用bert.embed获取序列嵌入：

 x = bert . embed ( "A sequence" )
y = bert . embed ([ "Multiple" , "Sequences" ])

如果您需要透明的嵌入，则可以设置per_token=True ：

 x = bert . embed ( "A sequence" , per_token = True )
y = bert . embed ([ "Multiple" , "Sequences" ], per_token = True )

easy-bert返回bert嵌入为numpy阵列

每次您调用bert.embed时，都会创建一个新的TensorFlow会话来计算。如果您经常调用bert.embed with

 with bert :
    x = bert . embed ( "A sequence" , per_token = True )
    y = bert . embed ([ "Multiple" , "Sequences" ], per_token = True )

您可以使用bert.save保存BERT模型，然后稍后使用Bert.load进行重新加载：

 bert . save ( "/path/to/your/model/" )
bert = Bert . load ( "/path/to/your/model/" )

CLI

Easy-bert还提供了一个CLI工具，可以方便地使用BERT进行序列的一次性嵌入。它还可以将TensorFlow Hub模型转换为已保存的模型。

运行bert --help ， bert embed --help或bert download --help获取有关CLI工具的详细信息。

Docker

Easy-bert带有Docker构建，该版本可用作依赖Bert嵌入的应用程序的基础图像，或者仅运行CLI工具而无需安装环境。

爪哇

如何获得它

Maven Central可以使用Easy-Bert。它也通过版本页面分发。

要将最新的Easy-bert版本版本添加到您的Maven项目中，请将依赖项添加到您的pom.xml依赖项部分：

< dependencies >
  < dependency >
    < groupId >com.robrua.nlp</ groupId >
    < artifactId >easy-bert</ artifactId >
    < version >1.0.3</ version >
  </ dependency >
</ dependencies >

或者，如果您想获得最新的开发版本，请将Sonaype快照存储库添加到您的pom.xml ：

< dependencies >
  < dependency >
    < groupId >com.robrua.nlp</ groupId >
    < artifactId >easy-bert</ artifactId >
    < version >1.0.4-SNAPSHOT</ version >
  </ dependency >
</ dependencies >

< repositories >
  < repository >
    < id >snapshots-repo</ id >
    < url >https://oss.sonatype.org/content/repositories/snapshots</ url >
    < releases >
      < enabled >false</ enabled >
    </ releases >
    < snapshots >
      < enabled >true</ enabled >
    </ snapshots >
  </ repository >
</ repositories >

用法

您可以使用Easy-Bert使用Easy-Bert模型，该模型使用Easy-Bert的Python工具生成。您还可以在Maven Central上使用预生产模型。

要加载来自本地文件系统的模型，您可以使用：

 try ( Bert bert = Bert . load ( new File ( "/path/to/your/model/" ))) {
    // Embed some sequences
}

如果该模型位于您的类路径（例如，如果您通过Maven将其拉入），则可以使用：

 try ( Bert bert = Bert . load ( "/resource/path/to/your/model" )) {
    // Embed some sequences
}

加载了BERT模型后，您可以使用bert.embedSequence或bert.embedSequences获得序列嵌入：

 float [] embedding = bert . embedSequence ( "A sequence" );
float [][] embeddings = bert . embedSequences ( "Multiple" , "Sequences" );

如果您需要透明的嵌入，则可以使用bert.embedTokens ：

 float [][] embedding = bert . embedTokens ( "A sequence" );
float [][][] embeddings = bert . embedTokens ( "Multiple" , "Sequences" );

预先生成的Maven Central模型

Maven Central上的各种Tensorflow Hub Bert模型以Easy-Bort格式提供。要在您的项目中使用一个，请将以下内容添加到您的pom.xml中，代替下面列出的文物ID之一，代替artifactId中的ARTIFACT-ID ：

< dependencies >
  < dependency >
    < groupId >com.robrua.nlp.models</ groupId >
    < artifactId >ARTIFACT-ID</ artifactId >
    < version >1.0.0</ version >
  </ dependency >
</ dependencies >

一旦您提取依赖项，就可以使用此代码加载模型。根据您添加为依赖性的模型，从下面的列表中替换适当的RESOURCE-PATH路径：

 try ( Bert bert = Bert . load ( "RESOURCE-PATH" )) {
    // Embed some sequences
}

可用型号

模型	语言	层	嵌入尺寸	头	参数	文物ID	资源路径
Bert-Base，未掩盖	英语	12	768	12	110m	Easy-Bert-uncasun-uncasun-l-12-H-768-A-12	com/robrua/nlp/easy-bert/bert-bert-uncases-l-12-H-768-a-12
Bert-Base，Cased	英语	12	768	12	110m	Easy-Bert-Cast Cast Cast Cast Casted-L-12-H-768-A-12	com/robrua/nlp/easy-bert/bert-casted-l-12-H-768-A-12
Bert-base，多语言外壳	104种语言	12	768	12	110m	Easy-bert-multi casucced l-12-H-768-A-12	com/robrua/nlp/easy-bert/bert-multi cased-l-12-H-12-H-768-A-12
伯特·基，中文	中文简化和传统	12	768	12	110m	Easy-Bert-phinese-L-12-H-768-A-12	com/robrua/nlp/easy-bert/bert-pher-phinese-l-12-H-768-a-12