easy bert下載 - easy bert源代碼下載

easy bert

Ai源碼

v1.0.3

下載

Easy-Bert

Easy-bert是一種使用Google在Python和Java中使用Google高質量BERT語言模型的簡單API。

目前，Easy-Bert專注於從Python和Java中的預訓練的BERT模型中獲取嵌入。將來還會添加對Python進行微調和預訓練的支持，並支持使用Easy-Bert用於其他任務以外的其他任務。

Python

如何獲得它

PYPI上可以使用Easy-bert。您可以使用pip install easybert或pip install git+https://github.com/robrua/easy-bert.git如果您想要最新的話。

用法

您可以使用Tensorflow Hub的預訓練的BERT模型或Tensorflow保存的模型格式中的本地型號使用Easy-Bert。

要從TensowFlow Hub模型創建BERT嵌入器，只需用目標TF-HUB URL實例化BERT對象：

 from easybert import Bert
bert = Bert ( "https://tfhub.dev/google/bert_multi_cased_L-12_H-768_A-12/1" )

您還可以使用Bert.load中的tensorflow的保存模型格式加載本地模型：

 from easybert import Bert
bert = Bert . load ( "/path/to/your/model/" )

加載了BERT模型後，您可以使用bert.embed獲取序列嵌入：

 x = bert . embed ( "A sequence" )
y = bert . embed ([ "Multiple" , "Sequences" ])

如果您需要透明的嵌入，則可以設置per_token=True ：

 x = bert . embed ( "A sequence" , per_token = True )
y = bert . embed ([ "Multiple" , "Sequences" ], per_token = True )

easy-bert返回bert嵌入為numpy陣列

每次您調用bert.embed時，都會創建一個新的TensorFlow會話來計算。如果您經常調用bert.embed with

 with bert :
    x = bert . embed ( "A sequence" , per_token = True )
    y = bert . embed ([ "Multiple" , "Sequences" ], per_token = True )

您可以使用bert.save保存BERT模型，然後稍後使用Bert.load進行重新加載：

 bert . save ( "/path/to/your/model/" )
bert = Bert . load ( "/path/to/your/model/" )

CLI

Easy-bert還提供了一個CLI工具，可以方便地使用BERT進行序列的一次性嵌入。它還可以將TensorFlow Hub模型轉換為已保存的模型。

運行bert --help ， bert embed --help或bert download --help獲取有關CLI工具的詳細信息。

Docker

Easy-bert帶有Docker構建，該版本可用作依賴Bert嵌入的應用程序的基礎圖像，或者僅運行CLI工具而無需安裝環境。

爪哇

如何獲得它

Maven Central可以使用Easy-Bert。它也通過版本頁面分發。

要將最新的Easy-bert版本版本添加到您的Maven項目中，請將依賴項添加到您的pom.xml依賴項部分：

< dependencies >
  < dependency >
    < groupId >com.robrua.nlp</ groupId >
    < artifactId >easy-bert</ artifactId >
    < version >1.0.3</ version >
  </ dependency >
</ dependencies >

或者，如果您想獲得最新的開發版本，請將Sonaype快照存儲庫添加到您的pom.xml ：

< dependencies >
  < dependency >
    < groupId >com.robrua.nlp</ groupId >
    < artifactId >easy-bert</ artifactId >
    < version >1.0.4-SNAPSHOT</ version >
  </ dependency >
</ dependencies >

< repositories >
  < repository >
    < id >snapshots-repo</ id >
    < url >https://oss.sonatype.org/content/repositories/snapshots</ url >
    < releases >
      < enabled >false</ enabled >
    </ releases >
    < snapshots >
      < enabled >true</ enabled >
    </ snapshots >
  </ repository >
</ repositories >

用法

您可以使用Easy-Bert使用Easy-Bert模型，該模型使用Easy-Bert的Python工俱生成。您還可以在Maven Central上使用預生產模型。

要加載來自本地文件系統的模型，您可以使用：

 try ( Bert bert = Bert . load ( new File ( "/path/to/your/model/" ))) {
    // Embed some sequences
}

如果該模型位於您的類路徑（例如，如果您通過Maven將其拉入），則可以使用：

 try ( Bert bert = Bert . load ( "/resource/path/to/your/model" )) {
    // Embed some sequences
}

加載了BERT模型後，您可以使用bert.embedSequence或bert.embedSequences獲得序列嵌入：

 float [] embedding = bert . embedSequence ( "A sequence" );
float [][] embeddings = bert . embedSequences ( "Multiple" , "Sequences" );

如果您需要透明的嵌入，則可以使用bert.embedTokens ：

 float [][] embedding = bert . embedTokens ( "A sequence" );
float [][][] embeddings = bert . embedTokens ( "Multiple" , "Sequences" );

預先生成的Maven Central模型

Maven Central上的各種Tensorflow Hub Bert模型以Easy-Bort格式提供。要在您的項目中使用一個，請將以下內容添加到您的pom.xml中，代替下面列出的文物ID之一，代替artifactId中的ARTIFACT-ID ：

< dependencies >
  < dependency >
    < groupId >com.robrua.nlp.models</ groupId >
    < artifactId >ARTIFACT-ID</ artifactId >
    < version >1.0.0</ version >
  </ dependency >
</ dependencies >

一旦您提取依賴項，就可以使用此代碼加載模型。根據您添加為依賴性的模型，從下面的列表中替換適當的RESOURCE-PATH路徑：

 try ( Bert bert = Bert . load ( "RESOURCE-PATH" )) {
    // Embed some sequences
}

可用型號

模型	語言	層	嵌入尺寸	頭	參數	文物ID	資源路徑
Bert-Base，未掩蓋	英語	12	768	12	110m	Easy-Bert-uncasun-uncasun-l-12-H-768-A-12	com/robrua/nlp/easy-bert/bert-bert-uncases-l-12-H-768-a-12
Bert-Base，Cased	英語	12	768	12	110m	Easy-Bert-Cast Cast Cast Cast Casted-L-12-H-768-A-12	com/robrua/nlp/easy-bert/bert-casted-l-12-H-768-A-12
Bert-base，多語言外殼	104種語言	12	768	12	110m	Easy-bert-multi casucced l-12-H-768-A-12	com/robrua/nlp/easy-bert/bert-multi cased-l-12-H-12-H-768-A-12
伯特·基，中文	中文簡化和傳統	12	768	12	110m	Easy-Bert-phinese-L-12-H-768-A-12	com/robrua/nlp/easy-bert/bert-pher-phinese-l-12-H-768-a-12