prose下載 - prose源代碼下載

prose

其他源碼

v1.2.1

下載

散文

prose是一個自然語言處理庫（目前僅是英語）。它支持象徵化，分割，言論部分標記和指定實用性提取。

您可以在此處找到有關圖書館性能的更詳細的摘要：介紹prose v2.0.0：將NLP帶入。

安裝

$ go get github.com/jdkato/prose/v2

用法

內容

概述
令牌化
細分
標記
ner

概述

 package main

import (
    "fmt"
    "log"

    "github.com/jdkato/prose/v2"
)

func main () {
    // Create a new document with the default configuration:
    doc , err := prose . NewDocument ( "Go is an open-source programming language created at Google." )
    if err != nil {
        log . Fatal ( err )
    }

    // Iterate over the doc's tokens:
    for _ , tok := range doc . Tokens () {
        fmt . Println ( tok . Text , tok . Tag , tok . Label )
        // Go NNP B-GPE
        // is VBZ O
        // an DT O
        // ...
    }

    // Iterate over the doc's named-entities:
    for _ , ent := range doc . Entities () {
        fmt . Println ( ent . Text , ent . Label )
        // Go GPE
        // Google GPE
    }

    // Iterate over the doc's sentences:
    for _ , sent := range doc . Sentences () {
        fmt . Println ( sent . Text )
        // Go is an open-source programming language created at Google.
    }
}

文檔創建過程遵循以下步驟序列：

 tokenization -> POS tagging -> NE extraction
            
             segmentation

通過適當的功能選項，可以禁用每個步驟（假設不需要以後的步驟）。例如，為了禁用命名 - 實體提取，您將執行以下操作：

 doc , err := prose . NewDocument (
        "Go is an open-source programming language created at Google." ,
        prose . WithExtraction ( false ))

令牌化

prose包括能夠處理現代文本的代幣器，包括下面顯示的非單詞角色跨度。

類型	例子
電子郵件地址	`[email protected]`
主題標籤	`#trending`
提及	`@jdkato`
URL	`https://github.com/jdkato/prose`
表情符號	`:-)` ， `>:(` ， `o_0` ，等等。

 package main

import (
    "fmt"
    "log"

    "github.com/jdkato/prose/v2"
)

func main () {
    // Create a new document with the default configuration:
    doc , err := prose . NewDocument ( "@jdkato, go to http://example.com thanks :)." )
    if err != nil {
        log . Fatal ( err )
    }

    // Iterate over the doc's tokens:
    for _ , tok := range doc . Tokens () {
        fmt . Println ( tok . Text , tok . Tag )
        // @jdkato NN
        // , ,
        // go VB
        // to TO
        // http://example.com NN
        // thanks NNS
        // :) SYM
        // . .
    }
}

細分

根據pragmatic_segmenter的開發人員創建的黃金規則， prose包括可用的最準確的句子細分器之一。

姓名	語言	執照	grs（英語）	grs（其他）	速度†
務實的分段	紅寶石	麻省理工學院	98.08％（51/52）	100.00％	3.84 s
散文	去	麻省理工學院	75.00％（39/52）	N/A。	0.96 s
TactfulTokenizer	紅寶石	GNU GPLV3	65.38％（34/52）	48.57％	46.32 s
OpenNLP	爪哇	APLV2	59.62％（31/52）	45.71％	1.27 s
Standford Corenlp	爪哇	GNU GPLV3	59.62％（31/52）	31.43％	0.92 s
Splitta	Python	APLV2	55.77％（29/52）	37.14％	N/A。
龐克	Python	APLV2	46.15％（24/52）	48.57％	1.79 s
SRX英語	紅寶石	GNU GPLV3	30.77％（16/52）	28.57％	6.19 s
肩cap	紅寶石	GNU GPLV3	28.85％（15/52）	20.00％	0.13 s

†原始測試是使用MacBook Pro 3.7 GHz四核Intel Xeon E5進行10.9.5進行的，而prose則使用MacBook Pro 2.9 GHz Intel Core i7定時運行10.13.3 。

 package main

import (
    "fmt"
    "strings"

    "github.com/jdkato/prose/v2"
)

func main () {
    // Create a new document with the default configuration:
    doc , _ := prose . NewDocument ( strings . Join ([] string {
        "I can see Mt. Fuji from here." ,
        "St. Michael's Church is on 5th st. near the light." }, " " ))

    // Iterate over the doc's sentences:
    sents := doc . Sentences ()
    fmt . Println ( len ( sents )) // 2
    for _ , sent := range sents {
        fmt . Println ( sent . Text )
        // I can see Mt. Fuji from here.
        // St. Michael's Church is on 5th st. near the light.
    }
}

標記

prose包括一個基於TextBlob的“快速準確” POS Tagger的標記。以下是其性能與NLTK在Treebank語料庫上實現同一標籤器的比較：

圖書館	準確性	5局平均（SEC）
NLTK	0.893	7.224
`prose`	0.961	2.538

（有關更多信息，請參見scripts/test_model.py 。）

支持的POS標籤的完整列表如下。

標籤	描述
`(`	左轉支架
`)`	右轉支架
`,`	逗號
`:`	冒號
`.`	時期
`''`	關閉引號標記
``	開頭報價標記
`#`	數字標誌
`$`	貨幣
`CC`	連詞，協調
`CD`	基數
`DT`	確定器
`EX`	那裡存在
`FW`	外語
`IN`	連詞，從屬或介詞
`JJ`	形容詞
`JJR`	形容詞，比較
`JJS`	形容詞，最高級
`LS`	列表項目標記
`MD`	動詞，模態輔助
`NN`	名詞，單數或質量
`NNP`	名詞，適當的單數
`NNPS`	名詞，適當的複數
`NNS`	名詞，複數
`PDT`	預定器
`POS`	所有格結局
`PRP`	代詞，個人
`PRP$`	代詞，所有格
`RB`	副詞
`RBR`	副詞，比較
`RBS`	副詞，最高級
`RP`	副詞，粒子
`SYM`	象徵
`TO`	無罪
`UH`	欹
`VB`	動詞，基本形式
`VBD`	動詞，過去時
`VBG`	動詞，gerund或現在分詞
`VBN`	動詞，過去分詞
`VBP`	動詞，非第三人稱單數禮物
`VBZ`	動詞，第三人稱單數禮物
`WDT`	whiterminer
`WP`	wh-pronoun，個人
`WP$`	wh-pronoun，所有格
`WRB`	wh-adverb

ner

prose v2.0.0包括v1.0.0的塊包裝的改進版本，默認情況下可以識別人員（ PERSON ）和地理/政治實體（ GPE ）。

 package main

import (
    "github.com/jdkato/prose/v2"
)

func main () {
    doc , _ := prose . NewDocument ( "Lebron James plays basketball in Los Angeles." )
    for _ , ent := range doc . Entities () {
        fmt . Println ( ent . Text , ent . Label )
        // Lebron James PERSON
        // Los Angeles GPE
    }
}

但是，為了使此功能更有用，我們已經直接訓練自己的模型以適合特定用例。參見Prodigy + prose ： Go for Tutorial中的根本高效的機器教學。

展開

附加信息

版本 v1.2.1
類型其他源碼
更新時間 2025-04-16
大小 11.59MB
來自於 Github

相關應用

Google Dorks

2025-03-10
shepherd

2025-06-04
mongo express

2025-06-04
hidusbf

2025-02-14
Free Algorithms Books

2025-05-29
markdownpedia

2025-04-22

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3

相關資訊全部