whatlanggo
1.0.0
自然语言检测。
安装:
go get -u github.com/abadojack/whatlanggo简单用法示例:
package main
import (
"fmt"
"github.com/abadojack/whatlanggo"
)
func main () {
info := whatlanggo . Detect ( "Foje funkcias kaj foje ne funkcias" )
fmt . Println ( "Language:" , info . Lang . String (), " Script:" , whatlanggo . Scripts [ info . Script ], " Confidence: " , info . Confidence )
} package main
import (
"fmt"
"github.com/abadojack/whatlanggo"
)
func main () {
//Blacklist
options := whatlanggo. Options {
Blacklist : map [whatlanggo. Lang ] bool {
whatlanggo . Ydd : true ,
},
}
info := whatlanggo . DetectWithOptions ( "האקדמיה ללשון העברית" , options )
fmt . Println ( "Language:" , info . Lang . String (), "Script:" , whatlanggo . Scripts [ info . Script ])
//Whitelist
options1 := whatlanggo. Options {
Whitelist : map [whatlanggo. Lang ] bool {
whatlanggo . Epo : true ,
whatlanggo . Ukr : true ,
},
}
info = whatlanggo . DetectWithOptions ( "Mi ne scias" , options1 )
fmt . Println ( "Language:" , info . Lang . String (), " Script:" , whatlanggo . Scripts [ info . Script ])
}有关更多详细信息,请检查文档。
走1.8或更高
该算法基于Trigram语言模型,该模型是N-grams的一种特殊情况。要了解这个想法,请检查原始的白皮书Cavnar和Trenkle '94:基于N-Gram的文本分类'。
它基于以下因素:
rate 。因此,可以将其作为具有阈值函数的2D空间,将其分为“可靠”和“不可靠”区域。该功能是一种双曲线,看起来如下:
有关更多详细信息,请查看博客文章Rust Whatlang库和自然语言标识算法的简介。
麻省理工学院
Whatlanggo是Titus Wormer的Franc(JavaScript,MIT)的衍生产品。
感谢Greyblake(Potapov Sergey)创建了从我获得的想法和算法的Whatlang-Rs。