luga
v0.2.7

卢加是斯瓦希里语的语言词。 FastText提供了燃烧的语言检测工具。令人难以置信的是,FastText的API无美容,并且文档有点模糊。我们必须手动下载和加载模型也很时髦。
这是卢加(Luga)进来的地方。我们抽象不必要的步骤,并允许您精确地做一件事:检测文本语言。
站着。保持沉默 - 米娜·桑德伯格(Minna Sundberg)的印欧语和乌拉尔语之间的关系。

python -m pip install -U luga from luga import language
print ( language ( "the world ended yesterday" ))
# Language(name='en', score=0.98)有了文本列表,我们可以为过滤管道创建一个掩码,例如,可以使用DataFrames
from luga import language
import pandas as pd
examples = [ "Jeg har ikke en rød reje" , "Det blæser en halv pelican" , "We are not robots yet" ]
languages ( texts = examples , only_language = True , to_array = True ) == "en"
# output
# array([False, False, True])
dataf = pd . DataFrame ({ "text" : examples })
dataf . loc [ lambda d : languages ( texts = d [ "text" ]. to_list (), only_language = True , to_array = True ) == "en" ]
# output
# 2 We are not robots yet
# Name: text, dtype: object下载模型
wget https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin -O /tmp/lid.176.bin加载和使用
import fasttext
PATH_TO_MODEL = '/tmp/lid.176.bin'
fmodel = fasttext . load_model ( PATH_TO_MODEL )
fmodel . predict ([ "the world has ended yesterday" ])
# ([['__label__en']], [array([0.98046654], dtype=float32)])poetry run pre-commit install # assumes git push is completed
git tag -l # lists tags
git tag v * . * . * # Major.Minor.Fix
git push origin tag v * . * . *
# to delete tag:
git tag -d v * . * . * && git push origin tag -d v * . * . *
# change project_toml and __init__.py to reflect new version artifacts.py Line 111铸件列出引起问题的[str]