
Luga is a Swahili word for language. fastText provides blazing-fast language detection tool. Lamentably, fastText's API is beauty-less, and the documentation is a bit fuzzy. It is also funky that we have to manually download and load models.
Here is where luga comes in. We abstract unnecessary steps and allow you to do precisely one thing: detecting text language.
Stand Still. Stay Silent - The relationships between Indo-European and Uralic languages by Minna Sundberg.

python -m pip install -U lugafrom luga import language
print(language("the world ended yesterday"))
# Language(name='en', score=0.98)With the list of texts, we can create a mask for a filtering pipeline, that can be used, for example, with DataFrames
from luga import language
import pandas as pd
examples = ["Jeg har ikke en rød reje", "Det blæser en halv pelican", "We are not robots yet"]
languages(texts=examples, only_language=True, to_array=True) == "en"
# output
# array([False, False, True])
dataf = pd.DataFrame({"text": examples})
dataf.loc[lambda d: languages(texts=d["text"].to_list(), only_language=True, to_array=True) == "en"]
# output
# 2 We are not robots yet
# Name: text, dtype: objectDownload the model
wget https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin -O /tmp/lid.176.binLoad and use
import fasttext
PATH_TO_MODEL = '/tmp/lid.176.bin'
fmodel = fasttext.load_model(PATH_TO_MODEL)
fmodel.predict(["the world has ended yesterday"])
# ([['__label__en']], [array([0.98046654], dtype=float32)])poetry run pre-commit install# assumes git push is completed
git tag -l # lists tags
git tag v*.*.* # Major.Minor.Fix
git push origin tag v*.*.*
# to delete tag:
git tag -d v*.*.* && git push origin tag -d v*.*.*
# change project_toml and __init__.py to reflect new versionartifacts.py line 111 cast to List[str] that causes issues