split lang Download - split lang Quellcode herunterladen

split lang

AI-Quellcode

v2.0.5

Herunterladen

Split-Lang

Englisch | Chinesische Zusammenfassung | japanisch

Teilen Sie Text durch Sprachen durch Verkettung über geteilte Substrings basierend auf ihrer Sprache, angetrieben von

Aufteilung: budoux und Regelbasis Aufteilung

Spracherkennung: fast-langdetect und wordfreq

1. Wie es funktioniert

Stufe 1 : Regelbasiertes Split (separater Zeichen, Zeichensetzung und Ziffer)

hello, how are you -> hello | , | how are you

Stufe 2 : Übergespannter Text zu Unterstrings von budoux für chinesische Mix mit Japanisch, (Raum) für nicht Skripta continua

你喜欢看アニメ吗->你喜欢|アニメ| Look |看| Look | Look |吗
Der Film, den ich letzte Nacht gesehen habe,で昨天見た映画はとても感動的でしたした映画, denは昨天見た的とても感動
how are you -> how | are | you

Stufe 3 : Verkettierungs-Substrings basierend auf ihren Sprachen mit fast-langdetect , wordfreq und Regex (regelbasiert)

你|喜欢|看|アニメ|吗>你喜欢看|アニメ| yu-> 你 | Like |吗
昨天|見たは映画|とても| Ichで的感動|した>昨天|見た映画はとても感動的でした
how | are | you -> how are you

Weitere geteilte Beispiele

 correct_substrings   : [ 'x|我是 ' , 'x|VGroupChatBot' , 'punctuation|，' , 'x|一个旨在支持多人通信的助手' , 'punctuation|，' , 'x|通过可视化消息来帮助团队成员更好地交流' , 'punctuation|。' , 'x|我可以帮助团队成员更好地整理和共享信息' , 'punctuation|，' , 'x|特别是在讨论' , 'punctuation|、' , 'x|会议和' , 'x|Brainstorming' , 'x|等情况下' , 'punctuation|。' , 'x|你好我的名字是' , 'x|西野くまです' , 'x|my name is bob' , 'x|很高兴认识你' , 'x|どうぞよろしくお願いいたします' , 'punctuation|「' , 'x|こんにちは' , 'punctuation|」' , 'x|是什么意思' , 'punctuation|。' ]
test_split_substrings : [ 'zh|我是 ' , 'en|VGroupChatBot' , 'punctuation|，' , 'zh|一个旨在支持多人通信的助手' , 'punctuation|，' , 'zh|通过可视化消息来帮助团队成员更好地交流' , 'punctuation|。' , 'zh|我可以帮助团队成员更好地整理和共享信息' , 'punctuation|，' , 'zh|特别是在讨论' , 'punctuation|、' , 'zh|会议和' , 'en|Brainstorming' , 'zh|等情况下' , 'punctuation|。' , 'zh|你好我的名字是' , 'ja|西野くまです' , 'en|my name is bob' , 'zh|很高兴认识你' , 'ja|どうぞよろしくお願いいたします' , 'punctuation|「' , 'ja|こんにち は' , 'punctuation|」' , 'zh|是什么意思' , 'punctuation|。' ]
acc                  : 25 / 25
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|我的名字是' , 'x|西野くまです' , 'punctuation|。' , 'x|I am from Tokyo' , 'punctuation|, ' , 'x|日本の首都' , 'punctuation|。' , 'x|今天的天气非常好' ]
test_split_substrings : [ 'zh|我的名字是' , 'ja|西野くまです' , 'punctuation|。' , 'en|I am from Tokyo' , 'punctuation|, ' , 'ja|日本の首都' , 'punctuation|。' , 'zh|今天的天气非常好' ]
acc                  : 8 / 8
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|你好' , 'punctuation|，' , 'x|今日はどこへ行きますか' , 'punctuation|？' ]
test_split_substrings : [ 'zh|你好' , 'punctuation|，' , 'ja|今日はどこへ行きますか' , 'punctuation|？' ]
acc                  : 4 / 4
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|你好' , 'x|今日はどこへ行きますか' , 'punctuation|？' ]
test_split_substrings : [ 'zh|你好' , 'ja|今日はどこへ行きますか' , 'punctuation|？' ]
acc                  : 3 / 3
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|我的名字是' , 'x|田中さんです' , 'punctuation|。' ]
test_split_substrings : [ 'zh|我的名字是田中' , 'ja|さんです' , 'punctuation|。' ]
acc                  : 1 / 3
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|我喜欢吃寿司和拉面' , 'x|おいしいです' , 'punctuation|。' ]
test_split_substrings : [ 'zh|我喜欢吃寿司和拉面' , 'ja|おいしいです' , 'punctuation|。' ]
acc                  : 3 / 3
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|今天' , 'x|の天気はとてもいいですね' , 'punctuation|。' ]
test_split_substrings : [ 'zh|今天' , 'ja|の天気はとてもいいですね' , 'punctuation|。' ]
acc                  : 3 / 3
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|我在学习' , 'x|日本語少し難しいです' , 'punctuation|。' ]
test_split_substrings : [ 'zh|我在学习日本語少' , 'ja|し難しいです' , 'punctuation|。' ]
acc                  : 1 / 3
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|日语真是' , 'x|おもしろい' , 'x|啊' ]
test_split_substrings : [ 'zh|日语真是' , 'ja|おもしろい' , 'zh|啊' ]
acc                  : 3 / 3
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|你喜欢看' , 'x|アニメ' , 'x|吗' , 'punctuation|？' ]
test_split_substrings : [ 'zh|你喜欢看' , 'ja|アニメ' , 'zh|吗' , 'punctuation|？' ]
acc                  : 4 / 4
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|我想去日本旅行' , 'punctuation|、' , 'x|特に京都に行きたいです' , 'punctuation|。' ]
test_split_substrings : [ 'zh|我想去日本旅行' , 'punctuation|、' , 'ja|特に京都に行きたいです' , 'punctuation|。' ]
acc                  : 4 / 4
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|昨天' , 'x|見た映画はとても感動的でした' , 'punctuation|。' , 'x|我朋友是日本人' , 'x|彼はとても優しいです' , 'punctuation|。' ]
test_split_substrings : [ 'zh|昨天' , 'ja|見た映画はとても感動的でした' , 'punctuation|。' , 'zh|我朋友是日本人' , 'ja|彼はとても優しいです' , 'punctuation|。' ]
acc                  : 6 / 6
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|我们一起去' , 'x|カラオケ' , 'x|吧' , 'punctuation|、' , 'x|楽しそうです' , 'punctuation|。' ]
test_split_substrings : [ 'zh|我们一起去' , 'ja|カラオケ' , 'zh|吧' , 'punctuation|、' , 'ja|楽しそうです' , 'punctuation|。' ]
acc                  : 6 / 6
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|我的家在北京' , 'punctuation|、' , 'x|でも' , 'punctuation|、' , 'x|仕事で東京に住んでいます' , 'punctuation|。' ]
test_split_substrings : [ 'ja|我的家在北京' , 'punctuation|、' , 'ja|でも' , 'punctuation|、' , 'ja|仕事で東京に住んでいます' , 'punctuation|。' ]
acc                  : 6 / 6
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|我在学做日本料理' , 'punctuation|、' , 'x|日本料理を作るのを習っています' , 'punctuation|。' ]
test_split_substrings : [ 'ja|我在学做日本料理' , 'punctuation|、' , 'ja|日本料理を作るのを習っています' , 'punctuation|。' ]
acc                  : 4 / 4
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|你会说几种语言' , 'punctuation|、' , 'x|何ヶ国語話せますか' , 'punctuation|？' ]
test_split_substrings : [ 'zh|你会说几种语言' , 'punctuation|、' , 'ja|何ヶ国語話せますか' , 'punctuation|？' ]
acc                  : 4 / 4
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|我昨天看了一本书' , 'punctuation|、' , 'x|その本はとても面白かったです' , 'punctuation|。' ]
test_split_substrings : [ 'zh|我昨天看了一本书' , 'punctuation|、' , 'ja|その本はとても面白かったです' , 'punctuation|。' ]
acc                  : 4 / 4
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|你最近好吗' , 'punctuation|、' , 'x|最近どうですか' , 'punctuation|？' ]
test_split_substrings : [ 'zh|你最近好吗' , 'punctuation|、' , 'ja|最近どうですか' , 'punctuation|？' ]
acc                  : 4 / 4
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|你最近好吗' , 'x|最近どうですか' , 'punctuation|？' ]
test_split_substrings : [ 'zh|你最近好吗最近' , 'ja|どうですか' , 'punctuation|？' ]
acc                  : 1 / 3
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|我在学做日本料理' , 'x|와 한국 요리' , 'punctuation|、' , 'x|日本料理を作るのを習っています' , 'punctuation|。' ]
test_split_substrings : [ 'ja|我在学做日本料理' , 'ko|와 한국 요리' , 'punctuation|、' , 'ja|日本料理を作るのを習っています' , 'punctuation|。' ]
acc                  : 5 / 5
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|你会说几种语言' , 'punctuation|、' , 'x|何ヶ国語話せますか' , 'punctuation|？' , 'x|몇 개 언어를 할 수 있어요' , 'punctuation|？' ]
test_split_substrings : [ 'zh|你会说几种语言' , 'punctuation|、' , 'ja|何ヶ国語話せますか' , 'punctuation|？' , 'ko|몇 개 언어를 할 수 있어요' , 'punctuation|？' ]
acc                  : 6 / 6
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|我昨天看了一本书' , 'punctuation|、' , 'x|その本はとても面白かったです' , 'punctuation|。' , 'x|어제 책을 읽었는데' , 'punctuation|, ' , 'x|정말 재미있었어요' , 'punctuation|。' ]
test_split_substrings : [ 'zh|我昨天看了一本书' , 'punctuation|、' , 'ja|その本はとても面白かったです' , 'punctuation|。' , 'ko|어제 책을 읽었는데' , 'punctuation|, ' , 'ko|정말 재미있었어요' , 'punctuation|。' ]
acc                  : 8 / 8
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|我们一起去逛街' , 'x|와 쇼핑' , 'punctuation|、' , 'x|買い物に行きましょう' , 'punctuation|。' , 'x|쇼핑하러 가요' , 'punctuation|。' ]
test_split_substrings : [ 'zh|我们一起去逛街' , 'ko|와 쇼핑' , 'punctuation|、' , 'ja|買い物に行きましょう' , 'punctuation|。' , 'ko|쇼핑하러 가요' , 'punctuation|。' ]
acc                  : 7 / 7
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|你最近好吗' , 'punctuation|、' , 'x|最近どうですか' , 'punctuation|？' , 'x|요즘 어떻게 지내요' , 'punctuation|？' ]
test_split_substrings : [ 'zh|你最近好吗' , 'punctuation|、' , 'ja|最近どうですか' , 'punctuation|？' , 'ko|요즘 어떻게 지내요' , 'punctuation|？' ]
acc                  : 6 / 6
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|Bonjour' , 'punctuation|, ' , "x|wie geht's dir " , 'x|today' , 'punctuation|?' ]
test_split_substrings : [ 'fr|Bonjour' , 'punctuation|, ' , "de|wie geht's dir " , 'en|today' , 'punctuation|?' ]
acc                  : 5 / 5
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|Vielen Dank ' , 'x|merci beaucoup ' , 'x|for your help' , 'punctuation|.' ]
test_split_substrings : [ 'de|Vielen ' , 'fr|Dank merci beaucoup ' , 'en|for your help' , 'punctuation|.' ]
acc                  : 2 / 4
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|Ich bin müde ' , 'x|je suis fatigué ' , 'x|and I need some rest' , 'punctuation|.' ]
test_split_substrings : [ 'de|Ich ' , 'en|bin ' , 'de|müde ' , 'fr|je suis fatigué ' , 'en|and I need some rest' , 'punctuation|.' ]
acc                  : 3 / 4
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|Ich mag dieses Buch ' , 'x|ce livre est intéressant ' , 'x|and it has a great story' , 'punctuation|.' ]
test_split_substrings : [ 'de|Ich mag dieses Buch ' , 'fr|ce livre est intéressant ' , 'en|and it has a great story' , 'punctuation|.' ]
acc                  : 4 / 4
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|Ich mag dieses Buch' , 'punctuation|, ' , 'x|ce livre est intéressant' , 'punctuation|, ' , 'x|and it has a great story' , 'punctuation|.' ]
test_split_substrings : [ 'de|Ich mag dieses Buch' , 'punctuation|, ' , 'fr|ce livre est intéressant' , 'punctuation|, ' , 'en|and it has a great story' , 'punctuation|.' ]
acc                  : 6 / 6
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|The shirt is ' , 'x|9.15 ' , 'x|dollars' , 'punctuation|.' ]
test_split_substrings : [ 'en|The shirt is ' , 'digit|9' , 'punctuation|.' , 'digit|15 ' , 'en|dollars' , 'punctuation|.' ]
acc                  : 3 / 4
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|The shirt is ' , 'digit|233 ' , 'x|dollars' , 'punctuation|.' ]
test_split_substrings : [ 'en|The shirt is ' , 'digit|233 ' , 'en|dollars' , 'punctuation|.' ]
acc                  : 4 / 4
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|lang' , 'punctuation|-' , 'x|split' ]
test_split_substrings : [ 'en|lang' , 'punctuation|-' , 'en|split' ]
acc                  : 3 / 3
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|I have ' , 'digit|10' , 'punctuation|, ' , 'x|€' ]
test_split_substrings : [ 'en|I have ' , 'digit|10' , 'punctuation|, ' , 'fr|€' ]
acc                  : 4 / 4
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|日本のメディアでは' , 'punctuation|「' , 'x|匿名掲示板' , 'punctuation|」' , 'x|であると紹介されることが多いが' , 'punctuation|、' , 'x|2003年1月7日から全書き込みについて' , 'x|IP' , 'x|アドレスの記録・保存を始めており' , 'punctuation|、' , 'x|厳密には匿名掲示板ではなくなっていると' , 'x|CNET Japan' , 'x|は報じている' ]
test_split_substrings : [ 'ja|日本のメディアでは' , 'punctuation|「' , 'ja|匿名掲示板' , 'punctuation|」' , 'ja|であると紹介されることが多いが' , 'punctuation|、' , 'digit|2003' , 'ja|年' , 'digit|1' , 'ja|月' , 'digit|7' , 'ja|日から全書き込みについて' , 'en|IP' , 'ja|アドレスの記録・保存を始めており' , 'punctuation|、' , 'ja|厳密には匿名掲示板ではなくなっていると' , 'en|CNET Japan' , 'ja|は報じている' ]
acc                  : 12 / 13
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|日本語' , 'punctuation|（' , 'x|にほんご' , 'punctuation|、' , 'x|にっぽんご' , 'punctuation|）' , 'x|は' , 'punctuation|、' , 'x|日本国内や' , 'punctuation|、' , 'x|かつての日本領だった国' , 'punctuation|、' , 'x|そして国外移民や移住者を含む日本人同士の間で使用されている言語' , 'punctuation|。' , 'x|日本は法令によって公用語を規定していないが' , 'punctuation|、' , 'x|法令その他の公用文は全て日本語で記述され' , 'punctuation|、' , 'x|各種法令において日本語を用いることが規定され' , 'punctuation|、' , 'x|学校教育においては「国語」の教科として学習を行うなど' , 'punctuation|、' , 'x|事実上日本国内において唯一の公用語となっている' , 'punctuation|。' ]
test_split_substrings : [ 'ja|日本語' , 'punctuation|（' , 'ja|にほんご' , 'punctuation|、' , 'ja|にっぽんご' , 'punctuation|）' , 'ja|は' , 'punctuation|、' , 'ja|日本国内や' , 'punctuation|、' , 'ja|かつての日本領だった国' , 'punctuation|、' , 'ja|そして国外移民 や移住者を含む日本人同士の間で使用されている言語' , 'punctuation|。' , 'ja|日本は法令によって公用語を規定していないが' , 'punctuation|、' , 'ja|法令その他の公用文は全て日本語で記述され' , 'punctuation|、' , 'ja|各種法令において日本語を用いることが規定され' , 'punctuation|、' , 'ja|学校教育においては' , 'punctuation|「' , 'ja|国語' , 'punctuation|」' , 'ja|の教科として学習を行うなど' , 'punctuation|、' , 'ja|事実上日本国内において唯一の公用語となっている' , 'punctuation|。' ]
acc                  : 23 / 24
- - - - - - - - - - - - - - - - - - - - - - - - - -
correct_substrings   : [ 'x|日语是日本通用语及事实上的官方语言' , 'punctuation|。' , 'x|没有精确的日语使用人口的统计' , 'punctuation|，' , 'x|如果计算日本人口以及居住在日本以外的日本人' , 'punctuation|、' , 'x|日侨和日裔' , 'punctuation|，' , 'x|日语使用者应超过一亿三千万人' , 'punctuation|。' ]
test_split_substrings : [ 'zh|日语是日本通用语及事实上的官方语言' , 'punctuation|。' , 'zh|没有精确的日语使用人口的统计' , 'punctuation|，' , 'zh|如果计算日本人口以及居住在日本以外的日本人' , 'punctuation|、' , 'zh|日侨和日裔' , 'punctuation|，' , 'zh|日语使用 者应超过一亿三千万人' , 'punctuation|。' ]
acc                  : 10 / 10
- - - - - - - - - - - - - - - - - - - - - - - - - -
total substring num : 217
test total substring num : 230
text acc num : 205
precision : 0.9447004608294931
recall : 0.8913043478260869
F1 Score : 0.9172259507829977
time : 0.3573117256164551

2. Motivation

TTS (Text-To-Speech) -Modell schlägt häufig bei mehrsprachigen Sprachgenerierung fehl . Es gibt zwei Möglichkeiten:
- Train ein Modell kann mehrere Sprachen aussprechen
- (Dieses Paket) Separate Satz zuerst basierend auf der Sprache und verwenden Sie dann verschiedene Sprachmodelle
Vorhandene Modelle im NLP -Toolkit (z. B. SpaCy , jieba ) sind normalerweise hilfreich, um mit Text in einer Sprache für jedes Modell umzugehen. Was bedeutet, dass mehrsprachige Texte vor dem Prozess erforderlich sind, z. B. Texte unten:

你喜欢看アニメ吗？
Vielen Dank merci beaucoup for your help.
你最近好吗、最近どうですか？요즘 어떻게 지내요？sky is clear and sunny。

1. Wie es funktioniert
2. Motivation
3. Nutzung
- 3.1. Installation
- 3.2. Basic
  - 3.2.1. split_by_lang
  - 3.2.2. merge_across_digit
- 3.3. Fortschrittlich
  - 3.3.1. Verwendung von lang_map und default_lang (für Ihre Sprachen)
4. Anerkennung
5. Sterngeschichte

3. Nutzung

3.1. Installation

Sie können das Paket mit PIP installieren:

pip install split-lang

3.2. Basic

3.2.1. `split_by_lang`

 from split_lang import LangSplitter
lang_splitter = LangSplitter ()
text = "你喜欢看アニメ吗"

substr = lang_splitter . split_by_lang (
    text = text ,
)
for index , item in enumerate ( substr ):
    print ( f" { index } | { item . lang } : { item . text } " )

 0|zh:你喜欢看
1|ja:アニメ
2|zh:吗

 from split_lang import LangSplitter
lang_splitter = LangSplitter ( merge_across_punctuation = True )
import time
texts = [
    "你喜欢看アニメ吗？我也喜欢看" ,
    "Please star this project on GitHub, Thanks you. I love you请加星这个项目，谢谢你。我爱你この項目をスターしてください、ありがとうございます！愛してる" ,
]
time1 = time . time ()
for text in texts :
    substr = lang_splitter . split_by_lang (
        text = text ,
    )
    for index , item in enumerate ( substr ):
        print ( f" { index } | { item . lang } : { item . text } " )
    print ( "----------------------" )
time2 = time . time ()
print ( time2 - time1 )

 0|zh:你喜欢看
1|ja:アニメ
2|zh:吗？我也喜欢看
----------------------
0|en:Please star this project on GitHub, Thanks you. I love you
1|zh:请加星这个项目，谢谢你。我爱你
2|ja:この項目をスターしてください、ありがとうございます！愛してる
----------------------
0.007998466491699219

3.2.2. `merge_across_digit`

 lang_splitter . merge_across_digit = False
texts = [
    "衬衫的价格是9.15便士" ,
]
for text in texts :
    substr = lang_splitter . split_by_lang (
        text = text ,
    )
    for index , item in enumerate ( substr ):
        print ( f" { index } | { item . lang } : { item . text } " )

 0|zh:衬衫的价格是
1|digit:9.15
2|zh:便士

3.3. Fortschrittlich

3.3.1. Verwendung von `lang_map` und `default_lang` (für Ihre Sprachen)

Wichtig

Fügen Sie Lang Code für Ihren Usecase hinzu, wenn andere Sprachen benötigt werden. Siehe Unterstützungssprache

Standard lang_map sieht unten nach wie unten
- Wenn langua-py oder fasttext oder ein anderer Sprachdetektor die Sprache erkennen, die nicht in lang_map enthalten ist, wird auf default_lang eingestellt
- Wenn Sie default_lang oder value des key:value in lang_map auf x
  - zh x | jp -> zh | jp ( x wurde zu einer Seite verschmolzen)
  - In Beispiel unten ist zh-tw auf x eingestellt, weil das Zeichen in zh und jp manchmal als traditionelles Chinesisch erkannt wurde
Standard default_lang ist x

 DEFAULT_LANG_MAP = {
    "zh" : "zh" ,
    "yue" : "zh" ,  # 粤语
    "wuu" : "zh" ,  # 吴语
    "zh-cn" : "zh" ,
    "zh-tw" : "x" ,
    "ko" : "ko" ,
    "ja" : "ja" ,
    "de" : "de" ,
    "fr" : "fr" ,
    "en" : "en" ,
    "hr" : "en" ,
}
DEFAULT_LANG = "x"

4. Anerkennung

Inspiriert von LLMKIRA/FAST-LANGDECTECT
Die Textsegmentierung hängt von Google/Budoux ab
Die Spracherkennung hängt von Zafercavdar/fastText-LangDeTect und Rspeer/WordFreq ab