Romanian Transformers下載 - Romanian Transformers源代碼下載

Romanian Transformers

Ai源碼

1.0.0

下載

羅馬尼亞變形金剛

該倉庫是將羅馬尼亞變壓器集中並提供統一評估的空間。歡迎捐款。

我們正在使用HuggingFace的Transformers Lib，這是NLP的絕妙工具。你問什麼？這是一篇關於伯特是什麼以及它可以做什麼的清晰而冷靜的文章。還請查看此不同變壓器模型的摘要。

接下來是羅馬尼亞變壓器模型的列表，包括掩蓋和有條件的語言模型。

隨時打開問題並在此處添加您的模型/評估！

蒙版語言模型（MLMS）

模型	類型	尺寸	文章/引文/來源	預訓練 /微調	發布日期
dumitrescustefan/bert-base-base-Romanian cased-v1	伯特	124m	pdf / cite	預訓練	2020年4月
dumitrescustefan/bert-base-base-romanian-uncasund-v1	伯特	124m	pdf / cite	預訓練	2020年4月
RACAI/DISTILLBERT-BASE-ROMANIAN CASCAP	Distilbert	81m	-	預訓練	2021年4月
Readerbench/Robert-Small	伯特	19m	PDF	預訓練	2021年5月
Readerbench/Robert-Base	伯特	114m	PDF	預訓練	2021年5月
Readerbench/Robert-Large	伯特	341m	PDF	預訓練	2021年5月
dumitrescustefan/bert-base-romanian-ner	伯特	124m	HF空間	在ronecv2上命名實體識別	2022年1月
Snisioi/bert-Legal-Romanian cased-V1	伯特	124m	-	Marcellv2的法律文件	2022年1月
Readerbench/Jurbert-bas	伯特	111m	PDF	法律文件	2021年10月
Readerbench/Jurbert-Large	伯特	337m	PDF	法律文件	2021年10月

生成語言模型（CLM）

模型	類型	尺寸	文章/引文/來源	預訓練 /微調	發布日期
Dumitrescustefan/GPT-Neo-Romanian-780m	gpt-neo	780m	尚未 / HF空間	預訓練	9月，2022年
Readerbench/rogpt2-base	GPT2	124m	PDF	預訓練	7月，2021年
Readerbench/rogpt2-Medium	GPT2	354m	PDF	預訓練	7月，2021年
Readerbench/rogpt2-large	GPT2	774m	PDF	預訓練	7月，2021年

新：查看此HF空間與羅馬尼亞生成模型一起玩：https：//huggingface.co/spaces/dumitrescustefan/romanian-text-generation

模型評估

使用此處可用的公共COLAB腳本評估模型。報告的所有結果是使用相同的參數的平均得分5次。對於較大的型號，如果可能的話，通過累積梯度模擬了較大的批處理大小，以便所有模型都應具有相同的有效批量尺寸。僅評估僅評估標準模型（對於特定任務而不是未針對特定任務），並且可以評估適合16GB RAM的模型。

測試涵蓋以下字段，對於簡潔起見，我們從每個字段中選擇一個指標：

命名實體識別：在RONECV2上，我們測量測試嚴格匹配度量。模型必須正確檢測單詞是否是實體並使用正確的類標記。
語音標籤的一部分：在RO-POS-Tagger上，我們測量了測試UPOS F1分數。該測試應該揭示模型對語言結構的理解程度。
語義文本相似性：在RO-STS上，我們測量了測試Pearson相關係數。給定兩個句子，模型必須預測它們是否需要，矛盾還是在不同的主題上（中性）。該測試應突出顯示模型可以嵌入句子的含義的能力。
情緒檢測：在羅馬尼亞推文中的REDV2情緒檢測中，我們測量了分類環境中的測試錘損失（較低）。該測試應該顯示模型可以從短文中“理解”情緒的能力。
困惑：在Wiki-Ro的測試拆分上，我們測量僅CLM模型的困惑，步幅為512，批次大小為4。

MLM模型評估

模型	類型	尺寸	ner/em_strict	Rosts/Pearson	RO-POS-TAGGE/UPOS F1	redv2/hamming_loss
dumitrescustefan/bert-base-base-Romanian cased-v1	伯特	124m	0.8815	0.7966	0.982	0.1039
dumitrescustefan/bert-base-base-romanian-uncasund-v1	伯特	124m	0.8572	0.8149	0.9826	0.1038
RACAI/DISTILLBERT-BASE-ROMANIAN CASCAP	Distilbert	81m	0.8573	0.7285	0.9637	0.1119
Readerbench/Robert-Small	伯特	19m	0.8512	0.7827	0.9794	0.1085
Readerbench/Robert-Base	伯特	114m	0.8768	0.8102	0.9819	0.1041

CLM模型評估

模型	類型	尺寸	ner/em_strict	Rosts/Pearson	RO-POS-TAGGE/UPOS F1	redv2/hamming_loss	困惑
Readerbench/rogpt2-base	GPT2	124m	0.6865	0.7963	0.9009	0.1068	52.34
Readerbench/rogpt2-Medium	GPT2	354m	0.7123	0.7979	0.9098	0.114	31.26

這些模型可以做什麼

使用HuggingFace的Transformers LIB，實例化模型並根據需要替換模型名稱。然後根據您的任務使用適當的型號頭。這裡有幾個例子：

獲取令牌嵌入

 from transformers import AutoTokenizer , AutoModel
import torch

# load tokenizer and model
tokenizer = AutoTokenizer . from_pretrained ( "dumitrescustefan/bert-base-romanian-cased-v1" )
model = AutoModel . from_pretrained ( "dumitrescustefan/bert-base-romanian-cased-v1" )

# tokenize a sentence and run through the model
input_ids = tokenizer . encode ( "Acesta este un test." , add_special_tokens = True , return_tensors = "pt" )
outputs = model ( input_ids )

# get encoding
last_hidden_states = outputs [ 0 ]  # The last hidden-state is the first element of the output tuple

對於Dumitrescustefan/*型號，請記住在將其餵入模型之前校正ș/ț的變量（僅經過正確的逗號風格的變量訓練，並且會看到cedillais the the the the theţ

 text = text.replace("ţ", "ț").replace("ş", "ș").replace("Ţ", "Ț").replace("Ş", "Ș")

用生成模型寫文字

給生成模型提示並讓其寫入：

 tokenizer = AutoTokenizer . from_pretrained ( "dumitrescustefan/gpt-neo-romanian-125m" )
model = AutoModelForCausalLM . from_pretrained ( "dumitrescustefan/gpt-neo-romanian-125m" )

input_ids = tokenizer . encode ( "Cine a fost Mihai Eminescu? A fost" , return_tensors = 'pt' )

text = model . generate ( input_ids , max_length = 128 , do_sample = True , no_repeat_ngram_size = 2 , top_k = 50 , top_p = 0.9 , early_stopping = True )

print ( tokenizer . decode ( text [ 0 ], skip_special_tokens = True ))

PS您可以在此處測試所有生成模型：https：//huggingface.co/spaces/dumitrescustefan/romanian-text-generation

最後注意

雖然最初此存儲庫最初是2020年的單個變壓器模型的深度，但明示希望可以迅速添加更多模型，但事實證明，訓練一個好的模型並不容易，並且需要大量精力來策劃數據，然後才能訪問足夠的計算功率。因此，我覺得僅列出幾個型號已不再有用，並且列出我可以找到的所有僅是羅馬尼亞的模型，並且績效/文檔的水平最低，這將產生更大的影響。幹得好：）
此存儲庫包含一些代碼，用於下載和清潔羅馬尼亞語料庫。我已經刪除了這一部分，因為現在在HuggingFace上提供了Oscar（新版本），並且Opus的API不再工作（現在需要一些手動過濾，更不用說正在不斷添加新資源了） - 因此，維護此代碼並不可行。
請使用您發現的新羅馬尼亞模型，或引用或對現有型號的更新來為此回購做出貢獻。

展開

附加信息

版本 1.0.0
類型 Ai源碼
更新時間 2025-09-10
大小 6.16KB
來自於 Github

相關應用

GitHub sgrebnov/cordova plugin background download

2024-11-05
Wa ch the greatest of all time 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-02
wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

2024-11-01
變形金剛：塞伯坦之戰

2022-08-19
變形金剛：德

2022-08-18
變形金剛：黑暗火花崛起

2022-08-17

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
ML stack

Ai源碼

1.0.0
awesome free chatgpt

Ai源碼

1.0.0
pywin_contextmenu

Ai源碼

Version update
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3

相關資訊全部