PoliBERTweet下載 - PoliBERTweet源代碼下載

PoliBERTweet

Ai源碼

1.0.0

下載

？ Polibertweet：政治推文的語言模型

基於變壓器的語言模型已通過大量與政治相關的Twitter數據（8300萬推文）進行了預訓練。此存儲庫是以下論文的官方資源。

PoliberTweet：一種預先訓練的語言模型，用於分析Twitter上的政治內容，LREC 2022。

數據集

我們論文中介紹的評估任務的數據集如下提供。

poli-test＆nonpoli測試 - [下載]
立場數據集 - [下載] [Paper] [GitHub]

預訓練的模型

所有型號都上傳到我的擁抱面？因此，您只需三行代碼加載模型！！！

polibertweet（83m Tweets） - 隨意將其微調到任何下游任務
polibertweet-small（5m推文）

用法

我們在pytorch v1.10.2和transformers v4.18.0中進行了測試。

要微調我們的特定任務模型（例如立場檢測），請參見huggingface doc
請參閱上面的特定型號頁面以獲取更多用法詳細信息。以下是示例用例。

1。加載模型和令牌器

 from transformers import AutoModel , AutoTokenizer , pipeline
import torch

# Choose GPU if available
device = torch . device ( "cuda" if torch . cuda . is_available () else "cpu" )

# Select mode path here
pretrained_LM_path = "kornosk/polibertweet-mlm"

# Load model
tokenizer = AutoTokenizer . from_pretrained ( pretrained_LM_path )
model = AutoModel . from_pretrained ( pretrained_LM_path )

2。預測蒙面的單詞

 # Fill mask
example = "Trump is the <mask> of USA"
fill_mask = pipeline ( 'fill-mask' , model = pretrained_LM_path , tokenizer = tokenizer )

outputs = fill_mask ( example )
print ( outputs )

3。見嵌入

 # See embeddings
inputs = tokenizer ( example , return_tensors = "pt" )
outputs = model ( ** inputs )
print ( outputs )

# OR you can use this model to train on your downstream task!
# please consider citing our paper if you feel this is useful :)

4。微調下游任務，例如立場檢測

請參閱“擁抱面文檔”中的詳細信息。

✏️引用

如果您認為我們的紙張和資源很有用，請考慮引用我們的作品！

 @inproceedings { kawintiranon2022polibertweet ,
  title     = { {P}oli{BERT}weet: A Pre-trained Language Model for Analyzing Political Content on {T}witter } ,
  author    = { Kawintiranon, Kornraphop and Singh, Lisa } ,
  booktitle = { Proceedings of the Language Resources and Evaluation Conference (LREC) } ,
  year      = { 2022 } ,
  pages     = { 7360--7367 } ,
  publisher = { European Language Resources Association } ,
  url       = { https://aclanthology.org/2022.lrec-1.801 }
}