PoliBERTweetダウンロードPoliBERTweetソースコードのダウンロード

PoliBERTweet

AI ソースコード

1.0.0

ダウンロード

？ Polibertweet：政治的ツイートの言語モデル

トランスベースの言語モデルは、大量の政治関連のTwitterデータ（83mのツイート）で事前に訓練されています。このレポは、次の論文の公式リソースです。

Polibertweet：Twitterで政治的コンテンツを分析するための事前に訓練された言語モデル、LREC 2022。

データセット

私たちの論文で示されている評価タスクのデータセットは、以下に入手できます。

ポリテストとノンポリテスト - [ダウンロード]
スタンスデータセット - [ダウンロード] [紙] [Github]

事前に訓練されたモデル

すべてのモデルは私のハグFaceにアップロードされますか？したがって、モデルを3行のコードでロードできます!!!

polibertweet（83mのツイート） - これをダウンストリームタスクに微調整してください
polibertweet-small（5mツイート）

使用法

pytorch v1.10.2およびtransformers v4.18.0でテストしました。

特定のタスク（スタンス検出）についてモデルを微調整するには、Huggingfaceドキュメントを参照してください
使用法の詳細については、上記の特定のモデルページをご覧ください。以下はサンプルユースケースです。

1.モデルとトークネザーをロードします

 from transformers import AutoModel , AutoTokenizer , pipeline
import torch

# Choose GPU if available
device = torch . device ( "cuda" if torch . cuda . is_available () else "cpu" )

# Select mode path here
pretrained_LM_path = "kornosk/polibertweet-mlm"

# Load model
tokenizer = AutoTokenizer . from_pretrained ( pretrained_LM_path )
model = AutoModel . from_pretrained ( pretrained_LM_path )

2。マスクされた単語を予測します

 # Fill mask
example = "Trump is the <mask> of USA"
fill_mask = pipeline ( 'fill-mask' , model = pretrained_LM_path , tokenizer = tokenizer )

outputs = fill_mask ( example )
print ( outputs )

3.埋め込みを参照してください

 # See embeddings
inputs = tokenizer ( example , return_tensors = "pt" )
outputs = model ( ** inputs )
print ( outputs )

# OR you can use this model to train on your downstream task!
# please consider citing our paper if you feel this is useful :)

4.スタンス検出のような下流タスクに微調整します

Huggingfaceドキュメントの詳細を参照してください。

✏唱。引用

私たちの論文とリソースが役立つと感じている場合は、私たちの作品を引用することを検討してください！

 @inproceedings { kawintiranon2022polibertweet ,
  title     = { {P}oli{BERT}weet: A Pre-trained Language Model for Analyzing Political Content on {T}witter } ,
  author    = { Kawintiranon, Kornraphop and Singh, Lisa } ,
  booktitle = { Proceedings of the Language Resources and Evaluation Conference (LREC) } ,
  year      = { 2022 } ,
  pages     = { 7360--7367 } ,
  publisher = { European Language Resources Association } ,
  url       = { https://aclanthology.org/2022.lrec-1.801 }
}