PoliBERTweet 다운로드 - PoliBERTweet 소스 코드 다운로드

PoliBERTweet

AI 소스 코드

1.0.0

다운로드

? Polibertweet : 정치적 트윗을위한 언어 모델

변압기 기반 언어 모델은 많은 양의 정치 관련 트위터 데이터 (83m 트윗)에 미리 훈련 된 언어 모델입니다. 이 repo는 다음 논문의 공식 자원입니다.

Polibertweet : 트위터에서 정치적 내용을 분석하기위한 미리 훈련 된 언어 모델, LREC 2022.

데이터 세트

본 백서에 제시된 평가 작업의 데이터 세트는 아래에 있습니다.

폴리 테스트 및 비 폴리 테스트-[다운로드]
자세 데이터 세트 - [다운로드] [논문] [Github]

미리 훈련 된 모델

모든 모델이 내 Huggingface에 업로드되어 있습니까? 따라서 3 줄의 코드 로 모델을로드 할 수 있습니다 !!!

Polibertweet (83m tweets) - 모든 다운 스트림 작업에 미세 조정하십시오.
Puribertweet-Small (5m 트윗)

용법

우리는 pytorch v1.10.2 와 transformers v4.18.0 에서 테스트했습니다.

특정 작업 (예 : 자세 감지)에 대한 모델을 미세 조정하려면 Huggingface Doc을 참조하십시오.
더 많은 사용에 대한 자세한 내용은 위의 특정 모델 페이지를 참조하십시오. 아래는 샘플 사용 사례입니다.

1. 모델과 토큰 화제를로드하십시오

 from transformers import AutoModel , AutoTokenizer , pipeline
import torch

# Choose GPU if available
device = torch . device ( "cuda" if torch . cuda . is_available () else "cpu" )

# Select mode path here
pretrained_LM_path = "kornosk/polibertweet-mlm"

# Load model
tokenizer = AutoTokenizer . from_pretrained ( pretrained_LM_path )
model = AutoModel . from_pretrained ( pretrained_LM_path )

2. 마스크 된 단어를 예측하십시오

 # Fill mask
example = "Trump is the <mask> of USA"
fill_mask = pipeline ( 'fill-mask' , model = pretrained_LM_path , tokenizer = tokenizer )

outputs = fill_mask ( example )
print ( outputs )

3. 임베딩을 참조하십시오

 # See embeddings
inputs = tokenizer ( example , return_tensors = "pt" )
outputs = model ( ** inputs )
print ( outputs )

# OR you can use this model to train on your downstream task!
# please consider citing our paper if you feel this is useful :)

4. 자세 감지와 같은 다운 스트림 작업으로 미세 조정

Huggingface Doc의 세부 사항을 참조하십시오.

✏️ 인용

우리의 종이와 자원이 유용하다고 생각되면, 우리의 작업을 인용하는 것을 고려하십시오!

 @inproceedings { kawintiranon2022polibertweet ,
  title     = { {P}oli{BERT}weet: A Pre-trained Language Model for Analyzing Political Content on {T}witter } ,
  author    = { Kawintiranon, Kornraphop and Singh, Lisa } ,
  booktitle = { Proceedings of the Language Resources and Evaluation Conference (LREC) } ,
  year      = { 2022 } ,
  pages     = { 7360--7367 } ,
  publisher = { European Language Resources Association } ,
  url       = { https://aclanthology.org/2022.lrec-1.801 }
}