tweetnlp下載tweetnlp源代碼下載

Tweetnlp

在Twitter和社交媒體上工作的所有NLP愛好者推薦！ Python Library tweetnlp提供了一系列有用的工具，可以分析/了解諸如情感分析，表情符號預測和指定性識別等推文，由最先進的語言建模為社交媒體提供支持。

新聞（2022年12月）：我們介紹了TweetNLP演示論文（“ Tweetnlp：尖端的社交媒體自然語言處理”），在EMNLP 2022。最終版本可以在此處找到。

Tweetnlp擁抱面頁所有主要的Tweetnlp模型都可以在擁抱的臉上找到。

資源：

與COLAB筆記本快速遊覽：
使用TweetNLP在線演示：鏈接
EMNLP 2022紙：鏈接
第二加的夫NLP夏季研討會教程：
第二加的夫NLP夏季研討會教程（解決方案）：

負載模型和數據集
微調模型

開始

在控制台上安裝通過PIP安裝TweetNLP。

pip install tweetnlp

模型和數據集

在本節中，您將學習如何使用tweetnlp獲取模型和數據集。模型遵循擁抱面模型，數據集的格式為擁抱面數據集。應在HuggingFace網頁上找到HuggingFace模型和數據集的簡單介紹，因此，如果您是HuggingFace的新手，請檢查它們。

推文分類

分類模塊由六個不同的任務（主題分類，情感分析，諷刺檢測，仇恨言論檢測，進攻性語言檢測，表情符號預測和情感分析）組成。在每個示例中，該模型均由tweetnlp.load_model("task-name")實例化，並通過將文本或文本列表傳遞給相應函數來運行預測。

主題分類：此任務的目的是，給出了一條推文來分配與其內容相關的主題。該任務是作為有監督的多標籤分類問題形成的，其中每個推文分配了一個或多個可用主題的一個或多個主題。這些主題是根據Twitter趨勢精心策劃的，其目的是廣泛，一般，包括：藝術和文化，音樂或體育。我們內部註銷的數據集包含超過10K手動標記的推文（在此處查看紙張或HuggingFace數據集頁面）。

 import tweetnlp

# MULTI-LABEL MODEL 
model = tweetnlp . load_model ( 'topic_classification' )  # Or `model = tweetnlp.TopicClassification()`
model . topic ( "Jacob Collier is a Grammy-awarded English artist from London." )  # Or `model.predict`
> >> { 'label' : [ 'celebrity_&_pop_culture' , 'music' ]}
# Note: the probability of the multi-label model is the output of sigmoid function on binary prediction whether each topic is positive or negative.
model . topic ( "Jacob Collier is a Grammy-awarded English artist from London." , return_probability = True )
> >> { 'label' : [ 'celebrity_&_pop_culture' , 'music' ],
 'probability' : { 'arts_&_culture' : 0.037371691316366196 ,
  'business_&_entrepreneurs' : 0.010188567452132702 ,
  'celebrity_&_pop_culture' : 0.92448890209198 ,
  'diaries_&_daily_life' : 0.03425711765885353 ,
  'family' : 0.00796138122677803 ,
  'fashion_&_style' : 0.020642118528485298 ,
  'film_tv_&_video' : 0.08062587678432465 ,
  'fitness_&_health' : 0.006343095097690821 ,
  'food_&_dining' : 0.0042883665300905704 ,
  'gaming' : 0.004327300935983658 ,
  'learning_&_educational' : 0.010652057826519012 ,
  'music' : 0.8291937112808228 ,
  'news_&_social_concern' : 0.24688217043876648 ,
  'other_hobbies' : 0.020671198144555092 ,
  'relationships' : 0.020371075719594955 ,
  'science_&_technology' : 0.0170074962079525 ,
  'sports' : 0.014291072264313698 ,
  'travel_&_adventure' : 0.010423899628221989 ,
  'youth_&_student_life' : 0.008605164475739002 }}

# SINGLE-LABEL MODEL
model = tweetnlp . load_model ( 'topic_classification' , multi_label = False )  # Or `model = tweetnlp.TopicClassification(multi_label=False)`
model . topic ( "Jacob Collier is a Grammy-awarded English artist from London." )
> >> { 'label' : 'pop_culture' }
# NOTE: the probability of the sinlge-label model the softmax over the label.
model . topic ( "Jacob Collier is a Grammy-awarded English artist from London." , return_probability = True )
> >> { 'label' : 'pop_culture' ,
 'probability' : { 'arts_&_culture' : 9.20625461731106e-05 ,
  'business_&_entrepreneurs' : 6.916998972883448e-05 ,
  'pop_culture' : 0.9995898604393005 ,
  'daily_life' : 0.00011083036952186376 ,
  'sports_&_gaming' : 8.668467489769682e-05 ,
  'science_&_technology' : 5.152115045348182e-05 }}

# GET DATASET
dataset_multi_label , label2id_multi_label = tweetnlp . load_dataset ( 'topic_classification' )
dataset_single_label , label2id_single_label = tweetnlp . load_dataset ( 'topic_classification' , multi_label = False )

情感分析：在TweetNLP中集成的情感分析任務是一個簡化的版本，其目標是預測帶有以下三個標籤之一的推文的情感：正，中性或負面。英語的基本數據集是從Twitter中的情感分析任務中的Semeval-2017數據集的統一Tweeteval版本（請在此處查看論文）。

 import tweetnlp

# ENGLISH MODEL
model = tweetnlp . load_model ( 'sentiment' )  # Or `model = tweetnlp.Sentiment()` 
model . sentiment ( "Yes, including Medicare and social security saving?" )  # Or `model.predict`
> >> { 'label' : 'positive' }
model . sentiment ( "Yes, including Medicare and social security saving?" , return_probability = True )
> >> { 'label' : 'positive' , 'probability' : { 'negative' : 0.004584966693073511 , 'neutral' : 0.19360853731632233 , 'positive' : 0.8018065094947815 }}

# MULTILINGUAL MODEL
model = tweetnlp . load_model ( 'sentiment' , multilingual = True )  # Or `model = tweetnlp.Sentiment(multilingual=True)` 
model . sentiment ( "天気が良いとやっぱり気持ち良いなあ" )
> >> { 'label' : 'positive' }
model . sentiment ( "天気が良いとやっぱり気持ち良いなあ" , return_probability = True )
> >> { 'label' : 'positive' , 'probability' : { 'negative' : 0.028369612991809845 , 'neutral' : 0.08128828555345535 , 'positive' : 0.8903420567512512 }}

# GET DATASET (ENGLISH)
dataset , label2id = tweetnlp . load_dataset ( 'sentiment' )
# GET DATASET (MULTILINGUAL)
for l in [ 'all' , 'arabic' , 'english' , 'french' , 'german' , 'hindi' , 'italian' , 'portuguese' , 'spanish' ]:
    dataset_multilingual , label2id_multilingual = tweetnlp . load_dataset ( 'sentiment' , multilingual = True , task_language = l )

具有諷刺意味的檢測：這是一項二進制分類任務，鑑於推文，目標是檢測它是否具有諷刺意味。它基於Semeval 2018任務中的諷刺檢測數據集（此處查看論文）。

 import tweetnlp

# MODEL
model = tweetnlp . load_model ( 'irony' )  # Or `model = tweetnlp.Irony()` 
model . irony ( 'If you wanna look like a badass, have drama on social media' )  # Or `model.predict`
> >> { 'label' : 'irony' }
model . irony ( 'If you wanna look like a badass, have drama on social media' , return_probability = True )
> >> { 'label' : 'irony' , 'probability' : { 'non_irony' : 0.08390884101390839 , 'irony' : 0.9160911440849304 }} 

# GET DATASET
dataset , label2id = tweetnlp . load_dataset ( 'irony' )

仇恨言論檢測：仇恨言論檢測任務包括檢測一條推文是否對目標社區仇恨。基礎模型基於一套統一的仇恨言語檢測數據集（請參閱參考文件）。

 import tweetnlp

# MODEL
model = tweetnlp . load_model ( 'hate' )  # Or `model = tweetnlp.Hate()` 
model . hate ( 'Whoever just unfollowed me you a bitch' )  # Or `model.predict`
> >> { 'label' : 'not-hate' }
model . hate ( 'Whoever just unfollowed me you a bitch' , return_probability = True )
> >> { 'label' : 'non-hate' , 'probability' : { 'non-hate' : 0.7263831496238708 , 'hate' : 0.27361682057380676 }}

# GET DATASET
dataset , label2id = tweetnlp . load_dataset ( 'hate' )

進攻性語言標識：此任務包括識別推文中是否存在某種形式的冒犯性語言。對於我們的基準測試，我們依靠Semeval2019犯罪數據集（在此處查看論文）。

 import tweetnlp

# MODEL
model = tweetnlp . load_model ( 'offensive' )  # Or `model = tweetnlp.Offensive()` 
model . offensive ( "All two of them taste like ass." )  # Or `model.predict`
> >> { 'label' : 'offensive' }
model . offensive ( "All two of them taste like ass." , return_probability = True )
> >> { 'label' : 'offensive' , 'probability' : { 'non-offensive' : 0.16420328617095947 , 'offensive' : 0.8357967734336853 }}

# GET DATASET
dataset , label2id = tweetnlp . load_dataset ( 'offensive' )

表情符號預測：表情符號預測的目標是預測給定推文上的最終表情符號。用於微調我們模型的數據集是從Semeval 2018上的TweetEval改編的表情符號預測任務（在此處檢查論文），包括20個表情符號作為標籤（❤，？，？，？，？，？，？，？，？，？，？，？，？,,？,,☀，☀，？,,？,,？,,？,,？，？）。

 import tweetnlp

# MODEL
model = tweetnlp . load_model ( 'emoji' )  # Or `model = tweetnlp.Emoji()` 
model . emoji ( 'Beautiful sunset last night from the pontoon @TupperLakeNY' )  # Or `model.predict`
> >> { 'label' : '?' }
model . emoji ( 'Beautiful sunset last night from the pontoon @TupperLakeNY' , return_probability = True )
> >> { 'label' : '?' ,
 'probability' : { '❤' : 0.13197319209575653 ,
  '?' : 0.11246423423290253 ,
  '?' : 0.008415069431066513 ,
  '?' : 0.04842926934361458 ,
  '' : 0.014528146013617516 ,
  '?' : 0.1509675830602646 ,
  '?' : 0.08625403046607971 ,
  '' : 0.01616635173559189 ,
  '?' : 0.07396604865789413 ,
  '?' : 0.03033279813826084 ,
  '?' : 0.16525287926197052 ,
  '??' : 0.020336611196398735 ,
  '☀' : 0.00799981877207756 ,
  '?' : 0.016111424192786217 ,
  '' : 0.012984540313482285 ,
  '?' : 0.012557178735733032 ,
  '?' : 0.031386848539114 ,
  '?' : 0.006829539313912392 ,
  '?' : 0.04188741743564606 ,
  '?' : 0.011156936176121235 }}

# GET DATASET
dataset , label2id = tweetnlp . load_dataset ( 'emoji' )

情感認可：鑑於推文，此任務包括將其與最合適的情感聯繫起來。作為參考數據集，我們將Semeval 2018任務用於Tweet中的影響（此處查看論文）。最新的多標籤模型包括11種情感類型。

 import tweetnlp

# MULTI-LABEL MODEL 
model = tweetnlp . load_model ( 'emotion' )  # Or `model = tweetnlp.Emotion()` 
model . emotion ( 'I love swimming for the same reason I love meditating...the feeling of weightlessness.' )  # Or `model.predict`
> >> { 'label' : 'joy' }
# Note: the probability of the multi-label model is the output of sigmoid function on binary prediction whether each topic is positive or negative.
model . emotion ( 'I love swimming for the same reason I love meditating...the feeling of weightlessness.' , return_probability = True )
> >> { 'label' : 'joy' ,
 'probability' : { 'anger' : 0.00025800734874792397 ,
  'anticipation' : 0.0005329723935574293 ,
  'disgust' : 0.00026112011983059347 ,
  'fear' : 0.00027552215033210814 ,
  'joy' : 0.7721399068832397 ,
  'love' : 0.1806265264749527 ,
  'optimism' : 0.04208092764019966 ,
  'pessimism' : 0.00025325192837044597 ,
  'sadness' : 0.0006160663324408233 ,
  'surprise' : 0.0005619609728455544 ,
  'trust' : 0.002393839880824089 }}

# SINGLE-LABEL MODEL
model = tweetnlp . load_model ( 'emotion' )  # Or `model = tweetnlp.Emotion()` 
model . emotion ( 'I love swimming for the same reason I love meditating...the feeling of weightlessness.' )  # Or `model.predict`
> >> { 'label' : 'joy' }
# NOTE: the probability of the sinlge-label model the softmax over the label.
model . emotion ( 'I love swimming for the same reason I love meditating...the feeling of weightlessness.' , return_probability = True )
> >> { 'label' : 'optimism' , 'probability' : { 'joy' : 0.01367587223649025 , 'optimism' : 0.7345258593559265 , 'anger' : 0.1770714670419693 , 'sadness' : 0.07472680509090424 }}

# GET DATASET
dataset , label2id = tweetnlp . load_dataset ( 'emotion' )

WARNING: The single-label and multi-label emotion model have diiferent label set (single-label has four classes of 'joy'/'optimism'/'anger'/'sadness', while multi-label has eleven classes of 'joy'/'optimism'/'anger'/'sadness'/'love'/'trust'/'fear'/'surprise'/'anticipation'/'disgust'/'pessimism').

命名實體識別

該模塊由專門針對推文訓練的命名實體識別（NER）模型組成。該模型由tweetnlp.load_model("ner")實例化，並通過將文本或文本列表作為參數作為ner （在此處查看論文或HuggingFace DataSet頁面）來運行預測。

 import tweetnlp

# MODEL
model = tweetnlp . load_model ( 'ner' )  # Or `model = tweetnlp.NER()` 
model . ner ( 'Jacob Collier is a Grammy-awarded English artist from London.' )  # Or `model.predict`
> >> [{ 'type' : 'person' , 'entity' : 'Jacob Collier' }, { 'type' : 'event' , 'entity' : ' Grammy' }, { 'type' : 'location' , 'entity' : ' London' }]
# Note: the probability for the predicted entity is the mean of the probabilities over the sub-tokens representing the entity. 
model . ner ( 'Jacob Collier is a Grammy-awarded English artist from London.' , return_probability = True )  # Or `model.predict`
> >> [
  { 'type' : 'person' , 'entity' : 'Jacob Collier' , 'probability' : 0.9905318220456442 },
  { 'type' : 'event' , 'entity' : ' Grammy' , 'probability' : 0.19164378941059113 },
  { 'type' : 'location' , 'entity' : ' London' , 'probability' : 0.9607000350952148 }
]

# GET DATASET
dataset , label2id = tweetnlp . load_dataset ( 'ner' )

問題回答

該模塊由一個針對推文培訓的問題回答模型組成。該模型由tweetnlp.load_model("question_answering")實例化，並通過給出問題或問題列表以及上下文或上下文列表作為參數（在此處查看論文或HuggingFace DataSet頁面）來question_answering預測）。

 import tweetnlp

# MODEL
model = tweetnlp . load_model ( 'question_answering' )  # Or `model = tweetnlp.QuestionAnswering()` 
model . question_answering (
  question = 'who created the post as we know it today?' ,
  context = "'So much of The Post is Ben,' Mrs. Graham said in 1994, three years after Bradlee retired as editor. 'He created it as we know it today.'— Ed O'Keefe (@edatpost) October 21, 2014"
)  # Or `model.predict`
> >> { 'generated_text' : 'ben' }

# GET DATASET
dataset = tweetnlp . load_dataset ( 'question_answering' )

問題答案生成

該模塊由一個針對推文培訓的問答對生成。該模型由tweetnlp.load_model("question_answer_generation")實例化，並通過將上下文或上下文列表作為參數作為question_answer_generation函數（在此處查看論文或HuggingFace DataSet Page）來運行預測）。

 import tweetnlp

# MODEL
model = tweetnlp . load_model ( 'question_answer_generation' )  # Or `model = tweetnlp.QuestionAnswerGeneration()` 
model . question_answer_generation (
  text = "'So much of The Post is Ben,' Mrs. Graham said in 1994, three years after Bradlee retired as editor. 'He created it as we know it today.'— Ed O'Keefe (@edatpost) October 21, 2014"
)  # Or `model.predict`
> >> [
    { 'question' : 'who created the post?' , 'answer' : 'ben' },
    { 'question' : 'what did ben do in 1994?' , 'answer' : 'he retired as editor' }
]

# GET DATASET
dataset = tweetnlp . load_dataset ( 'question_answer_generation' )

語言建模

蒙版語言模型預測給定句子中的蒙版令牌。這是由tweetnlp.load_model('language_model')實例化的，並通過將文本或文本列表作為參數作為mask_prediction函數來運行預測。請確保每個文本都有一個<mask>令牌，因為這最終是按照模型預測的目標。

 import tweetnlp
model = tweetnlp . load_model ( 'language_model' )  # Or `model = tweetnlp.LanguageModel()` 
model . mask_prediction ( "How many more <mask> until opening day? ?" , best_n = 2 )  # Or `model.predict`
> >> { 'best_tokens' : [ 'days' , 'hours' ],
 'best_scores' : [ 5.498564104033932e-11 , 4.906026140893971e-10 ],
 'best_sentences' : [ 'How many more days until opening day? ?' ,
  'How many more hours until opening day? ?' ]}

推文嵌入

推文嵌入模型可為推文產生固定長度嵌入。嵌入方式通過推文的含義表示語義，這可以通過使用嵌入之間的相似性來用於對推文的語義搜索。模型由tweet_nlp.load_model('sentence_embedding')實例化，並通過將文本或文本列表傳遞給embedding函數來運行預測。

嵌入

 import tweetnlp
model = tweetnlp . load_model ( 'sentence_embedding' )  # Or `model = tweetnlp.SentenceEmbedding()` 

# Get sentence embedding
tweet = "I will never understand the decision making of the people of Alabama. Their new Senator is a definite downgrade. You have served with honor.  Well done."
vectors = model . embedding ( tweet )
vectors . shape
> >> ( 768 ,)

# Get sentence embedding (multiple inputs)
tweet_corpus = [
    "Free, fair elections are the lifeblood of our democracy. Charges of unfairness are serious. But calling an election unfair does not make it so. Charges require specific allegations and then proof. We have neither here." ,
    "Trump appointed judge Stephanos Bibas " ,
    "If your members can go to Puerto Rico they can get their asses back in the classroom. @CTULocal1" ,
    "@PolitiBunny @CTULocal1 Political leverage, science said schools could reopen, teachers and unions protested to keep'em closed and made demands for higher wages and benefits, they're usin Covid as a crutch at the expense of life and education." ,
    "Congratulations to all the exporters on achieving record exports in Dec 2020 with a growth of 18 % over the previous year. Well done &amp; keep up this trend. A major pillar of our govt's economic policy is export enhancement &amp; we will provide full support to promote export culture." ,
    "@ImranKhanPTI Pakistan seems a worst country in term of exporting facilities. I am a small business man and if I have to export a t-shirt having worth of $5 to USA or Europe. Postal cost will be around $30. How can we grow as an exporting country if this situation prevails. Think about it. #PM" ,
    "The thing that doesn’t sit right with me about “nothing good happened in 2020” is that it ignores the largest protest movement in our history. The beautiful, powerful Black Lives Matter uprising reached every corner of the country and should be central to our look back at 2020." ,
    "@JoshuaPotash I kinda said that in the 2020 look back for @washingtonpost" ,
    "Is this a confirmation from Q that Lin is leaking declassified intelligence to the public? I believe so. If @realDonaldTrump didn’t approve of what @LLinWood is doing he would have let us know a lonnnnnng time ago. I’ve always wondered why Lin’s Twitter handle started with “LLin” https://t.co/0G7zClOmi2" ,
    "@ice_qued @realDonaldTrump @LLinWood Yeah 100%" ,
    "Tomorrow is my last day as Senator from Alabama.  I believe our opportunities are boundless when we find common ground. As we swear in a new Congress &amp; a new President, demand from them that they do just that &amp; build a stronger, more just society.  It’s been an honor to serve you." 
    "The mask cult can’t ever admit masks don’t work because their ideology is based on feeling like a “good person”  Wearing a mask makes them a “good person” &amp; anyone who disagrees w/them isn’t  They can’t tolerate any idea that makes them feel like their self-importance is unearned" ,
    "@ianmSC Beyond that, they put such huge confidence in masks so early with no strong evidence that they have any meaningful benefit, they don’t want to backtrack or admit they were wrong. They put the cart before the horse, now desperate to find any results that match their hypothesis." ,
]
vectors = model . embedding ( tweet_corpus , batch_size = 4 )
vectors . shape
> >> ( 12 , 768 )

相似性搜索

 sims = []
for n , i in enumerate ( tweet_corpus ):
  _sim = model . similarity ( tweet , i )
  sims . append ([ n , _sim ])
print ( f'anchor tweet: { tweet } n ' )
for m , ( n , s ) in enumerate ( sorted ( sims , key = lambda x : x [ 1 ], reverse = True )[: 3 ]):
  print ( f' - top { m } : { tweet_corpus [ n ] } n - similaty: { s } n ' )

> >> anchor tweet : I will never understand the decision making of the people of Alabama . Their new Senator is a definite downgrade . You have served with honor .  Well done .

 - top 0 : Tomorrow is my last day as Senator from Alabama .  I believe our opportunities are boundless when we find common ground . As we swear in a new Congress & amp ; a new President , demand from them that they do just that & amp ; build a stronger , more just society .  It ’ s been an honor to serve you . The mask cult can ’ t ever admit masks don ’ t work because their ideology is based on feeling like a “ good person ”  Wearing a mask makes them a “ good person ” & amp ; anyone who disagrees w / them isn ’ t  They can ’ t tolerate any idea that makes them feel like their self - importance is unearned
 - similaty : 0.7480925982953287

 - top 1 : Trump appointed judge Stephanos Bibas 
 - similaty : 0.6289173306344258

 - top 2 : Free , fair elections are the lifeblood of our democracy . Charges of unfairness are serious . But calling an election unfair does not make it so . Charges require specific allegations and then proof . We have neither here .
 - similaty : 0.6017154109745276

資源和自定義模型加載

這是每個任務中使用的默認模型的表。

任務	模型	數據集
主題分類（單標籤）	CardiffNLP/Twitter-Roberta-Base-Dec2021-Tweet-Topic-single-All	cardiffnlp/tweet_topic_single
主題分類（多標籤）	CardiffNLP/Twitter-Roberta-Base-Dec2021-Tweet-Topic-Multi-All	Cardiffnlp/Tweet_topic_multi
情感分析（多語言）	CardiffNLP/Twitter-XLM-Roberta-base-sentiment	cardiffnlp/tweet_sentiment_multighatual
情感分析	Cardiffnlp/Twitter-Roberta-base-sentiment-Latest	Tweet_eval
諷刺檢測	Cardiffnlp/Twitter-Roberta-base-iRony	Tweet_eval
仇恨檢測	Cardiffnlp/Twitter-Roberta-Base討厭的最終	Tweet_eval
進攻性檢測	Cardiffnlp/Twitter-Roberta-Base攻勢	Tweet_eval
表情符號預測	Cardiffnlp/Twitter-Roberta-Base-Emoji	Tweet_eval
情緒分析（單標籤）	Cardiffnlp/Twitter-Roberta-base-sotion	Tweet_eval
情緒分析（多標籤）	Cardiffnlp/Twitter-Roberta-base-Multilabel-Latest	TBA
命名實體識別	tner/roberta-large-tweetner7-all	Tner/Tweetner7
問題回答	LMQG/T5-SMALL-TWEETQA-QA	LMQG/QG_TWEETQA
問題答案生成	lmqg/t5-base-tweetqa-qag	LMQG/QAG_TWEETQA
語言建模	Cardiffnlp/Twitter-Roberta-Base-2021-124M	TBA
推文嵌入	Cambridgeltl/Tweet-Roberta-base-embeddings-v1	TBA

要使用Local/HuggingFace ModelHub中的其他模型，可以簡單地為load_model函數提供模型路徑/別名。以下是加載NER模型的示例。

 import tweetnlp
tweetnlp . load_model ( 'ner' , model_name = 'tner/twitter-roberta-base-2019-90m-tweetner7-continuous' )

模型微調

TweetNLP提供了一個簡單的接口，可在數據集上通過HuggingFace支持模型託管/使用Ray Tune進行射線曲調來搜索的數據集中的微調語言模型。

受支持的任務： sentiment ， offensive ， irony ， hate ， emotion ， topic_classification

可以在下表中找到使用tweetnlp培訓師的實驗結果。結果具有競爭力，可以用作每個任務的基準。請參閱排行榜頁面以了解有關結果的更多信息。

任務	Laging_model	eval_f1	eval_f1_macro	eval_accuracy	關聯
表情符號	Cardiffnlp/Twitter-Roberta-Base-2021-124M	0.46	0.35	0.46	CardiffNLP/Twitter-Roberta-Base-2021-124M-Emoji
情感	Cardiffnlp/Twitter-Roberta-Base-2021-124M	0.83	0.79	0.83	CardiffNLP/Twitter-Roberta-Base-2021-124M髮型
恨	Cardiffnlp/Twitter-Roberta-Base-2021-124M	0.56	0.53	0.56	CardiffNLP/Twitter-Roberta-Base-2021-124M討厭
諷刺	Cardiffnlp/Twitter-Roberta-Base-2021-124M	0.79	0.78	0.79	Cardiffnlp/Twitter-Roberta-Base-2021-124M-iRony
進攻	Cardiffnlp/Twitter-Roberta-Base-2021-124M	0.86	0.82	0.86	Cardiffnlp/Twitter-Roberta-Base-2021-124m進攻
情緒	Cardiffnlp/Twitter-Roberta-Base-2021-124M	0.71	0.72	0.71	CardiffNLP/Twitter-Roberta-Base-2021-124M索賠
主題_Classification（單個）	Cardiffnlp/Twitter-Roberta-Base-2021-124M	0.9	0.8	0.9	Cardiffnlp/Twitter-Roberta-Base-2021-124M-Topic-Single
主題_Classification（Multi）	Cardiffnlp/Twitter-Roberta-Base-2021-124M	0.75	0.56	0.54	Cardiffnlp/Twitter-Roberta-Base-2021-124M-Topic-Multi
情感（多語言）	CardiffNLP/Twitter-XLM-Roberta基礎	0.69	0.69	0.69	Cardiffnlp/Twitter-XLM-Roberta-base-sentiment-Multlindual

例子

以下示例將重現我們的諷刺模型CardiffNLP/Twitter-Roberta-Base-2021-124M-Irony。

 import logging
import tweetnlp

logging . basicConfig ( format = '%(asctime)s %(levelname)-8s %(message)s' , level = logging . INFO , datefmt = '%Y-%m-%d %H:%M:%S' )

# load dataset
dataset , label_to_id = tweetnlp . load_dataset ( "irony" )
# load trainer class
trainer_class = tweetnlp . load_trainer ( "irony" )
# setup trainer
trainer = trainer_class (
    language_model = 'cardiffnlp/twitter-roberta-base-2021-124m' ,  # language model to fine-tune
    dataset = dataset ,
    label_to_id = label_to_id ,
    max_length = 128 ,
    split_test = 'test' ,
    split_train = 'train' ,
    split_validation = 'validation' ,
    output_dir = 'model_ckpt/irony' 
)
# start model fine-tuning with parameter optimization
trainer . train (
  eval_step = 50 ,  # each `eval_step`, models are validated on the validation set 
  n_trials = 10 ,  # number of trial at parameter optimization
  search_range_lr = [ 1e-6 , 1e-4 ],  # define the search space for learning rate (min and max value)
  search_range_epoch = [ 1 , 6 ],  # define the search space for epoch (min and max value)
  search_list_batch = [ 4 , 8 , 16 , 32 , 64 ]  # define the search space for batch size (list of integer to test) 
)
# evaluate model on the test set
trainer . evaluate ()
> >> {
  "eval_loss" : 1.3228046894073486 ,
  "eval_f1" : 0.7959183673469388 ,
  "eval_f1_macro" : 0.791350632069195 ,
  "eval_accuracy" : 0.7959183673469388 ,
  "eval_runtime" : 2.2267 ,
  "eval_samples_per_second" : 352.084 ,
  "eval_steps_per_second" : 44.01
}
# save model locally (saved at `{output_dir}/best_model` as default)
trainer . save_model ()
# run prediction
trainer . predict ( 'If you wanna look like a badass, have drama on social media' )
> >> { 'label' : 'irony' }
# push your model on huggingface hub
trainer . push_to_hub ( hf_organization = 'cardiffnlp' , model_alias = 'twitter-roberta-base-2021-124m-irony' )

保存的檢查點可以作為自定義模型加載，如下所示。

 import tweetnlp
model = tweetnlp . load_model ( 'irony' , model_name = "model_ckpt/irony/best_model" )

如果未給出split_validation ，則教練將進行單個運行，而無需參數搜索。

參考文件

有關更多詳細信息，請閱讀隨附的TweetNLP參考文件。如果您在研究中使用TweetNLP，請使用以下bib條目引用參考文件：

 @inproceedings{camacho-collados-etal-2022-tweetnlp,
    title={{T}weet{NLP}: {C}utting-{E}dge {N}atural {L}anguage {P}rocessing for {S}ocial {M}edia},
    author={Camacho-Collados, Jose and Rezaee, Kiamehr and Riahi, Talayeh and Ushio, Asahi and Loureiro, Daniel and Antypas, Dimosthenis and Boisson, Joanne and Espinosa-Anke, Luis and Liu, Fangyu and Mart{'i}nez-C{'a}mara, Eugenio and others},
    booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = nov,
    year = "2022",
    address = "Abu Dhabi, U.A.E.",
    publisher = "Association for Computational Linguistics",
}

展開