FinBERT下载 - FinBERT源代码下载

FinBERT

其他源码

1.0.0

下载

芬伯特

***** 2022年6月2日：更多微调的Finbert模型*****

请访问Finbert.ai，以获取有关Finbert最近发展的更多详细信息。

我们在几个财务NLP任务上进行了微调的Finbert预估计模型，所有表现都优于传统的机器学习模型，深度学习模型和微调的BERT模型。所有调整的Finbert模型都在Huggingface公开托管？具体来说，我们有以下内容：

芬伯特（Finbert）预言：大规模财务经文的审计芬伯特（Finbert）模型。关联
Finbert-Sentiment ：用于情感分类任务。关联
Finbert-ESG ：用于ESG分类任务。关联
Finbert-FLS ：用于前瞻性语句（FLS）分类任务。关联

在这个github回购中，

finbert-demo.ipynb演示了如何在特定的NLP任务上应用微调的Finbert模型。
Finetune.ipynb说明了微调Finbert的过程。

背景：

FinBERT是经济传播文本预先培训的BERT模型。目的是增强Finaincal NLP研究和实践。它在以下三个Finanical Communication语料库中进行了培训。总数为4.9b代币。

公司报告10-K＆10-Q：2.5B令牌
收入电话成绩单：1.3B令牌
分析师报告：1.1B令牌

FinBERT在各种财务NLP任务上都会取得最先进的绩效，包括情感分析，ESG分类，前瞻性陈述（FLS）分类。随着FinBERT的发布，我们希望从业者和研究人员可以将FinBERT用于更广泛的应用程序，在这些应用程序中，预测目标超越了情感，例如与财务相关的成果，包括股票收益，股票波动，股票波动，公司欺诈等。

***** 2021年7月30日：迁移到拥抱面？*****

对金融情感分类的微调FinBERT模型已上传并与Huggingface的transformers图书馆集成在一起。该模型对分析师报告的10,000个手动注释（正，负，中性）句子进行了微调。该模型在财务基调Anlaysis任务上取得了出色的表现。如果您只是有兴趣使用FinBERT进行财务基调分析，请尝试一下。

 from transformers import BertTokenizer , BertForSequenceClassification
import numpy as np

finbert = BertForSequenceClassification . from_pretrained ( 'yiyanghkust/finbert-tone' , num_labels = 3 )
tokenizer = BertTokenizer . from_pretrained ( 'yiyanghkust/finbert-tone' )

sentences = [ "there is a shortage of capital, and we need extra financing" , 
             "growth is strong and we have plenty of liquidity" , 
             "there are doubts about our finances" , 
             "profits are flat" ]

inputs = tokenizer ( sentences , return_tensors = "pt" , padding = True )
outputs = finbert ( ** inputs ) [ 0 ]

labels = { 0 : 'neutral' , 1 : 'positive' , 2 : 'negative' }
for idx , sent in enumerate ( sentences ) :
    print ( sent , '----' , labels [ np . argmax ( outputs . detach ( ) . numpy ( ) [ idx ] ) ] )
    
'' '
there is a shortage of capital , and we need extra financing -- -- negative
growth is strong and we have plenty of liquidity -- -- positive
there are doubts about our finances -- -- negative
profits are flat -- -- neutral
'' '

***** 2020年6月16日：审计的Finbert模型发布*****

我们提供四个版本的预训练的Finbert重量。

Finbert-finvocab uncund（推荐）
Finbert-Finvocab持续的
Finbert-Basevocab uncund
Finbert-Basevocab持续的

FinVocab是使用句子图书馆在我们的金融语料库上的新文字词汇。我们同时生产了FinVocab的壳体版本和未固定版本，尺寸分别为28,573和30,873个令牌。这与原始的Bert Cased和未固定的BaseVocab的28,996和30,522代币大小非常相似。

finvocab uncund
Finvocab限制

引用

 @misc{yang2020finbert,
    title={FinBERT: A Pretrained Language Model for Financial Communications},
    author={Yi Yang and Mark Christopher Siy UY and Allen Huang},
    year={2020},
    eprint={2006.08097},
    archivePrefix={arXiv},
    }