FinBERT是在金融交流文本上预训练的BERT模型。其目的是增强金融自然语言处理的研究与实践。它在以下三个金融交流语料库上进行了训练,总共包含49亿个标记。
关于FinBERT的更多技术细节: Click Link
发布的finbert-tone模型是在1万个手动注释的(正面、负面、中性)分析师报告句子上对FinBERT模型进行微调的结果。该模型在金融情感分析任务上表现出优越性能。如果您只是想使用FinBERT进行金融情感分析,请尝试一下。
如果您在学术工作中使用该模型,请引用以下论文:
Huang, Allen H., Hui Wang, and Yi Yang. "FinBERT: A Large Language Model for Extracting Information from Financial Text." Contemporary Accounting Research (2022).
您可以使用Transformers pipeline进行情感分析来使用该模型。
from transformers import BertTokenizer, BertForSequenceClassification from transformers import pipeline finbert = BertForSequenceClassification.from_pretrained('yiyanghkust/finbert-tone',num_labels=3) tokenizer = BertTokenizer.from_pretrained('yiyanghkust/finbert-tone') nlp = pipeline("sentiment-analysis", model=finbert, tokenizer=tokenizer) sentences = ["there is a shortage of capital, and we need extra financing", "growth is strong and we have plenty of liquidity", "there are doubts about our finances", "profits are flat"] results = nlp(sentences) print(results) #LABEL_0: neutral; LABEL_1: positive; LABEL_2: negative