模型:
Jean-Baptiste/roberta-large-financial-news-sentiment-en
This model was train on financial_news_sentiment_mixte_with_phrasebank_75 dataset. This is a customized version of the phrasebank dataset in which I kept only sentence validated by at least 75% annotators. In addition I added ~2000 articles validated manually on Canadian financial news. Therefore the model is more specifically trained for Canadian news. Final result is f1 score of 93.25% overall and 83.6% on Canadian news.
Training data was classified as follow:
class | Description |
---|---|
0 | negative |
1 | neutral |
2 | positive |
from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("Jean-Baptiste/roberta-large-financial-news-sentiment-en") model = AutoModelForSequenceClassification.from_pretrained("Jean-Baptiste/roberta-large-financial-news-sentiment-en") ##### Process text sample (from wikipedia) from transformers import pipeline pipe = pipeline("text-classification", model=model, tokenizer=tokenizer) pipe("Melcor REIT (TSX: MR.UN) today announced results for the third quarter ended September 30, 2022. Revenue was stable in the quarter and year-to-date. Net operating income was down 3% in the quarter at $11.61 million due to the timing of operating expenses and inflated costs including utilities like gas/heat and power") [{'label': 'negative', 'score': 0.9399105906486511}]
Overall f1 score (average macro)
precision | recall | f1 |
---|---|---|
0.9355 | 0.9299 | 0.9325 |
By entity
entity | precision | recall | f1 |
---|---|---|---|
negative | 0.9605 | 0.9240 | 0.9419 |
neutral | 0.9538 | 0.9459 | 0.9498 |
positive | 0.8922 | 0.9200 | 0.9059 |