模型:
hakonmh/topic-xdistil-uncased
Topic-xDistil is a model based on xtremedistil-l12-h384-uncased fine-tuned for classifying the topic of news headlines on a dataset annotated by Chat GPT 3.5 . It is built, together with Sentiment-xDistil , as a tool for filtering out financial news headlines and classifying their sentiment. The code used to train both models and build the dataset are found here .
Notes : The output labels are either Economics or Other . The model is suitable for English.
Here are the performance metrics for both models on the test set:
Model | Test Set Size | Accuracy | F1 Score |
---|---|---|---|
topic-xdistil-uncased | 32 799 | 94.44 % | 92.59 % |
sentiment-xdistil-uncased | 17 527 | 94.59 % | 93.44 % |
The training data consists of ~600k news headlines and tweets, and was annotated by Chat GPT 3.5 , which has shown to outperform crowd-workers for text annotation tasks .
The sentence labels are defined by the Chat GPT prompt as follows:
""" [...] - Economic headlines generally cover topics such as financial markets, \ business, financial assets, trade, employment, GDP, inflation, or fiscal \ and monetary policy. - Non-economic headlines might include sports, entertainment, politics, \ science, weather, health, or other unrelated news events. [...] """
Here's a simple example:
from transformers import AutoTokenizer, AutoModelForSequenceClassification model = AutoModelForSequenceClassification.from_pretrained("hakonmh/topic-xdistil-uncased") tokenizer = AutoTokenizer.from_pretrained("hakonmh/topic-xdistil-uncased") SENTENCE = "Global Growth Surges as New Technologies Drive Innovation and Productivity!" inputs = tokenizer(SENTENCE, return_tensors="pt") output = model(**inputs).logits predicted_label = model.config.id2label[output.argmax(-1).item()] print(predicted_label)
Economics
Or, as a pipeline together with Sentiment-xDistil :
from transformers import pipeline topic_classifier = pipeline("sentiment-analysis", model="hakonmh/topic-xdistil-uncased", tokenizer="hakonmh/topic-xdistil-uncased") sentiment_classifier = pipeline("sentiment-analysis", model="hakonmh/sentiment-xdistil-uncased", tokenizer="hakonmh/sentiment-xdistil-uncased") SENTENCE = "Global Growth Surges as New Technologies Drive Innovation and Productivity!" print(topic_classifier(SENTENCE)) print(sentiment_classifier(SENTENCE))
[{'label': 'Economics', 'score': 0.9970171451568604}] [{'label': 'Positive', 'score': 0.9997037053108215}]