hakonmh/topic-xdistil-uncased | ATYUN.COM 官网-人工智能教程资讯全方位服务平台

模型:

hakonmh/topic-xdistil-uncased

任务:

文本分类

类库:

PyTorch Transformers Safetensors

语言:

其他:

bert finance topic-classification

预印本库:

arxiv:2303.15056

许可:

mit

模型介绍文件清单

中文

Topic-xDistil is a model based on xtremedistil-l12-h384-uncased fine-tuned for classifying the topic of news headlines on a dataset annotated by Chat GPT 3.5 . It is built, together with Sentiment-xDistil , as a tool for filtering out financial news headlines and classifying their sentiment. The code used to train both models and build the dataset are found here .

Notes : The output labels are either Economics or Other . The model is suitable for English.

Performance Results

Here are the performance metrics for both models on the test set:

Model	Test Set Size	Accuracy	F1 Score
topic-xdistil-uncased	32 799	94.44 %	92.59 %
sentiment-xdistil-uncased	17 527	94.59 %	93.44 %

Data

The training data consists of ~600k news headlines and tweets, and was annotated by Chat GPT 3.5 , which has shown to outperform crowd-workers for text annotation tasks .

The sentence labels are defined by the Chat GPT prompt as follows:

"""
[...]
    - Economic headlines generally cover topics such as financial markets, \
 business, financial assets, trade, employment, GDP, inflation, or fiscal \
and monetary policy.
    - Non-economic headlines might include sports, entertainment, politics, \
science, weather, health, or other unrelated news events.
[...]
"""

Example Usage

Here's a simple example:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("hakonmh/topic-xdistil-uncased")
tokenizer = AutoTokenizer.from_pretrained("hakonmh/topic-xdistil-uncased")

SENTENCE = "Global Growth Surges as New Technologies Drive Innovation and Productivity!"
inputs = tokenizer(SENTENCE, return_tensors="pt")
output = model(**inputs).logits
predicted_label = model.config.id2label[output.argmax(-1).item()]

print(predicted_label)

Economics

Or, as a pipeline together with Sentiment-xDistil :

from transformers import pipeline

topic_classifier = pipeline("sentiment-analysis",
                            model="hakonmh/topic-xdistil-uncased",
                            tokenizer="hakonmh/topic-xdistil-uncased")
sentiment_classifier = pipeline("sentiment-analysis",
                                model="hakonmh/sentiment-xdistil-uncased",
                                tokenizer="hakonmh/sentiment-xdistil-uncased")

SENTENCE = "Global Growth Surges as New Technologies Drive Innovation and Productivity!"
print(topic_classifier(SENTENCE))
print(sentiment_classifier(SENTENCE))

[{'label': 'Economics', 'score': 0.9970171451568604}]
[{'label': 'Positive', 'score': 0.9997037053108215}]

作者:

Håkon Magne Holmen

数据集大小:

255.52 MB