数据集:
zeroshot/twitter-financial-news-topic
Read this BLOG to see how I fine-tuned a sparse transformer on this dataset.
The Twitter Financial News dataset is an English-language dataset containing an annotated corpus of finance-related tweets. This dataset is used to classify finance-related tweets for their topic.
topics = { "LABEL_0": "Analyst Update", "LABEL_1": "Fed | Central Banks", "LABEL_2": "Company | Product News", "LABEL_3": "Treasuries | Corporate Debt", "LABEL_4": "Dividend", "LABEL_5": "Earnings", "LABEL_6": "Energy | Oil", "LABEL_7": "Financials", "LABEL_8": "Currencies", "LABEL_9": "General News | Opinion", "LABEL_10": "Gold | Metals | Materials", "LABEL_11": "IPO", "LABEL_12": "Legal | Regulation", "LABEL_13": "M&A | Investments", "LABEL_14": "Macro", "LABEL_15": "Markets", "LABEL_16": "Politics", "LABEL_17": "Personnel Change", "LABEL_18": "Stock Commentary", "LABEL_19": "Stock Movement", }
The data was collected using the Twitter API. The current dataset supports the multi-class classification task.
There are 2 splits: train and validation. Below are the statistics:
Dataset Split | Number of Instances in Split |
---|---|
Train | 16,990 |
Validation | 4,118 |
The Twitter Financial Dataset (topic) version 1.0.0 is released under the MIT License.