数据集:
ag_news
任务:
文本分类子任务:
topic-classification语言:
en计算机处理:
monolingual大小:
100K<n<1M语言创建人:
found批注创建人:
found源数据集:
original许可:
license:unknownAG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search, etc), xml, data compression, data streaming, and any other non-commercial activity. For more information, please refer to the link http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html .
The AG's news topic classification dataset is constructed by Xiang Zhang ( xiang.zhang@nyu.edu ) from the dataset above. It is used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).
An example of 'train' looks as follows.
{ "label": 3, "text": "New iPad released Just like every other September, this one is no different. Apple is planning to release a bigger, heavier, fatter iPad that..." }
The data fields are the same among all splits.
defaultname | train | test |
---|---|---|
default | 120000 | 7600 |
@inproceedings{Zhang2015CharacterlevelCN, title={Character-level Convolutional Networks for Text Classification}, author={Xiang Zhang and Junbo Jake Zhao and Yann LeCun}, booktitle={NIPS}, year={2015} }
Thanks to @jxmorris12 , @thomwolf , @lhoestq , @lewtun for adding this dataset.