数据集:
amazon_polarity
任务:
文本分类语言:
en计算机处理:
monolingual大小:
1M<n<10M语言创建人:
crowdsourced批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1509.01626许可:
apache-2.0The Amazon reviews dataset consists of reviews from amazon. The data span a period of 18 years, including ~35 million reviews up to March 2013. Reviews include product and user information, ratings, and a plaintext review.
Mainly English.
A typical data point, comprises of a title, a content and the corresponding label.
An example from the AmazonPolarity test set looks as follows:
{ 'title':'Great CD', 'content':"My lovely Pat has one of the GREAT voices of her generation. I have listened to this CD for YEARS and I still LOVE IT. When I'm in a good mood it makes me feel better. A bad mood just evaporates like sugar in the rain. This CD just oozes LIFE. Vocals are jusat STUUNNING and lyrics just kill. One of life's hidden gems. This is a desert isle CD in my book. Why she never made it big is just beyond me. Everytime I play this, no matter black, white, young, old, male, female EVERYBODY says one thing ""Who was that singing ?""", 'label':1 }
The Amazon reviews polarity dataset is constructed by taking review score 1 and 2 as negative, and 4 and 5 as positive. Samples of score 3 is ignored. Each class has 1,800,000 training samples and 200,000 testing samples.
The Amazon reviews polarity dataset is constructed by Xiang Zhang ( xiang.zhang@nyu.edu ). It is used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).
[Needs More Information]
Who are the source language producers?[Needs More Information]
[Needs More Information]
Who are the annotators?[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
Apache License 2.0
McAuley, Julian, and Jure Leskovec. "Hidden factors and hidden topics: understanding rating dimensions with review text." In Proceedings of the 7th ACM conference on Recommender systems, pp. 165-172. 2013.
Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015)
Thanks to @hfawaz for adding this dataset.