数据集:
ruanchaves/reli-sa
ReLi is a dataset created by Cláudia Freitas within the framework of the project "Semantic Annotators based on Active Learning" at PUC-Rio. It consists of 1,600 book reviews manually annotated for the presence of opinions on the reviewed book and its polarity. The dataset contains reviews in Brazilian Portuguese on books written by seven authors: Stephenie Meyer, Thalita Rebouças, Sidney Sheldon, Jorge Amado, George Orwell, José Saramago, and J.D. Salinger. The language used in the reviews varies from highly informal, with slang, abbreviations, neologisms, and emoticons, to more formal reviews with a more elaborate vocabulary.
ReLi-SA is an adaptation of the original ReLi dataset for the sentiment analysis task. We attribute a sentiment polarity to each sentence according to the sentiment annotations of its individual tokens.
This dataset is in Brazilian Portuguese.
{ 'source': 'ReLi-Orwell.txt', 'title': 'False', 'book': '1984', 'review_id': '0', 'score': 5.0, 'sentence_id': 102583, 'unique_review_id': 'ReLi-Orwell_1984_0', 'sentence': ' Um ótimo livro , além de ser um ótimo alerta para uma potencial distopia , em contraponto a utopia tão sonhada por os homens de o medievo e início de a modernidade .', 'label': 'positive' }
The dataset is divided into three splits:
train | validation | test | |
---|---|---|---|
Instances | 7,875 | 1,348 | 3,288 |
The splits are carefully made to avoid having reviews about a given author appear in more than one split.
If you use this dataset in your work, please cite the following publication:
@incollection{freitas2014sparkling, title={Sparkling Vampire... lol! Annotating Opinions in a Book Review Corpus}, author={Freitas, Cl{\'a}udia and Motta, Eduardo and Milidi{\'u}, Ruy Luiz and C{\'e}sar, Juliana}, booktitle={New Language Technologies and Linguistic Research: A Two-Way Road}, editor={Alu{\'\i}sio, Sandra and Tagnin, Stella E. O.}, year={2014}, publisher={Cambridge Scholars Publishing}, pages={128--146} }
Thanks to @ruanchaves for adding this dataset.