数据集:
multi_booked
任务:
计算机处理:
monolingual大小:
n<1K语言创建人:
found批注创建人:
expert-generated源数据集:
original预印本库:
arxiv:1803.08614许可:
MultiBooked is a corpus of Basque and Catalan Hotel Reviews Annotated for Aspect-level Sentiment Classification.
The corpora are compiled from hotel reviews taken mainly from booking.com. The corpora are in Kaf/Naf format, which is an xml-style stand-off format that allows for multiple layers of annotation. Each review was sentence- and word-tokenized and lemmatized using Freeling for Catalan and ixa-pipes for Basque. Finally, for each language two annotators annotated opinion holders, opinion targets, and opinion expressions for each review, following the guidelines set out in the OpeNER project.
[More Information Needed]
Each sub-dataset is monolingual in the languages:
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Dataset is under the CC-BY 3.0 license.
@inproceedings{Barnes2018multibooked,
author={Barnes, Jeremy and Lambert, Patrik and Badia, Toni},
title={MultiBooked: A corpus of Basque and Catalan Hotel Reviews Annotated for Aspect-level Sentiment Classification},
booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC'18)},
year = {2018},
month = {May},
date = {7-12},
address = {Miyazaki, Japan},
publisher = {European Language Resources Association (ELRA)},
language = {english}
}
Thanks to @albertvillanova for adding this dataset.