数据集:
multi_booked
任务:
文本分类计算机处理:
monolingual大小:
n<1K语言创建人:
found批注创建人:
expert-generated源数据集:
original预印本库:
arxiv:1803.08614许可:
cc-by-3.0MultiBooked is a corpus of Basque and Catalan Hotel Reviews Annotated for Aspect-level Sentiment Classification.
The corpora are compiled from hotel reviews taken mainly from booking.com. The corpora are in Kaf/Naf format, which is an xml-style stand-off format that allows for multiple layers of annotation. Each review was sentence- and word-tokenized and lemmatized using Freeling for Catalan and ixa-pipes for Basque. Finally, for each language two annotators annotated opinion holders, opinion targets, and opinion expressions for each review, following the guidelines set out in the OpeNER project.
[More Information Needed]
Each sub-dataset is monolingual in the languages:
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Dataset is under the CC-BY 3.0 license.
@inproceedings{Barnes2018multibooked, author={Barnes, Jeremy and Lambert, Patrik and Badia, Toni}, title={MultiBooked: A corpus of Basque and Catalan Hotel Reviews Annotated for Aspect-level Sentiment Classification}, booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC'18)}, year = {2018}, month = {May}, date = {7-12}, address = {Miyazaki, Japan}, publisher = {European Language Resources Association (ELRA)}, language = {english} }
Thanks to @albertvillanova for adding this dataset.