数据集:
uit-nlp/vietnamese_students_feedback
任务:
语言:
计算机处理:
monolingual大小:
10K<n<100K语言创建人:
found批注创建人:
no-annotation源数据集:
original许可:
Students’ feedback is a vital resource for the interdisciplinary research involving the combining of two different research fields between sentiment analysis and education.
Vietnamese Students’ Feedback Corpus (UIT-VSFC) is the resource consists of over 16,000 sentences which are human-annotated with two different tasks: sentiment-based and topic-based classifications.
To assess the quality of our corpus, we measure the annotator agreements and classification evaluation on the UIT-VSFC corpus. As a result, we obtained the inter-annotator agreement of sentiments and topics with more than over 91% and 71% respectively. In addition, we built the baseline model with the Maximum Entropy classifier and achieved approximately 88% of the sentiment F1-score and over 84% of the topic F1-score.
[More Information Needed]
The language of the dataset text sentence is Vietnamese ( vi ).
An instance example:
{
'sentence': 'slide giáo trình đầy đủ .',
'sentiment': 2,
'topic': 1
}
The dataset is split in train, validation and test.
| Tain | Validation | Test | |
|---|---|---|---|
| Number of examples | 11426 | 1583 | 3166 |
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Unknown.
@InProceedings{8573337,
author={Nguyen, Kiet Van and Nguyen, Vu Duc and Nguyen, Phu X. V. and Truong, Tham T. H. and Nguyen, Ngan Luu-Thuy},
booktitle={2018 10th International Conference on Knowledge and Systems Engineering (KSE)},
title={UIT-VSFC: Vietnamese Students’ Feedback Corpus for Sentiment Analysis},
year={2018},
volume={},
number={},
pages={19-24},
doi={10.1109/KSE.2018.8573337}
}
Thanks to @albertvillanova for adding this dataset.