数据集:

uit-nlp/vietnamese_students_feedback

任务:

文本分类

子任务:

sentiment-classification topic-classification

语言:

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

found

批注创建人:

no-annotation

源数据集:

original

许可:

license:unknown

数据集介绍文件清单

中文

Dataset Card for Vietnamese Students’ Feedback Corpus

Dataset Summary

Students’ feedback is a vital resource for the interdisciplinary research involving the combining of two different research fields between sentiment analysis and education.

Vietnamese Students’ Feedback Corpus (UIT-VSFC) is the resource consists of over 16,000 sentences which are human-annotated with two different tasks: sentiment-based and topic-based classifications.

To assess the quality of our corpus, we measure the annotator agreements and classification evaluation on the UIT-VSFC corpus. As a result, we obtained the inter-annotator agreement of sentiments and topics with more than over 91% and 71% respectively. In addition, we built the baseline model with the Maximum Entropy classifier and achieved approximately 88% of the sentiment F1-score and over 84% of the topic F1-score.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

The language of the dataset text sentence is Vietnamese ( vi ).

Dataset Structure

Data Instances

An instance example:

{
  'sentence': 'slide giáo trình đầy đủ .', 
  'sentiment': 2, 
  'topic': 1
}

Data Fields

sentence (str): Text sentence.
sentiment : Sentiment class, with values 0 (negative), 1 (neutral) and 2 (positive).
topic : Topic class, with values 0 (lecturer), 1 (training_program), 2 (facility) and 3 (others).

Data Splits

The dataset is split in train, validation and test.

Tain	Validation	Test
Number of examples	11426	1583	3166

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

Unknown.

Citation Information

@InProceedings{8573337,
  author={Nguyen, Kiet Van and Nguyen, Vu Duc and Nguyen, Phu X. V. and Truong, Tham T. H. and Nguyen, Ngan Luu-Thuy},
  booktitle={2018 10th International Conference on Knowledge and Systems Engineering (KSE)},
  title={UIT-VSFC: Vietnamese Students’ Feedback Corpus for Sentiment Analysis},
  year={2018},
  volume={},
  number={},
  pages={19-24},
  doi={10.1109/KSE.2018.8573337}
}

Contributions

Thanks to @albertvillanova for adding this dataset.

作者:

uit-nlp

数据集大小:

10.68 KB