数据集:

tyqiangz/multilingual-sentiments

中文

Multilingual Sentiments Dataset

A collection of multilingual sentiments datasets grouped into 3 classes -- positive, neutral, negative.

Most multilingual sentiment datasets are either 2-class positive or negative, 5-class ratings of products reviews (e.g. Amazon multilingual dataset) or multiple classes of emotions. However, to an average person, sometimes positive, negative and neutral classes suffice and are more straightforward to perceive and annotate. Also, a positive/negative classification is too naive, most of the text in the world is actually neutral in sentiment. Furthermore, most multilingual sentiment datasets don't include Asian languages (e.g. Malay, Indonesian) and are dominated by Western languages (e.g. English, German).

Git repo: https://github.com/tyqiangz/multilingual-sentiment-datasets