数据集:
cjvt/sentinews
SentiNews is a Slovenian sentiment classification dataset, consisting of news articles manually annotated with their sentiment by between two and six annotators. It is annotated at three granularities:
Sentiment classification, three classes (negative, neutral, positive).
Slovenian.
A sample instance from the sentence-level config:
{
'nid': 2,
'content': 'Vilo Prešeren je na dražbi ministrstva za obrambo kupilo nepremičninsko podjetje Condor Real s sedežem v Lescah.',
'sentiment': 'neutral',
'pid': 1,
'sid': 1
}
The data fields are similar among all three configs, with the only difference being the IDs.
Jože Bučar, Martin Žnidaršič, Janez Povh.
CC BY-SA 4.0
@article{buvcar2018annotated,
title={Annotated news corpora and a lexicon for sentiment analysis in Slovene},
author={Bu{\v{c}}ar, Jo{\v{z}}e and {\v{Z}}nidar{\v{s}}i{\v{c}}, Martin and Povh, Janez},
journal={Language Resources and Evaluation},
volume={52},
number={3},
pages={895--919},
year={2018},
publisher={Springer}
}
Thanks to @matejklemen for adding this dataset.