数据集:
cjvt/sentinews
SentiNews is a Slovenian sentiment classification dataset, consisting of news articles manually annotated with their sentiment by between two and six annotators. It is annotated at three granularities:
Sentiment classification, three classes (negative, neutral, positive).
Slovenian.
A sample instance from the sentence-level config:
{ 'nid': 2, 'content': 'Vilo Prešeren je na dražbi ministrstva za obrambo kupilo nepremičninsko podjetje Condor Real s sedežem v Lescah.', 'sentiment': 'neutral', 'pid': 1, 'sid': 1 }
The data fields are similar among all three configs, with the only difference being the IDs.
Jože Bučar, Martin Žnidaršič, Janez Povh.
CC BY-SA 4.0
@article{buvcar2018annotated, title={Annotated news corpora and a lexicon for sentiment analysis in Slovene}, author={Bu{\v{c}}ar, Jo{\v{z}}e and {\v{Z}}nidar{\v{s}}i{\v{c}}, Martin and Povh, Janez}, journal={Language Resources and Evaluation}, volume={52}, number={3}, pages={895--919}, year={2018}, publisher={Springer} }
Thanks to @matejklemen for adding this dataset.