数据集:
xed_en_fi
任务:
文本分类计算机处理:
multilingual语言创建人:
found批注创建人:
expert-generated预印本库:
arxiv:2011.01612许可:
cc-by-4.0This is the XED dataset. The dataset consists of emotion annotated movie subtitles from OPUS. We use Plutchik's 8 core emotions to annotate. The data is multilabel. The original annotations have been sourced for mainly English and Finnish. For the English data we used Stanford NER (named entity recognition) (Finkel et al., 2005) to replace names and locations with the tags: [PERSON] and [LOCATION] respectively. For the Finnish data, we replaced names and locations using the Turku NER corpus (Luoma et al., 2020).
Sentiment Classification, Multilabel Classification, Multilabel Classification, Intent Classification
English, Finnish
{ "sentence": "A confession that you hired [PERSON] ... and are responsible for my father's murder." "labels": [1, 6] # anger, sadness }
Where the number indicates the emotion in ascending alphabetical order: anger:1, anticipation:2, disgust:3, fear:4, joy:5, sadness:6, surprise:7, trust:8, with neutral:0 where applicable.
For English: Number of unique data points: 17528 ('en_annotated' config) + 9675 ('en_neutral' config) Number of emotions: 8 (+neutral)
For Finnish: Number of unique data points: 14449 ('fi_annotated' config) + 10794 ('fi_neutral' config) Number of emotions: 8 (+neutral)
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
License: Creative Commons Attribution 4.0 International License (CC-BY)
@inproceedings{ohman2020xed, title={XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection}, author={{"O}hman, Emily and P{`a}mies, Marc and Kajava, Kaisla and Tiedemann, J{"o}rg}, booktitle={The 28th International Conference on Computational Linguistics (COLING 2020)}, year={2020} }
Thanks to @lhoestq , @harshalmittal4 for adding this dataset.