数据集:

NbAiLab/norec_agg

任务:

文本分类

子任务:

sentiment-classification

语言:

计算机处理:

monolingual

大小:

1K<n<10K

语言创建人:

found

批注创建人:

expert-generated

源数据集:

original

预印本库:

arxiv:2011.02686

许可:

cc-by-4.0

数据集介绍文件清单

中文

Dataset Card Creation Guide

Dataset Summary

Aggregated NoRec_fine: A Fine-grained Sentiment Dataset for Norwegian. This dataset was created by the Nordic Language Processing Laboratory by aggregating the fine-grained annotations in NoReC_fine and removing sentences with conflicting or no sentiment.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

The text in the dataset is in Norwegian.

Dataset Structure

Data Instances

Example of one instance in the dataset.

{'label': 0, 'text': 'Verre er det med slagsmålene .'}

Data Fields

id : index of the example
text : Text of a sentence
label : The sentiment label. Here
- 0 = negative
- 1 = positive

Data Splits

The dataset is split into a train , validation , and test split with the following sizes:

Tain	Valid	Test
Number of examples	2675	516	417

Dataset Creation

This dataset is based largely on the original data described in the paper A Fine-Grained Sentiment Dataset for Norwegian by L. Øvrelid, P. Mæhlum, J. Barnes, and E. Velldal, accepted at LREC 2020, paper available . However, we have since added annotations for another 3476 sentences, increasing the overall size and scope of the dataset.

Additional Information

Licensing Information

This work is licensed under a Creative Commons Attribution 4.0 International License

Citation Information

@misc{sheng2020investigating,
      title={Investigating Societal Biases in a Poetry Composition System},
      author={Emily Sheng and David Uthus},
      year={2020},
      eprint={2011.02686},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

作者:

NbAiLab

数据集大小:

10.56 KB