数据集:
ro_sent
任务:
文本分类语言:
ro计算机处理:
monolingual大小:
10K<n<100K语言创建人:
found批注创建人:
found源数据集:
original预印本库:
arxiv:2009.08712许可:
license:unknownThis dataset is a Romanian Sentiment Analysis dataset. It is present in a processed form, as used by the authors of Romanian Transformers in their examples and based on the original data present in at this GitHub repository . The original data contains product and movie reviews in Romanian.
[More Information Needed]
This dataset is present in Romanian language.
An instance from the train split:
{'id': '0', 'label': 1, 'original_id': '0', 'sentence': 'acest document mi-a deschis cu adevarat ochii la ceea ce oamenii din afara statelor unite s-au gandit la atacurile din 11 septembrie. acest film a fost construit in mod expert si prezinta acest dezastru ca fiind mai mult decat un atac asupra pamantului american. urmarile acestui dezastru sunt previzionate din multe tari si perspective diferite. cred ca acest film ar trebui sa fie mai bine distribuit pentru acest punct. de asemenea, el ajuta in procesul de vindecare sa vada in cele din urma altceva decat stirile despre atacurile teroriste. si unele dintre piese sunt de fapt amuzante, dar nu abuziv asa. acest film a fost extrem de recomandat pentru mine, si am trecut pe acelasi sentiment.'}
This dataset has two splits: train with 17941 examples, and test with 11005 examples.
[More Information Needed]
The source dataset is present at the this GitHub repository and is based on product and movie reviews. The original source is unknown.
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Stefan Daniel Dumitrescu, Andrei-Marious Avram, Sampo Pyysalo, @katakonst
[More Information Needed]
@article{dumitrescu2020birth, title={The birth of Romanian BERT}, author={Dumitrescu, Stefan Daniel and Avram, Andrei-Marius and Pyysalo, Sampo}, journal={arXiv preprint arXiv:2009.08712}, year={2020} }
Thanks to @gchhablani and @iliemihai for adding this dataset.