数据集:
ro_sts_parallel
任务:
翻译计算机处理:
multilingual大小:
10K<n<100K语言创建人:
crowdsourced批注创建人:
crowdsourced源数据集:
extended|other-sts-b许可:
cc-by-4.0We present RO-STS-Parallel - a Parallel Romanian-English dataset obtained by translating the STS English dataset dataset into Romanian. It contains 17256 sentences in Romanian and English.
[Needs More Information]
The text dataset is in Romanian and English ( ro , en )
An example looks like this:
{ 'translation': { 'ro': 'Problema e si mai simpla.', 'en': 'The problem is simpler than that.' } }
The train/validation/test split contain 11,498/3,000/2,758 sentence pairs.
*To construct the dataset, we first obtained automatic translations using Google's translation engine. These were then manually checked, corrected, and cross-validated by human volunteers. *
Who are the source language producers?[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
CC BY-SA 4.0 License
@inproceedings{dumitrescu2021liro, title={Liro: Benchmark and leaderboard for romanian language tasks}, author={Dumitrescu, Stefan Daniel and Rebeja, Petru and Lorincz, Beata and Gaman, Mihaela and Avram, Andrei and Ilie, Mihai and Pruteanu, Andrei and Stan, Adriana and Rosia, Lorena and Iacobescu, Cristina and others}, booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)}, year={2021} }
Thanks to @lorinczb for adding this dataset.