数据集:
flores
任务:
翻译计算机处理:
translation大小:
1K<n<10K语言创建人:
found批注创建人:
found预印本库:
arxiv:1902.01382许可:
cc-by-4.0Evaluation datasets for low-resource machine translation: Nepali-English and Sinhala-English.
An example of 'validation' looks as follows.
This example was too long and was cropped: { "translation": "{\"en\": \"This is the wrong translation!\", \"ne\": \"यस वाहेक आगम पूजा, तारा पूजा, व्रत आदि पनि घरभित्र र वाहिर दुवै स्थानमा गरेको पा..." }sien
An example of 'validation' looks as follows.
This example was too long and was cropped: { "translation": "{\"en\": \"This is the wrong translation!\", \"si\": \"එවැනි ආවරණයක් ලබාදීමට රක්ෂණ සපයන්නෙකු කැමති වුවත් ඒ සාමාන් යයෙන් බොහෝ රටවල පොදු ..." }
The data fields are the same among all splits.
neenname | validation | test |
---|---|---|
neen | 2560 | 2836 |
sien | 2899 | 2767 |
@misc{guzmn2019new, title={Two New Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English}, author={Francisco Guzman and Peng-Jen Chen and Myle Ott and Juan Pino and Guillaume Lample and Philipp Koehn and Vishrav Chaudhary and Marc'Aurelio Ranzato}, year={2019}, eprint={1902.01382}, archivePrefix={arXiv}, primaryClass={cs.CL} }
Thanks to @thomwolf , @patrickvonplaten , @lewtun for adding this dataset.