数据集:
flores
任务:
计算机处理:
translation大小:
1K<n<10K语言创建人:
found批注创建人:
found预印本库:
arxiv:1902.01382许可:
Evaluation datasets for low-resource machine translation: Nepali-English and Sinhala-English.
An example of 'validation' looks as follows.
This example was too long and was cropped:
{
"translation": "{\"en\": \"This is the wrong translation!\", \"ne\": \"यस वाहेक आगम पूजा, तारा पूजा, व्रत आदि पनि घरभित्र र वाहिर दुवै स्थानमा गरेको पा..."
}
sien
An example of 'validation' looks as follows.
This example was too long and was cropped:
{
"translation": "{\"en\": \"This is the wrong translation!\", \"si\": \"එවැනි ආවරණයක් ලබාදීමට රක්ෂණ සපයන්නෙකු කැමති වුවත් ඒ සාමාන් යයෙන් බොහෝ රටවල පොදු ..."
}
The data fields are the same among all splits.
neen| name | validation | test |
|---|---|---|
| neen | 2560 | 2836 |
| sien | 2899 | 2767 |
@misc{guzmn2019new,
title={Two New Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English},
author={Francisco Guzman and Peng-Jen Chen and Myle Ott and Juan Pino and Guillaume Lample and Philipp Koehn and Vishrav Chaudhary and Marc'Aurelio Ranzato},
year={2019},
eprint={1902.01382},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Thanks to @thomwolf , @patrickvonplaten , @lewtun for adding this dataset.