数据集:
iwslt2017
任务:
计算机处理:
translation大小:
1M<n<10M语言创建人:
expert-generated批注创建人:
crowdsourced源数据集:
original许可:
The IWSLT 2017 Multilingual Task addresses text translation, including zero-shot translation, with a single MT system across all directions including English, German, Dutch, Italian and Romanian. As unofficial task, conventional bilingual text translation is offered between English and Arabic, French, Japanese, Chinese, German and Korean.
An example of 'train' looks as follows.
This example was too long and was cropped:
{
"translation": "{\"ar\": \"لقد طرت في \\\"القوات الجوية \\\" لمدة ثمان سنوات. والآن أجد نفسي مضطرا لخلع حذائي قبل صعود الطائرة!\", \"en\": \"I flew on Air ..."
}
iwslt2017-de-en
An example of 'train' looks as follows.
{
"translation": {
"de": "Es ist mir wirklich eine Ehre, zweimal auf dieser Bühne stehen zu dürfen. Tausend Dank dafür.",
"en": "And it's truly a great honor to have the opportunity to come to this stage twice; I'm extremely grateful."
}
}
iwslt2017-en-ar
An example of 'train' looks as follows.
This example was too long and was cropped:
{
"translation": "{\"ar\": \"لقد طرت في \\\"القوات الجوية \\\" لمدة ثمان سنوات. والآن أجد نفسي مضطرا لخلع حذائي قبل صعود الطائرة!\", \"en\": \"I flew on Air ..."
}
iwslt2017-en-de
An example of 'validation' looks as follows.
{
"translation": {
"de": "Die nächste Folie, die ich Ihnen zeige, ist eine Zeitrafferaufnahme was in den letzten 25 Jahren passiert ist.",
"en": "The next slide I show you will be a rapid fast-forward of what's happened over the last 25 years."
}
}
iwslt2017-en-fr
An example of 'validation' looks as follows.
{
"translation": {
"en": "But this understates the seriousness of this particular problem because it doesn't show the thickness of the ice.",
"fr": "Mais ceci tend à amoindrir le problème parce qu'on ne voit pas l'épaisseur de la glace."
}
}
The data fields are the same among all splits.
iwslt2017-ar-en| name | train | validation | test |
|---|---|---|---|
| iwslt2017-ar-en | 231713 | 888 | 8583 |
| iwslt2017-de-en | 206112 | 888 | 8079 |
| iwslt2017-en-ar | 231713 | 888 | 8583 |
| iwslt2017-en-de | 206112 | 888 | 8079 |
| iwslt2017-en-fr | 232825 | 890 | 8597 |
Creative Commons BY-NC-ND
See the (TED Talks Usage Policy)[ https://www.ted.com/about/our-organization/our-policies-terms/ted-talks-usage-policy] .
@inproceedings{cettolo-etal-2017-overview,
title = "Overview of the {IWSLT} 2017 Evaluation Campaign",
author = {Cettolo, Mauro and
Federico, Marcello and
Bentivogli, Luisa and
Niehues, Jan and
St{\"u}ker, Sebastian and
Sudoh, Katsuhito and
Yoshino, Koichiro and
Federmann, Christian},
booktitle = "Proceedings of the 14th International Conference on Spoken Language Translation",
month = dec # " 14-15",
year = "2017",
address = "Tokyo, Japan",
publisher = "International Workshop on Spoken Language Translation",
url = "https://aclanthology.org/2017.iwslt-1.1",
pages = "2--14",
}