XNLI is a subset of a few thousand examples from MNLI which has been translated into a 14 different languages (some low-ish resource). As with MNLI, the goal is to predict textual entailment (does sentence A imply/contradict/neither sentence B) and is a classification task (given two sentences, predict one of three labels).
An example of 'train' looks as follows.
This example was too long and was cropped: { "hypothesis": "{\"language\": [\"ar\", \"bg\", \"de\", \"el\", \"en\", \"es\", \"fr\", \"hi\", \"ru\", \"sw\", \"th\", \"tr\", \"ur\", \"vi\", \"zh\"], \"translation\": [\"احد اع...", "label": 0, "premise": "{\"ar\": \"واحدة من رقابنا ستقوم بتنفيذ تعليماتك كلها بكل دقة\", \"bg\": \"един от нашите номера ще ви даде инструкции .\", \"de\": \"Eine ..." }ar
An example of 'validation' looks as follows.
{ "hypothesis": "اتصل بأمه حالما أوصلته حافلة المدرسية.", "label": 1, "premise": "وقال، ماما، لقد عدت للمنزل." }bg
An example of 'train' looks as follows.
This example was too long and was cropped: { "hypothesis": "\"губиш нещата на следното ниво , ако хората си припомнят .\"...", "label": 0, "premise": "\"по време на сезона и предполагам , че на твоето ниво ще ги загубиш на следващото ниво , ако те решат да си припомнят отбора на ..." }de
An example of 'train' looks as follows.
This example was too long and was cropped: { "hypothesis": "Man verliert die Dinge auf die folgende Ebene , wenn sich die Leute erinnern .", "label": 0, "premise": "\"Du weißt , während der Saison und ich schätze , auf deiner Ebene verlierst du sie auf die nächste Ebene , wenn sie sich entschl..." }el
An example of 'validation' looks as follows.
This example was too long and was cropped: { "hypothesis": "\"Τηλεφώνησε στη μαμά του μόλις το σχολικό λεωφορείο τον άφησε.\"...", "label": 1, "premise": "Και είπε, Μαμά, έφτασα στο σπίτι." }
The data fields are the same among all splits.
all_languagesname | train | validation | test |
---|---|---|---|
all_languages | 392702 | 2490 | 5010 |
ar | 392702 | 2490 | 5010 |
bg | 392702 | 2490 | 5010 |
de | 392702 | 2490 | 5010 |
el | 392702 | 2490 | 5010 |
@InProceedings{conneau2018xnli, author = {Conneau, Alexis and Rinott, Ruty and Lample, Guillaume and Williams, Adina and Bowman, Samuel R. and Schwenk, Holger and Stoyanov, Veselin}, title = {XNLI: Evaluating Cross-lingual Sentence Representations}, booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing}, year = {2018}, publisher = {Association for Computational Linguistics}, location = {Brussels, Belgium}, }
Thanks to @lewtun , @mariamabarham , @thomwolf , @lhoestq , @patrickvonplaten for adding this dataset.