数据集:

xnli

语言:

数据集介绍文件清单

英文

“xnli”数据集的数据卡片

数据集概述

XNLI是MNLI的一个子集，由几千个例子组成，已翻译成14种不同的语言（其中一些资源较低）。与MNLI一样，目标是预测文本蕴涵（句子A是否暗示/与句子B矛盾/两者都不是），这是一个分类任务（给定两个句子，预测三个标签之一）。

支持的任务和排行榜

More Information Needed

语言

More Information Needed

数据集结构

数据实例

all_languages

下载的数据集文件大小： 483.96 MB
生成的数据集大小： 1.61 GB
总使用磁盘空间： 2.09 GB

'train'的示例如下所示。

This example was too long and was cropped:

{
    "hypothesis": "{\"language\": [\"ar\", \"bg\", \"de\", \"el\", \"en\", \"es\", \"fr\", \"hi\", \"ru\", \"sw\", \"th\", \"tr\", \"ur\", \"vi\", \"zh\"], \"translation\": [\"احد اع...",
    "label": 0,
    "premise": "{\"ar\": \"واحدة من رقابنا ستقوم بتنفيذ تعليماتك كلها بكل دقة\", \"bg\": \"един от нашите номера ще ви даде инструкции .\", \"de\": \"Eine ..."
}

下载的数据集文件大小： 483.96 MB
生成的数据集大小： 109.32 MB
总使用磁盘空间： 593.29 MB

'validation'的示例如下所示。

{
    "hypothesis": "اتصل بأمه حالما أوصلته حافلة المدرسية.",
    "label": 1,
    "premise": "وقال، ماما، لقد عدت للمنزل."
}

下载的数据集文件大小： 483.96 MB
生成的数据集大小： 128.32 MB
总使用磁盘空间： 612.28 MB

'train'的示例如下所示。

This example was too long and was cropped:

{
    "hypothesis": "\"губиш нещата на следното ниво , ако хората си припомнят .\"...",
    "label": 0,
    "premise": "\"по време на сезона и предполагам , че на твоето ниво ще ги загубиш на следващото ниво , ако те решат да си припомнят отбора на ..."
}

下载的数据集文件大小： 483.96 MB
生成的数据集大小： 86.17 MB
总使用磁盘空间： 570.14 MB

'train'的示例如下所示。

This example was too long and was cropped:

{
    "hypothesis": "Man verliert die Dinge auf die folgende Ebene , wenn sich die Leute erinnern .",
    "label": 0,
    "premise": "\"Du weißt , während der Saison und ich schätze , auf deiner Ebene verlierst du sie auf die nächste Ebene , wenn sie sich entschl..."
}

下载的数据集文件大小： 483.96 MB
生成的数据集大小： 142.30 MB
总使用磁盘空间： 626.26 MB

'validation'的示例如下所示。

This example was too long and was cropped:

{
    "hypothesis": "\"Τηλεφώνησε στη μαμά του μόλις το σχολικό λεωφορείο τον άφησε.\"...",
    "label": 1,
    "premise": "Και είπε, Μαμά, έφτασα στο σπίτι."
}

数据字段

所有拆分之间的数据字段相同。

all_languages

premise : 一个多语言字符串变量，可能的语言包括 ar , bg , de , el , en .
hypothesis : 一个多语言字符串变量，可能的语言包括 ar , bg , de , el , en .
label : 一个分类标签，可能的值包括 entailment (0), neutral (1), contradiction (2).

premise : 一个字符串特征。
hypothesis : 一个字符串特征。
label : 一个分类标签，可能的值包括 entailment (0), neutral (1), contradiction (2).

premise : 一个字符串特征。
hypothesis : 一个字符串特征。
label : 一个分类标签，可能的值包括 entailment (0), neutral (1), contradiction (2).

premise : 一个字符串特征。
hypothesis : 一个字符串特征。
label : 一个分类标签，可能的值包括 entailment (0), neutral (1), contradiction (2).

premise : 一个字符串特征。
hypothesis : 一个字符串特征。
label : 一个分类标签，可能的值包括 entailment (0), neutral (1), contradiction (2).

数据拆分

name	train	validation	test
all_languages	392702	2490	5010
ar	392702	2490	5010
bg	392702	2490	5010
de	392702	2490	5010
el	392702	2490	5010

数据集创建

策划理由

More Information Needed

源数据

初始数据收集和归一化

More Information Needed

谁是源语言的生产者？

More Information Needed

注释

注释过程

More Information Needed

谁是注释者？

More Information Needed

个人和敏感信息

More Information Needed

使用数据的注意事项

其他信息

数据集策划者

More Information Needed

许可信息

More Information Needed

引用信息

@InProceedings{conneau2018xnli,
  author = {Conneau, Alexis
                 and Rinott, Ruty
                 and Lample, Guillaume
                 and Williams, Adina
                 and Bowman, Samuel R.
                 and Schwenk, Holger
                 and Stoyanov, Veselin},
  title = {XNLI: Evaluating Cross-lingual Sentence Representations},
  booktitle = {Proceedings of the 2018 Conference on Empirical Methods
               in Natural Language Processing},
  year = {2018},
  publisher = {Association for Computational Linguistics},
  location = {Brussels, Belgium},
}

贡献者

感谢 @lewtun ， @mariamabarham ， @thomwolf ， @lhoestq ， @patrickvonplaten 添加这个数据集。

作者:

佚名

数据集大小:

63.36 KB

“xnli”数据集的数据卡片

数据集概述

支持的任务和排行榜

语言

数据集结构

数据实例

数据字段

数据拆分

数据集创建

策划理由

源数据

注释

个人和敏感信息

使用数据的注意事项

数据的社会影响

偏见讨论

其他已知限制

其他信息

数据集策划者

许可信息

引用信息

贡献者