登录注册

行业学习

支撑

数据算量系统

企业招聘智能体

下载

模型数据集

AI工具箱

企业服务

EVA 智能HR

ChatGPT 人工智能应用人工智能未来计算机视觉

热门新闻

公司板科大讯飞

科大讯飞包揽ICPR MTWI图文识别挑战赛三项冠军

行业人工智能

所以，能动手就别吵吵了

行业人工智能

人工智能对 IT 技能和人才发展的影响

常用工具

OpenAI旗下AI对话工具

字节跳动旗下团队推出的免费AI英语写作助手

AI图像和插画生成工具，测试测试测试测试测试测测试

Stable Diffusion

StabilityAI推出的文本到图像生成AI

GitHub AI编程工具

您尚未登录账户

请先登录您的atyun账户，方可使用该功能

仅限企业账户使用

该功能仅限企业账号使用，开通企业账号可享受更多服务，是否现在注册企业账号？

立即注册企业账号

暂不需要

您的企业账号申请正在审核中

审核通过后即可使用此功能，请耐心等待~

数据集:

americas_nli

任务:

子任务:

natural-language-inference

语言:

计算机处理:

multilingual translation

大小:

size_categories:unknown

语言创建人:

expert-generated

批注创建人:

expert-generated

源数据集:

预印本库:

arxiv:2104.08726

许可:

license:unknown

数据集介绍文件清单

Dataset Card for AmericasNLI

Dataset Summary

AmericasNLI is an extension of XNLI (Conneau et al., 2018) a natural language inference (NLI) dataset covering 15 high-resource languages to 10 low-resource indigenous languages spoken in the Americas: Ashaninka, Aymara, Bribri, Guarani, Nahuatl, Otomi, Quechua, Raramuri, Shipibo-Konibo, and Wixarika. As with MNLI, the goal is to predict textual entailment (does sentence A imply/contradict/neither sentence B) and is a classification task (given two sentences, predict one of three labels).

Supported Tasks and Leaderboards

[Needs More Information]

Languages

aym
bzd
cni
gn
hch
nah
oto
quy
shp
tar

Dataset Structure

Data Instances

all_languages

An example of the test split looks as follows:

{'language': 'aym', 'premise': "Ukhamaxa, janiw ukatuqits lup'kayätti, ukhamarus wali phiñasitayätwa, ukatx jupampiw mayamp aruskipañ qallanttha.", 'hypothesis': 'Janiw mayamp jupampix p
arlxapxti.', 'label': 2}

aym

An example of the test split looks as follows:

{'premise': "Ukhamaxa, janiw ukatuqits lup'kayätti, ukhamarus wali phiñasitayätwa, ukatx jupampiw mayamp aruskipañ qallanttha.", 'hypothesis': 'Janiw mayamp jupampix parlxapxti.', 'label
': 2}

bzd

An example of the test split looks as follows:

{'premise': "Bua', kèq ye' kũ e' bikeitsök erë ye' chkénãwã tã ye' ujtémĩne ie' tã páxlĩnẽ.", 'hypothesis': "Kèq ye' ùtẽnẽ ie' tã páxlĩ.", 'label': 2}

cni

An example of the test split looks as follows:

{'premise': 'Kameetsa, tee nokenkeshireajeroji, iro kantaincha tee nomateroji aisati nintajaro noñanatajiri iroakera.', 'hypothesis': 'Tee noñatajeriji.', 'label': 2}

gn

An example of the test split looks as follows:

{'premise': "Néi, ni napensaikurihína upéva rehe, ajepichaiterei ha añepyrûjey añe'ê hendive.", 'hypothesis': "Nañe'êvéi hendive.", 'label': 2}

hch

An example of the test split looks as follows:

{'premise': 'mu hekwa.', 'hypothesis': 'neuka tita xatawe m+k+ mat+a.', 'label': 2}

nah

An example of the test split looks as follows:

{'premise': 'Cualtitoc, na axnimoihliaya ino, nicualaniztoya queh naha nicamohuihqui', 'hypothesis': 'Ayoc nicamohuihtoc', 'label': 2}

oto

An example of the test split looks as follows:

{'premise': 'mi-ga, nin mibⴘy mbô̮nitho ane guenu, guedi mibⴘy nho ⴘnmⴘy xi di mⴘdi o ñana nen nⴘua manaigui', 'hypothesis': 'hin din bi pengui nen nⴘa', 'label': 2}

quy

An example of the test split looks as follows:

.', 'label': 2}

shp

An example of the test split looks as follows:

{'premise': 'Jakon riki, ja shinanamara ea ike, ikaxbi kikin frustradara ea ike jakopira ea jabe yoyo iribake.', 'hypothesis': 'Eara jabe yoyo iribiama iki.', 'label': 2}

tar

An example of the test split looks as follows:

{'premise': 'Ga’lá ju, ke tási newalayé nejé echi kítira, we ne majáli, a’lí ko uchécho ne yua ku ra’íchaki.', 'hypothesis': 'Tási ne uchecho yua ra’ícha échi rejói.', 'label': 2}

Data Fields

all_languages

- language: a multilingual string variable, with languages including ar, bg, de, el, en.
- premise: a multilingual string variable, with languages including ar, bg, de, el, en.
- hypothesis: a multilingual string variable, with possible languages including ar, bg, de, el, en.
- label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).

aym

- premise: a string feature.
- hypothesis: a string feature.
- label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).

bzd

- premise: a string feature.
- hypothesis: a string feature.
- label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).

cni

- premise: a string feature.
- hypothesis: a string feature.
- label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).

hch

- premise: a string feature.
- hypothesis: a string feature.
- label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).

nah

- premise: a string feature.
- hypothesis: a string feature.
- label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).

oto

- premise: a string feature.
- hypothesis: a string feature.
- label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).

quy

- premise: a string feature.
- hypothesis: a string feature.
- label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).

shp

- premise: a string feature.
- hypothesis: a string feature.
- label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).

tar

- premise: a string feature.
- hypothesis: a string feature.
- label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).

Data Splits

Language	ISO	Family	Dev	Test
all_languages	--	--	6457	7486
Aymara	aym	Aymaran	743	750
Ashaninka	cni	Arawak	658	750
Bribri	bzd	Chibchan	743	750
Guarani	gn	Tupi-Guarani	743	750
Nahuatl	nah	Uto-Aztecan	376	738
Otomi	oto	Oto-Manguean	222	748
Quechua	quy	Quechuan	743	750
Raramuri	tar	Uto-Aztecan	743	750
Shipibo-Konibo	shp	Panoan	743	750
Wixarika	hch	Uto-Aztecan	743	750

Dataset Creation

Curation Rationale

[Needs More Information]

Source Data

The authors translate from the Spanish subset of XNLI.

AmericasNLI is the translation of a subset of XNLI (Conneau et al., 2018). As translators between Spanish and the target languages are more frequently available than those for English, we translate from the Spanish version.

As per paragraph 3.1 of the original paper .

Initial Data Collection and Normalization

[Needs More Information]

Who are the source language producers?

[Needs More Information]

Annotations

Annotation process

The dataset comprises expert translations from Spanish XNLI.

Additionally, some translators reported that code-switching is often used to describe certain topics, and, while many words without an exact equivalence in the target language are worked in through translation or interpretation, others are kept in Spanish. To minimize the amount of Spanish vocabulary in the translated examples, we choose sentences from genres that we judged to be relatively easy to translate into the target languages: “face-to-face,” “letters,” and “telephone.”

As per paragraph 3.1 of the original paper .

Who are the annotators?

[Needs More Information]

Personal and Sensitive Information

[Needs More Information]

Considerations for Using the Data

Social Impact of Dataset

[Needs More Information]

Discussion of Biases

[Needs More Information]

Other Known Limitations

[Needs More Information]

Additional Information

Dataset Curators

[Needs More Information]

Licensing Information

[Needs More Information]

Citation Information

@article{DBLP:journals/corr/abs-2104-08726,
  author    = {Abteen Ebrahimi and
               Manuel Mager and
               Arturo Oncevay and
               Vishrav Chaudhary and
               Luis Chiruzzo and
               Angela Fan and
               John Ortega and
               Ricardo Ramos and
               Annette Rios and
               Ivan Vladimir and
               Gustavo A. Gim{\'{e}}nez{-}Lugo and
               Elisabeth Mager and
               Graham Neubig and
               Alexis Palmer and
               Rolando A. Coto Solano and
               Ngoc Thang Vu and
               Katharina Kann},
  title     = {AmericasNLI: Evaluating Zero-shot Natural Language Understanding of
               Pretrained Multilingual Models in Truly Low-resource Languages},
  journal   = {CoRR},
  volume    = {abs/2104.08726},
  year      = {2021},
  url       = {https://arxiv.org/abs/2104.08726},
  eprinttype = {arXiv},
  eprint    = {2104.08726},
  timestamp = {Mon, 26 Apr 2021 17:25:10 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2104-08726.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Contributions

Thanks to @fdschmidt93 for adding this dataset.

作者:

佚名

数据集大小:

55.34 KB

相关推荐