数据集:

PlanTL-GOB-ES/wnli-es

许可:

cc-by-4.0

源数据集:

extended|glue

批注创建人:

expert-generated

语言创建人:

found

大小:

size_categories:unknown

计算机处理:

monolingual

语言:

子任务:

natural-language-inference

任务:

文本分类

数据集介绍文件清单

中文

WNLI-es

Dataset Summary

"A Winograd schema is a pair of sentences that differ in only one or two words and that contain an ambiguity that is resolved in opposite ways in the two sentences and requires the use of world knowledge and reasoning for its resolution. The schema takes its name from Terry Winograd." Source: The Winograd Schema Challenge .

The Winograd NLI dataset presents 855 sentence pairs, in which the first sentence contains an ambiguity and the second one a possible interpretation of it. The label indicates if the interpretation is correct (1) or not (0).

This dataset is a professional translation into Spanish of Winograd NLI dataset as published in GLUE Benchmark .

Both the original dataset and this translation are licenced under a Creative Commons Attribution 4.0 International License .

Supported Tasks and Leaderboards

Textual entailment, Text classification, Language Model.

Languages

Spanish (es)

Dataset Structure

Data Instances

Three tsv files.

Data Fields

index
sentence 1: first sentence of the pair
sentence 2: second sentence of the pair
label: relation between the two sentences:
- 0: the second sentence does not entail a correct interpretation of the first one (neutral)
- 1: the second sentence entails a correct interpretation of the first one (entailment)

Data Splits

wnli-train-es.csv: 636 sentence pairs
wnli-dev-es.csv: 72 sentence pairs
wnli-test-shuffled-es.csv: 147 sentence pairs

Dataset Creation

Curation Rationale

We translated this dataset to contribute to the development of language models in Spanish.

Source Data

GLUE Benchmark site

Initial Data Collection and Normalization

This is a professional translation of WNLI dataset into Spanish, commissioned by BSC TeMU within the the framework of the Plan-TL .

For more information on how the Winograd NLI dataset was created, visit the webpage The Winograd Schema Challenge .

Who are the source language producers?

For more information on how the Winograd NLI dataset was created, visit the webpage The Winograd Schema Challenge .

Annotations

Annotation process

We comissioned a professional translation of WNLI dataset into Spanish.

Who are the annotators?

Translation was commisioned to a professional translation agency.

Personal and Sensitive Information

No personal or sensitive information included.

Considerations for Using the Data

Social Impact of Dataset

This dataset contributes to the development of language models in Spanish.

Discussion of Biases

[N/A]

Other Known Limitations

[N/A]

Additional Information

Dataset Curators

Text Mining Unit (TeMU) at the Barcelona Supercomputing Center ( bsc-temu@bsc.es ).

For further information, send an email to ( plantl-gob-es@bsc.es ).

This work was funded by the Spanish State Secretariat for Digitalization and Artificial Intelligence (SEDIA) within the framework of the Plan-TL .

Licensing information

This work is licensed under CC Attribution 4.0 International License.

Contributions

[N/A]

作者:

PlanTL-GOB-ES

数据集大小:

165.98 KB