数据集:

persiannlp/parsinlu_entailment

子任务:

natural-language-inference

语言:

计算机处理:

monolingual

大小:

1K<n<10K

语言创建人:

expert-generated

批注创建人:

expert-generated

源数据集:

extended|translated|mnli

预印本库:

arxiv:2012.06154

许可:

cc-by-nc-sa-4.0

数据集介绍文件清单

中文

Dataset Card for PersiNLU (Textual Entailment)

Dataset Summary

A Persian textual entailment task (deciding sent1 entails sent2 ). The questions are partially translated from the SNLI dataset and partially generated by expert annotators.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

The text dataset is in Persian ( fa ).

Dataset Structure

Data Instances

Here is an example from the dataset:

{
  "sent1": "سالها است که کنگره در تلاش است تا اثربخشی مدیریت اطلاعات و فناوری را در دولت فدرال افزایش دهد.",
  "sent2": "کنگره بودجه ویژه ای برای مدیریت اطلاعات و فناوری در دولت فدرال دارد.",
  "label": "n",
  "category": "translation-train"
}

Data Fields

sent1 : the first sentence.
sent2 : the second sentence.
source : whether the questions are translated from MNLI ( translation-. ) or they're written by native speakers ( natural-. ).
label : e if sent2 is entailed from sent1 ; c if sent2 is contradictory to sent1 ; n if the two sentences are neutral.

Data Splits

The train/dev/test splits contains 756/271/1751 samples.

Dataset Creation

Curation Rationale

For details, check the corresponding draft .

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

CC BY-NC-SA 4.0 License

Citation Information

@article{huggingface:dataset,
    title = {ParsiNLU: A Suite of Language Understanding Challenges for Persian},
    authors = {Khashabi, Daniel and Cohan, Arman and Shakeri, Siamak and Hosseini, Pedram and Pezeshkpour, Pouya and Alikhani, Malihe and Aminnaseri, Moin and Bitaab, Marzieh and Brahman, Faeze and Ghazarian, Sarik and others},
    year={2020}
    journal = {arXiv e-prints},
    eprint = {2012.06154},    
}

Contributions

Thanks to @danyaljj for adding this dataset.

作者:

persiannlp

数据集大小:

12.34 KB