classla/copa_hr | ATYUN.COM 官网-人工智能教程资讯全方位服务平台

数据集:

classla/copa_hr

任务:

文本分类

子任务:

natural-language-inference

语言:

预印本库:

arxiv:2005.00333 arxiv:2104.09243

其他:

causal-reasoning textual-entailment commonsense-reasoning

许可:

cc-by-sa-4.0

数据集介绍文件清单

中文

The COPA-HR dataset (Choice of plausible alternatives in Croatian) is a translation of the English COPA dataset ( https://people.ict.usc.edu/~gordon/copa.html ) by following the XCOPA dataset translation methodology ( https://arxiv.org/abs/2005.00333 ). The dataset consists of 1000 premises (My body cast a shadow over the grass), each given a question (What is the cause?), and two choices (The sun was rising; The grass was cut), with a label encoding which of the choices is more plausible given the annotator or translator (The sun was rising).

The dataset is split into 400 training samples, 100 validation samples, and 500 test samples. It includes the following features: 'premise', 'choice1', 'choice2', 'label', 'question', 'changed' (boolean).

If you use the dataset in your work, please cite

@article{DBLP:journals/corr/abs-2104-09243,
 author    = {Nikola Ljube\\\\v{s}i\\\\'{c} and
              Davor Lauc},
 title     = {BERTi{\\\\'{c}} - The Transformer Language Model for Bosnian, Croatian,
               Montenegrin and Serbian},
 journal   = {CoRR},
 volume    = {abs/2104.09243},
 year      = {2021},
 url       = {https://arxiv.org/abs/2104.09243},
 archivePrefix = {arXiv},
}

作者:

classla

数据集大小:

52.57 KB