数据集:
ruanchaves/faquad-nli
任务:
问答子任务:
extractive-qa语言:
pt计算机处理:
monolingual大小:
n<1K语言创建人:
found批注创建人:
expert-generated源数据集:
extended|wikipedia许可:
cc-by-4.0FaQuAD is a Portuguese reading comprehension dataset that follows the format of the Stanford Question Answering Dataset (SQuAD). It is a pioneer Portuguese reading comprehension dataset using the challenging format of SQuAD. The dataset aims to address the problem of abundant questions sent by academics whose answers are found in available institutional documents in the Brazilian higher education system. It consists of 900 questions about 249 reading passages taken from 18 official documents of a computer science college from a Brazilian federal university and 21 Wikipedia articles related to the Brazilian higher education system.
FaQuAD-NLI is a modified version of the FaQuAD dataset that repurposes the question answering task as a textual entailment task between a question and its possible answers.
This dataset is in Brazilian Portuguese.
The dataset is split into three subsets: train, validation, and test. The splits were made carefully to avoid question and answer pairs belonging to the same document appearing in more than one split.
Train | Validation | Test | |
---|---|---|---|
Instances | 3128 | 731 | 650 |
Thanks to @ruanchaves for adding this dataset.