Dataset Card for FaQuAD-NLI

Dataset Summary

FaQuAD is a Portuguese reading comprehension dataset that follows the format of the Stanford Question Answering Dataset (SQuAD). It is a pioneer Portuguese reading comprehension dataset using the challenging format of SQuAD. The dataset aims to address the problem of abundant questions sent by academics whose answers are found in available institutional documents in the Brazilian higher education system. It consists of 900 questions about 249 reading passages taken from 18 official documents of a computer science college from a Brazilian federal university and 21 Wikipedia articles related to the Brazilian higher education system.

FaQuAD-NLI is a modified version of the FaQuAD dataset that repurposes the question answering task as a textual entailment task between a question and its possible answers.

Supported Tasks and Leaderboards

question_answering : The dataset can be used to train a model for question-answering tasks in the domain of Brazilian higher education institutions.
textual_entailment : FaQuAD-NLI can be used to train a model for textual entailment tasks, where answers in Q&A pairs are classified as either suitable or unsuitable.

Languages

This dataset is in Brazilian Portuguese.

Dataset Structure

Data Fields

document_index : an integer representing the index of the document.
document_title : a string containing the title of the document.
paragraph_index : an integer representing the index of the paragraph within the document.
question : a string containing the question related to the paragraph.
answer : a string containing the answer related to the question.
label : an integer (0 or 1) representing if the answer is suitable (1) or unsuitable (0) for the question.

Data Splits

The dataset is split into three subsets: train, validation, and test. The splits were made carefully to avoid question and answer pairs belonging to the same document appearing in more than one split.

Train	Validation	Test
Instances	3128	731	650

Contributions

Thanks to @ruanchaves for adding this dataset.

作者:

ruanchaves

数据集大小:

33.63 KB