数据集:
voidful/EQG-RACE-PLUS
Table of Contents
QGG-RACE Dataset is a subset of RACE, containing three types of questions: Factoid, Cloze, and Summarization.
Dataset Download: GitHub Release
Data Statistics:
Types | Examples | Train | Dev | Test |
---|---|---|---|---|
Cloze | Yingying is Wangwang's _ . | 43167 | 2405 | 2462 |
Factiod | What can Mimi do? | 18405 | 1030 | 944 |
Summarization | According to this passage we know that _ . | 3004 | 175 | 184 |
The dataset is in English.
An example data instance from the dataset is shown below:
{ "answers": [ "D", "A", "B", "C" ], "options": [ [ "States", "Doubts", "Confirms", "Removes" ], [ "shows the kind of male birds females seek out.", "indicates the wandering albatross is the most faithful.", "is based on Professor Stutchbury's 20 years' research.", "suggests that female birds select males near their home." ], [ "young birds' quality depends on their feather.", "some male birds care for others' young as their own.", "female birds go to find males as soon as autumn comes.", "female birds are responsible for feeding the hungry babies." ], [ "A book about love-birds.", "Birds' living habits and love life", "The fact that birds don't love their mates forever.", "The factors that influence birds to look for another mate." ] ], "questions": [ "What does the underline word \"dispels\" mean?", "The book The Private Lives of Birds _ .", "According to the passage, we can infer that _ .", "What is the passage mainly about?" ], "article": "Birds are not as loyal to their partners as you might think ...", "id": "high11327.txt", "factoid_questions": [ "What does the underline word \"dispels\" mean?" ], "cloze_questions": [ "The book The Private Lives of Birds _ ." ], "summarization_questions": [ "According to the passage, we can infer that _ ." ] }
QGG-RACE dataset is created as a subset of RACE, focusing on three types of questions: Factoid, Cloze, and Summarization. This dataset is intended to facilitate research in question generation and reading comprehension.
QGG-RACE dataset is derived from RACE dataset.
Who are the source language producers?The source language producers are the authors of the RACE dataset.
The dataset is annotated with questions and their corresponding answer options.
Who are the annotators?The annotators are the authors of the RACE dataset.
The dataset does not contain any personal or sensitive information.
The QGG-RACE dataset can be used for research in question generation and reading comprehension, leading to improvements in these fields.
The dataset may inherit some biases from the RACE dataset as it is a subset of it.
No other known limitations.
The QGG-RACE dataset is curated by the authors of the QGG-RACE dataset GitHub repository.
The dataset is released under the CC BY 4.0 License .
No citation information is available for the QGG-RACE dataset.
Thanks to @p208p2002 for creating the QGG-RACE dataset.