数据集:

race

任务:

多项选择

子任务:

multiple-choice-qa

语言:

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

found

批注创建人:

expert-generated

源数据集:

original

预印本库:

arxiv:1704.04683

许可:

other

数据集介绍文件清单

中文

Dataset Card for "race"

Dataset Summary

RACE is a large-scale reading comprehension dataset with more than 28,000 passages and nearly 100,000 questions. The dataset is collected from English examinations in China, which are designed for middle school and high school students. The dataset can be served as the training and test sets for machine comprehension.

Supported Tasks and Leaderboards

More Information Needed

Languages

More Information Needed

Dataset Structure

Data Instances

all

Size of downloaded dataset files: 25.44 MB
Size of the generated dataset: 174.73 MB
Total amount of disk used: 200.17 MB

An example of 'train' looks as follows.

This example was too long and was cropped:

{
    "answer": "A",
    "article": "\"Schoolgirls have been wearing such short skirts at Paget High School in Branston that they've been ordered to wear trousers ins...",
    "example_id": "high132.txt",
    "options": ["short skirts give people the impression of sexualisation", "short skirts are too expensive for parents to afford", "the headmaster doesn't like girls wearing short skirts", "the girls wearing short skirts will be at the risk of being laughed at"],
    "question": "The girls at Paget High School are not allowed to wear skirts in that    _  ."
}

high

Size of downloaded dataset files: 25.44 MB
Size of the generated dataset: 140.12 MB
Total amount of disk used: 165.56 MB

An example of 'train' looks as follows.

This example was too long and was cropped:

{
    "answer": "A",
    "article": "\"Schoolgirls have been wearing such short skirts at Paget High School in Branston that they've been ordered to wear trousers ins...",
    "example_id": "high132.txt",
    "options": ["short skirts give people the impression of sexualisation", "short skirts are too expensive for parents to afford", "the headmaster doesn't like girls wearing short skirts", "the girls wearing short skirts will be at the risk of being laughed at"],
    "question": "The girls at Paget High School are not allowed to wear skirts in that    _  ."
}

middle

Size of downloaded dataset files: 25.44 MB
Size of the generated dataset: 34.61 MB
Total amount of disk used: 60.05 MB

An example of 'train' looks as follows.

This example was too long and was cropped:

{
    "answer": "B",
    "article": "\"There is not enough oil in the world now. As time goes by, it becomes less and less, so what are we going to do when it runs ou...",
    "example_id": "middle3.txt",
    "options": ["There is more petroleum than we can use now.", "Trees are needed for some other things besides making gas.", "We got electricity from ocean tides in the old days.", "Gas wasn't used to run cars in the Second World War."],
    "question": "According to the passage, which of the following statements is TRUE?"
}

Data Fields

The data fields are the same among all splits.

all

example_id : a string feature.
article : a string feature.
answer : a string feature.
question : a string feature.
options : a list of string features.

high

example_id : a string feature.
article : a string feature.
answer : a string feature.
question : a string feature.
options : a list of string features.

middle

example_id : a string feature.
article : a string feature.
answer : a string feature.
question : a string feature.
options : a list of string features.

Data Splits

name	train	validation	test
all	87866	4887	4934
high	62445	3451	3498
middle	25421	1436	1436

Dataset Creation

Curation Rationale

More Information Needed

Source Data

Initial Data Collection and Normalization

More Information Needed

Who are the source language producers?

More Information Needed

Annotations

Annotation process

More Information Needed

Who are the annotators?

More Information Needed

Personal and Sensitive Information

More Information Needed

Considerations for Using the Data

Additional Information

Dataset Curators

More Information Needed

Licensing Information

http://www.cs.cmu.edu/~glai1/data/race/

RACE dataset is available for non-commercial research purpose only.

All passages are obtained from the Internet which is not property of Carnegie Mellon University. We are not responsible for the content nor the meaning of these passages.

You agree not to reproduce, duplicate, copy, sell, trade, resell or exploit for any commercial purpose, any portion of the contexts and any portion of derived data.

We reserve the right to terminate your access to the RACE dataset at any time.

Citation Information

@inproceedings{lai-etal-2017-race,
    title = "{RACE}: Large-scale {R}e{A}ding Comprehension Dataset From Examinations",
    author = "Lai, Guokun  and
      Xie, Qizhe  and
      Liu, Hanxiao  and
      Yang, Yiming  and
      Hovy, Eduard",
    booktitle = "Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing",
    month = sep,
    year = "2017",
    address = "Copenhagen, Denmark",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D17-1082",
    doi = "10.18653/v1/D17-1082",
    pages = "785--794",
}

Contributions

Thanks to @abarbosa94 , @patrickvonplaten , @lewtun , @thomwolf , @mariamabarham for adding this dataset.

作者:

佚名

数据集大小:

21.44 KB

Dataset Card for "race"

Dataset Summary

Supported Tasks and Leaderboards

Languages

Dataset Structure

Data Instances

Data Fields

Data Splits

Dataset Creation

Curation Rationale

Source Data

Annotations

Personal and Sensitive Information

Considerations for Using the Data

Social Impact of Dataset

Discussion of Biases

Other Known Limitations

Additional Information

Dataset Curators

Licensing Information

Citation Information

Contributions