数据集:
race
任务:
多项选择子任务:
multiple-choice-qa语言:
en计算机处理:
monolingual大小:
10K<n<100K语言创建人:
found批注创建人:
expert-generated源数据集:
original预印本库:
arxiv:1704.04683许可:
otherRACE is a large-scale reading comprehension dataset with more than 28,000 passages and nearly 100,000 questions. The dataset is collected from English examinations in China, which are designed for middle school and high school students. The dataset can be served as the training and test sets for machine comprehension.
An example of 'train' looks as follows.
This example was too long and was cropped: { "answer": "A", "article": "\"Schoolgirls have been wearing such short skirts at Paget High School in Branston that they've been ordered to wear trousers ins...", "example_id": "high132.txt", "options": ["short skirts give people the impression of sexualisation", "short skirts are too expensive for parents to afford", "the headmaster doesn't like girls wearing short skirts", "the girls wearing short skirts will be at the risk of being laughed at"], "question": "The girls at Paget High School are not allowed to wear skirts in that _ ." }high
An example of 'train' looks as follows.
This example was too long and was cropped: { "answer": "A", "article": "\"Schoolgirls have been wearing such short skirts at Paget High School in Branston that they've been ordered to wear trousers ins...", "example_id": "high132.txt", "options": ["short skirts give people the impression of sexualisation", "short skirts are too expensive for parents to afford", "the headmaster doesn't like girls wearing short skirts", "the girls wearing short skirts will be at the risk of being laughed at"], "question": "The girls at Paget High School are not allowed to wear skirts in that _ ." }middle
An example of 'train' looks as follows.
This example was too long and was cropped: { "answer": "B", "article": "\"There is not enough oil in the world now. As time goes by, it becomes less and less, so what are we going to do when it runs ou...", "example_id": "middle3.txt", "options": ["There is more petroleum than we can use now.", "Trees are needed for some other things besides making gas.", "We got electricity from ocean tides in the old days.", "Gas wasn't used to run cars in the Second World War."], "question": "According to the passage, which of the following statements is TRUE?" }
The data fields are the same among all splits.
allname | train | validation | test |
---|---|---|---|
all | 87866 | 4887 | 4934 |
high | 62445 | 3451 | 3498 |
middle | 25421 | 1436 | 1436 |
http://www.cs.cmu.edu/~glai1/data/race/
RACE dataset is available for non-commercial research purpose only.
All passages are obtained from the Internet which is not property of Carnegie Mellon University. We are not responsible for the content nor the meaning of these passages.
You agree not to reproduce, duplicate, copy, sell, trade, resell or exploit for any commercial purpose, any portion of the contexts and any portion of derived data.
We reserve the right to terminate your access to the RACE dataset at any time.
@inproceedings{lai-etal-2017-race, title = "{RACE}: Large-scale {R}e{A}ding Comprehension Dataset From Examinations", author = "Lai, Guokun and Xie, Qizhe and Liu, Hanxiao and Yang, Yiming and Hovy, Eduard", booktitle = "Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing", month = sep, year = "2017", address = "Copenhagen, Denmark", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/D17-1082", doi = "10.18653/v1/D17-1082", pages = "785--794", }
Thanks to @abarbosa94 , @patrickvonplaten , @lewtun , @thomwolf , @mariamabarham for adding this dataset.