数据集:
qasc
语言:
en计算机处理:
monolingual大小:
1K<n<10K语言创建人:
found批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1910.11473许可:
cc-by-4.0QASC is a question-answering dataset with a focus on sentence composition. It consists of 9,980 8-way multiple-choice questions about grade school science (8,134 train, 926 dev, 920 test), and comes with a corpus of 17M sentences.
An example of 'validation' looks as follows.
{ "answerKey": "F", "choices": { "label": ["A", "B", "C", "D", "E", "F", "G", "H"], "text": ["sand", "occurs over a wide range", "forests", "Global warming", "rapid changes occur", "local weather conditions", "measure of motion", "city life"] }, "combinedfact": "Climate is generally described in terms of local weather conditions", "fact1": "Climate is generally described in terms of temperature and moisture.", "fact2": "Fire behavior is driven by local weather conditions such as winds, temperature and moisture.", "formatted_question": "Climate is generally described in terms of what? (A) sand (B) occurs over a wide range (C) forests (D) Global warming (E) rapid changes occur (F) local weather conditions (G) measure of motion (H) city life", "id": "3NGI5ARFTT4HNGVWXAMLNBMFA0U1PG", "question": "Climate is generally described in terms of what?" }
The data fields are the same among all splits.
defaultname | train | validation | test |
---|---|---|---|
default | 8134 | 926 | 920 |
The dataset is released under CC BY 4.0 license.
@article{allenai:qasc, author = {Tushar Khot and Peter Clark and Michal Guerquin and Peter Jansen and Ashish Sabharwal}, title = {QASC: A Dataset for Question Answering via Sentence Composition}, journal = {arXiv:1910.11473v2}, year = {2020}, }
Thanks to @thomwolf , @patrickvonplaten , @lewtun for adding this dataset.