数据集:
quartz
许可:
cc-by-4.0源数据集:
original批注创建人:
crowdsourced语言创建人:
crowdsourced大小:
1K<n<10K计算机处理:
monolingual语言:
en任务:
问答QuaRTz is a crowdsourced dataset of 3864 multiple-choice questions about open domain qualitative relationships. Each question is paired with one of 405 different background sentences (sometimes short paragraphs). The QuaRTz dataset V1 contains 3864 questions about open domain qualitative relationships. Each question is paired with one of 405 different background sentences (sometimes short paragraphs).
The dataset is split into train (2696), dev (384) and test (784). A background sentence will only appear in a single split.
An example of 'train' looks as follows.
{ "answerKey": "A", "choices": { "label": ["A", "B"], "text": ["higher", "lower"] }, "id": "QRQA-10116-3", "para": "Electrons at lower energy levels, which are closer to the nucleus, have less energy.", "para_anno": { "cause_dir_sign": "LESS", "cause_dir_str": "closer", "cause_prop": "distance from a nucleus", "effect_dir_sign": "LESS", "effect_dir_str": "less", "effect_prop": "energy" }, "para_id": "QRSent-10116", "question": "Electrons further away from a nucleus have _____ energy levels than close ones.", "question_anno": { "less_cause_dir": "electron energy levels", "less_cause_prop": "nucleus", "less_effect_dir": "lower", "less_effect_prop": "electron energy levels", "more_effect_dir": "higher", "more_effect_prop": "electron energy levels" } }
The data fields are the same among all splits.
defaultname | train | validation | test |
---|---|---|---|
default | 2696 | 384 | 784 |
The dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0) .
@InProceedings{quartz, author = {Oyvind Tafjord and Matt Gardner and Kevin Lin and Peter Clark}, title = {"QUARTZ: An Open-Domain Dataset of Qualitative Relationship Questions"}, year = {"2019"}, }
Thanks to @patrickvonplaten , @lewtun , @thomwolf for adding this dataset.