数据集:
lmqg/qg_squad
This is a subset of QG-Bench , a unified question generation benchmark proposed in "Generative Language Models for Paragraph-Level Question Generation: A Unified Benchmark and Evaluation, EMNLP 2022 main conference" . This is SQuAD dataset for question generation (QG) task. The split of train/development/test set follows the "Neural Question Generation" work and is compatible with the leader board .
English (en)
An example of 'train' looks as follows.
{ "question": "What is heresy mainly at odds with?", "paragraph": "Heresy is any provocative belief or theory that is strongly at variance with established beliefs or customs. A heretic is a proponent of such claims or beliefs. Heresy is distinct from both apostasy, which is the explicit renunciation of one's religion, principles or cause, and blasphemy, which is an impious utterance or action concerning God or sacred things.", "answer": "established beliefs or customs", "sentence": "Heresy is any provocative belief or theory that is strongly at variance with established beliefs or customs .", "paragraph_sentence": "<hl> Heresy is any provocative belief or theory that is strongly at variance with established beliefs or customs . <hl> A heretic is a proponent of such claims or beliefs. Heresy is distinct from both apostasy, which is the explicit renunciation of one's religion, principles or cause, and blasphemy, which is an impious utterance or action concerning God or sacred things.", "paragraph_answer": "Heresy is any provocative belief or theory that is strongly at variance with <hl> established beliefs or customs <hl>. A heretic is a proponent of such claims or beliefs. Heresy is distinct from both apostasy, which is the explicit renunciation of one's religion, principles or cause, and blasphemy, which is an impious utterance or action concerning God or sacred things.", "sentence_answer": "Heresy is any provocative belief or theory that is strongly at variance with <hl> established beliefs or customs <hl> ." }
The data fields are the same among all splits.
Each of paragraph_answer , paragraph_sentence , and sentence_answer feature is assumed to be used to train a question generation model, but with different information. The paragraph_answer and sentence_answer features are for answer-aware question generation and paragraph_sentence feature is for sentence-aware question generation.
train | validation | test |
---|---|---|
75722 | 10570 | 11877 |
@inproceedings{ushio-etal-2022-generative, title = "{G}enerative {L}anguage {M}odels for {P}aragraph-{L}evel {Q}uestion {G}eneration", author = "Ushio, Asahi and Alva-Manchego, Fernando and Camacho-Collados, Jose", booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing", month = dec, year = "2022", address = "Abu Dhabi, U.A.E.", publisher = "Association for Computational Linguistics", }