数据集:
GEM/squad_v2
语言:
en计算机处理:
unknown语言创建人:
unknown批注创建人:
crowd-sourced源数据集:
original预印本库:
arxiv:1806.03822许可:
cc-by-sa-4.0You can find the main data card on the GEM Website .
SQuAD2.0 is a dataset that tests the ability of a system to not only answer reading comprehension questions, but also abstain when presented with a question that cannot be answered based on the provided paragraph. F1 score is used to evaluate models on the leaderboard. In GEM, we are using this dataset for the question-generation task in which a model should generate squad-like questions from an input text.
You can load the dataset via:
import datasets data = datasets.load_dataset('GEM/squad_v2')
The data loader can be found here .
website paper authorsPranav Rajpurkar, Robin Jia and Percy Liang
@inproceedings{Rajpurkar2018KnowWY, title={Know What You Don’t Know: Unanswerable Questions for SQuAD}, author={Pranav Rajpurkar and Robin Jia and Percy Liang}, booktitle={ACL}, year={2018} }Contact Name
Robin Jia
Contact Emailrobinjia@stanford.edu
Has a Leaderboard?yes
Leaderboard Link Leaderboard DetailsSQuAD2.0 tests the ability of a system to not only answer reading comprehension questions, but also abstain when presented with a question that cannot be answered based on the provided paragraph. F1 score is used to evaluate models on the leaderboard.
no
Covered LanguagesEnglish
Licensecc-by-sa-4.0: Creative Commons Attribution Share Alike 4.0 International
Intended UseThe idea behind SQuAD2.0 dataset is to make the models understand when a question cannot be answered given a context. This will help in building models such that they know what they don't know, and therefore make the models understand language at a deeper level. The tasks that can be supported by the dataset are machine reading comprehension, extractive QA, and question generation.
Primary TaskQuestion Generation
Communicative GoalGiven an input passage and an answer span, the goal is to generate a question that asks for the answer.
academic
Curation Organization(s)Stanford University
Dataset CreatorsPranav Rajpurkar, Robin Jia and Percy Liang
FundingFacebook and NSF Graduate Research Fellowship under Grant No. DGE-114747
Who added the Dataset to GEM?(Abinaya Mahendiran)[ https://github.com/AbinayaM02] , Manager Data Science, NEXT Labs,
The data fields are the same among all splits.
squad_v2Here is an example of a validation data point. This example was too long and was cropped:
{ "gem_id": "gem-squad_v2-validation-1", "id": "56ddde6b9a695914005b9629", "answers": { "answer_start": [94, 87, 94, 94], "text": ["10th and 11th centuries", "in the 10th and 11th centuries", "10th and 11th centuries", "10th and 11th centuries"] }, "context": "\"The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave thei...", "question": "When were the Normans in Normandy?", "title": "Normans" }Data Splits
The original SQuAD2.0 dataset has only training and dev (validation) splits. The train split is further divided into test split and added as part of the GEM datasets.
name | train | validation | test |
---|---|---|---|
squad_v2 | 90403 | 11873 | 39916 |
SQuAD2.0 will encourage the development of new reading comprehension models that know what they don’t know, and therefore understand language at a deeper level. It can also help in building better models for answer-aware question generation .
Similar Datasetsno
Unique Language Coverageyes
Ability that the Dataset measuresReasoning capability
yes
GEM Modificationsother
Additional Splits?yes
Split InformationThe train(80%) and validation(10%) split of SQuAD2.0 are made available to public whereas the test(10%) split is not available.
As part of GEM, the train split, 80% of the original data is split into two train split (90%) and test split (remaining 10%). The idea is to provide all three splits for the users to use.
Extractive QA, Question Generation
MetricsOther: Other Metrics , METEOR , ROUGE , BLEU
Other Metricsyes
Other Evaluation ApproachesQuestion generation users METEOR, ROUGE-L, BLEU-4
Relevant Previous Results@article{Dong2019UnifiedLM, title={Unified Language Model Pre-training for Natural Language Understanding and Generation}, author={Li Dong and Nan Yang and Wenhui Wang and Furu Wei and Xiaodong Liu and Yu Wang and Jianfeng Gao and M. Zhou and Hsiao-Wuen Hon}, journal={ArXiv}, year={2019}, volume={abs/1905.03197} }
The dataset is curated in three stages:
SQuAD2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones.
Communicative GoalTo build systems that not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering.
Sourced from Different Sourcesyes
Source DetailsWikipedia
Found
Where was it found?Single website
Topics CoveredThe dataset contains 536 articles covering a wide range of topics, from musical celebrities to abstract concepts.
Data Validationvalidated by crowdworker
Data PreprocessingFrom the sampled articles from Wikipedia, individual paragraphs are extracted, stripping away images, figures, tables, and discarding paragraphs shorter than 500 characters and partitioned into training(80%), development set(10%) and test set(10%).
Was Data Filtered?algorithmically
Filter CriteriaTo retrieve high-quality articles, Project Nayuki’s Wikipedia’s internal PageRanks was used to obtain the top 10000 articles of English Wikipedia, from which 536 articles are sampled uniformly at random.
crowd-sourced
Number of Ratersunknown
Rater QualificationsCrowdworkers from the United States or Canada with a 97% HIT acceptance rate, a minimum of 1000 HITs, were employed to create questions.
Raters per Training Example0
Raters per Test Example0
Annotation Service?yes
Which Annotation Serviceother , Amazon Mechanical Turk
Annotation ValuesFor SQuAD 1.1 , crowdworkers were tasked with asking and answering up to 5 questions on the content of that paragraph. The questions had to be entered in a text field, and the answers had to be highlighted in the paragraph.
For SQuAD2.0, each task consisted of an entire article from SQuAD 1.1. For each paragraph in the article, workers were asked to pose up to five questions that were impossible to answer based on the paragraph alone, while referencing entities in the paragraph and ensuring that a plausible answer is present.
Any Quality Control?validated by another rater
Quality Control DetailsQuestions from workers who wrote 25 or fewer questions on an article is removed; this filter helped remove noise from workers who had trouble understanding the task, and therefore quit before completing the whole article. This filter to both SQuAD2.0 and the existing answerable questions from SQuAD 1.1.
no
unlikely
Any PII Identification?no identification
no
no
no
yes