数据集:

GEM/squad_v2

语言:

en

计算机处理:

unknown

语言创建人:

unknown

批注创建人:

crowd-sourced

源数据集:

original

预印本库:

arxiv:1806.03822
中文

Dataset Card for GEM/squad_v2

Link to Main Data Card

You can find the main data card on the GEM Website .

Dataset Summary

SQuAD2.0 is a dataset that tests the ability of a system to not only answer reading comprehension questions, but also abstain when presented with a question that cannot be answered based on the provided paragraph. F1 score is used to evaluate models on the leaderboard. In GEM, we are using this dataset for the question-generation task in which a model should generate squad-like questions from an input text.

You can load the dataset via:

import datasets
data = datasets.load_dataset('GEM/squad_v2')

The data loader can be found here .

website

Website

paper

Arxiv

authors

Pranav Rajpurkar, Robin Jia and Percy Liang

Dataset Overview

Where to find the Data and its Documentation

Webpage

Website

Download

Website

Paper

Arxiv

BibTex
@inproceedings{Rajpurkar2018KnowWY,
  title={Know What You Don’t Know: Unanswerable Questions for SQuAD},
  author={Pranav Rajpurkar and Robin Jia and Percy Liang},
  booktitle={ACL},
  year={2018}
}
Contact Name

Robin Jia

Contact Email

robinjia@stanford.edu

Has a Leaderboard?

yes

Leaderboard Link

Website

Leaderboard Details

SQuAD2.0 tests the ability of a system to not only answer reading comprehension questions, but also abstain when presented with a question that cannot be answered based on the provided paragraph. F1 score is used to evaluate models on the leaderboard.

Languages and Intended Use

Multilingual?

no

Covered Languages

English

License

cc-by-sa-4.0: Creative Commons Attribution Share Alike 4.0 International

Intended Use

The idea behind SQuAD2.0 dataset is to make the models understand when a question cannot be answered given a context. This will help in building models such that they know what they don't know, and therefore make the models understand language at a deeper level. The tasks that can be supported by the dataset are machine reading comprehension, extractive QA, and question generation.

Primary Task

Question Generation

Communicative Goal

Given an input passage and an answer span, the goal is to generate a question that asks for the answer.

Credit

Curation Organization Type(s)

academic

Curation Organization(s)

Stanford University

Dataset Creators

Pranav Rajpurkar, Robin Jia and Percy Liang

Funding

Facebook and NSF Graduate Research Fellowship under Grant No. DGE-114747

Who added the Dataset to GEM?

(Abinaya Mahendiran)[ https://github.com/AbinayaM02] , Manager Data Science, NEXT Labs,

Dataset Structure

Data Fields

The data fields are the same among all splits.

squad_v2
  • id : a string feature.
  • gem_id : a string feature.
  • title : a string feature.
  • context : a string feature.
  • question : a string feature.
  • answers : a dictionary feature containing:
    • text : a string feature.
    • answer_start : a int32 feature.
Example Instance

Here is an example of a validation data point. This example was too long and was cropped:

{
    "gem_id": "gem-squad_v2-validation-1",
    "id": "56ddde6b9a695914005b9629",
    "answers": {
        "answer_start": [94, 87, 94, 94],
        "text": ["10th and 11th centuries", "in the 10th and 11th centuries", "10th and 11th centuries", "10th and 11th centuries"]
    },
    "context": "\"The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave thei...",
    "question": "When were the Normans in Normandy?",
    "title": "Normans"
}
Data Splits

The original SQuAD2.0 dataset has only training and dev (validation) splits. The train split is further divided into test split and added as part of the GEM datasets.

name train validation test
squad_v2 90403 11873 39916

Dataset in GEM

Rationale for Inclusion in GEM

Why is the Dataset in GEM?

SQuAD2.0 will encourage the development of new reading comprehension models that know what they don’t know, and therefore understand language at a deeper level. It can also help in building better models for answer-aware question generation .

Similar Datasets

no

Unique Language Coverage

yes

Ability that the Dataset measures

Reasoning capability

GEM-Specific Curation

Modificatied for GEM?

yes

GEM Modifications

other

Additional Splits?

yes

Split Information

The train(80%) and validation(10%) split of SQuAD2.0 are made available to public whereas the test(10%) split is not available.

As part of GEM, the train split, 80% of the original data is split into two train split (90%) and test split (remaining 10%). The idea is to provide all three splits for the users to use.

Getting Started with the Task

Previous Results

Previous Results

Measured Model Abilities

Extractive QA, Question Generation

Metrics

Other: Other Metrics , METEOR , ROUGE , BLEU

Other Metrics
  • Extractive QA uses Exact Match and F1 Score
  • Question generation users METEOR, ROUGE-L, BLEU-4
Previous results available?

yes

Other Evaluation Approaches

Question generation users METEOR, ROUGE-L, BLEU-4

Relevant Previous Results

@article{Dong2019UnifiedLM, title={Unified Language Model Pre-training for Natural Language Understanding and Generation}, author={Li Dong and Nan Yang and Wenhui Wang and Furu Wei and Xiaodong Liu and Yu Wang and Jianfeng Gao and M. Zhou and Hsiao-Wuen Hon}, journal={ArXiv}, year={2019}, volume={abs/1905.03197} }

Dataset Curation

Original Curation

Original Curation Rationale

The dataset is curated in three stages:

  • Curating passages,
  • Crowdsourcing question-answers on those passages,
  • Obtaining additional answers As part of SQuAD1.1, 10000 high-quality articles from English Wikipedia is extracted using Project Nayuki’s Wikipedia’s internal PageRanks, from which 536 articles are sampled uniformly at random. From each of these articles, individual paragraphs are extracted, stripping away images, figures, tables, and discarding paragraphs shorter than 500 characters.

SQuAD2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones.

Communicative Goal

To build systems that not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering.

Sourced from Different Sources

yes

Source Details

Wikipedia

Language Data

How was Language Data Obtained?

Found

Where was it found?

Single website

Topics Covered

The dataset contains 536 articles covering a wide range of topics, from musical celebrities to abstract concepts.

Data Validation

validated by crowdworker

Data Preprocessing

From the sampled articles from Wikipedia, individual paragraphs are extracted, stripping away images, figures, tables, and discarding paragraphs shorter than 500 characters and partitioned into training(80%), development set(10%) and test set(10%).

Was Data Filtered?

algorithmically

Filter Criteria

To retrieve high-quality articles, Project Nayuki’s Wikipedia’s internal PageRanks was used to obtain the top 10000 articles of English Wikipedia, from which 536 articles are sampled uniformly at random.

Structured Annotations

Additional Annotations?

crowd-sourced

Number of Raters

unknown

Rater Qualifications

Crowdworkers from the United States or Canada with a 97% HIT acceptance rate, a minimum of 1000 HITs, were employed to create questions.

Raters per Training Example

0

Raters per Test Example

0

Annotation Service?

yes

Which Annotation Service

other , Amazon Mechanical Turk

Annotation Values

For SQuAD 1.1 , crowdworkers were tasked with asking and answering up to 5 questions on the content of that paragraph. The questions had to be entered in a text field, and the answers had to be highlighted in the paragraph.

For SQuAD2.0, each task consisted of an entire article from SQuAD 1.1. For each paragraph in the article, workers were asked to pose up to five questions that were impossible to answer based on the paragraph alone, while referencing entities in the paragraph and ensuring that a plausible answer is present.

Any Quality Control?

validated by another rater

Quality Control Details

Questions from workers who wrote 25 or fewer questions on an article is removed; this filter helped remove noise from workers who had trouble understanding the task, and therefore quit before completing the whole article. This filter to both SQuAD2.0 and the existing answerable questions from SQuAD 1.1.

Consent

Any Consent Policy?

no

Private Identifying Information (PII)

Contains PII?

unlikely

Any PII Identification?

no identification

Maintenance

Any Maintenance Plan?

no

Broader Social Context

Previous Work on the Social Impact of the Dataset

Usage of Models based on the Data

no

Impact on Under-Served Communities

Addresses needs of underserved Communities?

no

Discussion of Biases

Any Documented Social Biases?

yes

Considerations for Using the Data

PII Risks and Liability

Licenses

Known Technical Limitations