数据集:

facebook/babi_qa

任务:

问答

语言:

en

计算机处理:

monolingual

语言创建人:

machine-generated

批注创建人:

machine-generated

源数据集:

original

其他:

chained-qa

许可:

cc-by-3.0
中文

Dataset Card for bAbi QA

Dataset Summary

The (20) QA bAbI tasks are a set of proxy tasks that evaluate reading comprehension via question answering. Our tasks measure understanding in several ways: whether a system is able to answer questions via chaining facts, simple induction, deduction and many more. The tasks are designed to be prerequisites for any system that aims to be capable of conversing with a human. The aim is to classify these tasks into skill sets,so that researchers can identify (and then rectify) the failings of their systems.

Supported Tasks and Leaderboards

The dataset supports a set of 20 proxy story-based question answering tasks for various "types" in English and Hindi. The tasks are:

task_no task_name
qa1 single-supporting-fact
qa2 two-supporting-facts
qa3 three-supporting-facts
qa4 two-arg-relations
qa5 three-arg-relations
qa6 yes-no-questions
qa7 counting
qa8 lists-sets
qa9 simple-negation
qa10 indefinite-knowledge
qa11 basic-coreference
qa12 conjunction
qa13 compound-coreference
qa14 time-reasoning
qa15 basic-deduction
qa16 basic-induction
qa17 positional-reasoning
qa18 size-reasoning
qa19 path-finding
qa20 agents-motivations

The "types" are are:

  • en

    • the tasks in English, readable by humans.
  • hn

    • the tasks in Hindi, readable by humans.
  • shuffled

    • the same tasks with shuffled letters so they are not readable by humans, and for existing parsers and taggers cannot be used in a straight-forward fashion to leverage extra resources-- in this case the learner is more forced to rely on the given training data. This mimics a learner being first presented a language and having to learn from scratch.
  • en-10k , shuffled-10k and hn-10k

    • the same tasks in the three formats, but with 10,000 training examples, rather than 1000 training examples.
  • en-valid and en-valid-10k

    • are the same as en and en10k except the train sets have been conveniently split into train and valid portions (90% and 10% split).

To get a particular dataset, use load_dataset('babi_qa',type=f'{type}',task_no=f'{task_no}') where type is one of the types, and task_no is one of the task numbers. For example, load_dataset('babi_qa', type='en', task_no='qa1') .

Languages

Dataset Structure

Data Instances

An instance from the en-qa1 config's train split:

{'story': {'answer': ['', '', 'bathroom', '', '', 'hallway', '', '', 'hallway', '', '', 'office', '', '', 'bathroom'], 'id': ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15'], 'supporting_ids': [[], [], ['1'], [], [], ['4'], [], [], ['4'], [], [], ['11'], [], [], ['8']], 'text': ['Mary moved to the bathroom.', 'John went to the hallway.', 'Where is Mary?', 'Daniel went back to the hallway.', 'Sandra moved to the garden.', 'Where is Daniel?', 'John moved to the office.', 'Sandra journeyed to the bathroom.', 'Where is Daniel?', 'Mary moved to the hallway.', 'Daniel travelled to the office.', 'Where is Daniel?', 'John went back to the garden.', 'John moved to the bedroom.', 'Where is Sandra?'], 'type': [0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1]}}

Data Fields

  • story : a dictionary feature containing:
    • id : a string feature, which denotes the line number in the example.
    • type : a classification label, with possible values including context , question , denoting whether the text is context or a question.
    • text : a string feature the text present, whether it is a question or context.
    • supporting_ids : a list of string features containing the line numbers of the lines in the example which support the answer.
    • answer : a string feature containing the answer to the question, or an empty string if the type s is not question .

Data Splits

The splits and corresponding sizes are:

train test validation
en-qa1 200 200 -
en-qa2 200 200 -
en-qa3 200 200 -
en-qa4 1000 1000 -
en-qa5 200 200 -
en-qa6 200 200 -
en-qa7 200 200 -
en-qa8 200 200 -
en-qa9 200 200 -
en-qa10 200 200 -
en-qa11 200 200 -
en-qa12 200 200 -
en-qa13 200 200 -
en-qa14 200 200 -
en-qa15 250 250 -
en-qa16 1000 1000 -
en-qa17 125 125 -
en-qa18 198 199 -
en-qa19 1000 1000 -
en-qa20 94 93 -
en-10k-qa1 2000 200 -
en-10k-qa2 2000 200 -
en-10k-qa3 2000 200 -
en-10k-qa4 10000 1000 -
en-10k-qa5 2000 200 -
en-10k-qa6 2000 200 -
en-10k-qa7 2000 200 -
en-10k-qa8 2000 200 -
en-10k-qa9 2000 200 -
en-10k-qa10 2000 200 -
en-10k-qa11 2000 200 -
en-10k-qa12 2000 200 -
en-10k-qa13 2000 200 -
en-10k-qa14 2000 200 -
en-10k-qa15 2500 250 -
en-10k-qa16 10000 1000 -
en-10k-qa17 1250 125 -
en-10k-qa18 1978 199 -
en-10k-qa19 10000 1000 -
en-10k-qa20 933 93 -
en-valid-qa1 180 200 20
en-valid-qa2 180 200 20
en-valid-qa3 180 200 20
en-valid-qa4 900 1000 100
en-valid-qa5 180 200 20
en-valid-qa6 180 200 20
en-valid-qa7 180 200 20
en-valid-qa8 180 200 20
en-valid-qa9 180 200 20
en-valid-qa10 180 200 20
en-valid-qa11 180 200 20
en-valid-qa12 180 200 20
en-valid-qa13 180 200 20
en-valid-qa14 180 200 20
en-valid-qa15 225 250 25
en-valid-qa16 900 1000 100
en-valid-qa17 113 125 12
en-valid-qa18 179 199 19
en-valid-qa19 900 1000 100
en-valid-qa20 85 93 9
en-valid-10k-qa1 1800 200 200
en-valid-10k-qa2 1800 200 200
en-valid-10k-qa3 1800 200 200
en-valid-10k-qa4 9000 1000 1000
en-valid-10k-qa5 1800 200 200
en-valid-10k-qa6 1800 200 200
en-valid-10k-qa7 1800 200 200
en-valid-10k-qa8 1800 200 200
en-valid-10k-qa9 1800 200 200
en-valid-10k-qa10 1800 200 200
en-valid-10k-qa11 1800 200 200
en-valid-10k-qa12 1800 200 200
en-valid-10k-qa13 1800 200 200
en-valid-10k-qa14 1800 200 200
en-valid-10k-qa15 2250 250 250
en-valid-10k-qa16 9000 1000 1000
en-valid-10k-qa17 1125 125 125
en-valid-10k-qa18 1781 199 197
en-valid-10k-qa19 9000 1000 1000
en-valid-10k-qa20 840 93 93
hn-qa1 200 200 -
hn-qa2 200 200 -
hn-qa3 167 167 -
hn-qa4 1000 1000 -
hn-qa5 200 200 -
hn-qa6 200 200 -
hn-qa7 200 200 -
hn-qa8 200 200 -
hn-qa9 200 200 -
hn-qa10 200 200 -
hn-qa11 200 200 -
hn-qa12 200 200 -
hn-qa13 125 125 -
hn-qa14 200 200 -
hn-qa15 250 250 -
hn-qa16 1000 1000 -
hn-qa17 125 125 -
hn-qa18 198 198 -
hn-qa19 1000 1000 -
hn-qa20 93 94 -
hn-10k-qa1 2000 200 -
hn-10k-qa2 2000 200 -
hn-10k-qa3 1667 167 -
hn-10k-qa4 10000 1000 -
hn-10k-qa5 2000 200 -
hn-10k-qa6 2000 200 -
hn-10k-qa7 2000 200 -
hn-10k-qa8 2000 200 -
hn-10k-qa9 2000 200 -
hn-10k-qa10 2000 200 -
hn-10k-qa11 2000 200 -
hn-10k-qa12 2000 200 -
hn-10k-qa13 1250 125 -
hn-10k-qa14 2000 200 -
hn-10k-qa15 2500 250 -
hn-10k-qa16 10000 1000 -
hn-10k-qa17 1250 125 -
hn-10k-qa18 1977 198 -
hn-10k-qa19 10000 1000 -
hn-10k-qa20 934 94 -
shuffled-qa1 200 200 -
shuffled-qa2 200 200 -
shuffled-qa3 200 200 -
shuffled-qa4 1000 1000 -
shuffled-qa5 200 200 -
shuffled-qa6 200 200 -
shuffled-qa7 200 200 -
shuffled-qa8 200 200 -
shuffled-qa9 200 200 -
shuffled-qa10 200 200 -
shuffled-qa11 200 200 -
shuffled-qa12 200 200 -
shuffled-qa13 200 200 -
shuffled-qa14 200 200 -
shuffled-qa15 250 250 -
shuffled-qa16 1000 1000 -
shuffled-qa17 125 125 -
shuffled-qa18 198 199 -
shuffled-qa19 1000 1000 -
shuffled-qa20 94 93 -
shuffled-10k-qa1 2000 200 -
shuffled-10k-qa2 2000 200 -
shuffled-10k-qa3 2000 200 -
shuffled-10k-qa4 10000 1000 -
shuffled-10k-qa5 2000 200 -
shuffled-10k-qa6 2000 200 -
shuffled-10k-qa7 2000 200 -
shuffled-10k-qa8 2000 200 -
shuffled-10k-qa9 2000 200 -
shuffled-10k-qa10 2000 200 -
shuffled-10k-qa11 2000 200 -
shuffled-10k-qa12 2000 200 -
shuffled-10k-qa13 2000 200 -
shuffled-10k-qa14 2000 200 -
shuffled-10k-qa15 2500 250 -
shuffled-10k-qa16 10000 1000 -
shuffled-10k-qa17 1250 125 -
shuffled-10k-qa18 1978 199 -
shuffled-10k-qa19 10000 1000 -
shuffled-10k-qa20 933 93 -

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

Code to generate tasks is available on github

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

Jesse Dodge and Andreea Gane and Xiang Zhang and Antoine Bordes and Sumit Chopra and Alexander Miller and Arthur Szlam and Jason Weston, at Facebook Research.

Licensing Information

Creative Commons Attribution 3.0 License

Citation Information

@misc{dodge2016evaluating,
      title={Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems}, 
      author={Jesse Dodge and Andreea Gane and Xiang Zhang and Antoine Bordes and Sumit Chopra and Alexander Miller and Arthur Szlam and Jason Weston},
      year={2016},
      eprint={1511.06931},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Contributions

Thanks to @gchhablani for adding this dataset.