数据集:
wiki_movies
任务:
问答子任务:
closed-domain-qa语言:
en计算机处理:
monolingual大小:
100K<n<1M语言创建人:
crowdsourced批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1606.03126许可:
cc-by-3.0The WikiMovies dataset consists of roughly 100k (templated) questions over 75k entitiesbased on questions with answers in the open movie database (OMDb). It is the QA part of the Movie Dialog dataset.
The text in the dataset is written in English.
The raw data consists of question answer pairs separated by a tab. Here are 3 examples:
1 what does Grégoire Colin appear in? Before the Rain 1 Joe Thomas appears in which movies? The Inbetweeners Movie, The Inbetweeners 2 1 what films did Michelle Trachtenberg star in? Inspector Gadget, Black Christmas, Ice Princess, Harriet the Spy, The Scribbler
It is unclear what the 1 is for at the beginning of each line, but it has been removed in the Dataset object.
Here is an example of the raw data ingested by Datasets :
{ 'answer': 'Before the Rain', 'question': 'what does Grégoire Colin appear in?' }
answer : a string containing the answer to a corresponding question. question : a string containing the relevant question.
The data is split into train, test, and dev sets. The split sizes are as follows:
wiki-entities_qa_* | n examples |
---|---|
train.txt | 96185 |
dev.txt | 10000 |
test.txt | 9952 |
WikiMovies was built with the following goals in mind: (i) machine learning techniques should have ample training examples for learning; and (ii) one can analyze easily the performance of different representations of knowledge and break down the results by question type. The datasetcan be downloaded from http://fb.ai/babi
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
@misc{miller2016keyvalue, title={Key-Value Memory Networks for Directly Reading Documents}, author={Alexander Miller and Adam Fisch and Jesse Dodge and Amir-Hossein Karimi and Antoine Bordes and Jason Weston}, year={2016}, eprint={1606.03126}, archivePrefix={arXiv}, primaryClass={cs.CL}
Thanks to @aclifton314 for adding this dataset.