Dataset Card for "trivia_qa"
Dataset Summary
TriviaqQA is a reading comprehension dataset containing over 650K
question-answer-evidence triples. TriviaqQA includes 95K question-answer
pairs authored by trivia enthusiasts and independently gathered evidence
documents, six per question on average, that provide high quality distant
supervision for answering the questions.
Supported Tasks and Leaderboards
More Information Needed
Languages
English.
Dataset Structure
Data Instances
rc
-
Size of downloaded dataset files:
2.67 GB
-
Size of the generated dataset:
16.02 GB
-
Total amount of disk used:
18.68 GB
An example of 'train' looks as follows.
rc.nocontext
-
Size of downloaded dataset files:
2.67 GB
-
Size of the generated dataset:
126.27 MB
-
Total amount of disk used:
2.79 GB
An example of 'train' looks as follows.
unfiltered
-
Size of downloaded dataset files:
3.30 GB
-
Size of the generated dataset:
29.24 GB
-
Total amount of disk used:
32.54 GB
An example of 'validation' looks as follows.
unfiltered.nocontext
-
Size of downloaded dataset files:
632.55 MB
-
Size of the generated dataset:
74.56 MB
-
Total amount of disk used:
707.11 MB
An example of 'train' looks as follows.
Data Fields
The data fields are the same among all splits.
rc
-
question
: a
string
feature.
-
question_id
: a
string
feature.
-
question_source
: a
string
feature.
-
entity_pages
: a dictionary feature containing:
-
doc_source
: a
string
feature.
-
filename
: a
string
feature.
-
title
: a
string
feature.
-
wiki_context
: a
string
feature.
-
search_results
: a dictionary feature containing:
-
description
: a
string
feature.
-
filename
: a
string
feature.
-
rank
: a
int32
feature.
-
title
: a
string
feature.
-
url
: a
string
feature.
-
search_context
: a
string
feature.
-
aliases
: a
list
of
string
features.
-
normalized_aliases
: a
list
of
string
features.
-
matched_wiki_entity_name
: a
string
feature.
-
normalized_matched_wiki_entity_name
: a
string
feature.
-
normalized_value
: a
string
feature.
-
type
: a
string
feature.
-
value
: a
string
feature.
rc.nocontext
-
question
: a
string
feature.
-
question_id
: a
string
feature.
-
question_source
: a
string
feature.
-
entity_pages
: a dictionary feature containing:
-
doc_source
: a
string
feature.
-
filename
: a
string
feature.
-
title
: a
string
feature.
-
wiki_context
: a
string
feature.
-
search_results
: a dictionary feature containing:
-
description
: a
string
feature.
-
filename
: a
string
feature.
-
rank
: a
int32
feature.
-
title
: a
string
feature.
-
url
: a
string
feature.
-
search_context
: a
string
feature.
-
aliases
: a
list
of
string
features.
-
normalized_aliases
: a
list
of
string
features.
-
matched_wiki_entity_name
: a
string
feature.
-
normalized_matched_wiki_entity_name
: a
string
feature.
-
normalized_value
: a
string
feature.
-
type
: a
string
feature.
-
value
: a
string
feature.
unfiltered
-
question
: a
string
feature.
-
question_id
: a
string
feature.
-
question_source
: a
string
feature.
-
entity_pages
: a dictionary feature containing:
-
doc_source
: a
string
feature.
-
filename
: a
string
feature.
-
title
: a
string
feature.
-
wiki_context
: a
string
feature.
-
search_results
: a dictionary feature containing:
-
description
: a
string
feature.
-
filename
: a
string
feature.
-
rank
: a
int32
feature.
-
title
: a
string
feature.
-
url
: a
string
feature.
-
search_context
: a
string
feature.
-
aliases
: a
list
of
string
features.
-
normalized_aliases
: a
list
of
string
features.
-
matched_wiki_entity_name
: a
string
feature.
-
normalized_matched_wiki_entity_name
: a
string
feature.
-
normalized_value
: a
string
feature.
-
type
: a
string
feature.
-
value
: a
string
feature.
unfiltered.nocontext
-
question
: a
string
feature.
-
question_id
: a
string
feature.
-
question_source
: a
string
feature.
-
entity_pages
: a dictionary feature containing:
-
doc_source
: a
string
feature.
-
filename
: a
string
feature.
-
title
: a
string
feature.
-
wiki_context
: a
string
feature.
-
search_results
: a dictionary feature containing:
-
description
: a
string
feature.
-
filename
: a
string
feature.
-
rank
: a
int32
feature.
-
title
: a
string
feature.
-
url
: a
string
feature.
-
search_context
: a
string
feature.
-
aliases
: a
list
of
string
features.
-
normalized_aliases
: a
list
of
string
features.
-
matched_wiki_entity_name
: a
string
feature.
-
normalized_matched_wiki_entity_name
: a
string
feature.
-
normalized_value
: a
string
feature.
-
type
: a
string
feature.
-
value
: a
string
feature.
Data Splits
name
|
train
|
validation
|
test
|
rc
|
138384
|
18669
|
17210
|
rc.nocontext
|
138384
|
18669
|
17210
|
unfiltered
|
87622
|
11313
|
10832
|
unfiltered.nocontext
|
87622
|
11313
|
10832
|
Dataset Creation
Curation Rationale
More Information Needed
Source Data
Initial Data Collection and Normalization
More Information Needed
Who are the source language producers?
More Information Needed
Annotations
Annotation process
More Information Needed
Who are the annotators?
More Information Needed
Personal and Sensitive Information
More Information Needed
Considerations for Using the Data
Social Impact of Dataset
More Information Needed
Discussion of Biases
More Information Needed
Other Known Limitations
More Information Needed
Additional Information
Dataset Curators
More Information Needed
Licensing Information
The University of Washington does not own the copyright of the questions and documents included in TriviaQA.
Citation Information
@article{2017arXivtriviaqa,
author = {{Joshi}, Mandar and {Choi}, Eunsol and {Weld},
Daniel and {Zettlemoyer}, Luke},
title = "{triviaqa: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension}",
journal = {arXiv e-prints},
year = 2017,
eid = {arXiv:1705.03551},
pages = {arXiv:1705.03551},
archivePrefix = {arXiv},
eprint = {1705.03551},
}
Contributions
Thanks to
@thomwolf
,
@patrickvonplaten
,
@lewtun
for adding this dataset.