数据集:

trivia_qa

任务:

问答

文生文

子任务:

open-domain-qa open-domain-abstractive-qa extractive-qa

语言:

计算机处理:

monolingual

大小:

10K<n<100K 100K<n<1M

语言创建人:

machine-generated

批注创建人:

crowdsourced

源数据集:

original

预印本库:

arxiv:1705.03551

许可:

license:unknown

数据集介绍文件清单

中文

Dataset Card for "trivia_qa"

Dataset Summary

TriviaqQA is a reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaqQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions.

Supported Tasks and Leaderboards

More Information Needed

Languages

English.

Dataset Structure

Data Instances

Size of downloaded dataset files: 2.67 GB
Size of the generated dataset: 16.02 GB
Total amount of disk used: 18.68 GB

An example of 'train' looks as follows.

rc.nocontext

Size of downloaded dataset files: 2.67 GB
Size of the generated dataset: 126.27 MB
Total amount of disk used: 2.79 GB

An example of 'train' looks as follows.

unfiltered

Size of downloaded dataset files: 3.30 GB
Size of the generated dataset: 29.24 GB
Total amount of disk used: 32.54 GB

An example of 'validation' looks as follows.

unfiltered.nocontext

Size of downloaded dataset files: 632.55 MB
Size of the generated dataset: 74.56 MB
Total amount of disk used: 707.11 MB

An example of 'train' looks as follows.

Data Fields

The data fields are the same among all splits.

question : a string feature.
question_id : a string feature.
question_source : a string feature.
entity_pages : a dictionary feature containing:
- doc_source : a string feature.
- filename : a string feature.
- title : a string feature.
- wiki_context : a string feature.
search_results : a dictionary feature containing:
- description : a string feature.
- filename : a string feature.
- rank : a int32 feature.
- title : a string feature.
- url : a string feature.
- search_context : a string feature.
aliases : a list of string features.
normalized_aliases : a list of string features.
matched_wiki_entity_name : a string feature.
normalized_matched_wiki_entity_name : a string feature.
normalized_value : a string feature.
type : a string feature.
value : a string feature.

rc.nocontext

question : a string feature.
question_id : a string feature.
question_source : a string feature.
entity_pages : a dictionary feature containing:
- doc_source : a string feature.
- filename : a string feature.
- title : a string feature.
- wiki_context : a string feature.
search_results : a dictionary feature containing:
- description : a string feature.
- filename : a string feature.
- rank : a int32 feature.
- title : a string feature.
- url : a string feature.
- search_context : a string feature.
aliases : a list of string features.
normalized_aliases : a list of string features.
matched_wiki_entity_name : a string feature.
normalized_matched_wiki_entity_name : a string feature.
normalized_value : a string feature.
type : a string feature.
value : a string feature.

unfiltered

question : a string feature.
question_id : a string feature.
question_source : a string feature.
entity_pages : a dictionary feature containing:
- doc_source : a string feature.
- filename : a string feature.
- title : a string feature.
- wiki_context : a string feature.
search_results : a dictionary feature containing:
- description : a string feature.
- filename : a string feature.
- rank : a int32 feature.
- title : a string feature.
- url : a string feature.
- search_context : a string feature.
aliases : a list of string features.
normalized_aliases : a list of string features.
matched_wiki_entity_name : a string feature.
normalized_matched_wiki_entity_name : a string feature.
normalized_value : a string feature.
type : a string feature.
value : a string feature.

unfiltered.nocontext

question : a string feature.
question_id : a string feature.
question_source : a string feature.
entity_pages : a dictionary feature containing:
- doc_source : a string feature.
- filename : a string feature.
- title : a string feature.
- wiki_context : a string feature.
search_results : a dictionary feature containing:
- description : a string feature.
- filename : a string feature.
- rank : a int32 feature.
- title : a string feature.
- url : a string feature.
- search_context : a string feature.
aliases : a list of string features.
normalized_aliases : a list of string features.
matched_wiki_entity_name : a string feature.
normalized_matched_wiki_entity_name : a string feature.
normalized_value : a string feature.
type : a string feature.
value : a string feature.

Data Splits

name	train	validation	test
rc	138384	18669	17210
rc.nocontext	138384	18669	17210
unfiltered	87622	11313	10832
unfiltered.nocontext	87622	11313	10832

Dataset Creation

Curation Rationale

More Information Needed

Source Data

Initial Data Collection and Normalization

More Information Needed

Who are the source language producers?

More Information Needed

Annotations

Annotation process

More Information Needed

Who are the annotators?

More Information Needed

Personal and Sensitive Information

More Information Needed

Considerations for Using the Data

Additional Information

Dataset Curators

More Information Needed

Licensing Information

The University of Washington does not own the copyright of the questions and documents included in TriviaQA.

Citation Information

@article{2017arXivtriviaqa,
       author = {{Joshi}, Mandar and {Choi}, Eunsol and {Weld},
                 Daniel and {Zettlemoyer}, Luke},
        title = "{triviaqa: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension}",
      journal = {arXiv e-prints},
         year = 2017,
          eid = {arXiv:1705.03551},
        pages = {arXiv:1705.03551},
archivePrefix = {arXiv},
       eprint = {1705.03551},
}

Contributions

Thanks to @thomwolf , @patrickvonplaten , @lewtun for adding this dataset.

作者:

佚名

数据集大小:

3.27 GB

Dataset Card for "trivia_qa"

Dataset Summary

Supported Tasks and Leaderboards

Languages

Dataset Structure

Data Instances

Data Fields

Data Splits

Dataset Creation

Curation Rationale

Source Data

Annotations

Personal and Sensitive Information

Considerations for Using the Data

Social Impact of Dataset

Discussion of Biases

Other Known Limitations

Additional Information

Dataset Curators

Licensing Information

Citation Information

Contributions