数据集:
cfq
语言:
en计算机处理:
monolingual大小:
100K<n<1M语言创建人:
expert-generated批注创建人:
no-annotation源数据集:
original预印本库:
arxiv:1912.09713其他:
compositionality许可:
cc-by-4.0The Compositional Freebase Questions (CFQ) is a dataset that is specifically designed to measure compositional generalization. CFQ is a simple yet realistic, large dataset of natural language questions and answers that also provides for each question a corresponding SPARQL query against the Freebase knowledge base. This means that CFQ can also be used for semantic parsing.
English ( en ).
An example of 'train' looks as follows.
{ 'query': 'SELECT count(*) WHERE {\n?x0 a ns:people.person .\n?x0 ns:influence.influence_node.influenced M1 .\n?x0 ns:influence.influence_node.influenced M2 .\n?x0 ns:people.person.spouse_s/ns:people.marriage.spouse|ns:fictional_universe.fictional_character.married_to/ns:fictional_universe.marriage_of_fictional_characters.spouses ?x1 .\n?x1 a ns:film.cinematographer .\nFILTER ( ?x0 != ?x1 )\n}', 'question': 'Did a person marry a cinematographer , influence M1 , and influence M2' }mcd2
An example of 'train' looks as follows.
{ 'query': 'SELECT count(*) WHERE {\n?x0 ns:people.person.parents|ns:fictional_universe.fictional_character.parents|ns:organization.organization.parent/ns:organization.organization_relationship.parent ?x1 .\n?x1 a ns:people.person .\nM1 ns:business.employer.employees/ns:business.employment_tenure.person ?x0 .\nM1 ns:business.employer.employees/ns:business.employment_tenure.person M2 .\nM1 ns:business.employer.employees/ns:business.employment_tenure.person M3 .\nM1 ns:business.employer.employees/ns:business.employment_tenure.person M4 .\nM5 ns:business.employer.employees/ns:business.employment_tenure.person ?x0 .\nM5 ns:business.employer.employees/ns:business.employment_tenure.person M2 .\nM5 ns:business.employer.employees/ns:business.employment_tenure.person M3 .\nM5 ns:business.employer.employees/ns:business.employment_tenure.person M4\n}', 'question': "Did M1 and M5 employ M2 , M3 , and M4 and employ a person 's child" }mcd3
An example of 'train' looks as follows.
{ "query": "SELECT /producer M0 . /director M0 . ", "question": "Who produced and directed M0?" }query_complexity_split
An example of 'train' looks as follows.
{ "query": "SELECT /producer M0 . /director M0 . ", "question": "Who produced and directed M0?" }query_pattern_split
An example of 'train' looks as follows.
{ "query": "SELECT /producer M0 . /director M0 . ", "question": "Who produced and directed M0?" }
The data fields are the same among all splits and configurations:
name | train | test |
---|---|---|
mcd1 | 95743 | 11968 |
mcd2 | 95743 | 11968 |
mcd3 | 95743 | 11968 |
query_complexity_split | 100654 | 9512 |
query_pattern_split | 94600 | 12589 |
question_complexity_split | 98999 | 10340 |
question_pattern_split | 95654 | 11909 |
random_split | 95744 | 11967 |
@inproceedings{Keysers2020, title={Measuring Compositional Generalization: A Comprehensive Method on Realistic Data}, author={Daniel Keysers and Nathanael Sch"{a}rli and Nathan Scales and Hylke Buisman and Daniel Furrer and Sergii Kashubin and Nikola Momchev and Danila Sinopalnikov and Lukasz Stafiniak and Tibor Tihon and Dmitry Tsarkov and Xiao Wang and Marc van Zee and Olivier Bousquet}, booktitle={ICLR}, year={2020}, url={https://arxiv.org/abs/1912.09713.pdf}, }
Thanks to @thomwolf , @patrickvonplaten , @lewtun , @brainshawn for adding this dataset.