数据集:
msr_sqa
任务:
问答子任务:
extractive-qa语言:
en计算机处理:
monolingual大小:
10K<n<100K语言创建人:
found批注创建人:
crowdsourced源数据集:
original许可:
ms-plRecent work in semantic parsing for question answering has focused on long and complicated questions, many of which would seem unnatural if asked in a normal conversation between two humans. In an effort to explore a conversational QA setting, we present a more realistic task: answering sequences of simple but inter-related questions.
We created SQA by asking crowdsourced workers to decompose 2,022 questions from WikiTableQuestions (WTQ)*, which contains highly-compositional questions about tables from Wikipedia. We had three workers decompose each WTQ question, resulting in a dataset of 6,066 sequences that contain 17,553 questions in total. Each question is also associated with answers in the form of cell locations in the tables.
[More Information Needed]
English ( en ).
{'id': 'nt-639', 'annotator': 0, 'position': 0, 'question': 'where are the players from?', 'table_file': 'table_csv/203_149.csv', 'table_header': ['Pick', 'Player', 'Team', 'Position', 'School'], 'table_data': [['1', 'Ben McDonald', 'Baltimore Orioles', 'RHP', 'Louisiana State University'], ['2', 'Tyler Houston', 'Atlanta Braves', 'C', '"Valley HS (Las Vegas', ' NV)"'], ['3', 'Roger Salkeld', 'Seattle Mariners', 'RHP', 'Saugus (CA) HS'], ['4', 'Jeff Jackson', 'Philadelphia Phillies', 'OF', '"Simeon HS (Chicago', ' IL)"'], ['5', 'Donald Harris', 'Texas Rangers', 'OF', 'Texas Tech University'], ['6', 'Paul Coleman', 'Saint Louis Cardinals', 'OF', 'Frankston (TX) HS'], ['7', 'Frank Thomas', 'Chicago White Sox', '1B', 'Auburn University'], ['8', 'Earl Cunningham', 'Chicago Cubs', 'OF', 'Lancaster (SC) HS'], ['9', 'Kyle Abbott', 'California Angels', 'LHP', 'Long Beach State University'], ['10', 'Charles Johnson', 'Montreal Expos', 'C', '"Westwood HS (Fort Pierce', ' FL)"'], ['11', 'Calvin Murray', 'Cleveland Indians', '3B', '"W.T. White High School (Dallas', ' TX)"'], ['12', 'Jeff Juden', 'Houston Astros', 'RHP', 'Salem (MA) HS'], ['13', 'Brent Mayne', 'Kansas City Royals', 'C', 'Cal State Fullerton'], ['14', 'Steve Hosey', 'San Francisco Giants', 'OF', 'Fresno State University'], ['15', 'Kiki Jones', 'Los Angeles Dodgers', 'RHP', '"Hillsborough HS (Tampa', ' FL)"'], ['16', 'Greg Blosser', 'Boston Red Sox', 'OF', 'Sarasota (FL) HS'], ['17', 'Cal Eldred', 'Milwaukee Brewers', 'RHP', 'University of Iowa'], ['18', 'Willie Greene', 'Pittsburgh Pirates', 'SS', '"Jones County HS (Gray', ' GA)"'], ['19', 'Eddie Zosky', 'Toronto Blue Jays', 'SS', 'Fresno State University'], ['20', 'Scott Bryant', 'Cincinnati Reds', 'OF', 'University of Texas'], ['21', 'Greg Gohr', 'Detroit Tigers', 'RHP', 'Santa Clara University'], ['22', 'Tom Goodwin', 'Los Angeles Dodgers', 'OF', 'Fresno State University'], ['23', 'Mo Vaughn', 'Boston Red Sox', '1B', 'Seton Hall University'], ['24', 'Alan Zinter', 'New York Mets', 'C', 'University of Arizona'], ['25', 'Chuck Knoblauch', 'Minnesota Twins', '2B', 'Texas A&M University'], ['26', 'Scott Burrell', 'Seattle Mariners', 'RHP', 'Hamden (CT) HS']], 'answer_coordinates': {'row_index': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25], 'column_index': [4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4]}, 'answer_text': ['Louisiana State University', 'Valley HS (Las Vegas, NV)', 'Saugus (CA) HS', 'Simeon HS (Chicago, IL)', 'Texas Tech University', 'Frankston (TX) HS', 'Auburn University', 'Lancaster (SC) HS', 'Long Beach State University', 'Westwood HS (Fort Pierce, FL)', 'W.T. White High School (Dallas, TX)', 'Salem (MA) HS', 'Cal State Fullerton', 'Fresno State University', 'Hillsborough HS (Tampa, FL)', 'Sarasota (FL) HS', 'University of Iowa', 'Jones County HS (Gray, GA)', 'Fresno State University', 'University of Texas', 'Santa Clara University', 'Fresno State University', 'Seton Hall University', 'University of Arizona', 'Texas A&M University', 'Hamden (CT) HS']}
Note that some text fields may contain Tab or LF characters and thus start with quotes. It is recommended to use a CSV parser like the Python CSV package to process the data.
train | test | |
---|---|---|
N. examples | 14541 | 3012 |
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Microsoft Research Data License Agreement .
@inproceedings{iyyer-etal-2017-search, title = "Search-based Neural Structured Learning for Sequential Question Answering", author = "Iyyer, Mohit and Yih, Wen-tau and Chang, Ming-Wei", booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", month = jul, year = "2017", address = "Vancouver, Canada", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/P17-1167", doi = "10.18653/v1/P17-1167", pages = "1821--1831", }
Thanks to @mattbui for adding this dataset.