数据集:
wikisql
任务:
语言:
计算机处理:
monolingual大小:
10K<n<100K批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1709.00103其他:
text-to-sql许可:
A large crowd-sourced dataset for developing natural language interfaces for relational databases.
WikiSQL is a dataset of 80654 hand-annotated examples of questions and SQL queries distributed across 24241 tables from Wikipedia.
An example of 'validation' looks as follows.
This example was too long and was cropped:
{
"phase": 1,
"question": "How would you answer a second test question?",
"sql": {
"agg": 0,
"conds": {
"column_index": [2],
"condition": ["Some Entity"],
"operator_index": [0]
},
"human_readable": "SELECT Header1 FROM table WHERE Another Header = Some Entity",
"sel": 0
},
"table": "{\"caption\": \"L\", \"header\": [\"Header1\", \"Header 2\", \"Another Header\"], \"id\": \"1-10015132-9\", \"name\": \"table_10015132_11\", \"page_i..."
}
The data fields are the same among all splits.
default| name | train | validation | test |
|---|---|---|---|
| default | 56355 | 8421 | 15878 |
@article{zhongSeq2SQL2017,
author = {Victor Zhong and
Caiming Xiong and
Richard Socher},
title = {Seq2SQL: Generating Structured Queries from Natural Language using
Reinforcement Learning},
journal = {CoRR},
volume = {abs/1709.00103},
year = {2017}
}
Thanks to @lewtun , @ghomasHudson , @thomwolf for adding this dataset.