数据集:
squad_it
任务:
问答语言:
it计算机处理:
monolingual语言创建人:
machine-generated批注创建人:
machine-generated源数据集:
extended|squad许可:
license:unknownSQuAD-it is derived from the SQuAD dataset and it is obtained through semi-automatic translation of the SQuAD dataset into Italian. It represents a large-scale dataset for open question answering processes on factoid questions in Italian. The dataset contains more than 60,000 question/answer pairs derived from the original English dataset. The dataset is split into training and test sets to support the replicability of the benchmarking of QA systems:
An example of 'train' looks as follows.
This example was too long and was cropped: { "answers": "{\"answer_start\": [243, 243, 243, 243, 243], \"text\": [\"evitare di essere presi di mira dal boicottaggio\", \"evitare di essere pres...", "context": "\"La crisi ha avuto un forte impatto sulle relazioni internazionali e ha creato una frattura all' interno della NATO. Alcune nazi...", "id": "5725b5a689a1e219009abd28", "question": "Perchè le nazioni europee e il Giappone si sono separati dagli Stati Uniti durante la crisi?" }
The data fields are the same among all splits.
defaultname | train | test |
---|---|---|
default | 54159 | 7609 |
@InProceedings{10.1007/978-3-030-03840-3_29, author="Croce, Danilo and Zelenanska, Alexandra and Basili, Roberto", editor="Ghidini, Chiara and Magnini, Bernardo and Passerini, Andrea and Traverso, Paolo", title="Neural Learning for Question Answering in Italian", booktitle="AI*IA 2018 -- Advances in Artificial Intelligence", year="2018", publisher="Springer International Publishing", address="Cham", pages="389--402", isbn="978-3-030-03840-3" }
Thanks to @thomwolf , @lewtun , @albertvillanova , @mariamabarham , @patrickvonplaten for adding this dataset.