数据集:
TUKE-DeutscheTelekom/skquad
SK-QuAD is the first QA dataset for the Slovak language. It is manually annotated, so it has no distortion caused by machine translation. The dataset is thematically diverse – it does not overlap with SQuAD – it brings new knowledge. It passed the second round of annotation – each question and the answer were seen by at least two annotators.
This example was too long and was cropped: { "answers": { "answer_start": [94, 87, 94, 94], "text": ["10th and 11th centuries", "in the 10th and 11th centuries", "10th and 11th centuries", "10th and 11th centuries"] }, "context": "\"The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave thei...", "id": "56ddde6b9a695914005b9629", "question": "When were the Normans in Normandy?", "title": "Normans" }
The data fields are the same among all splits.
squad_v2Train | Dev | Translated | |
---|---|---|---|
Documents | 8,377 | 940 | 442 |
Paragraphs | 22,062 | 2,568 | 18,931 |
Questions | 81,582 | 9,583 | 120,239 |
Answers | 65,839 | 7,822 | 79,978 |
Unanswerable | 15,877 | 1,784 | 40,261 |
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
[More Information Needed]
Thanks to @github-username for adding this dataset.