模型:
mrm8488/distilbert-multi-finetuned-for-xqua-on-tydiqa
distilbert-base-multilingual-cased
TyDi QA contains 200k human-annotated question-answer pairs in 11 Typologically Diverse languages, written without seeing the answer and without the use of translation, and is designed for the training and evaluation of automatic question answering systems. This repository provides evaluation code and a baseline system for the dataset. https://ai.google.com/research/tydiqa
Given a passage that is guaranteed to contain the answer, predict the single contiguous span of characters that answers the question. the gold passage task differs from the primary task in several ways:
The model was fine-tuned on a Tesla P100 GPU and 25GB of RAM. The script is the following:
python transformers/examples/question-answering/run_squad.py \ --model_type distilbert \ --model_name_or_path distilbert-base-multilingual-cased \ --do_train \ --do_eval \ --train_file /path/to/dataset/train.json \ --predict_file /path/to/dataset/dev.json \ --per_gpu_train_batch_size 24 \ --per_gpu_eval_batch_size 24 \ --learning_rate 3e-5 \ --num_train_epochs 5 \ --max_seq_length 384 \ --doc_stride 128 \ --output_dir /content/model_output \ --overwrite_output_dir \ --save_steps 1000 \ --threads 400
Metric | # Value |
---|---|
EM | 63.85 |
F1 | 75.70 |
Language | # Samples | # EM | # F1 |
---|---|---|---|
Arabic | 1314 | 66.66 | 80.02 |
Bengali | 180 | 53.09 | 63.50 |
English | 654 | 62.42 | 73.12 |
Finnish | 1031 | 64.57 | 75.15 |
Indonesian | 773 | 67.89 | 79.70 |
Korean | 414 | 51.29 | 61.73 |
Russian | 1079 | 55.42 | 70.08 |
Swahili | 596 | 74.51 | 81.15 |
Telegu | 874 | 66.21 | 79.85 |
You can also try bert-multi-cased-finedtuned-xquad-tydiqa-goldp that achieves F1 = 82.16 and EM = 71.06 (And of course better marks per language).
Created by Manuel Romero/@mrm8488
Made with ♥ in Spain