This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the FTSpeech dataset , being a dataset of 1,800 hours of transcribed speeches from the Danish parliament.
The model achieves the following WER scores (lower is better):
| Dataset | WER without LM | WER with 5-gram LM |
|---|---|---|
| Danish part of Common Voice 8.0 | 20.48 | 17.91 |
| Alvenir test set | 15.46 | 13.84 |
The use of this model needs to adhere to this license from the Danish Parliament .