This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the FTSpeech dataset , being a dataset of 1,800 hours of transcribed speeches from the Danish parliament.
The model achieves the following WER scores (lower is better):
Dataset | WER without LM | WER with 5-gram LM |
---|---|---|
Danish part of Common Voice 8.0 | 20.48 | 17.91 |
Alvenir test set | 15.46 | 13.84 |
The use of this model needs to adhere to this license from the Danish Parliament .