模型:
mesolitica/wav2vec2-xls-r-300m-mixed
Finetuned https://huggingface.co/facebook/wav2vec2-xls-r-300m on https://github.com/huseinzol05/malaya-speech/tree/master/data/mixed-stt
This model was finetuned on 3 languages,
This model trained on a single RTX 3090 Ti 24GB VRAM, provided by https://mesolitica.com/ .
Evaluation set from https://github.com/huseinzol05/malaya-speech/tree/master/pretrained-model/prepare-stt with sizes,
len(malay), len(singlish), len(mandarin) -> (765, 3579, 614)
It achieves the following results on the evaluation set based on evaluate-gpu.ipynb :
Mixed evaluation,
CER: 0.0481054244857041 WER: 0.1322198446007387 CER with LM: 0.041196586938584696 WER with LM: 0.09880169127621556
Malay evaluation,
CER: 0.051636391937588406 WER: 0.19561999547293663 CER with LM: 0.03917689630621449 WER with LM: 0.12710746406824835
Singlish evaluation,
CER: 0.0494915200071987 WER: 0.12763802881676573 CER with LM: 0.04271234986432335 WER with LM: 0.09677160640413336
Mandarin evaluation,
CER: 0.035626554824269824 WER: 0.07993515937860181 CER with LM: 0.03487760945087219 WER with LM: 0.07536807168546154
Language model from https://huggingface.co/huseinzol05/language-model-bahasa-manglish-combined