模型:
Harveenchadha/wav2vec2-pretrained-clsril-23-10k
We present a CLSRIL-23 (Cross Lingual Speech Representations on Indic Languages), a self supervised learning based audio pre-trained model which learns cross lingual speech representations from raw audio across 23 Indic languages . It is built on top of wav2vec 2.0 which is solved by training a contrastive task over masked latent speech representations and jointly learns the quantization of latents shared across all languages.
Original Repo contains models in fairseq format.
Language | Data (In Hrs) |
---|---|
Assamese | 254.9 |
Bengali | 331.3 |
Bodo | 26.9 |
Dogri | 17.1 |
English | 819.7 |
Gujarati | 336.7 |
Hindi | 4563.7 |
Kannada | 451.8 |
Kashmiri | 67.8 |
Konkani | 36.8 |
Maithili | 113.8 |
Malayalam | 297.7 |
Manipuri | 171.9 |
Marathi | 458.2 |
Nepali | 31.6 |
Odia | 131.4 |
Punjabi | 486.05 |
Sanskrit | 58.8 |
Santali | 6.56 |
Sindhi | 16 |
Tamil | 542.6 |
Telugu | 302.8 |
Urdu | 259.68 |
Experimentation platform built on top of fairseq.