Massively Multilingual Speech (MMS) - Finetuned ASR - L1107
This checkpoint is a model fine-tuned for multi-lingual ASR and part of Facebook's
Massive Multilingual Speech project
.
This checkpoint is based on the
Wav2Vec2 architecture
and makes use of adapter models to transcribe 1000+ languages.
The checkpoint consists of
1 billion parameters
and has been fine-tuned from
facebook/mms-1b
on 1107 languages.
Table Of Content
-
Example
-
Supported Languages
-
Model details
-
Additional links
Example
This MMS checkpoint can be used with
Transformers
to transcribe audio of 1107 different
languages. Let's look at a simple example.
First, we install transformers and some other libraries
pip install torch accelerate torchaudio datasets
pip install --upgrade transformers
Note
: In order to use MMS you need to have at least
transformers >= 4.30
installed. If the
4.30
version
is not yet available
on PyPI
make sure to install
transformers
from
source:
pip install git+https://github.com/huggingface/transformers.git
Next, we load a couple of audio samples via
datasets
. Make sure that the audio data is sampled to 16000 kHz.
from datasets import load_dataset, Audio
# English
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "en", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
en_sample = next(iter(stream_data))["audio"]["array"]
# French
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "fr", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
fr_sample = next(iter(stream_data))["audio"]["array"]
Next, we load the model and processor
from transformers import Wav2Vec2ForCTC, AutoProcessor
import torch
model_id = "facebook/mms-1b-l1107"
processor = AutoProcessor.from_pretrained(model_id)
model = Wav2Vec2ForCTC.from_pretrained(model_id)
Now we process the audio data, pass the processed audio data to the model and transcribe the model output, just like we usually do for Wav2Vec2 models such as
facebook/wav2vec2-base-960h
inputs = processor(en_sample, sampling_rate=16_000, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs).logits
ids = torch.argmax(outputs, dim=-1)[0]
transcription = processor.decode(ids)
# 'joe keton disapproved of films and buster also had reservations about the media'
We can now keep the same model in memory and simply switch out the language adapters by calling the convenient
load_adapter()
function for the model and
set_target_lang()
for the tokenizer. We pass the target language as an input - "fra" for French.
processor.tokenizer.set_target_lang("fra")
model.load_adapter("fra")
inputs = processor(fr_sample, sampling_rate=16_000, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs).logits
ids = torch.argmax(outputs, dim=-1)[0]
transcription = processor.decode(ids)
# "ce dernier est volé tout au long de l'histoire romaine"
In the same way the language can be switched out for all other supported languages. Please have a look at:
processor.tokenizer.vocab.keys()
For more details, please have a look at
the official docs
.
Supported Languages
This model supports 1107 languages. Unclick the following to toogle all supported languages of this checkpoint in
ISO 639-3 code
.
You can find more details about the languages and their ISO 649-3 codes in the
MMS Language Coverage Overview
.
Click to toggle
-
abi
-
abp
-
aca
-
acd
-
ace
-
acf
-
ach
-
acn
-
acr
-
acu
-
ade
-
adh
-
adj
-
adx
-
aeu
-
agd
-
agg
-
agn
-
agr
-
agu
-
agx
-
aha
-
ahk
-
aia
-
aka
-
akb
-
ake
-
akp
-
alj
-
alp
-
alt
-
alz
-
ame
-
amf
-
amh
-
ami
-
amk
-
ann
-
any
-
aoz
-
apb
-
apr
-
ara
-
arl
-
asa
-
asg
-
asm
-
ata
-
atb
-
atg
-
ati
-
atq
-
ava
-
avn
-
avu
-
awa
-
awb
-
ayo
-
ayr
-
ayz
-
azb
-
azg
-
azj-script_cyrillic
-
azj-script_latin
-
azz
-
bak
-
bam
-
ban
-
bao
-
bav
-
bba
-
bbb
-
bbc
-
bbo
-
bcc-script_arabic
-
bcc-script_latin
-
bcl
-
bcw
-
bdg
-
bdh
-
bdq
-
bdu
-
bdv
-
beh
-
bem
-
ben
-
bep
-
bex
-
bfa
-
bfo
-
bfy
-
bfz
-
bgc
-
bgq
-
bgr
-
bgt
-
bgw
-
bha
-
bht
-
bhz
-
bib
-
bim
-
bis
-
biv
-
bjr
-
bjv
-
bjw
-
bjz
-
bkd
-
bkv
-
blh
-
blt
-
blx
-
blz
-
bmq
-
bmr
-
bmu
-
bmv
-
bng
-
bno
-
bnp
-
boa
-
bod
-
boj
-
bom
-
bor
-
bov
-
box
-
bpr
-
bps
-
bqc
-
bqi
-
bqj
-
bqp
-
bru
-
bsc
-
bsq
-
bss
-
btd
-
bts
-
btt
-
btx
-
bud
-
bul
-
bus
-
bvc
-
bvz
-
bwq
-
bwu
-
byr
-
bzh
-
bzi
-
bzj
-
caa
-
cab
-
cac-dialect_sanmateoixtatan
-
cac-dialect_sansebastiancoatan
-
cak-dialect_central
-
cak-dialect_santamariadejesus
-
cak-dialect_santodomingoxenacoj
-
cak-dialect_southcentral
-
cak-dialect_western
-
cak-dialect_yepocapa
-
cap
-
car
-
cas
-
cat
-
cax
-
cbc
-
cbi
-
cbr
-
cbs
-
cbt
-
cbu
-
cbv
-
cce
-
cco
-
cdj
-
ceb
-
ceg
-
cek
-
cfm
-
cgc
-
chf
-
chv
-
chz
-
cjo
-
cjp
-
cjs
-
cko
-
ckt
-
cla
-
cle
-
cly
-
cme
-
cmo-script_khmer
-
cmo-script_latin
-
cmr
-
cnh
-
cni
-
cnl
-
cnt
-
coe
-
cof
-
cok
-
con
-
cot
-
cou
-
cpa
-
cpb
-
cpu
-
crh
-
crk-script_latin
-
crk-script_syllabics
-
crn
-
crq
-
crs
-
crt
-
csk
-
cso
-
ctd
-
ctg
-
cto
-
ctu
-
cuc
-
cui
-
cuk
-
cul
-
cwa
-
cwe
-
cwt
-
cya
-
cym
-
daa
-
dah
-
dar
-
dbj
-
dbq
-
ddn
-
ded
-
des
-
deu
-
dga
-
dgi
-
dgk
-
dgo
-
dgr
-
dhi
-
did
-
dig
-
dik
-
dip
-
div
-
djk
-
dnj-dialect_blowowest
-
dnj-dialect_gweetaawueast
-
dnt
-
dnw
-
dop
-
dos
-
dsh
-
dso
-
dtp
-
dts
-
dug
-
dwr
-
dyi
-
dyo
-
dyu
-
dzo
-
eip
-
eka
-
ell
-
emp
-
enb
-
eng
-
enx
-
ese
-
ess
-
eus
-
evn
-
ewe
-
eza
-
fal
-
fao
-
far
-
fas
-
fij
-
fin
-
flr
-
fmu
-
fon
-
fra
-
frd
-
ful
-
gag-script_cyrillic
-
gag-script_latin
-
gai
-
gam
-
gau
-
gbi
-
gbk
-
gbm
-
gbo
-
gde
-
geb
-
gej
-
gil
-
gjn
-
gkn
-
gld
-
glk
-
gmv
-
gna
-
gnd
-
gng
-
gof-script_latin
-
gog
-
gor
-
gqr
-
grc
-
gri
-
grn
-
grt
-
gso
-
gub
-
guc
-
gud
-
guh
-
guj
-
guk
-
gum
-
guo
-
guq
-
guu
-
gux
-
gvc
-
gvl
-
gwi
-
gwr
-
gym
-
gyr
-
had
-
hag
-
hak
-
hap
-
hat
-
hau
-
hay
-
heb
-
heh
-
hif
-
hig
-
hil
-
hin
-
hlb
-
hlt
-
hne
-
hnn
-
hns
-
hoc
-
hoy
-
hto
-
hub
-
hui
-
hun
-
hus-dialect_centralveracruz
-
hus-dialect_westernpotosino
-
huu
-
huv
-
hvn
-
hwc
-
hyw
-
iba
-
icr
-
idd
-
ifa
-
ifb
-
ife
-
ifk
-
ifu
-
ify
-
ign
-
ikk
-
ilb
-
ilo
-
imo
-
inb
-
ind
-
iou
-
ipi
-
iqw
-
iri
-
irk
-
isl
-
itl
-
itv
-
ixl-dialect_sangasparchajul
-
ixl-dialect_sanjuancotzal
-
ixl-dialect_santamarianebaj
-
izr
-
izz
-
jac
-
jam
-
jav
-
jbu
-
jen
-
jic
-
jiv
-
jmc
-
jmd
-
jun
-
juy
-
jvn
-
kaa
-
kab
-
kac
-
kak
-
kan
-
kao
-
kaq
-
kay
-
kaz
-
kbo
-
kbp
-
kbq
-
kbr
-
kby
-
kca
-
kcg
-
kdc
-
kde
-
kdh
-
kdi
-
kdj
-
kdl
-
kdn
-
kdt
-
kek
-
ken
-
keo
-
ker
-
key
-
kez
-
kfb
-
kff-script_telugu
-
kfw
-
kfx
-
khg
-
khm
-
khq
-
kia
-
kij
-
kik
-
kin
-
kir
-
kjb
-
kje
-
kjg
-
kjh
-
kki
-
kkj
-
kle
-
klu
-
klv
-
klw
-
kma
-
kmd
-
kml
-
kmr-script_arabic
-
kmr-script_cyrillic
-
kmr-script_latin
-
kmu
-
knb
-
kne
-
knf
-
knj
-
knk
-
kno
-
kog
-
kor
-
kpq
-
kps
-
kpv
-
kpy
-
kpz
-
kqe
-
kqp
-
kqr
-
kqy
-
krc
-
kri
-
krj
-
krl
-
krr
-
krs
-
kru
-
ksb
-
ksr
-
kss
-
ktb
-
ktj
-
kub
-
kue
-
kum
-
kus
-
kvn
-
kvw
-
kwd
-
kwf
-
kwi
-
kxc
-
kxf
-
kxm
-
kxv
-
kyb
-
kyc
-
kyf
-
kyg
-
kyo
-
kyq
-
kyu
-
kyz
-
kzf
-
lac
-
laj
-
lam
-
lao
-
las
-
lat
-
lav
-
law
-
lbj
-
lbw
-
lcp
-
lee
-
lef
-
lem
-
lew
-
lex
-
lgg
-
lgl
-
lhu
-
lia
-
lid
-
lif
-
lip
-
lis
-
lje
-
ljp
-
llg
-
lln
-
lme
-
lnd
-
lns
-
lob
-
lok
-
lom
-
lon
-
loq
-
lsi
-
lsm
-
luc
-
lug
-
lwo
-
lww
-
lzz
-
maa-dialect_sanantonio
-
maa-dialect_sanjeronimo
-
mad
-
mag
-
mah
-
mai
-
maj
-
mak
-
mal
-
mam-dialect_central
-
mam-dialect_northern
-
mam-dialect_southern
-
mam-dialect_western
-
maq
-
mar
-
maw
-
maz
-
mbb
-
mbc
-
mbh
-
mbj
-
mbt
-
mbu
-
mbz
-
mca
-
mcb
-
mcd
-
mco
-
mcp
-
mcq
-
mcu
-
mda
-
mdv
-
mdy
-
med
-
mee
-
mej
-
men
-
meq
-
met
-
mev
-
mfe
-
mfh
-
mfi
-
mfk
-
mfq
-
mfy
-
mfz
-
mgd
-
mge
-
mgh
-
mgo
-
mhi
-
mhr
-
mhu
-
mhx
-
mhy
-
mib
-
mie
-
mif
-
mih
-
mil
-
mim
-
min
-
mio
-
mip
-
miq
-
mit
-
miy
-
miz
-
mjl
-
mjv
-
mkl
-
mkn
-
mlg
-
mmg
-
mnb
-
mnf
-
mnk
-
mnw
-
mnx
-
moa
-
mog
-
mon
-
mop
-
mor
-
mos
-
mox
-
moz
-
mpg
-
mpm
-
mpp
-
mpx
-
mqb
-
mqf
-
mqj
-
mqn
-
mrw
-
msy
-
mtd
-
mtj
-
mto
-
muh
-
mup
-
mur
-
muv
-
muy
-
mvp
-
mwq
-
mwv
-
mxb
-
mxq
-
mxt
-
mxv
-
mya
-
myb
-
myk
-
myl
-
myv
-
myx
-
myy
-
mza
-
mzi
-
mzj
-
mzk
-
mzm
-
mzw
-
nab
-
nag
-
nan
-
nas
-
naw
-
nca
-
nch
-
ncj
-
ncl
-
ncu
-
ndj
-
ndp
-
ndv
-
ndy
-
ndz
-
neb
-
new
-
nfa
-
nfr
-
nga
-
ngl
-
ngp
-
ngu
-
nhe
-
nhi
-
nhu
-
nhw
-
nhx
-
nhy
-
nia
-
nij
-
nim
-
nin
-
nko
-
nlc
-
nld
-
nlg
-
nlk
-
nmz
-
nnb
-
nnq
-
nnw
-
noa
-
nod
-
nog
-
not
-
npl
-
npy
-
nst
-
nsu
-
ntm
-
ntr
-
nuj
-
nus
-
nuz
-
nwb
-
nxq
-
nya
-
nyf
-
nyn
-
nyo
-
nyy
-
nzi
-
obo
-
ojb-script_latin
-
ojb-script_syllabics
-
oku
-
old
-
omw
-
onb
-
ood
-
orm
-
ory
-
oss
-
ote
-
otq
-
ozm
-
pab
-
pad
-
pag
-
pam
-
pan
-
pao
-
pap
-
pau
-
pbb
-
pbc
-
pbi
-
pce
-
pcm
-
peg
-
pez
-
pib
-
pil
-
pir
-
pis
-
pjt
-
pkb
-
pls
-
plw
-
pmf
-
pny
-
poh-dialect_eastern
-
poh-dialect_western
-
poi
-
pol
-
por
-
poy
-
ppk
-
pps
-
prf
-
prk
-
prt
-
pse
-
pss
-
ptu
-
pui
-
pwg
-
pww
-
pxm
-
qub
-
quc-dialect_central
-
quc-dialect_east
-
quc-dialect_north
-
quf
-
quh
-
qul
-
quw
-
quy
-
quz
-
qvc
-
qve
-
qvh
-
qvm
-
qvn
-
qvo
-
qvs
-
qvw
-
qvz
-
qwh
-
qxh
-
qxl
-
qxn
-
qxo
-
qxr
-
rah
-
rai
-
rap
-
rav
-
raw
-
rej
-
rel
-
rgu
-
rhg
-
rif-script_arabic
-
rif-script_latin
-
ril
-
rim
-
rjs
-
rkt
-
rmc-script_cyrillic
-
rmc-script_latin
-
rmo
-
rmy-script_cyrillic
-
rmy-script_latin
-
rng
-
rnl
-
rol
-
ron
-
rop
-
rro
-
rub
-
ruf
-
rug
-
run
-
rus
-
sab
-
sag
-
sah
-
saj
-
saq
-
sas
-
sba
-
sbd
-
sbl
-
sbp
-
sch
-
sck
-
sda
-
sea
-
seh
-
ses
-
sey
-
sgb
-
sgj
-
sgw
-
shi
-
shk
-
shn
-
sho
-
shp
-
sid
-
sig
-
sil
-
sja
-
sjm
-
sld
-
slu
-
sml
-
smo
-
sna
-
sne
-
snn
-
snp
-
snw
-
som
-
soy
-
spa
-
spp
-
spy
-
sqi
-
sri
-
srm
-
srn
-
srx
-
stn
-
stp
-
suc
-
suk
-
sun
-
sur
-
sus
-
suv
-
suz
-
swe
-
swh
-
sxb
-
sxn
-
sya
-
syl
-
sza
-
tac
-
taj
-
tam
-
tao
-
tap
-
taq
-
tat
-
tav
-
tbc
-
tbg
-
tbk
-
tbl
-
tby
-
tbz
-
tca
-
tcc
-
tcs
-
tcz
-
tdj
-
ted
-
tee
-
tel
-
tem
-
teo
-
ter
-
tes
-
tew
-
tex
-
tfr
-
tgj
-
tgk
-
tgl
-
tgo
-
tgp
-
tha
-
thk
-
thl
-
tih
-
tik
-
tir
-
tkr
-
tlb
-
tlj
-
tly
-
tmc
-
tmf
-
tna
-
tng
-
tnk
-
tnn
-
tnp
-
tnr
-
tnt
-
tob
-
toc
-
toh
-
tom
-
tos
-
tpi
-
tpm
-
tpp
-
tpt
-
trc
-
tri
-
trn
-
trs
-
tso
-
tsz
-
ttc
-
tte
-
ttq-script_tifinagh
-
tue
-
tuf
-
tuk-script_arabic
-
tuk-script_latin
-
tuo
-
tur
-
tvw
-
twb
-
twe
-
twu
-
txa
-
txq
-
txu
-
tye
-
tzh-dialect_bachajon
-
tzh-dialect_tenejapa
-
tzj-dialect_eastern
-
tzj-dialect_western
-
tzo-dialect_chamula
-
tzo-dialect_chenalho
-
ubl
-
ubu
-
udm
-
udu
-
uig-script_arabic
-
uig-script_cyrillic
-
ukr
-
unr
-
upv
-
ura
-
urb
-
urd-script_arabic
-
urd-script_devanagari
-
urd-script_latin
-
urk
-
urt
-
ury
-
usp
-
uzb-script_cyrillic
-
vag
-
vid
-
vie
-
vif
-
vmw
-
vmy
-
vun
-
vut
-
wal-script_ethiopic
-
wal-script_latin
-
wap
-
war
-
waw
-
way
-
wba
-
wlo
-
wlx
-
wmw
-
wob
-
wsg
-
wwa
-
xal
-
xdy
-
xed
-
xer
-
xmm
-
xnj
-
xnr
-
xog
-
xon
-
xrb
-
xsb
-
xsm
-
xsr
-
xsu
-
xta
-
xtd
-
xte
-
xtm
-
xtn
-
xua
-
xuo
-
yaa
-
yad
-
yal
-
yam
-
yao
-
yas
-
yat
-
yaz
-
yba
-
ybb
-
ycl
-
ycn
-
yea
-
yka
-
yli
-
yor
-
yre
-
yua
-
yuz
-
yva
-
zaa
-
zab
-
zac
-
zad
-
zae
-
zai
-
zam
-
zao
-
zaq
-
zar
-
zas
-
zav
-
zaw
-
zca
-
zga
-
zim
-
ziw
-
zlm
-
zmz
-
zne
-
zos
-
zpc
-
zpg
-
zpi
-
zpl
-
zpm
-
zpo
-
zpt
-
zpu
-
zpz
-
ztq
-
zty
-
zyb
-
zyp
-
zza
Model details
-
Developed by:
Vineel Pratap et al.
-
Model type:
Multi-Lingual Automatic Speech Recognition model
-
Language(s):
1000+ languages, see
supported languages
-
License:
CC-BY-NC 4.0 license
-
Num parameters
: 1 billion
-
Audio sampling rate
: 16,000 kHz
-
Cite as:
@article{pratap2023mms,
title={Scaling Speech Technology to 1,000+ Languages},
author={Vineel Pratap and Andros Tjandra and Bowen Shi and Paden Tomasello and Arun Babu and Sayani Kundu and Ali Elkahky and Zhaoheng Ni and Apoorv Vyas and Maryam Fazel-Zarandi and Alexei Baevski and Yossi Adi and Xiaohui Zhang and Wei-Ning Hsu and Alexis Conneau and Michael Auli},
journal={arXiv},
year={2023}
}
Additional Links