模型:

facebook/mms-lid-1024

许可:

cc-by-nc-4.0

预印本库:

arxiv:2305.13516

其他:

mms wav2vec2

语言:

数据集:

3Agoogle/fleurs

类库:

Transformers Safetensors PyTorch

任务:

音频分类

模型介绍文件清单

中文

Massively Multilingual Speech (MMS) - Finetuned LID

This checkpoint is a model fine-tuned for speech language identification (LID) and part of Facebook's Massive Multilingual Speech project . This checkpoint is based on the Wav2Vec2 architecture and classifies raw audio input to a probability distribution over 1024 output classes (each class representing a language). The checkpoint consists of 1 billion parameters and has been fine-tuned from facebook/mms-1b on 1024 languages.

Table Of Content

Example
Supported Languages
Model details
Additional links

Example

This MMS checkpoint can be used with Transformers to identify the spoken language of an audio. It can recognize the following 1024 languages .

Let's look at a simple example.

First, we install transformers and some other libraries

pip install torch accelerate torchaudio datasets
pip install --upgrade transformers

Note : In order to use MMS you need to have at least transformers >= 4.30 installed. If the 4.30 version is not yet available on PyPI make sure to install transformers from source:

pip install git+https://github.com/huggingface/transformers.git

Next, we load a couple of audio samples via datasets . Make sure that the audio data is sampled to 16000 kHz.

from datasets import load_dataset, Audio

# English
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "en", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
en_sample = next(iter(stream_data))["audio"]["array"]

# Arabic
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "ar", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
ar_sample = next(iter(stream_data))["audio"]["array"]

Next, we load the model and processor

from transformers import Wav2Vec2ForSequenceClassification, AutoFeatureExtractor
import torch

model_id = "facebook/mms-lid-1024"

processor = AutoFeatureExtractor.from_pretrained(model_id)
model = Wav2Vec2ForSequenceClassification.from_pretrained(model_id)

Now we process the audio data, pass the processed audio data to the model to classify it into a language, just like we usually do for Wav2Vec2 audio classification models such as ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition

# English
inputs = processor(en_sample, sampling_rate=16_000, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs).logits

lang_id = torch.argmax(outputs, dim=-1)[0].item()
detected_lang = model.config.id2label[lang_id]
# 'eng'

# Arabic
inputs = processor(ar_sample, sampling_rate=16_000, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs).logits

lang_id = torch.argmax(outputs, dim=-1)[0].item()
detected_lang = model.config.id2label[lang_id]
# 'ara'

To see all the supported languages of a checkpoint, you can print out the language ids as follows:

processor.id2label.values()

For more details, about the architecture please have a look at the official docs .

Supported Languages

This model supports 1024 languages. Unclick the following to toogle all supported languages of this checkpoint in ISO 639-3 code . You can find more details about the languages and their ISO 649-3 codes in the MMS Language Coverage Overview .

Click to toggle

Model details

Developed by: Vineel Pratap et al.
Model type: Multi-Lingual Automatic Speech Recognition model
Language(s): 1024 languages, see supported languages
License: CC-BY-NC 4.0 license
Num parameters : 1 billion
Audio sampling rate : 16,000 kHz

Cite as:

@article{pratap2023mms,
  title={Scaling Speech Technology to 1,000+ Languages},
  author={Vineel Pratap and Andros Tjandra and Bowen Shi and Paden Tomasello and Arun Babu and Sayani Kundu and Ali Elkahky and Zhaoheng Ni and Apoorv Vyas and Maryam Fazel-Zarandi and Alexei Baevski and Yossi Adi and Xiaohui Zhang and Wei-Ning Hsu and Alexis Conneau and Michael Auli},
journal={arXiv},
year={2023}
}

Additional Links

Blog post
Transformers documentation .
Paper
GitHub Repository
Other MMS checkpoints
MMS base checkpoints:
- facebook/mms-1b
- facebook/mms-300m
Official Space

作者:

Meta AI

数据集大小:

7.21 GB