To train, fine-tune or play with the model you will need to install NVIDIA NeMo . We recommend you install it after you've installed latest Pytorch version.
pip install nemo_toolkit['all']
The model is available for use in the NeMo toolkit [1], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("ypluit/stt_kr_citrinet1024_PublicCallCenter_1000H_0.22")
First, let's get a sample
get any korean telephone voice wave file
Then simply do:
asr_model.transcribe(['sample-kr.wav'])
python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py pretrained_name="model" audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
This model accepts 16000Hz Mono-channel Audio (wav files) as input.
This model provides transcribed speech as a string for a given audio sample.
See nemo toolkit and reference papers.
Learned about 30 days on 2 A6000
Private call center real data (1100hour)
< 0.13 CER
This model was trained with 650 hours of Korean telephone voice data for customer service in a call center. might be Poor performance for general-purpose dialogue and specific accents.