模型:
NeuML/ljspeech-vits-onnx
espnet/kan-bayashi_ljspeech_vits exported to ONNX. This model is an ONNX export using the espnet_onnx library.
txtai has a built in Text to Speech (TTS) pipeline that makes using this model easy.
import soundfile as sf from txtai.pipeline import TextToSpeech # Build pipeline tts = TextToSpeech("NeuML/ljspeech-vits-onnx") # Generate speech speech = tts("Say something here") # Write to file sf.write("out.wav", speech, 22050)
This model can also be run directly with ONNX provided the input text is tokenized. Tokenization can be done with ttstokenizer .
Note that the txtai pipeline has additional functionality such as batching large inputs together that would need to be duplicated with this method.
import onnxruntime import soundfile as sf import yaml from ttstokenizer import TTSTokenizer # This example assumes the files have been downloaded locally with open("ljspeech-vits-onnx/config.yaml", "r", encoding="utf-8") as f: config = yaml.safe_load(f) # Create model model = onnxruntime.InferenceSession( "ljspeech-vits-onnx/model.onnx", providers=["CPUExecutionProvider"] ) # Create tokenizer tokenizer = TTSTokenizer(config["token"]["list"]) # Tokenize inputs inputs = tokenizer("Say something here") # Generate speech outputs = model.run(None, {"text": inputs}) # Write to file sf.write("out.wav", outputs[0], 22050)
More information on how to export ESPnet models to ONNX can be found here .