数据集:
Bingsu/KSS_Dataset
任务:
文本转语音语言:
ko计算机处理:
monolingual大小:
10K<n<100K语言创建人:
expert-generated批注创建人:
expert-generated源数据集:
original许可:
cc-by-nc-sa-4.0KSS Dataset is designed for the Korean text-to-speech task. It consists of audio files recorded by a professional female voice actoress and their aligned text extracted from my books. As a copyright holder, by courtesy of the publishers, I release this dataset to the public. To my best knowledge, this is the first publicly available speech dataset for Korean.
Each line in transcript.v.1.3.txt is delimited by | into six fields.
e.g.,
1/1_0470.wav|저는 보통 20분 정도 낮잠을 잡니다.|저는 보통 이십 분 정도 낮잠을 잡니다.|저는 보통 이십 분 정도 낮잠을 잡니다.|4.1|I usually take a nap for 20 minutes.
NC-SA 4.0. You CANNOT use this dataset for ANY COMMERCIAL purpose. Otherwise, you can freely use this.
If you want to cite KSS Dataset, please refer to this:
Kyubyong Park, KSS Dataset: Korean Single speaker Speech Dataset, https://kaggle.com/bryanpark/korean-single-speaker-speech-dataset , 2018
Check out this for a project using this KSS Dataset.
You can contact me at kbpark.linguist@gmail.com .
April, 2018.
Kyubyong Park
12,853 Korean audio files with transcription.
text-to-speech
korean
>>> from datasets import load_dataset >>> dataset = load_dataset("Bingsu/KSS_Dataset") >>> dataset["train"].features {'audio': Audio(sampling_rate=44100, mono=True, decode=True, id=None), 'original_script': Value(dtype='string', id=None), 'expanded_script': Value(dtype='string', id=None), 'decomposed_script': Value(dtype='string', id=None), 'duration': Value(dtype='float32', id=None), 'english_translation': Value(dtype='string', id=None)}
>>> dataset["train"][0] {'audio': {'path': None, 'array': array([ 0.00000000e+00, 3.05175781e-05, -4.57763672e-05, ..., 0.00000000e+00, -3.05175781e-05, -3.05175781e-05]), 'sampling_rate': 44100}, 'original_script': '그는 괜찮은 척하려고 애쓰는 것 같았다.', 'expanded_script': '그는 괜찮은 척하려고 애쓰는 것 같았다.', 'decomposed_script': '그는 괜찮은 척하려고 애쓰는 것 같았다.', 'duration': 3.5, 'english_translation': 'He seemed to be pretending to be okay.'}
train | |
---|---|
# of examples | 12853 |