Hebrew Dataset for ASR
[More Information Needed]
[More Information Needed]
{'audio': {'path': '/root/.cache/huggingface/datasets/downloads/extracted/8ce7402f6482c6053251d7f3000eec88668c994beb48b7ca7352e77ef810a0b6/train/e429593fede945c185897e378a5839f4198.wav', 'array': array([-0.00265503, -0.0018158 , -0.00149536, ..., -0.00135803, -0.00231934, -0.00190735]), 'sampling_rate': 16000}, 'sentence': 'היא מבינה אותי יותר מכל אחד אחר'}
[More Information Needed]
train | validation | |
---|---|---|
number of samples | 8000 | 2000 |
hours | 6.92 | 1.73 |
scraped data from youtube (channel כאן) with removing outliers (by length and ratio between length of the audio and sentences)
[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
@misc{imvladikon2022hebrew_speech_kan, author = {Gurevich, Vladimir}, title = {Hebrew Speech Recognition Dataset: Kan}, year = {2022}, howpublished = \url{https://huggingface.co/datasets/imvladikon/hebrew_speech_kan}, }
[More Information Needed]