数据集:
mkb
Indian Prime Minister's speeches - Mann Ki Baat, on All India Radio, translated into many languages.
[MORE INFORMATION NEEDED]
Hindi, Telugu, Tamil, Malayalam, Gujarati, Urdu, Bengali, Oriya, Marathi, Punjabi, and English
[MORE INFORMATION NEEDED]
[MORE INFORMATION NEEDED]
[MORE INFORMATION NEEDED]
[MORE INFORMATION NEEDED]
Initial Data Collection and Normalization[MORE INFORMATION NEEDED]
Who are the source language producers?[MORE INFORMATION NEEDED]
[MORE INFORMATION NEEDED]
Who are the annotators?[MORE INFORMATION NEEDED]
[MORE INFORMATION NEEDED]
[MORE INFORMATION NEEDED]
[MORE INFORMATION NEEDED]
[MORE INFORMATION NEEDED]
[MORE INFORMATION NEEDED]
The datasets and pretrained models provided here are licensed under Creative Commons Attribution-ShareAlike 4.0 International License.
@misc{siripragada2020multilingual, title={A Multilingual Parallel Corpora Collection Effort for Indian Languages}, author={Shashank Siripragada and Jerin Philip and Vinay P. Namboodiri and C V Jawahar}, year={2020}, eprint={2007.07691}, archivePrefix={arXiv}, primaryClass={cs.CL} }
Thanks to @vasudevgupta7 for adding this dataset.