数据集:
mkb
Indian Prime Minister's speeches - Mann Ki Baat, on All India Radio, translated into many languages.
[MORE INFORMATION NEEDED]
Hindi, Telugu, Tamil, Malayalam, Gujarati, Urdu, Bengali, Oriya, Marathi, Punjabi, and English
[MORE INFORMATION NEEDED]
[MORE INFORMATION NEEDED]
[MORE INFORMATION NEEDED]
[MORE INFORMATION NEEDED]
Initial Data Collection and Normalization[MORE INFORMATION NEEDED]
Who are the source language producers?[MORE INFORMATION NEEDED]
[MORE INFORMATION NEEDED]
Who are the annotators?[MORE INFORMATION NEEDED]
[MORE INFORMATION NEEDED]
[MORE INFORMATION NEEDED]
[MORE INFORMATION NEEDED]
[MORE INFORMATION NEEDED]
[MORE INFORMATION NEEDED]
The datasets and pretrained models provided here are licensed under Creative Commons Attribution-ShareAlike 4.0 International License.
@misc{siripragada2020multilingual,
title={A Multilingual Parallel Corpora Collection Effort for Indian Languages},
author={Shashank Siripragada and Jerin Philip and Vinay P. Namboodiri and C V Jawahar},
year={2020},
eprint={2007.07691},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Thanks to @vasudevgupta7 for adding this dataset.