数据集:
Someman/hindi-summarization
Hindi Text Short and Large Summarization Corpus is a collection of ~180k articles with their headlines and summary collected from Hindi News Websites.
This is a first of its kind Dataset in Hindi which can be used to benchmark models for Text summarization in Hindi. This does not contain articles contained in Hindi Text Short Summarization Corpus which is being released parallely with this Dataset.
The dataset retains original punctuation, numbers etc in the articles.
The language is Hindi.
MIT