数据集:
mbshr/XSUMUrdu-DW_BBC
Urdu Summarization Dataset containining 76,637 records of Article + Summary pairs scrapped from BBC Urdu and DW Urdu News Websites. -Preprocessed Version upto 512 tokens (~words); removed URLs, Pic Captions etc
Summarization -Extractive and Abstractive -urT5 (monolingual vocabulary; Urdu of 40k tokens) adapted from mT5 with own vocabulary was fine-tuned -ROUGE-1 F Score: 40.03 combined, 46.35 BBC Urdu datapoints only and 36.91 DW Urdu datapoints only -BERTScore: 75.1 combined, 77.0 BBC Urdu datapoints only and 74.16 DW Urdu datapoints only
Urdu.
[More Information Needed]
- url: URL of the article from where it was scrapped (BBC Urdu URLs in english topic text with number & DW Urdu with Urdu topic text) dtype: {string} - Summary: Short Summary of article written by author of article like highlights. dtype: {string} - Text: Complete Text of article which are intelligently trucated to 512 tokens. dtype: {string}
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]