Language model: roberta-urdu-small Model size: 125M Language: Urdu Training data: News data from urdu news resources in Pakistan
roberta-urdu-small is a language model for urdu language.
from transformers import pipeline fill_mask = pipeline("fill-mask", model="urduhack/roberta-urdu-small", tokenizer="urduhack/roberta-urdu-small")
roberta-urdu-small was trained on urdu news corpus. Training data was normalized using normalization module from urduhack to eliminate characters from other languages like arabic.
Urduhack is a Natural Language Processing (NLP) library for urdu language. Github: https://github.com/urduhack/urduhack