This is part of the Flax/Jax Community Week , organized by HuggingFace and TPU usage sponsored by Google.
We used Oscar dataset, which is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus.
You can use this model directly with a pipeline for text generation.
from transformers import pipeline, AutoTokenizer, GPT2LMHeadModel tokenizer = AutoTokenizer.from_pretrained('flax-community/gpt2-medium-persian') model = GPT2LMHeadModel.from_pretrained('flax-community/gpt2-medium-persian') generator = pipeline('text-generation', model, tokenizer=tokenizer, config={'max_length':100}) generated_text = generator('در یک اتفاق شگفت انگیز، پژوهشگران')
For using Tensorflow import TFGPT2LMHeadModel instead of GPT2LMHeadModel.
... SOON
... SOON