中文

GPT2 Medium 4 Persian

This is part of the Flax/Jax Community Week , organized by HuggingFace and TPU usage sponsored by Google.

Team Members

Dataset

We used Oscar dataset, which is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus.

How To Use

You can use this model directly with a pipeline for text generation.

from transformers import pipeline, AutoTokenizer, GPT2LMHeadModel
tokenizer = AutoTokenizer.from_pretrained('flax-community/gpt2-medium-persian')
model = GPT2LMHeadModel.from_pretrained('flax-community/gpt2-medium-persian')
generator = pipeline('text-generation', model, tokenizer=tokenizer, config={'max_length':100})
generated_text = generator('در یک اتفاق شگفت انگیز، پژوهشگران')

For using Tensorflow import TFGPT2LMHeadModel instead of GPT2LMHeadModel.

Demo

... SOON

Evaluation

... SOON