模型:
facebook/mbart-large-50-many-to-one-mmt
This model is a fine-tuned checkpoint of mBART-large-50 . mbart-large-50-many-to-many-mmt is fine-tuned for multilingual machine translation. It was introduced in Multilingual Translation with Extensible Multilingual Pretraining and Finetuning paper. The model can translate directly between any pair of 50 languages.
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast article_hi = "संयुक्त राष्ट्र के प्रमुख का कहना है कि सीरिया में कोई सैन्य समाधान नहीं है" article_ar = "الأمين العام للأمم المتحدة يقول إنه لا يوجد حل عسكري في سوريا." model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-many-to-one-mmt") tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50-many-to-one-mmt") # translate Hindi to English tokenizer.src_lang = "hi_IN" encoded_hi = tokenizer(article_hi, return_tensors="pt") generated_tokens = model.generate(**encoded_hi) tokenizer.batch_decode(generated_tokens, skip_special_tokens=True) # => "The head of the UN says there is no military solution in Syria." # translate Arabic to English tokenizer.src_lang = "ar_AR" encoded_ar = tokenizer(article_ar, return_tensors="pt") generated_tokens = model.generate(**encoded_ar) tokenizer.batch_decode(generated_tokens, skip_special_tokens=True) # => "The Secretary-General of the United Nations says there is no military solution in Syria."
See the model hub to look for more fine-tuned versions.
Arabic (ar_AR), Czech (cs_CZ), German (de_DE), English (en_XX), Spanish (es_XX), Estonian (et_EE), Finnish (fi_FI), French (fr_XX), Gujarati (gu_IN), Hindi (hi_IN), Italian (it_IT), Japanese (ja_XX), Kazakh (kk_KZ), Korean (ko_KR), Lithuanian (lt_LT), Latvian (lv_LV), Burmese (my_MM), Nepali (ne_NP), Dutch (nl_XX), Romanian (ro_RO), Russian (ru_RU), Sinhala (si_LK), Turkish (tr_TR), Vietnamese (vi_VN), Chinese (zh_CN), Afrikaans (af_ZA), Azerbaijani (az_AZ), Bengali (bn_IN), Persian (fa_IR), Hebrew (he_IL), Croatian (hr_HR), Indonesian (id_ID), Georgian (ka_GE), Khmer (km_KH), Macedonian (mk_MK), Malayalam (ml_IN), Mongolian (mn_MN), Marathi (mr_IN), Polish (pl_PL), Pashto (ps_AF), Portuguese (pt_XX), Swedish (sv_SE), Swahili (sw_KE), Tamil (ta_IN), Telugu (te_IN), Thai (th_TH), Tagalog (tl_XX), Ukrainian (uk_UA), Urdu (ur_PK), Xhosa (xh_ZA), Galician (gl_ES), Slovene (sl_SI)
@article{tang2020multilingual, title={Multilingual Translation with Extensible Multilingual Pretraining and Finetuning}, author={Yuqing Tang and Chau Tran and Xian Li and Peng-Jen Chen and Naman Goyal and Vishrav Chaudhary and Jiatao Gu and Angela Fan}, year={2020}, eprint={2008.00401}, archivePrefix={arXiv}, primaryClass={cs.CL} }