hfl/cino-large | ATYUN.COM 官网-人工智能教程资讯全方位服务平台

模型:

hfl/cino-large

任务:

填充掩码

类库:

PyTorch TensorFlow Transformers

语言:

其他:

xlm-roberta AutoTrain Compatible

许可:

apache-2.0

模型介绍文件清单

中文

CINO: Pre-trained Language Models for Chinese Minority Languages（中国少数民族预训练模型）

Multilingual Pre-trained Language Model, such as mBERT, XLM-R, provide multilingual and cross-lingual ability for language understanding. We have seen rapid progress on building multilingual PLMs in recent year. However, there is a lack of contributions on building PLMs on Chines minority languages, which hinders researchers from building powerful NLP systems.

To address the absence of Chinese minority PLMs, Joint Laboratory of HIT and iFLYTEK Research (HFL) proposes CINO (Chinese-miNOrity pre-trained language model), which is built on XLM-R with additional pre-training using Chinese minority corpus, such as

Chinese，中文（zh）
Tibetan，藏语（bo）
Mongolian (Uighur form)，蒙语（mn）
Uyghur，维吾尔语（ug）
Kazakh (Arabic form)，哈萨克语（kk）
Korean，朝鲜语（ko）
Zhuang，壮语
Cantonese，粤语（yue）

Please read our GitHub repository for more details (Chinese): https://github.com/ymcui/Chinese-Minority-PLM

You may also interested in,

Chinese MacBERT: https://github.com/ymcui/MacBERT Chinese BERT series: https://github.com/ymcui/Chinese-BERT-wwm Chinese ELECTRA: https://github.com/ymcui/Chinese-ELECTRA Chinese XLNet: https://github.com/ymcui/Chinese-XLNet Knowledge Distillation Toolkit - TextBrewer: https://github.com/airaria/TextBrewer

More resources by HFL: https://github.com/ymcui/HFL-Anthology

作者:

Joint Laboratory of HIT and iFLYTEK Research (HFL)

数据集大小:

4.37 GB