模型:

xlm-mlm-100-1280

任务:

填充掩码

类库:

PyTorch TensorFlow Transformers

语言:

multilingual

其他:

xlm AutoTrain Compatible

预印本库:

arxiv:1901.07291 arxiv:1911.02116 arxiv:1910.09700

许可:

cc-by-nc-4.0

模型介绍文件清单

中文

xlm-mlm-100-1280

Model Details

Uses

Bias, Risks, and Limitations

Training

Evaluation

Environmental Impact

Technical Specifications

Citation

Model Card Authors

How To Get Started With the Model

Model Details

xlm-mlm-100-1280 is the XLM model, which was proposed in Cross-lingual Language Model Pretraining by Guillaume Lample and Alexis Conneau, trained on Wikipedia text in 100 languages. The model is a transformer pretrained using a masked language modeling (MLM) objective.

Model Description

Developed by: See associated paper and GitHub Repo
Model type: Language model
Language(s) (NLP): 100 languages, see GitHub Repo for full list.
License: CC-BY-NC-4.0
Related Models: xlm-mlm-17-1280
Resources for more information:

Uses

Direct Use

The model is a language model. The model can be used for masked language modeling.

Downstream Use

To learn more about this task and potential downstream uses, see the Hugging Face fill mask docs and the Hugging Face Multilingual Models for Inference docs. Also see the associated paper .

Out-of-Scope Use

The model should not be used to intentionally create hostile or alienating environments for people.

Bias, Risks, and Limitations

Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021) ).

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Training

This model is the XLM model trained on Wikipedia text in 100 languages. The preprocessing included tokenization with byte-pair-encoding. See the GitHub repo and the associated paper for further details on the training data and training procedure.

Conneau et al. (2020) report that this model has 16 layers, 1280 hidden states, 16 attention heads, and the dimension of the feed-forward layer is 1520. The vocabulary size is 200k and the total number of parameters is 570M (see Table 7).

Evaluation

Testing Data, Factors & Metrics

The model developers evaluated the model on the XNLI cross-lingual classification task (see the XNLI data card for more details on XNLI) using the metric of test accuracy. See the GitHub Repo for further details on the testing data, factors and metrics.

Results

For xlm-mlm-100-1280, the test accuracy on the XNLI cross-lingual classification task in English (en), Spanish (es), German (de), Arabic (ar), Chinese (zh) and Urdu (ur) are:

Language	en	es	de	ar	zh	ur
83.7	76.6	73.6	67.4	71.7	62.9

See the GitHub repo for further details.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019) .

Hardware Type: More information needed
Hours used: More information needed
Cloud Provider: More information needed
Compute Region: More information needed
Carbon Emitted: More information needed

Technical Specifications

Citation

BibTeX:

@article{lample2019cross,
  title={Cross-lingual language model pretraining},
  author={Lample, Guillaume and Conneau, Alexis},
  journal={arXiv preprint arXiv:1901.07291},
  year={2019}
}

APA:

Lample, G., & Conneau, A. (2019). Cross-lingual language model pretraining. arXiv preprint arXiv:1901.07291.

Model Card Authors

This model card was written by the team at Hugging Face.

How to Get Started with the Model

More information needed. See the ipython notebook in the associated GitHub repo for examples.

作者:

None

数据集大小:

4.16 GB

xlm-mlm-100-1280

Table of Contents

Model Details

Model Description

Uses

Direct Use

Downstream Use

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

Training

Evaluation

Testing Data, Factors & Metrics

Results

Environmental Impact

Technical Specifications

Citation

Model Card Authors

How to Get Started with the Model