中文

xlm-mlm-enfr-1024

Table of Contents

  • Model Details
  • Uses
  • Bias, Risks, and Limitations
  • Training
  • Evaluation
  • Environmental Impact
  • Technical Specifications
  • Citation
  • Model Card Authors
  • How To Get Started With the Model
  • Model Details

    The XLM model was proposed in Cross-lingual Language Model Pretraining by Guillaume Lample, Alexis Conneau. xlm-mlm-enfr-1024 is a transformer pretrained using a masked language modeling (MLM) objective for English-French. This model uses language embeddings to specify the language used at inference. See the Hugging Face Multilingual Models for Inference docs for further details.

    Model Description

    Uses

    Direct Use

    The model is a language model. The model can be used for masked language modeling.

    Downstream Use

    To learn more about this task and potential downstream uses, see the Hugging Face fill mask docs and the Hugging Face Multilingual Models for Inference docs.

    Out-of-Scope Use

    The model should not be used to intentionally create hostile or alienating environments for people.

    Bias, Risks, and Limitations

    Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021) ).

    Recommendations

    Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

    Training

    The model developers write:

    In all experiments, we use a Transformer architecture with 1024 hidden units, 8 heads, GELU activations (Hendrycks and Gimpel, 2016), a dropout rate of 0.1 and learned positional embeddings. We train our models with the Adam op- timizer (Kingma and Ba, 2014), a linear warm- up (Vaswani et al., 2017) and learning rates varying from 10^−4 to 5.10^−4.

    See the associated paper for links, citations, and further details on the training data and training procedure.

    The model developers also write that:

    If you use these models, you should use the same data preprocessing / BPE codes to preprocess your data.

    See the associated GitHub Repo for further details.

    Evaluation

    Testing Data, Factors & Metrics

    The model developers evaluated the model on the WMT'14 English-French dataset using the BLEU metric . See the associated paper for further details on the testing data, factors and metrics.

    Results

    For xlm-mlm-enfr-1024 results, see Table 1 and Table 2 of the associated paper .

    Environmental Impact

    Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019) .

    • Hardware Type: More information needed
    • Hours used: More information needed
    • Cloud Provider: More information needed
    • Compute Region: More information needed
    • Carbon Emitted: More information needed

    Technical Specifications

    The model developers write:

    We implement all our models in PyTorch (Paszke et al., 2017), and train them on 64 Volta GPUs for the language modeling tasks, and 8 GPUs for the MT tasks. We use float16 operations to speed up training and to reduce the memory usage of our models.

    See the associated paper for further details.

    Citation

    BibTeX:

    @article{lample2019cross,
      title={Cross-lingual language model pretraining},
      author={Lample, Guillaume and Conneau, Alexis},
      journal={arXiv preprint arXiv:1901.07291},
      year={2019}
    }
    

    APA:

    • Lample, G., & Conneau, A. (2019). Cross-lingual language model pretraining. arXiv preprint arXiv:1901.07291.

    Model Card Authors

    This model card was written by the team at Hugging Face.

    How to Get Started with the Model

    More information needed. This model uses language embeddings to specify the language used at inference. See the Hugging Face Multilingual Models for Inference docs for further details.