模型:
xlm-clm-ende-1024
任务:
填充掩码The XLM model was proposed in Cross-lingual Language Model Pretraining by Guillaume Lample, Alexis Conneau. xlm-clm-ende-1024 is a transformer pretrained using a causal language modeling (CLM) objective (next token prediction) for English-German.
The model is a language model. The model can be used for causal language modeling.
To learn more about this task and potential downstream uses, see the Hugging Face Multilingual Models for Inference docs.
The model should not be used to intentionally create hostile or alienating environments for people.
Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021) ).
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
See the associated paper for details on the training data and training procedure.
See the associated paper for details on the testing data, factors and metrics.
For xlm-clm-ende-1024 results, see Table 2 of the associated paper .
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019) .
The model developers write:
We implement all our models in PyTorch (Paszke et al., 2017), and train them on 64 Volta GPUs for the language modeling tasks, and 8 GPUs for the MT tasks. We use float16 operations to speed up training and to reduce the memory usage of our models.
See the associated paper for further details.
BibTeX:
@article{lample2019cross, title={Cross-lingual language model pretraining}, author={Lample, Guillaume and Conneau, Alexis}, journal={arXiv preprint arXiv:1901.07291}, year={2019} }
APA:
This model card was written by the team at Hugging Face.
Use the code below to get started with the model.
Click to expandimport torch from transformers import XLMTokenizer, XLMWithLMHeadModel tokenizer = XLMTokenizer.from_pretrained("xlm-clm-ende-1024") model = XLMWithLMHeadModel.from_pretrained("xlm-clm-ende-1024") input_ids = torch.tensor([tokenizer.encode("Wikipedia was used to")]) # batch size of 1 language_id = tokenizer.lang2id["en"] # 0 langs = torch.tensor([language_id] * input_ids.shape[1]) # torch.tensor([0, 0, 0, ..., 0]) # We reshape it to be of size (batch_size, sequence_length) langs = langs.view(1, -1) # is now of shape [1, sequence_length] (we have a batch size of 1) outputs = model(input_ids, langs=langs)