模型:
distilroberta-base
This model is a distilled version of the RoBERTa-base model . It follows the same training procedure as DistilBERT . The code for the distillation process can be found here . This model is case-sensitive: it makes a difference between english and English.
The model has 6 layers, 768 dimension and 12 heads, totalizing 82M parameters (compared to 125M parameters for RoBERTa-base). On average DistilRoBERTa is twice as fast as Roberta-base.
We encourage users of this model card to check out the RoBERTa-base model card to learn more about usage, limitations and potential biases.
You can use the raw model for masked language modeling, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you.
Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation you should look at model like GPT2.
The model should not be used to intentionally create hostile or alienating environments for people. The model was not trained to be factual or true representations of people or events, and therefore using the models to generate such content is out-of-scope for the abilities of this model.
Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021) ). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. For example:
>>> from transformers import pipeline >>> unmasker = pipeline('fill-mask', model='distilroberta-base') >>> unmasker("The man worked as a <mask>.") [{'score': 0.1237526461482048, 'sequence': 'The man worked as a waiter.', 'token': 38233, 'token_str': ' waiter'}, {'score': 0.08968018740415573, 'sequence': 'The man worked as a waitress.', 'token': 35698, 'token_str': ' waitress'}, {'score': 0.08387645334005356, 'sequence': 'The man worked as a bartender.', 'token': 33080, 'token_str': ' bartender'}, {'score': 0.061059024184942245, 'sequence': 'The man worked as a mechanic.', 'token': 25682, 'token_str': ' mechanic'}, {'score': 0.03804653510451317, 'sequence': 'The man worked as a courier.', 'token': 37171, 'token_str': ' courier'}] >>> unmasker("The woman worked as a <mask>.") [{'score': 0.23149248957633972, 'sequence': 'The woman worked as a waitress.', 'token': 35698, 'token_str': ' waitress'}, {'score': 0.07563332468271255, 'sequence': 'The woman worked as a waiter.', 'token': 38233, 'token_str': ' waiter'}, {'score': 0.06983394920825958, 'sequence': 'The woman worked as a bartender.', 'token': 33080, 'token_str': ' bartender'}, {'score': 0.05411609262228012, 'sequence': 'The woman worked as a nurse.', 'token': 9008, 'token_str': ' nurse'}, {'score': 0.04995106905698776, 'sequence': 'The woman worked as a maid.', 'token': 29754, 'token_str': ' maid'}]
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
DistilRoBERTa was pre-trained on OpenWebTextCorpus , a reproduction of OpenAI's WebText dataset (it is ~4 times less training data than the teacher RoBERTa). See the roberta-base model card for further details on training.
When fine-tuned on downstream tasks, this model achieves the following results (see GitHub Repo ):
Glue test results:
Task | MNLI | QQP | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE |
---|---|---|---|---|---|---|---|---|
84.0 | 89.4 | 90.8 | 92.5 | 59.3 | 88.3 | 86.6 | 67.9 |
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019) .
@article{Sanh2019DistilBERTAD, title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter}, author={Victor Sanh and Lysandre Debut and Julien Chaumond and Thomas Wolf}, journal={ArXiv}, year={2019}, volume={abs/1910.01108} }
APA
You can use the model directly with a pipeline for masked language modeling:
>>> from transformers import pipeline >>> unmasker = pipeline('fill-mask', model='distilroberta-base') >>> unmasker("Hello I'm a <mask> model.") [{'score': 0.04673689603805542, 'sequence': "Hello I'm a business model.", 'token': 265, 'token_str': ' business'}, {'score': 0.03846118599176407, 'sequence': "Hello I'm a freelance model.", 'token': 18150, 'token_str': ' freelance'}, {'score': 0.03308931365609169, 'sequence': "Hello I'm a fashion model.", 'token': 2734, 'token_str': ' fashion'}, {'score': 0.03018997237086296, 'sequence': "Hello I'm a role model.", 'token': 774, 'token_str': ' role'}, {'score': 0.02111748233437538, 'sequence': "Hello I'm a Playboy model.", 'token': 24526, 'token_str': ' Playboy'}]