模型:
distilroberta-base
这个模型是DistilRoBERTa的基础版本。它遵循与RoBERTa相同的训练过程,可在 here 处找到蒸馏过程的代码。该模型区分大小写: English和english是有区别的。
该模型有6层,768维和12个头,总参数为82M(与RoBERTa-base的125M参数相比)。DistilRoBERTa的速度平均比RoBERTa-base快一倍。
我们鼓励使用该模型卡片的用户查看 RoBERTa-base model card ,以了解更多关于使用、限制和潜在偏见的信息。
您可以使用原始模型进行遮蔽语言建模,但主要用于在下游任务上进行微调。请查看 model hub ,以寻找您感兴趣的任务的微调版本。
请注意,该模型主要用于在使用整个句子(可能被遮蔽)进行决策的任务上进行微调,例如序列分类、标记分类或问答。对于生成文本等任务,您应该看一下类似GPT2的模型。
该模型不应用于有意对人们创建敌对或疏远环境。该模型的训练不是为了成为人或事件的真实或准确代表,因此使用模型生成这种内容超出了该模型的能力范围。
大量研究探讨了语言模型的偏见和公平性问题(参见,例如, Sheng et al. (2021) 和 Bender et al. (2021) )。模型生成的预测可能涉及对受保护类别、身份特征以及敏感的社会和职业群体的令人不安和有害的刻板印象。例如:
>>> from transformers import pipeline >>> unmasker = pipeline('fill-mask', model='distilroberta-base') >>> unmasker("The man worked as a <mask>.") [{'score': 0.1237526461482048, 'sequence': 'The man worked as a waiter.', 'token': 38233, 'token_str': ' waiter'}, {'score': 0.08968018740415573, 'sequence': 'The man worked as a waitress.', 'token': 35698, 'token_str': ' waitress'}, {'score': 0.08387645334005356, 'sequence': 'The man worked as a bartender.', 'token': 33080, 'token_str': ' bartender'}, {'score': 0.061059024184942245, 'sequence': 'The man worked as a mechanic.', 'token': 25682, 'token_str': ' mechanic'}, {'score': 0.03804653510451317, 'sequence': 'The man worked as a courier.', 'token': 37171, 'token_str': ' courier'}] >>> unmasker("The woman worked as a <mask>.") [{'score': 0.23149248957633972, 'sequence': 'The woman worked as a waitress.', 'token': 35698, 'token_str': ' waitress'}, {'score': 0.07563332468271255, 'sequence': 'The woman worked as a waiter.', 'token': 38233, 'token_str': ' waiter'}, {'score': 0.06983394920825958, 'sequence': 'The woman worked as a bartender.', 'token': 33080, 'token_str': ' bartender'}, {'score': 0.05411609262228012, 'sequence': 'The woman worked as a nurse.', 'token': 9008, 'token_str': ' nurse'}, {'score': 0.04995106905698776, 'sequence': 'The woman worked as a maid.', 'token': 29754, 'token_str': ' maid'}]
用户(包括直接用户和下游用户)应了解模型的风险、偏见和限制。
DistilRoBERTa在 OpenWebTextCorpus 上进行了预训练,这是OpenAI的WebText数据集的复制品(其训练数据比teacher RoBERTa少约4倍)。有关训练的详细信息,请参阅 roberta-base model card 。
在下游任务上进行微调时,该模型实现了以下结果(请参阅 GitHub Repo ):
Glue测试结果:
Task | MNLI | QQP | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE |
---|---|---|---|---|---|---|---|---|
84.0 | 89.4 | 90.8 | 92.5 | 59.3 | 88.3 | 86.6 | 67.9 |
可以使用 Machine Learning Impact calculator 中提出的方法估计碳排放量,该方法在 Lacoste et al. (2019) 中提供。
@article{Sanh2019DistilBERTAD, title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter}, author={Victor Sanh and Lysandre Debut and Julien Chaumond and Thomas Wolf}, journal={ArXiv}, year={2019}, volume={abs/1910.01108} }
APA格式
您可以使用带有遮蔽语言建模的管道直接使用模型:
>>> from transformers import pipeline >>> unmasker = pipeline('fill-mask', model='distilroberta-base') >>> unmasker("Hello I'm a <mask> model.") [{'score': 0.04673689603805542, 'sequence': "Hello I'm a business model.", 'token': 265, 'token_str': ' business'}, {'score': 0.03846118599176407, 'sequence': "Hello I'm a freelance model.", 'token': 18150, 'token_str': ' freelance'}, {'score': 0.03308931365609169, 'sequence': "Hello I'm a fashion model.", 'token': 2734, 'token_str': ' fashion'}, {'score': 0.03018997237086296, 'sequence': "Hello I'm a role model.", 'token': 774, 'token_str': ' role'}, {'score': 0.02111748233437538, 'sequence': "Hello I'm a Playboy model.", 'token': 24526, 'token_str': ' Playboy'}]