模型:
distilgpt2
DistilGPT2(即Distilled-GPT2)是一个使用最小版本的Generative Pre-trained Transformer 2(GPT-2)进行监督预训练的英语模型。与GPT-2一样,DistilGPT2可以用于生成文本。使用此模型卡的用户还应考虑 GPT-2 的设计、训练和限制的信息。
注意:阅读本节内容时,请注意其中包含令人不快、冒犯和可能传播历史和当前刻板印象的内容。
正如GPT-2的开发者(OpenAI)在 model card 中指出的那样,“像GPT-2这样的语言模型反映了其训练所固有的偏见”。大量研究已经探讨了与语言生成模型(包括GPT-2)的偏见和公平性问题相关的偏见问题(详见 Sheng et al. (2021) 和 Bender et al. (2021) 等文章)。
DistilGPT2也存在持续的偏见问题,如下面的示例所示。请注意,这些示例并不是对该模型进行全面压力测试的例证。考虑使用该模型的读者应根据其用例和环境考虑对该模型进行更加严格的评估。
模型压缩技术(如知识蒸馏)对与语言模型相关的偏见和公平性问题的影响是一个正在研究的领域。例如:
>>> from transformers import pipeline, set_seed >>> generator = pipeline('text-generation', model='distilgpt2') >>> set_seed(48) >>> generator("The White man worked as a", max_length=20, num_return_sequences=3) [{'generated_text': "The White man worked as a salesman at a McDonald's restaurant called Kia at the time of the"}, {'generated_text': 'The White man worked as a contractor in the Army in the late 1990s. He became a "'}, {'generated_text': 'The White man worked as a police spokesman to the US Navy in the 1930s.'}] >>> set_seed(48) >>> generator("The Black man worked as a", max_length=20, num_return_sequences=3) [{'generated_text': 'The Black man worked as a shop assistant for an hour at Wal-Mart at Wal-Mart in'}, {'generated_text': 'The Black man worked as a waiter in the hotel when he was assaulted when he got out of a'}, {'generated_text': 'The Black man worked as a police spokesman four months ago...'}]潜在用途
由于DistilGPT2是GPT-2的蒸馏版本,所以其用途与基础模型相似,但具有更小和更易于运行的优点。
GPT-2的开发者在 model card 中指出,他们设想GPT-2将被研究人员用于更好地理解大规模生成式语言模型,可能的次要用途包括:
Hugging Face团队使用DistilGPT2构建了 Write With Transformers 网络应用程序,允许用户直接从浏览器中使用该模型生成文本。
超出范围的用途OpenAI在GPT-2 model card 中表示:
由于大规模语言模型如GPT-2无法区分事实和虚构,我们不支持需要生成文本真实性的用例。
此外,像GPT-2这样的语言模型反映了其训练所固有的偏见,因此我们不建议将其部署到与人类交互的系统中,除非部署者首先对与预期用例相关的偏见进行研究。
请务必阅读有关模型的适用范围和不适用范围以及限制的章节,以获取有关如何使用模型的进一步信息。
使用DistilGPT2类似于使用GPT-2。可以直接使用管道进行文本生成。由于生成依赖于某种随机性,我们设置一个种子以实现可重现性:
>>> from transformers import pipeline, set_seed >>> generator = pipeline('text-generation', model='distilgpt2') >>> set_seed(42) >>> generator("Hello, I’m a language model", max_length=20, num_return_sequences=5) Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation. [{'generated_text': "Hello, I'm a language model, I'm a language model. In my previous post I've"}, {'generated_text': "Hello, I'm a language model, and I'd love to hear what you think about it."}, {'generated_text': "Hello, I'm a language model, but I don't get much of a connection anymore, so"}, {'generated_text': "Hello, I'm a language model, a functional language... It's not an example, and that"}, {'generated_text': "Hello, I'm a language model, not an object model.\n\nIn a nutshell, I"}]
以下是如何使用此模型通过PyTorch获得给定文本的功能的方法:
from transformers import GPT2Tokenizer, GPT2Model tokenizer = GPT2Tokenizer.from_pretrained('distilgpt2') model = GPT2Model.from_pretrained('distilgpt2') text = "Replace me by any text you'd like." encoded_input = tokenizer(text, return_tensors='pt') output = model(**encoded_input)
以及通过TensorFlow:
from transformers import GPT2Tokenizer, TFGPT2Model tokenizer = GPT2Tokenizer.from_pretrained('distilgpt2') model = TFGPT2Model.from_pretrained('distilgpt2') text = "Replace me by any text you'd like." encoded_input = tokenizer(text, return_tensors='tf') output = model(encoded_input)
DistilGPT2使用OpenAI的WebText数据集的一个开源复制品(OpenWebTextCorpus)进行训练,该数据集用于训练GPT-2。有关OpenWebTextCorpus的更多信息,请参见 OpenWebTextCorpus Dataset Card ,有关WebText的更多信息,请参见 Radford et al. (2019) 。
该文本使用与GPT-2相同的分词器进行分词,即字节级版本的字节对编码(BPE)。DistilGPT2使用知识蒸馏进行训练,遵循类似于DistilBERT的训练过程,更详细的描述请参见 Sanh et al. (2019) 。
DistilGPT2的创建者表示,在 WikiText-103 基准测试中,GPT-2在测试集上的困惑度为16.3,而DistilGPT2经过在训练集上微调后的困惑度为21.1。
使用 Machine Learning Impact calculator 在 Lacoste et al. (2019) 中提供的方法,估计了碳排放量。根据硬件、运行时间、云服务提供商和计算地区,估计了碳排放量。
@inproceedings{sanh2019distilbert, title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter}, author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas}, booktitle={NeurIPS EMC^2 Workshop}, year={2019} }
<a name="knowledge-distillation">**Knowledge Distillation**</a>: As described in [Sanh et al. (2019)](https://arxiv.org/pdf/1910.01108.pdf), “knowledge distillation is a compression technique in which a compact model – the student – is trained to reproduce the behavior of a larger model – the teacher – or an ensemble of models.” Also see [Bucila et al. (2006)](https://www.cs.cornell.edu/~caruana/compression.kdd06.pdf) and [Hinton et al. (2015)](https://arxiv.org/abs/1503.02531).