模型:
echarlaix/distilbert-sst2-inc-dynamic-quantization-magnitude-pruning-0.1
Model Description: This model is a DistilBERT fine-tuned on SST-2 dynamically quantized and pruned using a magnitude pruning strategy to obtain a sparsity of 10% with optimum-intel through the usage of Intel® Neural Compressor .
This requires to install Optimum : pip install optimum[neural-compressor]
To load the quantized model and run inference using the Transformers pipelines , you can do as follows:
from transformers import AutoTokenizer, pipeline from optimum.intel import INCModelForSequenceClassification model_id = "echarlaix/distilbert-sst2-inc-dynamic-quantization-magnitude-pruning-0.1" model = INCModelForSequenceClassification.from_pretrained(model_id) tokenizer = AutoTokenizer.from_pretrained(model_id) cls_pipe = pipeline("text-classification", model=model, tokenizer=tokenizer) text = "He's a dreadful magician." outputs = cls_pipe(text)