模型:
OpenVINO/bert-base-uncased-sst2-int8-unstructured80
语言:
enThis model conducts unstructured magnitude pruning, quantization and distillation at the same time on BERT-base when finetuning on the GLUE SST2 dataset. It achieves the following results on the evaluation set:
conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia pip install optimum[openvino,nncf]==1.7.0 pip install datasets sentencepiece scipy scikit-learn protobuf evaluate pip install wandb # optional
See https://gist.github.com/yujiepan-work/5d7e513a47b353db89f6e1b512d7c080
We use one card for training.
NNCFCFG=/path/to/nncf_config/json python run_glue.py \ --lr_scheduler_type cosine_with_restarts \ --cosine_lr_scheduler_cycles 11 6 \ --record_best_model_after_epoch 9 \ --load_best_model_at_end True \ --metric_for_best_model accuracy \ --model_name_or_path textattack/bert-base-uncased-SST-2 \ --teacher_model_or_path yoshitomo-matsubara/bert-large-uncased-sst2 \ --distillation_temperature 2 \ --task_name sst2 \ --nncf_compression_config $NNCFCFG \ --distillation_weight 0.95 \ --output_dir /tmp/bert-base-uncased-sst2-int8-unstructured80 \ --overwrite_output_dir \ --run_name bert-base-uncased-sst2-int8-unstructured80 \ --do_train \ --do_eval \ --max_seq_length 128 \ --per_device_train_batch_size 32 \ --per_device_eval_batch_size 32 \ --learning_rate 5e-05 \ --optim adamw_torch \ --num_train_epochs 17 \ --logging_steps 1 \ --evaluation_strategy steps \ --eval_steps 250 \ --save_strategy steps \ --save_steps 250 \ --save_total_limit 1 \ --fp16 \ --seed 1