microsoft/MiniLM-L12-H384-uncased | ATYUN.COM 官网-人工智能教程资讯全方位服务平台

模型:

microsoft/MiniLM-L12-H384-uncased

任务:

文本分类

类库:

PyTorch TensorFlow JAX Transformers

其他:

bert

预印本库:

arxiv:2002.10957 arxiv:1810.04805

许可:

mit

模型介绍文件清单

中文

MiniLM: Small and Fast Pre-trained Models for Language Understanding and Generation

MiniLM is a distilled model from the paper " MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers ".

Please find the information about preprocessing, training and full details of the MiniLM in the original MiniLM repository .

Please note: This checkpoint can be an inplace substitution for BERT and it needs to be fine-tuned before use!

English Pre-trained Models

We release the uncased 12 -layer model with 384 hidden size distilled from an in-house pre-trained UniLM v2 model in BERT-Base size.

MiniLMv1-L12-H384-uncased: 12-layer, 384-hidden, 12-heads, 33M parameters, 2.7x faster than BERT-Base

Fine-tuning on NLU tasks

We present the dev results on SQuAD 2.0 and several GLUE benchmark tasks.

Model	#Param	SQuAD 2.0	MNLI-m	SST-2	QNLI	CoLA	RTE	MRPC	QQP
BERT-Base	109M	76.8	84.5	93.2	91.7	58.9	68.6	87.3	91.3
MiniLM-L12xH384	33M	81.7	85.7	93.0	91.5	58.5	73.3	89.5	91.3

Citation

If you find MiniLM useful in your research, please cite the following paper:

@misc{wang2020minilm,
    title={MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers},
    author={Wenhui Wang and Furu Wei and Li Dong and Hangbo Bao and Nan Yang and Ming Zhou},
    year={2020},
    eprint={2002.10957},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

作者:

Microsoft

数据集大小:

382.32 MB