模型:

KoboldAI/fairseq-dense-125M

中文

This is a Hugging Face transformers-compatible conversion of the original dense 125M-parameter model from the paper " Efficient Large Scale Language Modeling with Mixtures of Experts " from Artetxe et al. Please refer to the original model card, which can be found at https://github.com/facebookresearch/fairseq/blob/main/examples/moe_lm/model_card.md .