DeBERTa-v3-large-mnli

模型描述

该模型是在MultiNLI数据集上训练的，该数据集包括433k个句对的文本蕴含信息。

所使用的模型是 DeBERTa-v3-large from Microsoft 。通过使用解耦的注意力和增强的掩码解码器，v3版本的DeBERTa在大多数NLU基准测试中表现优于Bert和RoBERTa的结果。关于原始模型的更多信息，请参见 official repository 和 paper 。

适用和限制

如何使用模型

premise = "The Movie have been criticized for the story. However, I think it is a great movie."

hypothesis = "I liked the movie."

input = tokenizer(premise, hypothesis, truncation=True, return_tensors="pt")

output = model(input["input_ids"].to(device))  # device = "cuda:0" or "cpu"

prediction = torch.softmax(output["logits"][0], -1)

label_names = ["entailment", "neutral", "contradiction"]

print(label_names[prediction.argmax(0).tolist()])

。

训练数据

该模型是在MultiNLI数据集上训练的，该数据集包括392K个句子的文本蕴含。

训练过程

DeBERTa-v3-large-mnli是使用Hugging Face训练器和以下超参数进行训练的。

train_args = TrainingArguments(
    learning_rate=2e-5,
    
    per_device_train_batch_size=8,
    
    per_device_eval_batch_size=8,
    
    num_train_epochs=3,
    
    warmup_ratio=0.06,  
    
    weight_decay=0.1, 
    
    fp16=True,
    
    seed=42,
)

BibTeX引用和引文信息

如果您使用该模型并包含本Huggingface hub，请引用 DeBERTa paper 和 MultiNLI Dataset 。

作者:

Khalid Almubarak

数据集大小:

1.62 GB