模型:

alexandrainst/scandi-nli-large

任务:

类库:

数据集:

strombergnlp/danfever KBLab/overlim MoritzLaurer/multilingual-NLI-26lang-2mil7 3AMoritzLaurer/multilingual-NLI-26lang-2mil7 3AKBLab/overlim 3Astrombergnlp/danfever

语言:

其他:

bert 文本分类

许可:

mit

模型介绍文件清单

英文

ScandiNLI - 北欧语言的自然语言推理模型

该模型是针对丹麦语、挪威博克马尔语和瑞典语进行自然语言推理的 NbAiLab/nb-bert-large 的微调版本。

我们为北欧自然语言推理发布了三个模型，大小不同：

alexandrainst/scandi-nli-large（这个模型）
alexandrainst/scandi-nli-base
alexandrainst/scandi-nli-small

可以在 this Hugging Face Space 中找到大模型的演示-去看看吧！

每个模型的性能和模型大小可以在下面的性能部分找到。

快速开始

您可以在自己的脚本中使用以下方式使用此模型：

>>> from transformers import pipeline
>>> classifier = pipeline(
...     "zero-shot-classification",
...     model="alexandrainst/scandi-nli-large",
... )
>>> classifier(
...     "Mexicansk bokser advarer Messi - 'Du skal bede til gud, om at jeg ikke finder dig'",
...     candidate_labels=['sundhed', 'politik', 'sport', 'religion'],
...     hypothesis_template="Dette eksempel handler om {}",
... )
{'sequence': "Mexicansk bokser advarer Messi - 'Du skal bede til gud, om at jeg ikke finder dig'",
 'labels': ['sport', 'religion', 'politik', 'sundhed'],
 'scores': [0.6134647727012634,
  0.30309760570526123,
  0.05021871626377106,
  0.03321893885731697]}

性能

我们对模型进行了北欧整体性能和每个语言（丹麦语、瑞典语和挪威博克马尔语）的评估。

在所有情况下，我们报告了马修斯相关系数（MCC），宏平均F1分数以及准确度。

北欧评估

北欧得分是丹麦语、瑞典语和挪威博克马尔语得分的平均值，可以在下面的各个部分找到。

Model	MCC	Macro-F1	Accuracy	Number of Parameters
alexandrainst/scandi-nli-large (this)	73.70%	74.44%	83.91%	354M
1235321	69.01%	71.99%	80.66%	279M
1236321	67.42%	71.54%	80.09%	178M
1237321	64.17%	70.80%	77.29%	560M
1238321	63.94%	70.41%	77.23%	279M
1239321	61.71%	68.36%	76.08%	178M
12310321	56.02%	65.30%	73.56%	22M

丹麦评估

我们使用 DanFEVER dataset 的测试集来评估模型的丹麦语性能。

测试集是使用 this gist 生成的。

Model	MCC	Macro-F1	Accuracy	Number of Parameters
alexandrainst/scandi-nli-large (this)	73.80%	58.41%	86.98%	354M
1235321	68.37%	57.10%	83.25%	279M
1236321	62.44%	55.00%	80.42%	178M
1239321	56.92%	53.25%	76.39%	178M
1238321	52.79%	52.00%	72.35%	279M
1237321	49.18%	50.31%	69.73%	560M
12310321	47.28%	48.88%	73.46%	22M

瑞典评估

我们使用机器翻译版本的 MultiNLI 数据集的测试集来评估模型的瑞典语性能。

我们理解，不在黄金标准数据集上进行评估并不理想，但不幸的是，我们不知道任何瑞典语的NLI数据集。

Model	MCC	Macro-F1	Accuracy	Number of Parameters
alexandrainst/scandi-nli-large (this)	76.69%	84.47%	84.38%	354M
1237321	75.35%	83.42%	83.55%	560M
1238321	73.84%	82.46%	82.58%	279M
1235321	73.32%	82.15%	82.08%	279M
1236321	72.29%	81.37%	81.51%	178M
1239321	64.69%	76.40%	76.47%	178M
12310321	62.35%	74.79%	74.93%	22M

挪威博克马尔语评估

我们使用机器翻译版本的 MultiNLI 数据集的测试集来评估模型的挪威语性能。

我们理解，不在黄金标准数据集上进行评估并不理想，但不幸的是，我们不知道任何挪威语的NLI数据集。

Model	MCC	Macro-F1	Accuracy	Number of Parameters
alexandrainst/scandi-nli-large (this)	70.61%	80.43%	80.36%	354M
1237321	67.99%	78.68%	78.60%	560M
1236321	67.53%	78.24%	78.33%	178M
1235321	65.33%	76.73%	76.65%	279M
1238321	65.18%	76.76%	76.77%	279M
1239321	63.51%	75.42%	75.39%	178M
12310321	58.42%	72.22%	72.30%	22M

训练过程

它在包括 DanFEVER 以及所有三种语言的 MultiNLI 和 CommitmentBank 的机器翻译版本，以及瑞典语的 FEVER 和 Adversarial NLI 的机器翻译版本中进行了微调。

DanFEVER的训练集使用 this gist 生成。

在训练过程中，三种语言的采样均等，并且它们在 DanFEVER 和机器翻译版本的 MultiNLI （瑞典语和挪威博克马尔语）的验证集上进行验证，采样均等。

查看 Github repository 以获取用于训练ScandiNLI模型的代码，完整的训练日志可以在 this Weights and Biases report 中找到。

训练超参数

在训练过程中使用了以下超参数：

learning_rate: 2e-05
train_batch_size: 2
eval_batch_size: 2
seed: 4242
gradient_accumulation_steps: 16
total_train_batch_size: 32
optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
max_steps: 50,000

作者:

Alexandra Institute

数据集大小:

2.65 GB