模型:

alexandrainst/scandi-nli-large

任务:

类库:

数据集:

strombergnlp/danfever KBLab/overlim MoritzLaurer/multilingual-NLI-26lang-2mil7 3AMoritzLaurer/multilingual-NLI-26lang-2mil7 3AKBLab/overlim 3Astrombergnlp/danfever

语言:

其他:

bert 文本分类

许可:

mit

模型介绍文件清单

中文

ScandiNLI - Natural Language Inference model for Scandinavian Languages

This model is a fine-tuned version of NbAiLab/nb-bert-large for Natural Language Inference in Danish, Norwegian Bokmål and Swedish.

We have released three models for Scandinavian NLI, of different sizes:

alexandrainst/scandi-nli-large (this)
alexandrainst/scandi-nli-base
alexandrainst/scandi-nli-small

A demo of the large model can be found in this Hugging Face Space - check it out!

The performance and model size of each of them can be found in the Performance section below.

Quick start

You can use this model in your scripts as follows:

>>> from transformers import pipeline
>>> classifier = pipeline(
...     "zero-shot-classification",
...     model="alexandrainst/scandi-nli-large",
... )
>>> classifier(
...     "Mexicansk bokser advarer Messi - 'Du skal bede til gud, om at jeg ikke finder dig'",
...     candidate_labels=['sundhed', 'politik', 'sport', 'religion'],
...     hypothesis_template="Dette eksempel handler om {}",
... )
{'sequence': "Mexicansk bokser advarer Messi - 'Du skal bede til gud, om at jeg ikke finder dig'",
 'labels': ['sport', 'religion', 'politik', 'sundhed'],
 'scores': [0.6134647727012634,
  0.30309760570526123,
  0.05021871626377106,
  0.03321893885731697]}

Performance

We assess the models both on their aggregate Scandinavian performance, as well as their language-specific Danish, Swedish and Norwegian Bokmål performance.

In all cases, we report Matthew's Correlation Coefficient (MCC), macro-average F1-score as well as accuracy.

Scandinavian Evaluation

The Scandinavian scores are the average of the Danish, Swedish and Norwegian scores, which can be found in the sections below.

Model	MCC	Macro-F1	Accuracy	Number of Parameters
alexandrainst/scandi-nli-large (this)	73.70%	74.44%	83.91%	354M
MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7	69.01%	71.99%	80.66%	279M
alexandrainst/scandi-nli-base	67.42%	71.54%	80.09%	178M
joeddav/xlm-roberta-large-xnli	64.17%	70.80%	77.29%	560M
MoritzLaurer/mDeBERTa-v3-base-mnli-xnli	63.94%	70.41%	77.23%	279M
NbAiLab/nb-bert-base-mnli	61.71%	68.36%	76.08%	178M
alexandrainst/scandi-nli-small	56.02%	65.30%	73.56%	22M

Danish Evaluation

We use a test split of the DanFEVER dataset to evaluate the Danish performance of the models.

The test split is generated using this gist .

Model	MCC	Macro-F1	Accuracy	Number of Parameters
alexandrainst/scandi-nli-large (this)	73.80%	58.41%	86.98%	354M
MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7	68.37%	57.10%	83.25%	279M
alexandrainst/scandi-nli-base	62.44%	55.00%	80.42%	178M
NbAiLab/nb-bert-base-mnli	56.92%	53.25%	76.39%	178M
MoritzLaurer/mDeBERTa-v3-base-mnli-xnli	52.79%	52.00%	72.35%	279M
joeddav/xlm-roberta-large-xnli	49.18%	50.31%	69.73%	560M
alexandrainst/scandi-nli-small	47.28%	48.88%	73.46%	22M

Swedish Evaluation

We use the test split of the machine translated version of the MultiNLI dataset to evaluate the Swedish performance of the models.

We acknowledge that not evaluating on a gold standard dataset is not ideal, but unfortunately we are not aware of any NLI datasets in Swedish.

Model	MCC	Macro-F1	Accuracy	Number of Parameters
alexandrainst/scandi-nli-large (this)	76.69%	84.47%	84.38%	354M
joeddav/xlm-roberta-large-xnli	75.35%	83.42%	83.55%	560M
MoritzLaurer/mDeBERTa-v3-base-mnli-xnli	73.84%	82.46%	82.58%	279M
MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7	73.32%	82.15%	82.08%	279M
alexandrainst/scandi-nli-base	72.29%	81.37%	81.51%	178M
NbAiLab/nb-bert-base-mnli	64.69%	76.40%	76.47%	178M
alexandrainst/scandi-nli-small	62.35%	74.79%	74.93%	22M

Norwegian Evaluation

We use the test split of the machine translated version of the MultiNLI dataset to evaluate the Norwegian performance of the models.

We acknowledge that not evaluating on a gold standard dataset is not ideal, but unfortunately we are not aware of any NLI datasets in Norwegian.

Model	MCC	Macro-F1	Accuracy	Number of Parameters
alexandrainst/scandi-nli-large (this)	70.61%	80.43%	80.36%	354M
joeddav/xlm-roberta-large-xnli	67.99%	78.68%	78.60%	560M
alexandrainst/scandi-nli-base	67.53%	78.24%	78.33%	178M
MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7	65.33%	76.73%	76.65%	279M
MoritzLaurer/mDeBERTa-v3-base-mnli-xnli	65.18%	76.76%	76.77%	279M
NbAiLab/nb-bert-base-mnli	63.51%	75.42%	75.39%	178M
alexandrainst/scandi-nli-small	58.42%	72.22%	72.30%	22M

Training procedure

It has been fine-tuned on a dataset composed of DanFEVER as well as machine translated versions of MultiNLI and CommitmentBank into all three languages, and machine translated versions of FEVER and Adversarial NLI into Swedish.

The training split of DanFEVER is generated using this gist .

The three languages are sampled equally during training, and they're validated on validation splits of DanFEVER and machine translated versions of MultiNLI for Swedish and Norwegian Bokmål, sampled equally.

Check out the Github repository for the code used to train the ScandiNLI models, and the full training logs can be found in this Weights and Biases report .

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 2
eval_batch_size: 2
seed: 4242
gradient_accumulation_steps: 16
total_train_batch_size: 32
optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
max_steps: 50,000

作者:

Alexandra Institute

数据集大小:

2.65 GB