中文

About RobBERTje

RobBERTje is a collection of distilled models based on RobBERT . There are multiple models with different sizes and different training settings, which you can choose for your use-case.

We are also continuously working on releasing better-performing models, so watch the repository for updates.

News

  • February 21, 2022 : Our paper about RobBERTje has been published in volume 11 of CLIN journal !
  • July 2, 2021 : Publicly released 4 RobBERTje models.
  • May 12, 2021 : RobBERTje was accepted at CLIN31 for an oral presentation!

The models

Model Description Parameters Training size Huggingface id
Non-shuffled Trained on the non-shuffled variant of the oscar corpus, without any operations to preserve this order during training and distillation. 74 M 1 GB this model
Shuffled Trained on the publicly available and shuffled OSCAR corpus. 74 M 1 GB DTAI-KULeuven/robbertje-1-gb-shuffled
Merged (p=0.5) Same as the non-shuffled variant, but sequential sentences of the same document are merged with a probability of 50%. 74 M 1 GB DTAI-KULeuven/robbertje-1-gb-merged
BORT A smaller version with 8 attention heads instead of 12 and 4 layers instead of 6 (and 12 for RobBERT). 46 M 1 GB DTAI-KULeuven/robbertje-1-gb-bort

Results

Intrinsic results

We calculated the pseudo perplexity (PPPL) from cite , which is a built-in metric in our distillation library. This metric gives an indication of how well the model captures the input distribution.

Model PPPL
RobBERT (teacher) 7.76
Non-shuffled 12.95
Shuffled 18.74
Merged (p=0.5) 17.10
BORT 26.44

Extrinsic results

We also evaluated our models on sereral downstream tasks, just like the teacher model RobBERT. Since that evaluation, a Dutch NLI task named SICK-NL was also released and we evaluated our models with it as well.

Model DBRD DIE-DAT NER POS SICK-NL
RobBERT (teacher) 94.4 99.2 89.1 96.4 84.2
Non-shuffled 90.2 98.4 82.9 95.5 83.4
Shuffled 92.5 98.2 82.7 95.6 83.4
Merged (p=0.5) 92.9 96.5 81.8 95.2 82.8
BORT 89.6 92.2 79.7 94.3 81.0