This is a finetuned model based on RobBERT (v2) . We used DBRD , which consists of book reviews from hebban.nl . Hence our example sentences about books. We did some limited experiments to test if this also works for other domains, but this was not exactly amazing.
We released a distilled model and a base -sized model. Both models perform quite well, so there is only a slight performance tradeoff:
Model | Identifier | Layers | #Params. | Accuracy |
---|---|---|---|---|
RobBERT (v2) | DTAI-KULeuven/robbert-v2-dutch-sentiment | 12 | 116 M | 93.3* |
RobBERTje - Merged (p=0.5) | DTAI-KULeuven/robbertje-merged-dutch-sentiment | 6 | 74 M | 92.9 |
*The results of RobBERT are of a different run than the one reported in the paper.
We used the Dutch Book Reviews Dataset (DBRD) from van der Burgh et al. (2019). Originally, these reviews got a five-star rating, but this has been converted to positive (⭐️⭐️⭐️⭐️ and ⭐️⭐️⭐️⭐️⭐️), neutral (⭐️⭐️⭐️) and negative (⭐️ and ⭐️⭐️). We used 19.5k reviews for the training set, 528 reviews for the validation set and 2224 to calculate the final accuracy.
The validation set was used to evaluate a random hyperparameter search over the learning rate, weight decay and gradient accumulation steps. The full training details are available in training_args.bin as a binary PyTorch file.
This project is created by Pieter Delobelle , Thomas Winters and Bettina Berendt . If you would like to cite our paper or models, you can use the following BibTeX:
@inproceedings{delobelle2020robbert, title = "{R}ob{BERT}: a {D}utch {R}o{BERT}a-based {L}anguage {M}odel", author = "Delobelle, Pieter and Winters, Thomas and Berendt, Bettina", booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.findings-emnlp.292", doi = "10.18653/v1/2020.findings-emnlp.292", pages = "3255--3265" }