模型:
UWB-AIR/Czert-B-base-cased-long-zero-shot
This repository keeps trained Czert-B-base-cased-long-zero-shot model for the paper Czert – Czech BERT-like Model for Language Representation For more information, see the paper
This is long version of Czert-B-base-cased created without any finetunning on long documents. Positional embedings were created by simply repeating the positional embeddings of the original Czert-B model. For tokenization, please use BertTokenizer. Cannot be used with AutoTokenizer.
You can download MLM & NSP only pretrained models CZERT-A-v1 CZERT-B-v1
After some additional experiments, we found out that the tokenizers config was exported wrongly. In Czert-B-v1, the tokenizer parameter "do_lower_case" was wrongly set to true. In Czert-A-v1 the parameter "strip_accents" was incorrectly set to true.
Both mistakes are repaired in v2. CZERT-A-v2 CZERT-B-v2
or choose from one of Finetuned Models
Models | |
---|---|
Sentiment Classification (Facebook or CSFD) | CZERT-A-sentiment-FB CZERT-B-sentiment-FB CZERT-A-sentiment-CSFD CZERT-B-sentiment-CSFD |
Named Entity Recognition | CZERT-A-ner-CNEC CZERT-B-ner-CNEC PAV-ner-CNEC CZERT-A-ner-BSNLP CZERT-B-ner-BSNLP PAV-ner-BSNLP |
Morphological Tagging | CZERT-A-morphtag-126k CZERT-B-morphtag-126k |
Semantic Role Labelling | CZERT-A-srl CZERT-B-srl |
We evaluate our model on two sentence level tasks:
We evaluate our model on one document level task
We evaluate our model on three token level tasks:
mBERT | SlavicBERT | ALBERT-r | Czert-A | Czert-B | |
---|---|---|---|---|---|
FB | 71.72 ± 0.91 | 73.87 ± 0.50 | 59.50 ± 0.47 | 72.47 ± 0.72 | 76.55 ± 0.14 |
CSFD | 82.80 ± 0.14 | 82.51 ± 0.14 | 75.40 ± 0.18 | 79.58 ± 0.46 | 84.79 ± 0.26 |
Average F1 results for the Sentiment Classification task. For more information, see the paper .
mBERT | Pavlov | Albert-random | Czert-A | Czert-B | |
---|---|---|---|---|---|
STA-CNA | 83.335 ± 0.063 | 83.593 ± 0.050 | 43.184 ± 0.125 | 82.942 ± 0.106 | 84.345 ± 0.028 |
STS-SVOB-img | 79.367 ± 0.486 | 79.900 ± 0.810 | 15.739 ± 2.992 | 79.444 ± 0.338 | 83.744 ± 0.395 |
STS-SVOB-hl | 78.833 ± 0.296 | 76.996 ± 0.305 | 33.949 ± 1.807 | 75.089 ± 0.806 | 79.827 ± 0.469 |
Comparison of Pearson correlation achieved using pre-trained CZERT-A, CZERT-B, mBERT, Pavlov and randomly initialised Albert on semantic text similarity. For more information see the paper .
mBERT | SlavicBERT | ALBERT-r | Czert-A | Czert-B | |
---|---|---|---|---|---|
AUROC | 97.62 ± 0.08 | 97.80 ± 0.06 | 94.35 ± 0.13 | 97.49 ± 0.07 | 98.00 ± 0.04 |
F1 | 83.04 ± 0.16 | 84.08 ± 0.14 | 72.44 ± 0.22 | 82.27 ± 0.17 | 85.06 ± 0.11 |
Comparison of F1 and AUROC score achieved using pre-trained CZERT-A, CZERT-B, mBERT, Pavlov and randomly initialised Albert on multi-label document classification. For more information see the paper .
mBERT | Pavlov | Albert-random | Czert-A | Czert-B | |
---|---|---|---|---|---|
Universal Dependencies | 99.176 ± 0.006 | 99.211 ± 0.008 | 96.590 ± 0.096 | 98.713 ± 0.008 | 99.300 ± 0.009 |
Comparison of F1 score achieved using pre-trained CZERT-A, CZERT-B, mBERT, Pavlov and randomly initialised Albert on morphological tagging task. For more information see the paper .
mBERT | Pavlov | Albert-random | Czert-A | Czert-B | dep-based | gold-dep | |
---|---|---|---|---|---|---|---|
span | 78.547 ± 0.110 | 79.333 ± 0.080 | 51.365 ± 0.423 | 72.254 ± 0.172 | 81.861 ± 0.102 | - | - |
syntax | 90.226 ± 0.224 | 90.492 ± 0.040 | 80.747 ± 0.131 | 80.319 ± 0.054 | 91.462 ± 0.062 | 85.19 | 89.52 |
SRL results – dep columns are evaluate with labelled F1 from CoNLL 2009 evaluation script, other columns are evaluated with span F1 score same as it was used for NER evaluation. For more information see the paper .
mBERT | Pavlov | Albert-random | Czert-A | Czert-B | |
---|---|---|---|---|---|
CNEC | 86.225 ± 0.208 | 86.565 ± 0.198 | 34.635 ± 0.343 | 72.945 ± 0.227 | 86.274 ± 0.116 |
BSNLP 2019 | 84.006 ± 1.248 | 86.699 ± 0.370 | 19.773 ± 0.938 | 48.859 ± 0.605 | 86.729 ± 0.344 |
Comparison of f1 score achieved using pre-trained CZERT-A, CZERT-B, mBERT, Pavlov and randomly initialised Albert on named entity recognition task. For more information see the paper .
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/
For now, please cite the Arxiv paper :
@article{sido2021czert, title={Czert -- Czech BERT-like Model for Language Representation}, author={Jakub Sido and Ondřej Pražák and Pavel Přibáň and Jan Pašek and Michal Seják and Miloslav Konopík}, year={2021}, eprint={2103.13031}, archivePrefix={arXiv}, primaryClass={cs.CL}, journal={arXiv preprint arXiv:2103.13031}, }