模型:
elenanereiss/bert-german-ler
This model is a fine-tuned version of bert-base-german-cased on the German LER Dataset .
Distribution of classes in the dataset:
Fine-grained classes | # | % | ||
---|---|---|---|---|
1 | PER | Person | 1,747 | 3.26 |
2 | RR | Judge | 1,519 | 2.83 |
3 | AN | Lawyer | 111 | 0.21 |
4 | LD | Country | 1,429 | 2.66 |
5 | ST | City | 705 | 1.31 |
6 | STR | Street | 136 | 0.25 |
7 | LDS | Landscape | 198 | 0.37 |
8 | ORG | Organization | 1,166 | 2.17 |
9 | UN | Company | 1,058 | 1.97 |
10 | INN | Institution | 2,196 | 4.09 |
11 | GRT | Court | 3,212 | 5.99 |
12 | MRK | Brand | 283 | 0.53 |
13 | GS | Law | 18,52 | 34.53 |
14 | VO | Ordinance | 797 | 1.49 |
15 | EUN | European legal norm | 1,499 | 2.79 |
16 | VS | Regulation | 607 | 1.13 |
17 | VT | Contract | 2,863 | 5.34 |
18 | RS | Court decision | 12,58 | 23.46 |
19 | LIT | Legal literature | 3,006 | 5.60 |
Total | 53,632 | 100 |
How to fine-tune another model on the German LER Dataset, see GitHub .
The following hyperparameters were used during training:
precision recall f1-score support AN 0.75 0.50 0.60 12 EUN 0.92 0.93 0.92 116 GRT 0.95 0.99 0.97 331 GS 0.98 0.98 0.98 1720 INN 0.84 0.91 0.88 199 LD 0.95 0.95 0.95 109 LDS 0.82 0.43 0.56 21 LIT 0.88 0.92 0.90 231 MRK 0.50 0.70 0.58 23 ORG 0.64 0.71 0.67 103 PER 0.86 0.93 0.90 186 RR 0.97 0.98 0.97 144 RS 0.94 0.95 0.94 1126 ST 0.91 0.88 0.89 58 STR 0.29 0.29 0.29 7 UN 0.81 0.85 0.83 143 VO 0.76 0.95 0.84 37 VS 0.62 0.80 0.70 56 VT 0.87 0.92 0.90 275 micro avg 0.92 0.94 0.93 4897 macro avg 0.80 0.82 0.80 4897 weighted avg 0.92 0.94 0.93 4897
precision recall f1-score support AN 1.00 0.89 0.94 9 EUN 0.90 0.97 0.93 150 GRT 0.98 0.98 0.98 321 GS 0.98 0.99 0.98 1818 INN 0.90 0.95 0.92 222 LD 0.97 0.92 0.94 149 LDS 0.91 0.45 0.61 22 LIT 0.92 0.96 0.94 314 MRK 0.78 0.88 0.82 32 ORG 0.82 0.88 0.85 113 PER 0.92 0.88 0.90 173 RR 0.95 0.99 0.97 142 RS 0.97 0.98 0.97 1245 ST 0.79 0.86 0.82 64 STR 0.75 0.80 0.77 15 UN 0.90 0.95 0.93 108 VO 0.80 0.83 0.81 71 VS 0.73 0.84 0.78 64 VT 0.93 0.97 0.95 290 micro avg 0.94 0.96 0.95 5322 macro avg 0.89 0.89 0.89 5322 weighted avg 0.95 0.96 0.95 5322
@misc{https://doi.org/10.48550/arxiv.2003.13016, doi = {10.48550/ARXIV.2003.13016}, url = {https://arxiv.org/abs/2003.13016}, author = {Leitner, Elena and Rehm, Georg and Moreno-Schneider, Julián}, keywords = {Computation and Language (cs.CL), Information Retrieval (cs.IR), FOS: Computer and information sciences, FOS: Computer and information sciences}, title = {A Dataset of German Legal Documents for Named Entity Recognition}, publisher = {arXiv}, year = {2020}, copyright = {arXiv.org perpetual, non-exclusive license} }