bert-german-ler

Model description

This model is a fine-tuned version of bert-base-german-cased on the German LER Dataset .

Distribution of classes in the dataset:

Fine-grained classes	#	%
1	PER	Person	1,747	3.26
2	RR	Judge	1,519	2.83
3	AN	Lawyer	111	0.21
4	LD	Country	1,429	2.66
5	ST	City	705	1.31
6	STR	Street	136	0.25
7	LDS	Landscape	198	0.37
8	ORG	Organization	1,166	2.17
9	UN	Company	1,058	1.97
10	INN	Institution	2,196	4.09
11	GRT	Court	3,212	5.99
12	MRK	Brand	283	0.53
13	GS	Law	18,52	34.53
14	VO	Ordinance	797	1.49
15	EUN	European legal norm	1,499	2.79
16	VS	Regulation	607	1.13
17	VT	Contract	2,863	5.34
18	RS	Court decision	12,58	23.46
19	LIT	Legal literature	3,006	5.60
Total	53,632	100

How to fine-tune another model on the German LER Dataset, see GitHub .

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 12
eval_batch_size: 16
max_seq_length: 512
num_epochs: 3

Results

Results on the dev set:

              precision    recall  f1-score   support

          AN       0.75      0.50      0.60        12
         EUN       0.92      0.93      0.92       116
         GRT       0.95      0.99      0.97       331
          GS       0.98      0.98      0.98      1720
         INN       0.84      0.91      0.88       199
          LD       0.95      0.95      0.95       109
         LDS       0.82      0.43      0.56        21
         LIT       0.88      0.92      0.90       231
         MRK       0.50      0.70      0.58        23
         ORG       0.64      0.71      0.67       103
         PER       0.86      0.93      0.90       186
          RR       0.97      0.98      0.97       144
          RS       0.94      0.95      0.94      1126
          ST       0.91      0.88      0.89        58
         STR       0.29      0.29      0.29         7
          UN       0.81      0.85      0.83       143
          VO       0.76      0.95      0.84        37
          VS       0.62      0.80      0.70        56
          VT       0.87      0.92      0.90       275

   micro avg       0.92      0.94      0.93      4897
   macro avg       0.80      0.82      0.80      4897
weighted avg       0.92      0.94      0.93      4897

Results on the test set:

              precision    recall  f1-score   support

          AN       1.00      0.89      0.94         9
         EUN       0.90      0.97      0.93       150
         GRT       0.98      0.98      0.98       321
          GS       0.98      0.99      0.98      1818
         INN       0.90      0.95      0.92       222
          LD       0.97      0.92      0.94       149
         LDS       0.91      0.45      0.61        22
         LIT       0.92      0.96      0.94       314
         MRK       0.78      0.88      0.82        32
         ORG       0.82      0.88      0.85       113
         PER       0.92      0.88      0.90       173
          RR       0.95      0.99      0.97       142
          RS       0.97      0.98      0.97      1245
          ST       0.79      0.86      0.82        64
         STR       0.75      0.80      0.77        15
          UN       0.90      0.95      0.93       108
          VO       0.80      0.83      0.81        71
          VS       0.73      0.84      0.78        64
          VT       0.93      0.97      0.95       290

   micro avg       0.94      0.96      0.95      5322
   macro avg       0.89      0.89      0.89      5322
weighted avg       0.95      0.96      0.95      5322

Reference

@misc{https://doi.org/10.48550/arxiv.2003.13016,
  doi = {10.48550/ARXIV.2003.13016},
  url = {https://arxiv.org/abs/2003.13016},  
  author = {Leitner, Elena and Rehm, Georg and Moreno-Schneider, Julián},  
  keywords = {Computation and Language (cs.CL), Information Retrieval (cs.IR), FOS: Computer and information sciences, FOS: Computer and information sciences},  
  title = {A Dataset of German Legal Documents for Named Entity Recognition},  
  publisher = {arXiv},  
  year = {2020},  
  copyright = {arXiv.org perpetual, non-exclusive license}
}

作者:

Elena Leitner

数据集大小:

828.54 MB