数据集:
jfrenz/legalglue
计算机处理:
multilingual源数据集:
extendedThe "Legal General Language Understanding Evaluation" (LegalGLUE) dataset was created as part of a bachelor thesis. It consists of four already existing datasets covering three task types and a total of 23 different languages.
Dataset | Source | Task Type | Languages |
German_LER | Leitner et al. | Named Entity Recognition | German |
LeNER_Br | de Araujo et al., 2018 | Named Entity Recognition | Portuguese |
SwissJudgmentPrediction | Niklaus et al. | Binary Text Classification | German, French, Italian |
MultEURLEX | Chalkidis et al. | Multi-label Text Classification | 23 languages (see below) |
see Split section
German_LER example
from datasets import load_dataset dataset = load_dataset('jfrenz/legalglue', 'german_ler')
{ 'id': '66722', 'tokens':['4.', 'Die', 'Kostenentscheidung', 'für', 'das', 'gerichtliche', 'Antragsverfahren', 'beruht', 'auf', '§', '21', 'Abs.', '2', 'Satz', '1', 'i.', 'V.', 'm.', '§', '20', 'Abs.', '1', 'Satz', '1', 'WBO', '.'], 'ner_tags': [38, 38, 38, 38, 38, 38, 38, 38, 38, 3, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 38] }LeNER-Br
LeNER-Br example
from datasets import load_dataset dataset = load_dataset('jfrenz/legalglue', 'lener_br')
{ 'id': '7826', 'tokens': ['Firmado', 'por', 'assinatura', 'digital', '(', 'MP', '2.200-2/2001', ')', 'JOSÉ', 'ROBERTO', 'FREIRE', 'PIMENTA', 'Ministro', 'Relator', 'fls', '.', 'PROCESSO', 'Nº', 'TST-RR-1603-79.2010.5.20.0001'], 'ner_tags': [0, 0, 0, 0, 0, 9, 10, 0, 3, 4, 4, 4, 0, 0, 0, 0, 11, 12, 12]}SwissJudgmentPrediction
swissJudgmentPrediction_de example
from datasets import load_dataset dataset = load_dataset('jfrenz/legalglue', 'swissJudgmentPrediction_de')
{ 'id': 48755, 'year': 2014, 'text': "Sachverhalt: A. X._ fuhr am 25. Juli 2012 bei Mülligen mit seinem Personenwagen auf dem zweiten Überholstreifen der Autobahn A1 in Richtung Zürich. Gemäss Anklage schloss er auf einen Lieferwagen auf und schwenkte vom zweiten auf den ersten Überholstreifen aus. Danach fuhr er an zwei Fahrzeugen rechts vorbei und wechselte auf die zweite Überholspur zurück. B. Das Obergericht des Kantons Aargau erklärte X._ am 14. Januar 2014 zweitinstanzlich der groben Verletzung der Verkehrsregeln schuldig. Es bestrafte ihn mit einer bedingten Geldstrafe von 30 Tagessätzen zu Fr. 430.-- und einer Busse von Fr. 3'000.--. C. X._ führt Beschwerde in Strafsachen. Er beantragt, er sei von Schuld und Strafe freizusprechen. Eventualiter sei die Sache an die Vorinstanz zurückzuweisen. ", 'label': 0, 'language': 'de', 'region': 'Northwestern Switzerland', 'canton': 'ag', 'legal area': 'penal law' }MultiEURLEX
Monolingual example out of the MultiEURLEX-Dataset
from datasets import load_dataset dataset = load_dataset('jfrenz/legalglue', 'multi_eurlex_de')
{ 'celex_id': '32002R0130', 'text': 'Verordnung (EG) Nr. 130/2002 der Kommission\nvom 24. Januar 2002\nbezüglich der im Rahmen der Auss...', 'labels': [3, 17, 5]}
Multilingual example out of the MultiEURLEX-Dataset
from datasets import load_dataset dataset = load_dataset('jfrenz/legalglue', 'multi_eurlex_all_languages')
{ 'celex_id': '32002R0130', 'text': { 'bg': None, 'cs': None, 'da': 'Kommissionens ...', 'de': 'Verordnung ... ', 'el': '...', 'en': '...', ... }, 'labels': [3, 17, 5] }
Monolingual use:
Multilingual use:
The labels lists consists per default of level 1 EUROVOC concepts. Can be changed by adding the label_level parameter when loading the dataset. (available levels: level_1, level_2, level_3, all_levels)
from datasets import load_dataset dataset = load_dataset('jfrenz/legalglue', 'multi_eurlex_de', label_level="level_3")
Dataset | Language | ISO code | Number of Documents train/dev/test |
---|---|---|---|
German-LER | German | de | 66723 / - / - |
LeNER-Br | Portuguese | pt | 7828 / 1177 / 1390 |
SwissJudgmentPrediction | German | de | 35458 / 4705 / 9725 |
French | fr | 21179 / 3095 / 6820 | |
Italian | it | 3072 / 408 / 812 | |
MultiEURLEX | English | en | 55,000 / 5,000 / 5,000 |
German | de | 55,000 / 5,000 / 5,000 | |
French | fr | 55,000 / 5,000 / 5,000 | |
Italian | it | 55,000 / 5,000 / 5,000 | |
Spanish | es | 52,785 / 5,000 / 5,000 | |
Polish | pl | 23,197 / 5,000 / 5,000 | |
Romanian | ro | 15,921 / 5,000 / 5,000 | |
Dutch | nl | 55,000 / 5,000 / 5,000 | |
Greek | el | 55,000 / 5,000 / 5,000 | |
Hungarian | hu | 22,664 / 5,000 / 5,000 | |
Portuguese | pt | 23,188 / 5,000 / 5,000 | |
Czech | cs | 23,187 / 5,000 / 5,000 | |
Swedish | sv | 42,490 / 5,000 / 5,000 | |
Bulgarian | bg | 15,986 / 5,000 / 5,000 | |
Danish | da | 55,000 / 5,000 / 5,000 | |
Finnish | fi | 42,497 / 5,000 / 5,000 | |
Slovak | sk | 15,986 / 5,000 / 5,000 | |
Lithuanian | lt | 23,188 / 5,000 / 5,000 | |
Croatian | hr | 7,944 / 2,500 / 5,000 | |
Slovene | sl | 23,184 / 5,000 / 5,000 | |
Estonian | et | 23,126 / 5,000 / 5,000 | |
Latvian | lv | 23,188 / 5,000 / 5,000 | |
Maltese | mt | 17,521 / 5,000 / 5,000 |
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]