数据集:
greek_legal_code
任务:
文本分类语言:
el计算机处理:
monolingual大小:
10K<n<100K语言创建人:
found批注创建人:
found源数据集:
original预印本库:
arxiv:2109.15298许可:
cc-by-4.0Greek_Legal_Code (GLC) is a dataset consisting of approx. 47k legal resources from Greek legislation. The origin of GLC is “Permanent Greek Legislation Code - Raptarchis”, a collection of Greek legislative documents classified into multi-level (from broader to more specialized) categories.
Topics
GLC consists of 47 legislative volumes and each volume corresponds to a main thematic topic. Each volume is divided into thematic sub categories which are called chapters and subsequently, each chapter breaks down to subjects which contain the legal resources. The total number of chapters is 389 while the total number of subjects is 2285, creating an interlinked thematic hierarchy. So, for the upper thematic level (volume) GLC has 47 classes. For the next thematic level (chapter) GLC offers 389 classes and for the inner and last thematic level (subject), GLC has 2285 classes.
GLC classes are divided into three categories for each thematic level: frequent classes, which occur in more than 10 training documents and can be found in all three subsets (training, development and test); few-shot classes which appear in 1 to 10 training documents and also appear in the documents of the development and test sets, and zero-shot classes which appear in the development and/or test, but not in the training documents.
The dataset supports:
Multi-class Text Classification: Given the text of a document, a model predicts the corresponding class.
Few-shot and Zero-shot learning: As already noted, the classes can be divided into three groups: frequent, few-shot, and zero- shot, depending on whether they were assigned to more than 10, fewer than 10 but at least one, or no training documents, respectively.
Level | Total | Frequent | Few-Shot (<10) | Zero-Shot |
---|---|---|---|---|
Volume | 47 | 47 | 0 | 0 |
Chapter | 389 | 333 | 53 | 3 |
Subject | 2285 | 712 | 1431 | 142 |
All documents are written in Greek.
{ "text": "179. ΑΠΟΦΑΣΗ ΥΠΟΥΡΓΟΥ ΜΕΤΑΦΟΡΩΝ ΚΑΙ ΕΠΙΚΟΙΝΩΝΙΩΝ Αριθ. Β-οικ. 68425/4765 της 2/17 Νοεμ. 2000 (ΦΕΚ Β΄ 1404) Τροποποίηση της 42000/2030/81 κοιν. απόφασης του Υπουργού Συγκοινωνιών «Κωδικοποίηση και συμπλήρωση καν. Αποφάσεων» που εκδόθηκαν κατ’ εξουσιοδότηση του Ν.Δ. 102/73 «περί οργανώσεως των δια λεωφορείων αυτοκινήτων εκτελουμένων επιβατικών συγκοινωνιών». ", "volume": 24, # "ΣΥΓΚΟΙΝΩΝΙΕΣ" }
The following data fields are provided for documents ( train , dev , test ):
text : ( str ) The full content of each document, which is represented by its header and articles (i.e., the main_body ). label : ( class label ): Depending on the configurarion, the volume/chapter/subject of the document. For volume-level class it belongs to specifically: ["ΚΟΙΝΩΝΙΚΗ ΠΡΟΝΟΙΑ", "ΓΕΩΡΓΙΚΗ ΝΟΜΟΘΕΣΙΑ", "ΡΑΔΙΟΦΩΝΙΑ ΚΑΙ ΤΥΠΟΣ", "ΒΙΟΜΗΧΑΝΙΚΗ ΝΟΜΟΘΕΣΙΑ", "ΥΓΕΙΟΝΟΜΙΚΗ ΝΟΜΟΘΕΣΙΑ", "ΠΟΛΕΜΙΚΟ ΝΑΥΤΙΚΟ", "ΤΑΧΥΔΡΟΜΕΙΑ - ΤΗΛΕΠΙΚΟΙΝΩΝΙΕΣ", "ΔΑΣΗ ΚΑΙ ΚΤΗΝΟΤΡΟΦΙΑ", "ΕΛΕΓΚΤΙΚΟ ΣΥΝΕΔΡΙΟ ΚΑΙ ΣΥΝΤΑΞΕΙΣ", "ΠΟΛΕΜΙΚΗ ΑΕΡΟΠΟΡΙΑ", "ΝΟΜΙΚΑ ΠΡΟΣΩΠΑ ΔΗΜΟΣΙΟΥ ΔΙΚΑΙΟΥ", "ΝΟΜΟΘΕΣΙΑ ΑΝΩΝΥΜΩΝ ΕΤΑΙΡΕΙΩΝ ΤΡΑΠΕΖΩΝ ΚΑΙ ΧΡΗΜΑΤΙΣΤΗΡΙΩΝ", "ΠΟΛΙΤΙΚΗ ΑΕΡΟΠΟΡΙΑ", "ΕΜΜΕΣΗ ΦΟΡΟΛΟΓΙΑ", "ΚΟΙΝΩΝΙΚΕΣ ΑΣΦΑΛΙΣΕΙΣ", "ΝΟΜΟΘΕΣΙΑ ΔΗΜΩΝ ΚΑΙ ΚΟΙΝΟΤΗΤΩΝ", "ΝΟΜΟΘΕΣΙΑ ΕΠΙΜΕΛΗΤΗΡΙΩΝ ΣΥΝΕΤΑΙΡΙΣΜΩΝ ΚΑΙ ΣΩΜΑΤΕΙΩΝ", "ΔΗΜΟΣΙΑ ΕΡΓΑ", "ΔΙΟΙΚΗΣΗ ΔΙΚΑΙΟΣΥΝΗΣ", "ΑΣΦΑΛΙΣΤΙΚΑ ΤΑΜΕΙΑ", "ΕΚΚΛΗΣΙΑΣΤΙΚΗ ΝΟΜΟΘΕΣΙΑ", "ΕΚΠΑΙΔΕΥΤΙΚΗ ΝΟΜΟΘΕΣΙΑ", "ΔΗΜΟΣΙΟ ΛΟΓΙΣΤΙΚΟ", "ΤΕΛΩΝΕΙΑΚΗ ΝΟΜΟΘΕΣΙΑ", "ΣΥΓΚΟΙΝΩΝΙΕΣ", "ΕΘΝΙΚΗ ΑΜΥΝΑ", "ΣΤΡΑΤΟΣ ΞΗΡΑΣ", "ΑΓΟΡΑΝΟΜΙΚΗ ΝΟΜΟΘΕΣΙΑ", "ΔΗΜΟΣΙΟΙ ΥΠΑΛΛΗΛΟΙ", "ΠΕΡΙΟΥΣΙΑ ΔΗΜΟΣΙΟΥ ΚΑΙ ΝΟΜΙΣΜΑ", "ΟΙΚΟΝΟΜΙΚΗ ΔΙΟΙΚΗΣΗ", "ΛΙΜΕΝΙΚΗ ΝΟΜΟΘΕΣΙΑ", "ΑΣΤΙΚΗ ΝΟΜΟΘΕΣΙΑ", "ΠΟΛΙΤΙΚΗ ΔΙΚΟΝΟΜΙΑ", "ΔΙΠΛΩΜΑΤΙΚΗ ΝΟΜΟΘΕΣΙΑ", "ΔΙΟΙΚΗΤΙΚΗ ΝΟΜΟΘΕΣΙΑ", "ΑΜΕΣΗ ΦΟΡΟΛΟΓΙΑ", "ΤΥΠΟΣ ΚΑΙ ΤΟΥΡΙΣΜΟΣ", "ΕΘΝΙΚΗ ΟΙΚΟΝΟΜΙΑ", "ΑΣΤΥΝΟΜΙΚΗ ΝΟΜΟΘΕΣΙΑ", "ΑΓΡΟΤΙΚΗ ΝΟΜΟΘΕΣΙΑ", "ΕΡΓΑΤΙΚΗ ΝΟΜΟΘΕΣΙΑ", "ΠΟΙΝΙΚΗ ΝΟΜΟΘΕΣΙΑ", "ΕΜΠΟΡΙΚΗ ΝΟΜΟΘΕΣΙΑ", "ΕΠΙΣΤΗΜΕΣ ΚΑΙ ΤΕΧΝΕΣ", "ΕΜΠΟΡΙΚΗ ΝΑΥΤΙΛΙΑ", "ΣΥΝΤΑΓΜΑΤΙΚΗ ΝΟΜΟΘΕΣΙΑ" ] \
The labels can also be a the chapter-level or subject-level class it belongs to. Some chapter labels are omitted due to size (389 classes). Some subject labels are also omitted due to size (2285 classes).
Split | No of Documents | Avg. words |
---|---|---|
Train | 28,536 | 600 |
Development | 9,511 | 574 |
Test | 9,516 | 595 |
The dataset was curated by Papaloukas et al. (2021) with the hope to support and encourage further research in NLP for the Greek language.
The Permanent Greek Legislation Code - Raptarchis is a thorough catalogue of Greek legislation since the creation of the Greek state in 1834 until 2015. It includes Laws, Royal and Presidential Decrees, Regulations and Decisions, retrieved from the Official Government Gazette, where Greek legislation is published. This collection is one of the official, publicly available sources of classified Greek legislation suitable for classification tasks.
Currently, the original catalogue is publicly offered in MS Word (.doc) format through the portal e-Themis, the legal database and management service of it, under the administration of the Ministry of the Interior (Affairs). E-Themis is primarily focused on providing legislation on a multitude of predefined thematic categories, as described in the catalogue. The main goal is to help users find legislation of interest using the thematic index.
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
The dataset does not include personal or sensitive information.
[More Information Needed]
[More Information Needed]
[More Information Needed]
Papaloukas et al. (2021)
[More Information Needed]
Christos Papaloukas, Ilias Chalkidis, Konstantinos Athinaios, Despina-Athanasia Pantazi and Manolis Koubarakis. Multi-granular Legal Topic Classification on Greek Legislation. Proceedings of the 3rd Natural Legal Language Processing (NLLP) Workshop, Punta Cana, Dominican Republic, 2021
@inproceedings{papaloukas-etal-2021-glc, title = "Multi-granular Legal Topic Classification on Greek Legislation", author = "Papaloukas, Christos and Chalkidis, Ilias and Athinaios, Konstantinos and Pantazi, Despina-Athanasia and Koubarakis, Manolis", booktitle = "Proceedings of the Natural Legal Language Processing Workshop 2021", year = "2021", address = "Punta Cana, Dominican Republic", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/2109.15298", doi = "10.48550/arXiv.2109.15298", pages = "63--75" }
Thanks to @christospi for adding this dataset.