数据集:
joelito/lextreme
计算机处理:
multilingual大小:
10K<n<100K语言创建人:
found批注创建人:
other源数据集:
extended预印本库:
arxiv:2301.13126许可:
cc-by-4.0The dataset consists of 11 diverse multilingual legal NLU datasets. 6 datasets have one single configuration and 5 datasets have two or three configurations. This leads to a total of 18 tasks (8 single-label text classification tasks, 5 multi-label text classification tasks and 5 token-classification tasks).
Use the dataset like this:
from datasets import load_dataset dataset = load_dataset("joelito/lextreme", "swiss_judgment_prediction")
The dataset supports the tasks of text classification and token classification. In detail, we support the folliwing tasks and configurations:
task | task type | configurations | link |
---|---|---|---|
Brazilian Court Decisions | Judgment Prediction | (judgment, unanimity) | joelito/brazilian_court_decisions |
Swiss Judgment Prediction | Judgment Prediction | default | joelito/swiss_judgment_prediction |
German Argument Mining | Argument Mining | default | joelito/german_argument_mining |
Greek Legal Code | Topic Classification | (volume, chapter, subject) | greek_legal_code |
Online Terms of Service | Unfairness Classification | (unfairness level, clause topic) | online_terms_of_service |
Covid 19 Emergency Event | Event Classification | default | covid19_emergency_event |
MultiEURLEX | Topic Classification | (level 1, level 2, level 3) | multi_eurlex |
LeNER BR | Named Entity Recognition | default | lener_br |
LegalNERo | Named Entity Recognition | default | legalnero |
Greek Legal NER | Named Entity Recognition | default | greek_legal_ner |
MAPA | Named Entity Recognition | (coarse, fine) | mapa |
The following languages are supported: bg , cs , da, de, el, en, es, et, fi, fr, ga, hr, hu, it, lt, lv, mt, nl, pl, pt, ro, sk, sl, sv
The file format is jsonl and three data splits are present for each configuration (train, validation and test).
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
How can I contribute a dataset to lextreme? Please follow the following steps:
[More Information Needed]
[More Information Needed]
@misc{niklaus2023lextreme, title={LEXTREME: A Multi-Lingual and Multi-Task Benchmark for the Legal Domain}, author={Joel Niklaus and Veton Matoshi and Pooja Rani and Andrea Galassi and Matthias Stürmer and Ilias Chalkidis}, year={2023}, eprint={2301.13126}, archivePrefix={arXiv}, primaryClass={cs.CL} }
Thanks to @JoelNiklaus for adding this dataset.