数据集:
joelito/lextreme
计算机处理:
multilingual大小:
10K<n<100K语言创建人:
found批注创建人:
other源数据集:
extended预印本库:
arxiv:2301.13126许可:
The dataset consists of 11 diverse multilingual legal NLU datasets. 6 datasets have one single configuration and 5 datasets have two or three configurations. This leads to a total of 18 tasks (8 single-label text classification tasks, 5 multi-label text classification tasks and 5 token-classification tasks).
Use the dataset like this:
from datasets import load_dataset
dataset = load_dataset("joelito/lextreme", "swiss_judgment_prediction")
The dataset supports the tasks of text classification and token classification. In detail, we support the folliwing tasks and configurations:
| task | task type | configurations | link |
|---|---|---|---|
| Brazilian Court Decisions | Judgment Prediction | (judgment, unanimity) | joelito/brazilian_court_decisions |
| Swiss Judgment Prediction | Judgment Prediction | default | joelito/swiss_judgment_prediction |
| German Argument Mining | Argument Mining | default | joelito/german_argument_mining |
| Greek Legal Code | Topic Classification | (volume, chapter, subject) | greek_legal_code |
| Online Terms of Service | Unfairness Classification | (unfairness level, clause topic) | online_terms_of_service |
| Covid 19 Emergency Event | Event Classification | default | covid19_emergency_event |
| MultiEURLEX | Topic Classification | (level 1, level 2, level 3) | multi_eurlex |
| LeNER BR | Named Entity Recognition | default | lener_br |
| LegalNERo | Named Entity Recognition | default | legalnero |
| Greek Legal NER | Named Entity Recognition | default | greek_legal_ner |
| MAPA | Named Entity Recognition | (coarse, fine) | mapa |
The following languages are supported: bg , cs , da, de, el, en, es, et, fi, fr, ga, hr, hu, it, lt, lv, mt, nl, pl, pt, ro, sk, sl, sv
The file format is jsonl and three data splits are present for each configuration (train, validation and test).
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
How can I contribute a dataset to lextreme? Please follow the following steps:
[More Information Needed]
[More Information Needed]
@misc{niklaus2023lextreme,
title={LEXTREME: A Multi-Lingual and Multi-Task Benchmark for the Legal Domain},
author={Joel Niklaus and Veton Matoshi and Pooja Rani and Andrea Galassi and Matthias Stürmer and Ilias Chalkidis},
year={2023},
eprint={2301.13126},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Thanks to @JoelNiklaus for adding this dataset.