数据集:

rcds/swiss_law_area_prediction

任务:

文本分类

语言:

计算机处理:

multilingual

大小:

100K<n<1M

语言创建人:

expert-generated

批注创建人:

machine-generated

源数据集:

original

许可:

cc-by-sa-4.0

预印本库:

arxiv:2306.09237

数据集介绍文件清单

中文

Dataset Card for Law Area Prediction

Dataset Summary

The dataset contains cases to be classified into the four main areas of law: Public, Civil, Criminal and Social

These can be classified further into sub-areas:

"public": ['Tax', 'Urban Planning and Environmental', 'Expropriation', 'Public Administration', 'Other Fiscal'],
"civil": ['Rental and Lease', 'Employment Contract', 'Bankruptcy', 'Family', 'Competition and Antitrust', 'Intellectual Property'],
'criminal': ['Substantive Criminal', 'Criminal Procedure']

Supported Tasks and Leaderboards

Law Area Prediction can be used as text classification task

Languages

Switzerland has four official languages with three languages German, French and Italian being represenated. The decisions are written by the judges and clerks in the language of the proceedings.

Language	Subset	Number of Documents
German	de	127K
French	fr	156K
Italian	it	46K

Dataset Structure

decision_id: unique identifier for the decision
facts: facts section of the decision
considerations: considerations section of the decision
law_area: label of the decision (main area of law)
law_sub_area: sub area of law of the decision
language: language of the decision
year: year of the decision
court: court of the decision
chamber: chamber of the decision
canton: canton of the decision
region: region of the decision

Data Fields

[More Information Needed]

Data Instances

[More Information Needed]

Data Fields

[More Information Needed]

Data Splits

The dataset was split date-stratisfied

Train: 2002-2015
Validation: 2016-2017
Test: 2018-2022

Dataset Creation

Curation Rationale

Source Data

Initial Data Collection and Normalization

The original data are published from the Swiss Federal Supreme Court ( https://www.bger.ch ) in unprocessed formats (HTML). The documents were downloaded from the Entscheidsuche portal ( https://entscheidsuche.ch ) in HTML.

Who are the source language producers?

The decisions are written by the judges and clerks in the language of the proceedings.

Annotations

Annotation process Who are the annotators?

Personal and Sensitive Information

The dataset contains publicly available court decisions from the Swiss Federal Supreme Court. Personal or sensitive information has been anonymized by the court before publication according to the following guidelines: https://www.bger.ch/home/juridiction/anonymisierungsregeln.html .

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

We release the data under CC-BY-4.0 which complies with the court licensing ( https://www.bger.ch/files/live/sites/bger/files/pdf/de/urteilsveroeffentlichung_d.pdf ) © Swiss Federal Supreme Court, 2002-2022

The copyright for the editorial content of this website and the consolidated texts, which is owned by the Swiss Federal Supreme Court, is licensed under the Creative Commons Attribution 4.0 International licence. This means that you can re-use the content provided you acknowledge the source and indicate any changes you have made. Source: https://www.bger.ch/files/live/sites/bger/files/pdf/de/urteilsveroeffentlichung_d.pdf

Citation Information

Please cite our ArXiv-Preprint

@misc{rasiah2023scale,
      title={SCALE: Scaling up the Complexity for Advanced Language Model Evaluation}, 
      author={Vishvaksenan Rasiah and Ronja Stern and Veton Matoshi and Matthias Stürmer and Ilias Chalkidis and Daniel E. Ho and Joel Niklaus},
      year={2023},
      eprint={2306.09237},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Contributions

作者:

rcds

数据集大小:

1.38 GB