数据集:

rcds/swiss_legislation

中文

Dataset Card for Swiss Legislation

Dataset Summary

Swiss Legislation is a multilingual, diachronic dataset of 36K Swiss laws. This dataset is part of a challenging Information Retreival task.

Supported Tasks and Leaderboards

Languages

The total number of texts in the dataset is 35,698. The dataset is saved in lexfind_v2.jsonl format. Switzerland has four official languages German, French, Italian and Romanch with some additional English laws being represenated. Laws are written by legal experts. 36K & 18K & 11K & 6K & 534 & 207

Language Subset Number of Documents
German de 18K
French fr 11K
Italian it 6K
Romanch rm 534
English en 207

Dataset Structure

Data Fields

Each entry in the dataset is a dictionary with the following keys:

  • canton : the canton of origin of the legislation
    • example: "ag"
  • language : the language of the legislation
    • example: "de"
  • uuid : a unique identifier for the legislation
    • example: "ec312f57-05fe-4552-ba50-8c9c269e0f3b"
  • title : the title of the legislation
    • example: "Gesetz über die Geoinformation im Kanton Aargau"
  • short : a short description of the legislation
    • example: "Kantonales Geoinformationsgesetz"
  • abbreviation : an abbreviation for the legislation
    • example: "KGeoIG"
  • sr_number : a reference number for the legislation
    • example: "740.100"
  • is_active : whether the legislation is currently in force
    • example: true
  • version_active_since : the date since when the legislation's current version is active
    • example: "2021-09-01"
  • family_active_since : the date since when the legislation's current version's family is active
    • example: "2011-05-24"
  • version_inactive_since : the date since when the legislation's current version is inactive
    • example: null
  • version_found_at : the date the legislation's current version was found
    • example: "2021-09-01"
  • pdf_url : a link to the legislation's pdf
  • html_url : a link to the legislation's html
  • pdf_content : the legislation's pdf content
    • example: "740.100 - Gesetz über..."
  • html_content : the legislation's html content
    • example: ""
  • changes : a list of changes made to the legislation
    • example: []
  • history : a list of the legislation's history
    • example: []
  • quotes : a list of quotes from the legislation
    • example: []

Data Instances

[More Information Needed]

Data Fields

[More Information Needed]

Data Splits

  • 'ch': Switzerland (Federal) - 15840
  • 'fr': Fribourg - 1633
  • 'be': Bern - 1344
  • 'vs': Valais - 1328
  • 'gr': Graubünden - 1205
  • 'ne': Neuchâtel - 1115
  • 'zh': Zurich - 974
  • 'bs': Basel-Stadt - 899
  • 'bl': Basel-Landschaft - 863
  • 'vd': Vaud - 870
  • 'ge': Geneva - 837
  • 'sg': St. Gallen - 764
  • 'ju': Jura - 804
  • 'zg': Zug - 632
  • 'ti': Ticino - 627
  • 'lu': Lucerne - 584
  • 'so': Solothurn - 547
  • 'ow': Obwalden - 513
  • 'ik': Interkantonal - 510
  • 'sh': Schaffhausen - 469
  • 'gl': Glarus - 467
  • 'tg': Thurgau - 453
  • 'sz': Schwyz - 423
  • 'ai': Appenzell Innerrhoden - 416
  • 'ag': Aargau - 483
  • 'ar': Appenzell Ausserrhoden - 330
  • 'nw': Nidwalden - 401
  • 'ur': Uri - 367
  • Dataset Creation

    Curation Rationale

    Source Data

    Initial Data Collection and Normalization

    The original data are published from the Swiss Federal Supreme Court ( https://www.bger.ch ) in unprocessed formats (HTML). The documents were downloaded from the Entscheidsuche portal ( https://entscheidsuche.ch ) in HTML.

    Who are the source language producers?

    The decisions are written by the judges and clerks in the language of the proceedings.

    Annotations

    Annotation process Who are the annotators?

    Metadata is published by the Swiss Federal Supreme Court ( https://www.bger.ch ).

    Personal and Sensitive Information

    The dataset contains publicly available court decisions from the Swiss Federal Supreme Court. Personal or sensitive information has been anonymized by the court before publication according to the following guidelines: https://www.bger.ch/home/juridiction/anonymisierungsregeln.html .

    Considerations for Using the Data

    Social Impact of Dataset

    [More Information Needed]

    Discussion of Biases

    [More Information Needed]

    Other Known Limitations

    [More Information Needed]

    Additional Information

    Dataset Curators

    [More Information Needed]

    Licensing Information

    We release the data under CC-BY-4.0 which complies with the court licensing ( https://www.bger.ch/files/live/sites/bger/files/pdf/de/urteilsveroeffentlichung_d.pdf ) © Swiss Federal Supreme Court, 2002-2022

    The copyright for the editorial content of this website and the consolidated texts, which is owned by the Swiss Federal Supreme Court, is licensed under the Creative Commons Attribution 4.0 International licence. This means that you can re-use the content provided you acknowledge the source and indicate any changes you have made. Source: https://www.bger.ch/files/live/sites/bger/files/pdf/de/urteilsveroeffentlichung_d.pdf

    Citation Information

    Please cite our ArXiv-Preprint

    @misc{rasiah2023scale,
          title={SCALE: Scaling up the Complexity for Advanced Language Model Evaluation}, 
          author={Vishvaksenan Rasiah and Ronja Stern and Veton Matoshi and Matthias Stürmer and Ilias Chalkidis and Daniel E. Ho and Joel Niklaus},
          year={2023},
          eprint={2306.09237},
          archivePrefix={arXiv},
          primaryClass={cs.CL}
    }