数据集:
joelito/brazilian_court_decisions
任务:
文本分类语言:
pt计算机处理:
monolingual大小:
1K<n<10K语言创建人:
found批注创建人:
found源数据集:
original预印本库:
arxiv:1905.10348许可:
otherThe dataset is a collection of 4043 Ementa (summary) court decisions and their metadata from the Tribunal de Justiça de Alagoas (TJAL, the State Supreme Court of Alagoas (Brazil). The court decisions are labeled according to 7 categories and whether the decisions were unanimous on the part of the judges or not. The dataset supports the task of Legal Judgment Prediction.
Legal Judgment Prediction
Brazilian Portuguese
The file format is jsonl and three data splits are present (train, validation and test) for each configuration.
The dataset contains the following fields:
The data has been split randomly into 80% train (3234), 10% validation (404), 10% test (405).
There are two tasks possible for this dataset.
JudgmentLabel Distribution
judgment | train | validation | test |
---|---|---|---|
no | 1960 | 221 | 234 |
partial | 677 | 96 | 93 |
yes | 597 | 87 | 78 |
total | 3234 | 404 | 405 |
In this configuration, all cases that have not_determined as unanimity_label can be removed.
Label Distribution
unanimity_label | train | validation | test |
---|---|---|---|
not_determined | 1519 | 193 | 201 |
unanimity | 1681 | 205 | 200 |
not-unanimity | 34 | 6 | 4 |
total | 3234 | 404 | 405 |
This dataset was created to further the research on developing models for predicting Brazilian court decisions that are also able to predict whether the decision will be unanimous.
The data was scraped from Tribunal de Justiça de Alagoas (TJAL, the State Supreme Court of Alagoas (Brazil).
Initial Data Collection and Normalization“We developed a Web scraper for collecting data from Brazilian courts. The scraper first searched for the URL that contains the list of court cases […]. Then, the scraper extracted from these HTML files the specific case URLs and downloaded their data […]. Next, it extracted the metadata and the contents of legal cases and stored them in a CSV file format […].” (Lage-Freitas et al., 2022)
Who are the source language producers?The source language producer are presumably attorneys, judges, and other legal professionals.
The dataset was not annotated.
Who are the annotators?[More Information Needed]
The court decisions might contain sensitive information about individuals.
[More Information Needed]
[More Information Needed]
Note that the information given in this dataset card refer to the dataset version as provided by Joel Niklaus and Veton Matoshi. The dataset at hand is intended to be part of a bigger benchmark dataset. Creating a benchmark dataset consisting of several other datasets from different sources requires postprocessing. Therefore, the structure of the dataset at hand, including the folder structure, may differ considerably from the original dataset. In addition to that, differences with regard to dataset statistics as give in the respective papers can be expected. The reader is advised to have a look at the conversion script convert_to_hf_dataset.py in order to retrace the steps for converting the original dataset into the present jsonl-format. For further information on the original dataset structure, we refer to the bibliographical references and the original Github repositories and/or web pages provided in this dataset card.
Lage-Freitas, A., Allende-Cid, H., Santana Jr, O., & Oliveira-Lage, L. (2019). Predicting Brazilian court decisions:
The names of the original dataset curators and creators can be found in references given below, in the section Citation Information . Additional changes were made by Joel Niklaus ( Email ; Github ) and Veton Matoshi ( Email ; Github ).
No licensing information was provided for this dataset. However, please make sure that you use the dataset according to Brazilian law.
@misc{https://doi.org/10.48550/arxiv.1905.10348, author = {Lage-Freitas, Andr{\'{e}} and Allende-Cid, H{\'{e}}ctor and Santana, Orivaldo and de Oliveira-Lage, L{\'{i}}via}, doi = {10.48550/ARXIV.1905.10348}, keywords = {Computation and Language (cs.CL),FOS: Computer and information sciences,Social and Information Networks (cs.SI)}, publisher = {arXiv}, title = {{Predicting Brazilian court decisions}}, url = {https://arxiv.org/abs/1905.10348}, year = {2019} }
@article{Lage-Freitas2022, author = {Lage-Freitas, Andr{\'{e}} and Allende-Cid, H{\'{e}}ctor and Santana, Orivaldo and Oliveira-Lage, L{\'{i}}via}, doi = {10.7717/peerj-cs.904}, issn = {2376-5992}, journal = {PeerJ. Computer science}, keywords = {Artificial intelligence,Jurimetrics,Law,Legal,Legal NLP,Legal informatics,Legal outcome forecast,Litigation prediction,Machine learning,NLP,Portuguese,Predictive algorithms,judgement prediction}, language = {eng}, month = {mar}, pages = {e904--e904}, publisher = {PeerJ Inc.}, title = {{Predicting Brazilian Court Decisions}}, url = {https://pubmed.ncbi.nlm.nih.gov/35494851 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9044329/}, volume = {8}, year = {2022} }
Thanks to @kapllan and @joelniklaus for adding this dataset.