Dataset Card for A Dataset for Statutory Reasoning in Tax Law Entailment and Question Answering

Note: this is for SARA v1, for SARA v2 please see https://nlp.jhu.edu/law/ (will be on Huggingface soon!)

Dataset Summary

If you use this dataset, we would appreciate you citing our work:

@inproceedings{Holzenberger2020ADF,
  title={A Dataset for Statutory Reasoning in Tax Law Entailment and Question Answering},
  author={Nils Holzenberger and Andrew Blair-Stanek and Benjamin Van Durme},
  booktitle={NLLP@KDD},
  year={2020}
}

Supported Tasks and Leaderboards

There are two tasks: question answering and natural language inference, both with train and test sets. There is no official leaderboard.

Language

English

Dataset Structure

Data Instances

Here's an example instance:

{
    "id": "s151_a_neg",
    "text": "Alice's income in 2015 is $100000. She gets one exemption of $2000 for the year 2015 under section 151(c). Alice is not married.",
    "question": "Alice's total exemption for 2015 under section 151(a) is equal to $6000",
    "answer": "Contradiction",
    "facts": ":- discontiguous s151_c\/4.\n:- [statutes\/prolog\/init].\nincome_(alice_makes_money).\nagent_(alice_makes_money,alice).\nstart_(alice_makes_money,\"2015-01-01\").\nend_(alice_makes_money,\"2015-12-31\").\namount_(alice_makes_money,100000).\ns151_c(alice,_,2000,2015).",
    "test": ":- \\+ s151_a(alice,6000,2015)."
}

Data Fields

id : unique ID for the instance, indicating the case number and relevant statute if applicable.
text : The background details for the legal case
question : the question (or hypothesis) for the instance
answer : the answer to the question or NLI judgement (Entailment/Contradict)
facts : the relevant facts for the case, in Prolog
test : the relevant execution code, in Prolog

Data Splits

Data splits can be accessed as:

from datasets import load_dataset
qa_test = load_dataset("jhu-clsp/SARA", "qa", split="test")
qa_train = load_dataset("jhu-clsp/SARA", "qa", split="train")
nli_test = load_dataset("jhu-clsp/SARA", "nli", split="test")
nli_train = load_dataset("jhu-clsp/SARA", "nli", split="train")

Dataset Creation

Full details are in the paper: https://ceur-ws.org/Vol-2645/paper5.pdf

作者:

jhu-clsp

数据集大小:

383.83 KB