Note: this is for SARA v1, for SARA v2 please see https://nlp.jhu.edu/law/ (will be on Huggingface soon!)
If you use this dataset, we would appreciate you citing our work:
@inproceedings{Holzenberger2020ADF, title={A Dataset for Statutory Reasoning in Tax Law Entailment and Question Answering}, author={Nils Holzenberger and Andrew Blair-Stanek and Benjamin Van Durme}, booktitle={NLLP@KDD}, year={2020} }
There are two tasks: question answering and natural language inference, both with train and test sets. There is no official leaderboard.
English
Here's an example instance:
{ "id": "s151_a_neg", "text": "Alice's income in 2015 is $100000. She gets one exemption of $2000 for the year 2015 under section 151(c). Alice is not married.", "question": "Alice's total exemption for 2015 under section 151(a) is equal to $6000", "answer": "Contradiction", "facts": ":- discontiguous s151_c\/4.\n:- [statutes\/prolog\/init].\nincome_(alice_makes_money).\nagent_(alice_makes_money,alice).\nstart_(alice_makes_money,\"2015-01-01\").\nend_(alice_makes_money,\"2015-12-31\").\namount_(alice_makes_money,100000).\ns151_c(alice,_,2000,2015).", "test": ":- \\+ s151_a(alice,6000,2015)." }
Data splits can be accessed as:
from datasets import load_dataset qa_test = load_dataset("jhu-clsp/SARA", "qa", split="test") qa_train = load_dataset("jhu-clsp/SARA", "qa", split="train") nli_test = load_dataset("jhu-clsp/SARA", "nli", split="test") nli_train = load_dataset("jhu-clsp/SARA", "nli", split="train")
Full details are in the paper: https://ceur-ws.org/Vol-2645/paper5.pdf