数据集:
copenlu/sufficient_facts
任务:
文本分类子任务:
fact-checking语言:
en计算机处理:
monolingual大小:
1K<n<10K语言创建人:
crowdsourced批注创建人:
crowdsourced许可:
mitThis is the dataset SufficientFacts, introduced in the paper "Fact Checking with Insufficient Evidence", accepted at the TACL journal in 2022.
Automating the fact checking (FC) process relies on information obtained from external sources. In this work, we posit that it is crucial for FC models to make veracity predictions only when there is sufficient evidence and otherwise indicate when it is not enough. To this end, we are the first to study what information FC models consider sufficient by introducing a novel task and advancing it with three main contributions. First, we conduct an in-depth empirical analysis of the task with a new fluency-preserving method for omitting information from the evidence at the constituent and sentence level. We identify when models consider the remaining evidence (in)sufficient for FC, based on three trained models with different Transformer architectures and three FC datasets. Second, we ask annotators whether the omitted evidence was important for FC, resulting in a novel diagnostic dataset, SufficientFacts , for FC with omitted evidence. We find that models are least successful in detecting missing evidence when adverbial modifiers are omitted (21% accuracy), whereas it is easiest for omitted date modifiers (63% accuracy). Finally, we propose a novel data augmentation strategy for contrastive self-learning of missing evidence by employing the proposed omission method combined with tri-training. It improves performance for Evidence Sufficiency Prediction by up to 17.8 F1 score, which in turn improves FC performance by up to 2.6 F1 score.
English
The dataset consists of three files, each for one of the datasets -- FEVER, HoVer, and VitaminC. Each file consists of json lines of the format:
{ "claim": "Unison (Celine Dion album) was originally released by Atlantic Records.", "evidence": [ [ "Unison (Celine Dion album)", "The album was originally released on 2 April 1990 ." ] ], "label_before": "REFUTES", "label_after": "NOT ENOUGH", "agreement": "agree_ei", "type": "PP", "removed": ["by Columbia Records"], "text_orig": "[[Unison (Celine Dion album)]] The album was originally released on 2 April 1990 <span style=\"color:red;\">by Columbia Records</span> ." }
name | test_fever | test_hover | test_vitaminc |
---|---|---|---|
test | 1000 | 1000 | 600 |
Augmented from the test splits of the corresponding datasets.
The workers were provided with the following task description:
For each evidence text, some facts have been removed (marked in red ). You should annotate whether, given the remaining facts in the evidence text, the evidence is still enough for verifying the claim.
Note: You should not incorporate your own knowledge or beliefs! You should rely only on the evidence provided for the claim.
The annotators were then given example instance annotations. Finally, annotators were asked to complete a qualification test in order to be allowed to annotate instances for the task. The resulting inter-annotator agreement for SufficientFacts is 0.81 Fleiss'k from three annotators.
Who are the annotators?The annotations were performed by workers at Amazon Mechanical Turk.
MIT
@article{10.1162/tacl_a_00486, author = {Atanasova, Pepa and Simonsen, Jakob Grue and Lioma, Christina and Augenstein, Isabelle}, title = "{Fact Checking with Insufficient Evidence}", journal = {Transactions of the Association for Computational Linguistics}, volume = {10}, pages = {746-763}, year = {2022}, month = {07}, abstract = "{Automating the fact checking (FC) process relies on information obtained from external sources. In this work, we posit that it is crucial for FC models to make veracity predictions only when there is sufficient evidence and otherwise indicate when it is not enough. To this end, we are the first to study what information FC models consider sufficient by introducing a novel task and advancing it with three main contributions. First, we conduct an in-depth empirical analysis of the task with a new fluency-preserving method for omitting information from the evidence at the constituent and sentence level. We identify when models consider the remaining evidence (in)sufficient for FC, based on three trained models with different Transformer architectures and three FC datasets. Second, we ask annotators whether the omitted evidence was important for FC, resulting in a novel diagnostic dataset, SufficientFacts1, for FC with omitted evidence. We find that models are least successful in detecting missing evidence when adverbial modifiers are omitted (21\\% accuracy), whereas it is easiest for omitted date modifiers (63\\% accuracy). Finally, we propose a novel data augmentation strategy for contrastive self-learning of missing evidence by employing the proposed omission method combined with tri-training. It improves performance for Evidence Sufficiency Prediction by up to 17.8 F1 score, which in turn improves FC performance by up to 2.6 F1 score.}", issn = {2307-387X}, doi = {10.1162/tacl_a_00486}, url = {https://doi.org/10.1162/tacl\_a\_00486}, eprint = {https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl\_a\_00486/2037141/tacl\_a\_00486.pdf}, }
Thanks to @apepa for adding this dataset.