数据集:
sileod/probability_words_nli
语言:
en计算机处理:
monolingual大小:
1K<n<10K语言创建人:
crowdsourced批注创建人:
expert-generated源数据集:
original许可:
apache-2.0This dataset tests the capabilities of language models to correctly capture the meaning of words denoting probabilities (WEP, also called verbal probabilities), e.g. words like "probably", "maybe", "surely", "impossible".
We used probabilitic soft logic to combine probabilistic statements expressed with WEP (WEP-Reasoning) and we also used the UNLI dataset ( https://nlp.jhu.edu/unli/ ) to directly check whether models can detect the WEP matching human-annotated probabilities according to Fagen-Ulmschneider, 2018 . The dataset can be used as natural language inference data (context, premise, label) or multiple choice question answering (context,valid_hypothesis, invalid_hypothesis).
Code : colab
Accepted at Starsem2023 (The 12th Joint Conference on Lexical and Computational Semantics). Temporary citation:
@article{sileo2022probing, title={Probing neural language models for understanding of words of estimative probability}, author={Sileo, Damien and Moens, Marie-Francine}, journal={arXiv preprint arXiv:2211.03358}, year={2022} }