数据集:
mteb/amazon_counterfactual
预印本库:
arxiv:2104.06893The dataset contains sentences from Amazon customer reviews (sampled from Amazon product review dataset) annotated for counterfactual detection (CFD) binary classification. Counterfactual statements describe events that did not or cannot take place. Counterfactual statements may be identified as statements of the form – If p was true, then q would be true (i.e. assertions whose antecedent (p) and consequent (q) are known or assumed to be false).
The key features of this dataset are:
Please see the paper for the data statistics, detailed description of data collection and annotation.
GitHub repo URL: https://github.com/amazon-research/amazon-multilingual-counterfactual-dataset
You can load each of the languages as follows:
from datasets import get_dataset_config_names dataset_id = "SetFit/amazon_counterfactual" # Returns ['de', 'en', 'en-ext', 'ja'] configs = get_dataset_config_names(dataset_id) # Load English subset dset = load_dataset(dataset_id, name="en")