Dataset Card for Fig-QA

Dataset Summary

This is the dataset for the paper Testing the Ability of Language Models to Interpret Figurative Language . Fig-QA consists of 10256 examples of human-written creative metaphors that are paired as a Winograd schema. It can be used to evaluate the commonsense reasoning of models. The metaphors themselves can also be used as training data for other tasks, such as metaphor detection or generation.

Supported Tasks and Leaderboards

You can evaluate your models on the test set by submitting to the leaderboard on Explainaboard. Click on "New" and select qa-multiple-choice for the task field. Select accuracy for the metric. You should upload results in the form of a system output file in JSON or JSONL format.

Languages

This is the English version. Multilingual version can be found here .

Data Splits

Train-{S, M(no suffix), XL}: different training set sizes Dev Test (labels not provided for test set)

Considerations for Using the Data

Discussion of Biases

These metaphors are human-generated and may contain insults or other explicit content. Authors of the paper manually removed offensive content, but users should keep in mind that some potentially offensive content may remain in the dataset.

Additional Information

Licensing Information

MIT License

Citation Information

If you found the dataset useful, please cite this paper:

@misc{https://doi.org/10.48550/arxiv.2204.12632,
  doi = {10.48550/ARXIV.2204.12632},
  url = {https://arxiv.org/abs/2204.12632},
  author = {Liu, Emmy and Cui, Chen and Zheng, Kenneth and Neubig, Graham},
  keywords = {Computation and Language (cs.CL), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {Testing the Ability of Language Models to Interpret Figurative Language},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution Share Alike 4.0 International}
}

作者:

nightingal3

数据集大小:

1.22 MB