数据集:
jmhessel/newyorker_caption_contest
See capcon.dev for more!
@article{hessel2022androids, title={Do Androids Laugh at Electric Sheep? Humor "Understanding" Benchmarks from The New Yorker Caption Contest}, author={Hessel, Jack and Marasovi{\'c}, Ana and Hwang, Jena D and Lee, Lillian and Da, Jeff and Zellers, Rowan and Mankoff, Robert and Choi, Yejin}, journal={arXiv preprint arXiv:2209.06293}, year={2022} }
If you use this dataset, we would appreciate you citing our work, but also -- several other papers that we build this corpus upon. See Citation Information .
We challenge AI models to "demonstrate understanding" of the sophisticated multimodal humor of The New Yorker Caption Contest. Concretely, we develop three carefully circumscribed tasks for which it suffices (but is not necessary) to grasp potentially complex and unexpected relationships between image and caption, and similarly complex and unexpected allusions to the wide varieties of human experience.
Three tasks are supported:
There are no official leaderboards (yet).
English
Here's an example instance from Matching:
{'caption_choices': ['Tell me about your childhood very quickly.', "Believe me . . . it's what's UNDER the ground that's " 'most interesting.', "Stop me if you've heard this one.", 'I have trouble saying no.', 'Yes, I see the train but I think we can beat it.'], 'contest_number': 49, 'entities': ['https://en.wikipedia.org/wiki/Rule_of_three_(writing)', 'https://en.wikipedia.org/wiki/Bar_joke', 'https://en.wikipedia.org/wiki/Religious_institute'], 'from_description': 'scene: a bar description: Two priests and a rabbi are ' 'walking into a bar, as the bartender and another patron ' 'look on. The bartender talks on the phone while looking ' 'skeptically at the incoming crew. uncanny: The scene ' 'depicts a very stereotypical "bar joke" that would be ' 'unlikely to be encountered in real life; the skepticism ' 'of the bartender suggests that he is aware he is seeing ' 'this trope, and is explaining it to someone on the ' 'phone. entities: Rule_of_three_(writing), Bar_joke, ' 'Religious_institute. choices A: Tell me about your ' "childhood very quickly. B: Believe me . . . it's what's " "UNDER the ground that's most interesting. C: Stop me if " "you've heard this one. D: I have trouble saying no. E: " 'Yes, I see the train but I think we can beat it.', 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=L size=323x231 at 0x7F34F283E9D0>, 'image_description': 'Two priests and a rabbi are walking into a bar, as the ' 'bartender and another patron look on. The bartender ' 'talks on the phone while looking skeptically at the ' 'incoming crew.', 'image_location': 'a bar', 'image_uncanny_description': 'The scene depicts a very stereotypical "bar ' 'joke" that would be unlikely to be encountered ' 'in real life; the skepticism of the bartender ' 'suggests that he is aware he is seeing this ' 'trope, and is explaining it to someone on the ' 'phone.', 'instance_id': '21125bb8787b4e7e82aa3b0a1cba1571', 'label': 'C', 'n_tokens_label': 1, 'questions': ['What is the bartender saying on the phone in response to the ' 'living, breathing, stereotypical bar joke that is unfolding?']}
The label "C" indicates that the 3rd choice in the caption_choices is correct.
Here's an example instance from Ranking (in the from pixels setting --- though, this is also available in the from description setting)
{'caption_choices': ['I guess I misunderstood when you said long bike ride.', 'Does your divorce lawyer have any other cool ideas?'], 'contest_number': 582, 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=L size=600x414 at 0x7F8FF9F96610>, 'instance_id': 'dd1c214a1ca3404aa4e582c9ce50795a', 'label': 'A', 'n_tokens_label': 1, 'winner_source': 'official_winner'}
the label indicates that the first caption choice ("A", here) in the caption_choices list was more highly rated.
Here's an example instance from Explanation:
{'caption_choices': 'The classics can be so intimidating.', 'contest_number': 752, 'entities': ['https://en.wikipedia.org/wiki/Literature', 'https://en.wikipedia.org/wiki/Solicitor'], 'from_description': 'scene: a road description: Two people are walking down a ' 'path. A number of giant books have surrounded them. ' 'uncanny: There are book people in this world. entities: ' 'Literature, Solicitor. caption: The classics can be so ' 'intimidating.', 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=L size=800x706 at 0x7F90003D0BB0>, 'image_description': 'Two people are walking down a path. A number of giant ' 'books have surrounded them.', 'image_location': 'a road', 'image_uncanny_description': 'There are book people in this world.', 'instance_id': 'eef9baf450e2fab19b96facc128adf80', 'label': 'A play on the word intimidating --- usually if the classics (i.e., ' 'classic novels) were to be intimidating, this would mean that they ' 'are intimidating to read due to their length, complexity, etc. But ' 'here, they are surrounded by anthropomorphic books which look ' 'physically intimidating, i.e., they are intimidating because they ' 'may try to beat up these people.', 'n_tokens_label': 59, 'questions': ['What do the books want?']}
The label is an explanation of the joke, which serves as the autoregressive target.
See above
See above
Data splits can be accessed as:
from datasets import load_dataset dset = load_dataset("jmhessel/newyorker_caption_contest", "matching") dset = load_dataset("jmhessel/newyorker_caption_contest", "ranking") dset = load_dataset("jmhessel/newyorker_caption_contest", "explanation")
Or, in the from pixels setting, e.g.,
from datasets import load_dataset dset = load_dataset("jmhessel/newyorker_caption_contest", "ranking_from_pixels")
Because the dataset is small, we reported in 5-fold cross-validation setting initially. The default splits are split 0. You can access the other splits, e.g.:
from datasets import load_dataset # the 4th data split dset = load_dataset("jmhessel/newyorker_caption_contest", "explanation_4")
Full details are in the paper.
See the paper for rationale/motivation.
See citation below. We combined 3 sources of data, and added significant annotations of our own.
Initial Data Collection and NormalizationFull details are in the paper.
Who are the source language producers?We paid crowdworkers $15/hr to annotate the corpus. In addition, significant annotation efforts were conducted by the authors of this work.
Full details are in the paper.
Annotation processFull details are in the paper.
Who are the annotators?A mix of crowdworks and authors of this paper.
Has been redacted from the dataset. Images are published in the New Yorker already.
It's plausible that humor could perpetuate negative stereotypes. The jokes in this corpus are a mix of crowdsourced entries that are highly rated, and ones published in the new yorker.
Humor is subjective, and some of the jokes may be considered offensive. The images may contain adult themes and minor cartoon nudity.
More details are in the paper
The dataset was curated by researchers at AI2
The annotations we provide are CC-BY-4.0. See www.capcon.dev for more info.
@article{hessel2022androids, title={Do Androids Laugh at Electric Sheep? Humor "Understanding" Benchmarks from The New Yorker Caption Contest}, author={Hessel, Jack and Marasovi{\'c}, Ana and Hwang, Jena D and Lee, Lillian and Da, Jeff and Zellers, Rowan and Mankoff, Robert and Choi, Yejin}, journal={arXiv preprint arXiv:2209.06293}, year={2022} }
Our data contributions are:
We release these data we contribute under CC-BY (see DATASET_LICENSE). If you find this data useful in your work, in addition to citing our contributions, please also cite the following, from which the cartoons/captions in our corpus are derived:
@misc{newyorkernextmldataset, author={Jain, Lalit and Jamieson, Kevin and Mankoff, Robert and Nowak, Robert and Sievert, Scott}, title={The {N}ew {Y}orker Cartoon Caption Contest Dataset}, year={2020}, url={https://nextml.github.io/caption-contest-data/} } @inproceedings{radev-etal-2016-humor, title = "Humor in Collective Discourse: Unsupervised Funniness Detection in The {New Yorker} Cartoon Caption Contest", author = "Radev, Dragomir and Stent, Amanda and Tetreault, Joel and Pappu, Aasish and Iliakopoulou, Aikaterini and Chanfreau, Agustin and de Juan, Paloma and Vallmitjana, Jordi and Jaimes, Alejandro and Jha, Rahul and Mankoff, Robert", booktitle = "LREC", year = "2016", } @inproceedings{shahaf2015inside, title={Inside jokes: Identifying humorous cartoon captions}, author={Shahaf, Dafna and Horvitz, Eric and Mankoff, Robert}, booktitle={KDD}, year={2015}, }