Conference Paper 2018

Punny Captions: Witty Wordplay in Image Descriptions

paper
Thumb ticker sm avatar
Perzeptive Systeme
  • Guest Scientist
Punny captions teaser 2

Wit is a form of rich interaction that is often grounded in a specific situation (e.g., a comment in response to an event). In this work, we attempt to build computational models that can produce witty descriptions for a given image. Inspired by a cognitive account of humor appreciation, we employ linguistic wordplay, specifically puns, in image descriptions. We develop two approaches which involve retrieving witty descriptions for a given image from a large corpus of sentences, or generating them via an encoder-decoder neural network architecture. We compare our approach against meaningful baseline approaches via human studies and show substantial improvements. We find that when a human is subject to similar constraints as the model regarding word usage and style, people vote the image descriptions generated by our model to be slightly wittier than human-written witty descriptions. Unsurprisingly, humans are almost always wittier than the model when they are free to choose the vocabulary, style, etc.

Author(s): Arjun Chandrasekaran and Devi Parikh and Mohit Bansal
Links:
Book Title: Proceedings Conf. on Empirical Methods in Natural Language Processing (EMNLP),
Year: 2018
Month: June
Bibtex Type: Conference Paper (inproceedings)
Event Name: EMNLP
Electronic Archiving: grant_archive

BibTex

@inproceedings{Chandrasekaran:EMNLP:2018,
  title = {Punny Captions: Witty Wordplay in Image Descriptions},
  booktitle = {Proceedings Conf.~on Empirical Methods in Natural Language Processing (EMNLP),},
  abstract = {Wit is a form of rich interaction that is often grounded in a specific situation (e.g., a comment in response to an event). In this work, we attempt to build computational models that can produce witty descriptions for a given image. Inspired by a cognitive account of humor appreciation, we employ linguistic wordplay, specifically puns, in image descriptions. We develop two approaches which involve retrieving witty descriptions for a given image from a large corpus of sentences, or generating them via an encoder-decoder neural network architecture. We compare our approach against meaningful baseline approaches via human studies and show substantial improvements. We find that when a human is subject to similar constraints as the model regarding word usage and style, people vote the image descriptions generated by our model to be slightly wittier than human-written witty descriptions. Unsurprisingly, humans are almost always wittier than the model when they are free to choose the vocabulary, style, etc.},
  month = jun,
  year = {2018},
  slug = {punnycap-emnlp-2018},
  author = {Chandrasekaran, Arjun and Parikh, Devi and Bansal, Mohit},
  month_numeric = {6}
}