skip to main content
10.1145/3336191.3371866acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Illustrate Your Story: Enriching Text with Images

Published: 22 January 2020 Publication History

Abstract

Human perception is known to be predominantly visual. As modern web infrastructure promoted the storage of media, the web-data paradigm shifted from text-only documents to those containing text and images. A multitude of blog posts, news articles, and social media posts exist on the Internet today as examples of multimodal stories. The manual alignment of images and text in a story is time-consuming and labor intensive. We present a web application for automatically selecting relevant images from an album and placing them in suitable contexts within a body of text. The application solves a global optimization problem that maximizes the coherence of text paragraphs and image descriptors, and allows for exploring the underlying image descriptors and similarity metrics. Experiments show that our method can align images with texts with high semantic fit, and to user satisfaction.

References

[1]
Malihe Alikhani, Sreyasi Nag Chowdhury, Gerard de Melo, and Matthew Stone. 2019. CITE: A Corpus Of Text--Image Discourse Relations. Proc. of NAACL-HLT.
[2]
Ann Marie Barry. 1997. Visual intelligence: Perception, image, and manipulation in visual communication.SUNY Press.
[3]
R. Bernardi, R. cC akici, D. Elliott, A. Erdem, E. Erdem, N. Ikizler-Cinbis, F. Keller, A. Muscat, and B. Plank. 2016. Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures. J. Artif. Intell. Res. (2016).
[4]
Sreyasi Nag Chowdhury, Simon Razniewski, and Gerhard Weikum. 2019. Story-oriented Image Selection and Placement. CoRR (2019).
[5]
Fartash Faghri, David J. Fleet, Jamie Kiros, and Sanja Fidler. 2018. VSE
[6]
: Improving Visual-Semantic Embeddings with Hard Negatives. BMVC.
[7]
Andrea Frome, Gregory S. Corrado, Jonathon Shlens, Samy Bengio, Jeffrey Dean, Marc'Aurelio Ranzato, and Tomas Mikolov. 2013. DeViSE: A Deep Visual-Semantic Embedding Model. NIPS.
[8]
Shivali Goel, Rishi Madhok, and Shweta Garg. 2018. Proposing Contextually Relevant Quotes for Images. ECIR.
[9]
Dhiraj Joshi, James Ze Wang, and Jia Li. 2006. The Story Picturing Engine - a system for automatic text illustration. TOMCCAP, Vol. 2, 1 (2006), 68--89.
[10]
Cewu Lu, Ranjay Krishna, Michael S. Bernstein, and Fei-Fei Li. 2016. Visual Relationship Detection with Language Priors. ECCV.
[11]
Paul Messaris and Linus Abraham. 2001. The role of images in framing news stories. Framing public life. Routledge, 231--242.
[12]
Hareesh Ravi, Lezi Wang, Carlos Mu n iz, Leonid Sigal, Dimitris N. Metaxas, and Mubbasir Kapadia. 2018. Show Me a Story: Towards Coherent Neural Story Illustration. CVPR.
[13]
Joseph Redmon and Ali Farhadi. 2017. YOLO9000: Better, Faster, Stronger. CVPR.
[14]
Bolei Zhou, À gata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. 2014. Learning Deep Features for Scene Recognition using Places Database. NIPS.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '20: Proceedings of the 13th International Conference on Web Search and Data Mining
January 2020
950 pages
ISBN:9781450368223
DOI:10.1145/3336191
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 January 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. combinatorial optimization
  2. multi-modal content generation
  3. text-image semantic alignment

Qualifiers

  • Research-article

Conference

WSDM '20

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media