skip to main content
10.1145/2960811.2967148acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
short-paper

Using Convolutional Neural Networks for Content Extraction from Online Flyers

Published: 13 September 2016 Publication History

Abstract

The rise of online shopping has hurt physical retailers, which struggle to persuade customers to buy products in physical stores rather than online. Marketing flyers are a great mean to increase the visibility of physical retailers, but the unstructured offers appearing in those documents cannot be easily compared with similar online deals, making it hard for a customer to understand whether it is more convenient to order a product online or to buy it from the physical shop. In this work we tackle this problem, introducing a content extraction algorithm that automatically extracts structured data from flyers. Unlike competing approaches that mainly focus on textual content or simply analyze font type, color and text positioning, we propose a new approach that uses Convolutional Neural Networks to classify words extracted from flyers typically used in marketing materials to attract the attention of readers towards specific deals. We obtained good results and a high language and genre independence.

References

[1]
E. Apostolova and N. Tomuro. Combining visual and textual features for information extraction from online flyers. In EMNLP, pages 1924--1929, 2014.
[2]
K. Chellapilla, S. Puri, and P. Simard. High performance convolutional neural networks for document processing. In 10th Int. Workshop on Frontiers in Handwriting Recognition. Suvisoft, 2006.
[3]
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The pascal visual object classes challenge. Computer Vision, 88(2):303--338, 2010.
[4]
I. Gallo, A. Zamberletti, and L. Noce. Content extraction from marketing flyers. In Computer Analysis of Images and Patterns, pages 325--336. Springer, 2015.
[5]
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia, pages 675--678. ACM, 2014.
[6]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097--1105, 2012.
[7]
P. Y. Simard, D. Steinkraus, and J. C. Platt. Best practices for convolutional neural networks applied to visual document analysis. In ICDAR, volume 3, pages 958--962, 2003.

Index Terms

  1. Using Convolutional Neural Networks for Content Extraction from Online Flyers

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      DocEng '16: Proceedings of the 2016 ACM Symposium on Document Engineering
      September 2016
      222 pages
      ISBN:9781450344388
      DOI:10.1145/2960811
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      In-Cooperation

      • SIGDOC: ACM Special Interest Group on Systems Documentation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 September 2016

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. content extraction
      2. convolutional neural network
      3. marketing flyers
      4. portable document format

      Qualifiers

      • Short-paper

      Conference

      DocEng '16
      Sponsor:
      DocEng '16: ACM Symposium on Document Engineering 2016
      September 13 - 16, 2016
      Vienna, Austria

      Acceptance Rates

      DocEng '16 Paper Acceptance Rate 11 of 35 submissions, 31%;
      Overall Acceptance Rate 194 of 564 submissions, 34%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 120
        Total Downloads
      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 07 Nov 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media