short-paper

Using Convolutional Neural Networks for Content Extraction from Online Flyers

Authors:

Alessandro Calefati,

Ignazio Gallo,

Alessandro Zamberletti,

Lucia NoceAuthors Info & Claims

DocEng '16: Proceedings of the 2016 ACM Symposium on Document Engineering

Pages 127 - 130

https://doi.org/10.1145/2960811.2967148

Published: 13 September 2016 Publication History

Get Access

Abstract

The rise of online shopping has hurt physical retailers, which struggle to persuade customers to buy products in physical stores rather than online. Marketing flyers are a great mean to increase the visibility of physical retailers, but the unstructured offers appearing in those documents cannot be easily compared with similar online deals, making it hard for a customer to understand whether it is more convenient to order a product online or to buy it from the physical shop. In this work we tackle this problem, introducing a content extraction algorithm that automatically extracts structured data from flyers. Unlike competing approaches that mainly focus on textual content or simply analyze font type, color and text positioning, we propose a new approach that uses Convolutional Neural Networks to classify words extracted from flyers typically used in marketing materials to attract the attention of readers towards specific deals. We obtained good results and a high language and genre independence.

References

[1]

E. Apostolova and N. Tomuro. Combining visual and textual features for information extraction from online flyers. In EMNLP, pages 1924--1929, 2014.

Crossref

Google Scholar

[2]

K. Chellapilla, S. Puri, and P. Simard. High performance convolutional neural networks for document processing. In 10th Int. Workshop on Frontiers in Handwriting Recognition. Suvisoft, 2006.

Google Scholar

[3]

M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The pascal visual object classes challenge. Computer Vision, 88(2):303--338, 2010.

Digital Library

Google Scholar

[4]

I. Gallo, A. Zamberletti, and L. Noce. Content extraction from marketing flyers. In Computer Analysis of Images and Patterns, pages 325--336. Springer, 2015.

Crossref

Google Scholar

[5]

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia, pages 675--678. ACM, 2014.

Digital Library

Google Scholar

[6]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097--1105, 2012.

Digital Library

Google Scholar

[7]

P. Y. Simard, D. Steinkraus, and J. C. Platt. Best practices for convolutional neural networks applied to visual document analysis. In ICDAR, volume 3, pages 958--962, 2003.

Digital Library

Google Scholar

Index Terms

Using Convolutional Neural Networks for Content Extraction from Online Flyers
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
    2. Machine learning approaches

Recommendations

Online Content Pricing: Purchase and Rental Markets

Digitization of content is changing how consumers and firms use purchase and rental markets. Low transaction costs make accessing content easier for consumers. Digital technology enables firms to create nondurable "rental" versions of their content and ...
Compensating Online Content Producers: A Theoretical Analysis
The digital content industry is rapidly growing, and many platforms host a vast amount of content that is produced by independent producers. A major source of revenue for these platforms is advertising. However, advertising revenue depends on the number ...
Content vs. Advertising: The Impact of Competition on Media Firm Strategy

Media firms compete in two connected markets. They face rivalry for the sale of content to consumers, and at the same time, they compete for advertisers seeking access to the attention of these consumers. We explore the implications of such two-sided ...

Comments

Information & Contributors

Information

Published In

DocEng '16: Proceedings of the 2016 ACM Symposium on Document Engineering

September 2016

222 pages

ISBN:9781450344388

DOI:10.1145/2960811

General Chair:
Robert Sablatnig
TU Wien, Austria
,
Program Chair:
Tamir Hassan
HP Labs, Austria

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGDOC: ACM Special Interest Group on Systems Documentation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 September 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

DocEng '16

Sponsor:

SIGWEB

DocEng '16: ACM Symposium on Document Engineering 2016

September 13 - 16, 2016

Vienna, Austria

Acceptance Rates

DocEng '16 Paper Acceptance Rate 11 of 35 submissions, 31%;

Overall Acceptance Rate 194 of 564 submissions, 34%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
120
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)1

Reflects downloads up to 07 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Index Terms

Recommendations

Online Content Pricing: Purchase and Rental Markets

Compensating Online Content Producers: A Theoretical Analysis

Content vs. Advertising: The Impact of Competition on Media Firm Strategy