skip to main content
article

Text mining for product attribute extraction

Published: 01 June 2006 Publication History

Abstract

We describe our work on extracting attribute and value pairs from textual product descriptions. The goal is to augment databases of products by representing each product as a set of attribute-value pairs. Such a representation is beneficial for tasks where treating the product as a set of attribute-value pairs is more useful than as an atomic entity. Examples of such applications include demand forecasting, assortment optimization, product recommendations, and assortment comparison across retailers and manufacturers. We deal with both implicit and explicit attributes and formulate both kinds of extractions as classification problems. Using single-view and multi-view semi-supervised learning algorithms, we are able to exploit large amounts of unlabeled data present in this domain while reducing the need for initial labeled data that is expensive to obtain. We present promising results on apparel and sporting goods products and show that our system can accurately extract attribute-value pairs from product descriptions. We describe a variety of application that are built on top of the results obtained by the attribute extraction system.

References

[1]
M. H. Bing Liu and J. Cheng. Opinion observer: Analyzing and comparing opinions on the web. In Proceedings of WWW 2005, 2005.
[2]
A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In COLT-98, 1998.
[3]
E. Brill. Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics, 1995.
[4]
M. Collins and Y. Singer. Unsupervised Models for Named Entity Classification. In EMNLP/VLC, 1999.
[5]
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1--38, 1977.
[6]
R. Ghani and A. E. Fano. Building recommender systems using a knowledge base of product semantics. In Proceedings of the Workshop on Recommendation and Personalization in ECommerce at the 2nd International Conference on Adaptive Hypermedia and Adaptive Web based Systems, 2002.
[7]
R. Ghani and R. Jones. A comparison of efficacy of bootstrapping algorithms for information extraction. In LREC 2002 Workshop on Linguistic Knowledge Acquisition, 2002.
[8]
T. Jaakkola and D. Haussler. Exploiting generative models in discriminative classifiers. In Advances in NIPS 11, 1999.
[9]
T. Joachims. Transductive inference for text classification using support vector machines. In Machine Learning: Proceedings of the Sixteenth International Conference, 1999.
[10]
R. Jones. Learning to Extract Entities from Labeled and Unlabeled Text. Ph.D. Dissertation, 2005.
[11]
A. M. Kristie Seymore and R. Rosenfeld. Learning hidden markov model structure for information extraction. In AAAI 99 Workshop on Machine Learning for Information Extraction, 1999.
[12]
D. Lin. Dependency-based evaluation of MINIPAR. In Workshop on the Evaluation of Parsing Systems, 1998.
[13]
K. Nigam and R. Ghani. Analyzing the effectiveness and applicability of co-training. In Proceedings of the Ninth International Conference on Information and Knowledge Management (CIKM-2000), 2000.
[14]
K. Nigam, A. McCallum, S. Thrun, and T. Mitchell. Text classification from labeled and unlabeled documents using EM. Machine Learning, 39(2/3):103--134, 2000.
[15]
F. Peng and A. McCallum. Accurate information extraction from research papers using conditional random fields. In HLT 2004, 2004.
[16]
A.-M. Popescu and O. Etzioni. Extracting product features and opinions from reviews. In Proceedings of EMNLP 2005, 2005.
[17]
M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.
[18]
J. Schafer, J. Konstan, and J. Riedl. Electronic commerce recommender applications. Journal of Data Mining and Knowledge Discovery, 5:115--152, 2000.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGKDD Explorations Newsletter
ACM SIGKDD Explorations Newsletter  Volume 8, Issue 1
June 2006
104 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/1147234
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2006
Published in SIGKDD Volume 8, Issue 1

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)86
  • Downloads (Last 6 weeks)14
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media