article

Text mining for product attribute extraction

Authors:

Katharina Probst,

Andrew FanoAuthors Info & Claims

ACM SIGKDD Explorations Newsletter, Volume 8, Issue 1

Pages 41 - 48

https://doi.org/10.1145/1147234.1147241

Published: 01 June 2006 Publication History

Abstract

We describe our work on extracting attribute and value pairs from textual product descriptions. The goal is to augment databases of products by representing each product as a set of attribute-value pairs. Such a representation is beneficial for tasks where treating the product as a set of attribute-value pairs is more useful than as an atomic entity. Examples of such applications include demand forecasting, assortment optimization, product recommendations, and assortment comparison across retailers and manufacturers. We deal with both implicit and explicit attributes and formulate both kinds of extractions as classification problems. Using single-view and multi-view semi-supervised learning algorithms, we are able to exploit large amounts of unlabeled data present in this domain while reducing the need for initial labeled data that is expensive to obtain. We present promising results on apparel and sporting goods products and show that our system can accurately extract attribute-value pairs from product descriptions. We describe a variety of application that are built on top of the results obtained by the attribute extraction system.

References

[1]

M. H. Bing Liu and J. Cheng. Opinion observer: Analyzing and comparing opinions on the web. In Proceedings of WWW 2005, 2005.

Digital Library

[2]

A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In COLT-98, 1998.

Digital Library

[3]

E. Brill. Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics, 1995.

Digital Library

[4]

M. Collins and Y. Singer. Unsupervised Models for Named Entity Classification. In EMNLP/VLC, 1999.

[5]

A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1--38, 1977.

[6]

R. Ghani and A. E. Fano. Building recommender systems using a knowledge base of product semantics. In Proceedings of the Workshop on Recommendation and Personalization in ECommerce at the 2nd International Conference on Adaptive Hypermedia and Adaptive Web based Systems, 2002.

[7]

R. Ghani and R. Jones. A comparison of efficacy of bootstrapping algorithms for information extraction. In LREC 2002 Workshop on Linguistic Knowledge Acquisition, 2002.

[8]

T. Jaakkola and D. Haussler. Exploiting generative models in discriminative classifiers. In Advances in NIPS 11, 1999.

Digital Library

[9]

T. Joachims. Transductive inference for text classification using support vector machines. In Machine Learning: Proceedings of the Sixteenth International Conference, 1999.

Digital Library

[10]

R. Jones. Learning to Extract Entities from Labeled and Unlabeled Text. Ph.D. Dissertation, 2005.

[11]

A. M. Kristie Seymore and R. Rosenfeld. Learning hidden markov model structure for information extraction. In AAAI 99 Workshop on Machine Learning for Information Extraction, 1999.

[12]

D. Lin. Dependency-based evaluation of MINIPAR. In Workshop on the Evaluation of Parsing Systems, 1998.

[13]

K. Nigam and R. Ghani. Analyzing the effectiveness and applicability of co-training. In Proceedings of the Ninth International Conference on Information and Knowledge Management (CIKM-2000), 2000.

Digital Library

[14]

K. Nigam, A. McCallum, S. Thrun, and T. Mitchell. Text classification from labeled and unlabeled documents using EM. Machine Learning, 39(2/3):103--134, 2000.

Digital Library

[15]

F. Peng and A. McCallum. Accurate information extraction from research papers using conditional random fields. In HLT 2004, 2004.

[16]

A.-M. Popescu and O. Etzioni. Extracting product features and opinions from reviews. In Proceedings of EMNLP 2005, 2005.

Digital Library

[17]

M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.

[18]

J. Schafer, J. Konstan, and J. Riedl. Electronic commerce recommender applications. Journal of Data Mining and Knowledge Discovery, 5:115--152, 2000.

Digital Library

Cited By

Fang CLi XFan ZXu JNag KKorpeoglu EKumar SAchan KHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)LLM-Ensemble: Optimal Large Language Model Ensemble Method for E-commerce Product Attribute Value ExtractionProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3661357(2910-2914)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3661357
Gong JEldardiry HChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Multi-Label Zero-Shot Product Attribute-Value ExtractionProceedings of the ACM Web Conference 202410.1145/3589334.3645649(2259-2270)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645649
Roy KGoyal PPandey M(2024)Exploring generative frameworks for product attribute value extractionExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122850243:COnline publication date: 25-Jun-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.122850
Show More Cited By

Index Terms

Text mining for product attribute extraction
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees
2. Information systems
  1. Data management systems
    1. Database management system engines
  2. Information systems applications
    1. Data mining

Recommendations

PAM: Understanding Product Images in Cross Product Category Attribute Extraction
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Understanding product attributes plays an important role in improving online shopping experience for customers and serves asan integral part for constructing a product knowledge graph. Most existing methods focus on attribute extraction from text ...
Extracting attribute-value pairs from product specifications on the web
WI '17: Proceedings of the International Conference on Web Intelligence

Comparison shopping portals integrate product offers from large numbers of e-shops in order to support consumers in their buying decisions. Product offers often consist of a title and a free-text product description, both describing product attributes ...
Multi-Label Zero-Shot Product Attribute-Value Extraction
WWW '24: Proceedings of the ACM Web Conference 2024

E-commerce platforms should provide detailed product descriptions (attribute values) for effective product search and recommendation. However, attribute value information is typically not available for new products. To predict unseen attribute values, ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGKDD Explorations Newsletter

ACM SIGKDD Explorations Newsletter Volume 8, Issue 1

June 2006

104 pages

ISSN:1931-0145

EISSN:1931-0153

DOI:10.1145/1147234

Issue’s Table of Contents

Copyright © 2006 Authors.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2006

Published in SIGKDD Volume 8, Issue 1

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

134
Total Citations
View Citations
1,981
Total Downloads

Downloads (Last 12 months)86
Downloads (Last 6 weeks)14

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Fang CLi XFan ZXu JNag KKorpeoglu EKumar SAchan KHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)LLM-Ensemble: Optimal Large Language Model Ensemble Method for E-commerce Product Attribute Value ExtractionProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3661357(2910-2914)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3661357
Gong JEldardiry HChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Multi-Label Zero-Shot Product Attribute-Value ExtractionProceedings of the ACM Web Conference 202410.1145/3589334.3645649(2259-2270)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645649
Roy KGoyal PPandey M(2024)Exploring generative frameworks for product attribute value extractionExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122850243:COnline publication date: 25-Jun-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.122850
Goldstein AHajaj C(2024)Measuring flight-destination similarityExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121802238:PAOnline publication date: 15-Mar-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.121802
Brinkmann ABaumann NBizer C(2024)Using LLMs for the Extraction and Normalization of Product Attribute ValuesAdvances in Databases and Information Systems10.1007/978-3-031-70626-4_15(217-230)Online publication date: 28-Aug-2024
https://dl.acm.org/doi/10.1007/978-3-031-70626-4_15
Liu HWu JWang Y(2023)Uncovering Insights for New Car Recommendations with Sequence Pattern Mining on Mobile ApplicationsApplied Sciences10.3390/app1311638613:11(6386)Online publication date: 23-May-2023
https://doi.org/10.3390/app13116386
Gong JChen WEldardiry HFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Knowledge-Enhanced Multi-Label Few-Shot Product Attribute-Value ExtractionProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615142(3902-3907)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3615142
Ghosh PWang NYenigalla P(2023)D-Extract: Extracting Dimensional Attributes From Product Images2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV56688.2023.00363(3630-3638)Online publication date: Jan-2023
https://doi.org/10.1109/WACV56688.2023.00363
Konstantinidis TXu YConstantinides TMandic D(2023)A comparative study on ML-based approaches for Main Entity Detection in Financial Reports2023 24th International Conference on Digital Signal Processing (DSP)10.1109/DSP58604.2023.10167951(1-5)Online publication date: 11-Jun-2023
https://doi.org/10.1109/DSP58604.2023.10167951
Rudniy ARudna OPark A(2023)Trend tracking tools for the fashion industry: the impact of social mediaJournal of Fashion Marketing and Management: An International Journal10.1108/JFMM-08-2023-021528:3(503-524)Online publication date: 21-Oct-2023
https://doi.org/10.1108/JFMM-08-2023-0215
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents