skip to main content
research-article

Unsupervised Extraction of Popular Product Attributes from E-Commerce Web Sites by Considering Customer Reviews

Published: 15 April 2016 Publication History

Abstract

We develop an unsupervised learning framework for extracting popular product attributes from product description pages originated from different E-commerce Web sites. Unlike existing information extraction methods that do not consider the popularity of product attributes, our proposed framework is able to not only detect popular product features from a collection of customer reviews but also map these popular features to the related product attributes. One novelty of our framework is that it can bridge the vocabulary gap between the text in product description pages and the text in customer reviews. Technically, we develop a discriminative graphical model based on hidden Conditional Random Fields. As an unsupervised model, our framework can be easily applied to a variety of new domains and Web sites without the need of labeling training samples. Extensive experiments have been conducted to demonstrate the effectiveness and robustness of our framework.

References

[1]
Enrique Alfonseca, Marius Pasca, and Enrique Robledo-Arnuncio. 2010. Acquisition of instance attributes via labeled and related instances. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 58--65.
[2]
Lidong Bing, Wai Lam, and Yuan Gu. 2011. Towards a unified solution: Data record region detection and segmentation. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM’11). ACM, New York, NY, 1265--1274.
[3]
Lidong Bing, Wai Lam, and Tak-Lam Wong. 2013. Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining (WSDM’13). ACM, New York, NY, USA, 567--576.
[4]
Lidong Bing, Tak-Lam Wong, and Wai Lam. 2012. Unsupervised extraction of popular product attributes from web sites. In Proceedings of the 8th Asia Information Retrieval Societies Conference. 437--446.
[5]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3 (2003), 993--1022.
[6]
Kenneth Bloom, Navendu Garg, and Shlomo Argamon. 2007. Extracting appraisal expressions. In Proceedings of Human Language Technologies/North American Association of Computational Linguists. Association for Computational Linguistics, Rochester, New York, 308--315.
[7]
D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma. 2004. Block-based web search. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 456--463.
[8]
Xiaowen Ding, Bing Liu, and Lei Zhang. 2009. Entity discovery and assignment for opinion mining applications. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 1125--1134.
[9]
J. L. Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychol. Bull. 76, 5 (1971), 378--382.
[10]
Rayid Ghani, Katharina Probst, Yan Liu, Marko Krema, and Andrew Fano. 2006. Text mining for product attribute extraction. SIGKDD Explor. Newslett. 8, 1 (2006), 41--48.
[11]
H. Guo, H. Zhu, Z. Guo, Z. Zhang, and Z. Su. 2009. Product feature categorization with multilevel latent semantic association. In Proceedings of the 18th ACM International Conference on Information and Knowledge Management. ACM, New York, NY, 1087--1096.
[12]
Minqing Hu and Bing Liu. 2004a. Mining and summarizing customer reviews. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 168--177.
[13]
Minqing Hu and Bing Liu. 2004b. Mining opinion features in customer reviews. In Proceedings of the 19th National Conference on Artifical Intelligence (AAAI’04). 755--760.
[14]
Nozomi Kobayashi, Kentaro Inui, Yuji Matsumoto, Kenji Tateishi, and Toshikazu Fukushima. 2004. Collecting evaluative expressions for opinion extraction. In Proceedings of the International Joint Conference on Natural Language Processing. 584--589.
[15]
John Lafferty, Andrew McCallum, and Fernando Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of 18th International Conference on Machine Learning. 282--289.
[16]
Xiao Li, Ye-Yi Wang, and Alex Acero. 2009. Extracting structured information from user queries with semi-supervised conditional random fields. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 572--579.
[17]
Bing Liu, Robert Grossman, and Yanhong Zhai. 2003. Mining data records in web pages. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’03). ACM, New York, NY, USA, 601--606.
[18]
Bing Liu, Minqing Hu, and Junsheng Cheng. 2005. Opinion observer: Analyzing and comparing opinions on the web. In Proceedings of the 14th International Conference on World Wide Web. ACM, New York, NY, USA, 342--351.
[19]
Ping Luo, Fen Lin, Yuhong Xiong, Yong Zhao, and Zhongzhi Shi. 2009. Towards combining web classification and web information extraction: A case study. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 1235--1244.
[20]
Ana-Maria Popescu and Oren Etzioni. 2005. Extracting product features and opinions from reviews. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Stroudsburg, PA, USA, 339--346.
[21]
K. Probst, M. Krema R. Ghai, A. Fano, and Y. Liu. 2007. Semi-supervised learning of attribute-value pairs from product descriptions. In Proceedings of the 20th International Joint Conference on Artificial Intelligence. 2838--2843.
[22]
Changqin Quan and Fuji Ren. 2014. Unsupervised product feature extraction for feature-oriented opinion determination. Inf. Sci. 272 (2014), 16--28.
[23]
A. Quattoni, S. Wang, L.-P. Morency, M. Collins, and T. Darrell. 2007. Hidden conditional random fields. IEEE Trans. Pattern Anal. Mach. Intell. 29(10) (2007), 1848--1853.
[24]
Xinying Song, Jing Liu, Yunbo Cao, Chin-Yew Lin, and Hsiao-Wuen Hon. 2010. Automatic extraction of web data records containing user-generated content. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management. ACM, New York, NY, 39--48.
[25]
Y.-H. Sung and D. Jurafsky. 2009. Hidden conditional random fields for phone recognition. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, USA, 107--112.
[26]
Huifeng Tang, Songbo Tan, and Xueqi Cheng. 2009. A survey on sentiment detection of reviews. Expert Syst. Appl. 36 (September 2009), 10760--10773. Issue 7.
[27]
Ivan Titov and Ryan McDonald. 2008. Modeling online reviews with multi-grain topic models. In Proceedings of the 17th International Conference on World Wide Web. ACM, New York, NY, USA, 111--120.
[28]
Peter D. Turney. 2002. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA, 417--424.
[29]
Hongning Wang, Yue Lu, and Chengxiang Zhai. 2010. Latent aspect rating analysis on review text data: A rating regression approach. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 783--792.
[30]
Tak-Lam Wong, Lidong Bing, and Wai Lam. 2011. Normalizing web product attributes and discovering domain ontology with minimal effort. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining (WSDM’11). ACM, New York, NY, 805--814.
[31]
Tak-Lam Wong and W. Lam. 2007. Adapting web information extraction knowledge via mining site invariant and site depdent features. ACM Trans. Internet Technol. 7(1) (2007), Article 6.
[32]
Tak-Lam Wong, W. Lam, and T. S. Wong. 2008. An unsupervised framework for extracting and normalizing product attributes from multiple web sites. In Proceedings of the 31st International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 35--42.
[33]
Liheng Xu, Kang Liu, Siwei Lai, and Jun Zhao. 2014. Product feature mining: Semantic clues versus syntactic constituents. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014, June 22--27, 2014, Baltimore, MD, USA, Volume 1: Long Papers. 336--346.
[34]
Chunyu Yang, Yong Cao, Zaiqing Nie, Jie Zhou, and Ji-Rong Wen. 2010. Closing the loop in webpage understanding. IEEE Trans. Knowledge Data Eng. 22 (May 2010), 639--650. Issue 5.
[35]
Yanhong Zhai and Bing Liu. 2006. Structured data extraction from the web based on partial tree alignment. IEEE Trans. Knowledge Data Eng. 18(12) (2006), 1614--1628.
[36]
Lei Zhang, Bing Liu, Suk Hwan Lim, and Eamonn O’Brien-Strain. 2010. Extracting and ranking product features in opinion documents. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters. 1462--1470.
[37]
Xin Wayne Zhao, Yanwei Guo, Yulan He, Han Jiang, Yuexin Wu, and Xiaoming Li. 2014. We know what you want to buy: A demographic-based system for product recommendation on microblogs. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, New York, NY, 1935--1944.
[38]
Shuyi Zheng, Ruihua Song, Ji-Rong Wen, and C. Lee Giles. 2009. Efficient record-level wrapper induction. In Proceeding of the 18th ACM International Conference on Information and Knowledge Management. ACM, New York, NY, 47--56.
[39]
J. Zhu, Z. Nie, B. Zhang, and J.-R. Wen. 2008. Dynamic hierarchical Markov random fields for integrated web data extraction. J. Mach. Learn. Res. (2008), 1583--1614.

Cited By

View all

Index Terms

  1. Unsupervised Extraction of Popular Product Attributes from E-Commerce Web Sites by Considering Customer Reviews

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Internet Technology
      ACM Transactions on Internet Technology  Volume 16, Issue 2
      April 2016
      150 pages
      ISSN:1533-5399
      EISSN:1557-6051
      DOI:10.1145/2909066
      • Editor:
      • Munindar P. Singh
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 April 2016
      Accepted: 01 December 2015
      Revised: 01 November 2015
      Received: 01 November 2013
      Published in TOIT Volume 16, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Information extraction
      2. conditional random fields
      3. customer reviews
      4. product attribute

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      • Research Grant Council of the Hong Kong Special Administrative Region, China
      • The Hong Kong Institute of Education
      • Direct Grant of the Faculty of Engineering, CUHK

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)42
      • Downloads (Last 6 weeks)5
      Reflects downloads up to 21 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media