skip to main content
10.1145/3366424.3386196acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Interpretable Methods for Identifying Product Variants

Published: 20 April 2020 Publication History

Abstract

For e-commerce companies with large product selections, the organization and grouping of products in meaningful ways is important for creating great customer shopping experiences and cultivating an authoritative brand image. One important way of grouping products is to identify a family of product variants, where the variants are mostly the same with slight and yet distinct differences (e.g. color or pack size). In this paper, we introduce a novel approach to identifying product variants. It combines both constrained clustering and tailored NLP techniques (e.g. extraction of product family name from unstructured product title and identification of products with similar model numbers) to achieve superior performance compared with an existing baseline using a vanilla classification approach. In addition, we design the algorithm to meet certain business criteria, including meeting high accuracy requirements on a wide range of categories (e.g. appliances, decor, tools, and building materials, etc.) as well as prioritizing the interpretability of the model to make it accessible and understandable to all business partners.

References

[1]
Saleh Rehiel Alenazi and Kamsuriah Ahmad. 2016. Record duplication detection in database: A review. International Journal on Advanced Science, Engineering and Information Technology 6, 6 (2016), 838–845.
[2]
Marnix de Bakker, Flavius Frasincar, and Damir Vandic. 2013. A hybrid model words-driven approach for web product duplicate detection. In International Conference on Advanced Information Systems Engineering. Springer, 149–161.
[3]
Sariel Har-Peled, Piotr Indyk, and Rajeev Motwani. 2012. Approximate nearest neighbor: Towards removing the curse of dimensionality. Theory of computing 8, 1 (2012), 321–350.
[4]
Aron Hartveld, Max van Keulen, Diederik Mathol, Thomas van Noort, Thomas Plaatsman, Flavius Frasincar, and Kim Schouten. 2018. An LSH-Based Model-Words-Driven Product Duplicate Detection Method. In International Conference on Advanced Information Systems Engineering. Springer, 409–423.
[5]
Roya Hassanian-esfahani and Mohammad-javad Kargar. 2019. A pruning strategy to improve pairwise comparison-based near-duplicate detection. Knowledge and Information Systems(2019), 1–33.
[6]
Vladimir I Levenshtein. [n.d.]. Binary codes capable of correcting deletions, insertions, and reversals.
[7]
Ryan Maunu. 2018. The Duplicate Review Tool: Incorporating Visual Search into Merchandising Operations. Retrieved December 10, 2019 from https://tech.wayfair.com/data-science/2018/05/the-duplicate-review-tool-incorporating-visual-search-into-merchandising-operations/
[8]
Oliver Strauß, Ahmad Almheidat, and Holger Kett. 2019. Applying Heuristic and Machine Learning Strategies to ProductResolution. In Proceedings of the 15th International Conference on Web Information Systems and Technologies (WEBIST 2019). 242–249.
[9]
Anthony KH Tung, Jiawei Han, Laks VS Lakshmanan, and Raymond T Ng. 2001. Constraint-based clustering in large databases. In International Conference on Database Theory. Springer, 405–419.
[10]
Ronald van Bezu, Sjoerd Borst, Rick Rijkse, Jim Verhagen, Damir Vandic, and Flavius Frasincar. 2015. Multi-component similarity method for web product duplicate detection. In Proceedings of the 30th annual ACM symposium on applied computing. ACM, 761–768.
[11]
Iris Van Dam, Gerhard van Ginkel, Wim Kuipers, Nikki Nijenhuis, Damir Vandic, and Flavius Frasincar. 2016. Duplicate detection in web shops using LSH to reduce the number of computations. In Proceedings of the 31st Annual ACM Symposium on Applied Computing. ACM, 772–779.
[12]
Damir Vandic, Jan-Willem Van Dam, and Flavius Frasincar. 2012. Faceted product search powered by the Semantic Web. Decision Support Systems 53, 3 (2012), 425–437.

Index Terms

  1. Interpretable Methods for Identifying Product Variants
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          WWW '20: Companion Proceedings of the Web Conference 2020
          April 2020
          854 pages
          ISBN:9781450370240
          DOI:10.1145/3366424
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 20 April 2020

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. constrained clustering
          2. natural language processing
          3. product variants

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Conference

          WWW '20
          Sponsor:
          WWW '20: The Web Conference 2020
          April 20 - 24, 2020
          Taipei, Taiwan

          Acceptance Rates

          Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 120
            Total Downloads
          • Downloads (Last 12 months)8
          • Downloads (Last 6 weeks)1
          Reflects downloads up to 01 Jan 2025

          Other Metrics

          Citations

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media