skip to main content
10.1145/3475959.3485395acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
poster

iFetch: Multimodal Conversational Agents for the Online Fashion Marketplace

Published: 17 November 2021 Publication History

Abstract

Most of the interaction between large organizations and their users will be mediated by AI agents in the near future. This perception is becoming undisputed as online shopping dominates entire market segments, and the new "digitally-native" generations become consumers. iFetch is a new generation of task-oriented conversational agents that interact with users seamlessly using verbal and visual information. Through the conversation, iFetch provides targeted advice and a "physical store-like" experience while maintaining user engagement. This context entails the following vital components: 1) highly complex memory models that keep track of the conversation, 2) extraction of key semantic features from language and images that reveal user intent, 3) generation of multimodal responses that will keep users engaged in the conversation and 4) an interrelated knowledge base of products from which to extract relevant product lists.

References

[1]
Guan-Lin Chao and Ian Lane. 2019. Bert-dst: Scalable end-to-end dialogue state tracking with bidirectional encoder representations from transformer. arXiv preprint arXiv:1907.03040 (2019).
[2]
Beatriz Quintino Ferreira, Luís Baía, João Faria, and Ricardo Gamelas Sousa. 2018. A unified model with structured output for fashion images classification. arXiv preprint arXiv:1806.09445 (2018).
[3]
Beatriz Quintino Ferreira, Joao P Costeira, and Joao P Gomes. 2021. Explainable Noisy Label Flipping for Multi-Label Fashion Image Classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3916--3920.
[4]
Rafael Ferreira, Mariana Leite, David Semedo, and Joao Magalhaes. 2021. Open- Domain Conversational Search Assistant with Transformers. arXiv preprint arXiv:2101.08197 (2021).
[5]
Ehsan Hosseini-Asl, Bryan McCann, Chien-Sheng Wu, Semih Yavuz, and Richard Socher. 2020. A simple language model for task-oriented dialogue. arXiv preprint arXiv:2005.00796 (2020).
[6]
Beatriz Quintino Ferreira, Joao P Costeira, Ricardo Gamelas Sousa, Liang-Yan Gui, and Joao P Gomes. 2019. Pose guided attention for multi-label fashion image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops.
[7]
David Semedo and João Magalhães. 2020. Adaptive Temporal Triplet-loss for Cross-modal Embedding Learning. In Proceedings of the 28th ACM International Conference on Multimedia. 1152--1161.
[8]
Yunyi Yang, Yunhao Li, and Xiaojun Quan. 2020. UBAR: Towards Fully End-to-End Task-Oriented Dialog Systems with GPT-2. arXiv preprint arXiv:2012.03539 (2020).

Cited By

View all

Index Terms

  1. iFetch: Multimodal Conversational Agents for the Online Fashion Marketplace

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          MuCAI'21: Proceedings of the 2nd ACM Multimedia Workshop on Multimodal Conversational AI
          November 2021
          32 pages
          ISBN:9781450386791
          DOI:10.1145/3475959
          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 17 November 2021

          Check for updates

          Author Tags

          1. computer vision
          2. conversational commerce
          3. natural language processing

          Qualifiers

          • Poster

          Funding Sources

          • ERDF, COMPETE 2020, NORTE 2020 and FCT under CMU Portugal

          Conference

          MM '21
          Sponsor:
          MM '21: ACM Multimedia Conference
          October 20, 2021
          Virtual Event, China

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)12
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 06 Jan 2025

          Other Metrics

          Citations

          Cited By

          View all

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media