research-article

ModaNet: A Large-scale Street Fashion Dataset with Polygon Annotations

Authors:

M. Hadi Kiapour,

Robinson PiramuthuAuthors Info & Claims

MM '18: Proceedings of the 26th ACM international conference on Multimedia

Pages 1670 - 1678

https://doi.org/10.1145/3240508.3240652

Published: 15 October 2018 Publication History

Abstract

Understanding clothes from a single image would have huge commercial and cultural impacts on modern societies. However, this task remains a challenging computer vision problem due to wide variations in the appearance, style, brand and layering of clothing items. We present a new database called ModaNet, a large-scale collection of images based on Paperdoll dataset. Our dataset provides 55,176 street images, fully annotated with polygons on top of the 1 million weakly annotated street images in Paperdoll. ModaNet aims to provide a technical benchmark to fairly evaluate the progress of applying the latest computer vision techniques that rely on large data for fashion understanding. The rich annotation of the dataset allows to measure the performance of state-of-the-art algorithms for object detection, semantic segmentation and polygon prediction on street fashion images in detail.

References

[1]

David Acuna, Huan Ling, Amlan Kar, and Sanja Fidler. 2018. Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++. In CVPR.

[2]

Navaneeth Bodla, Bharat Singh, Rama Chellappa, and Larry S. Davis. 2017. Soft- NMS - Improving Object Detection with One Line of Code. In ICCV. 5562--5570.

[3]

Lluis Castrejon, Kaustav Kundu, Raquel Urtasun, and Sanja Fidler. 2017. Annotating Object Instances with a Polygon-RNN. In CVPR.

[4]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. 2018. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40 (2018), 834--848. Issue 4.

[5]

Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In CVPR.

[6]

Francois Chollet. 2017. Xception: Deep Learning with Depthwise Separable Convolutions. In CVPR.

[7]

Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. 2016. R-FCN: Object Detection via Region-based Fully Convolutional Networks. In NIPS.

Digital Library

[8]

Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable Convolutional Networks. In ICCV. 764--773.

[9]

Jia Deng,Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR.

[10]

Jian Dong, Qiang Chen, Wei Xia, Zhongyang Huang, and Shuicheng Yan. 2013. A deformable mixture parsing model with parselets. In ICCV.

Digital Library

[11]

Ross B. Girshick. 2015. Fast R-CNN. In ICCV. 1440--1448.

Digital Library

[12]

Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In CVPR. 580--587.

Digital Library

[13]

Kota Hara, Vignesh Jagadeesh, and Robinson Piramuthu. 2016. Fashion apparel detection: The role of deep convolutional neural network and pose-dependent priors. In WACV. 1--9.

[14]

Bharath Hariharan, Pablo Arbelaez, Ross Girshick, and Jitendra Malik. 2014. Simultaneous Detection and Segmentation. In ECCV.

[15]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition.

[16]

Xuming He, Richard S Zemel, and Miguel A Carreira-Perpinan. 2014. Multiscale Conditional Random Fields for Image Labeling. In CVPR.

Digital Library

[17]

Junshi Huang, Rogério Schmidt Feris, Qiang Chen, and Shuicheng Yan. 2015. Cross-Domain Image Retrieval with a Dual Attribute-Aware Ranking Network. In ICCV. 1062--1070.

Digital Library

[18]

M. Hadi Kiapour, Xufeng Han, Svetlana Lazebnik, Alexander C. Berg, and Tamara L. Berg. 2015. Where to Buy It: Matching Street Clothing Photos in Online Shops. In ICCV. 3343--3351.

Digital Library

[19]

Lubor Ladicky, Chris Russell, Pushmeet Kohli, and Philip HS Torr. 2009. Associative hierarchical crfs for object class image segmentation. In ICCV.

[20]

Xiaodan Liang, Si Liu, Xiaohui Shen, Jianchao Yang, Luoqi Liu, Jian Dong, Liang Lin, and Shuicheng Yan. 2015. Deep Human Parsing with Active Template Regression. IEEE Trans. Pattern Anal. Mach. Intell. 37, 12 (2015), 2402--2414.

Digital Library

[21]

Xiaodan Liang, Chunyan Xu, Xiaohui Shen, Jianchao Yang, Si Liu, Jinhui Tang, Liang Lin, and Shuicheng Yan. 2015. Human Parsing with Contextualized Convolutional Neural Network. In ICCV.

Digital Library

[22]

G. Lin, C. Shen, A. van den Hengel, and I. D. Reid. 2017. Exploring context with deep structured models for semantic segmentation. In CVPR.

[23]

Tsung-Yi Lin, Piotr Dollár, Ross B. Girshick, Kaiming He, Bharath Hariharan, and Serge J. Belongie. 2017. Feature Pyramid Networks for Object Detection. In CVPR. 936--944.

[24]

Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He, and Piotr Dollár. 2017. Focal Loss for Dense Object Detection. In ICCV. 2999--3007.

[25]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In ECCV.

[26]

Si Liu, Jiashi Feng, Csaba Domokos, Hui Xu, Junshi Huang, Zhenzhen Hu, and Shuicheng Yan. 2014. Fashion Parsing With Weak Color-Category Labels. IEEE Trans. Multimedia 16, 1 (2014), 253--265.

[27]

Si Liu, Xiaodan Liang, Luoqi Liu, Ke Lu, Liang Lin, and Shuicheng Yan. 2014. Fashion Parsing with Video Context. In ACM MUltiMedia.

Digital Library

[28]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single Shot MultiBox Detector. In ECCV.

[29]

Ziwei Liu, X. Li, P. Luo, C.-C. Loy, and X. Tang. 2017. Deep learning markov random field for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1 (2017), 37--52.

[30]

Ziwei Liu, Ping Luo, Shi Qiu, XiaogangWang, and Xiaoou Tang. 2016. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations. In CVPR. 1096--1104.

[31]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In CVPR.

[32]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NIPS.

Digital Library

[33]

A. G. Schwing and R. Urtasun. 2015. Fully connected deep structured networks. In arXiv:1503.02351.

[34]

Evan Shelhamer, Jonathon Long, and Trevor Darrell. 2016. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39 (2016), 640--651. Issue 4.

Digital Library

[35]

Jamie Shotton, John Winn, Carsten Rother, and Antonio Criminisi. 2006. Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In ECCV.

Digital Library

[36]

Abhinav Shrivastava, Abhinav Gupta, and Ross B. Girshick. 2016. Training Region-Based Object Detectors with Online Hard Example Mining. In CVPR. 761--769.

[37]

Marcel Simon, Erik Rodner, and Joachim Denzler. 2016. ImageNet pre-trained models with batch normalization. arXiv preprint arXiv:1612.01452 (2016).

[38]

K. Simonyan and A. Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR.

[39]

Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI.

[40]

Kota Yamaguchi, M. Hadi Kiapour, and Tamara L. Berg. 2013. Paper Doll Parsing: Retrieving Similar Styles to Parse Clothing Items. In ICCV. 3519--3526.

Digital Library

[41]

Kota Yamaguchi, M. Hadi Kiapour, Luis E. Ortiz, and Tamara L. Berg. 2012. Parsing clothing in fashion photographs. In CVPR. 3570--3577.

Digital Library

[42]

Wei Yang, Ping Luo, and Liang Lin. 2014. Clothing Co-parsing by Joint Image Segmentation and Labeling. In CVPR. 3182--3189.

Digital Library

[43]

Aron Yu and Kristen Grauman. 2014. Fine-Grained Visual Comparisons with Local Learning. In CVPR. 192--199.

Digital Library

[44]

Fisher Yu and Vladlen Koltun. 2016. Multi-Scale Context Aggregation by Dilated Convolutions. In ICLR.

[45]

HengShuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017. Pyramid Scene Parsing Network. In CVPR. 1063--6919.

[46]

Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, and Philip H. S. Torr. 2015. Conditional Random Fields as Recurrent Neural Networks. In ICCV. 1529--1537.

Digital Library

Cited By

Yang SLiu XWei W(2024)Open-Vocabulary Part-Level Detection and Segmentation for Human–Robot InteractionApplied Sciences10.3390/app1414635614:14(6356)Online publication date: 21-Jul-2024
https://doi.org/10.3390/app14146356
Hou JLu YYang YLiu Z(2024)P D N: A Priori Dictionary Network for Fashion ParsingApplied Sciences10.3390/app1408350914:8(3509)Online publication date: 22-Apr-2024
https://doi.org/10.3390/app14083509
Zhao YSun GLing ZZhang AJia X(2024)Point-Based Weakly Supervised Deep Learning for Semantic Segmentation of Remote Sensing ImagesIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.340990362(1-16)Online publication date: 2024
https://doi.org/10.1109/TGRS.2024.3409903
Show More Cited By

Index Terms

ModaNet: A Large-scale Street Fashion Dataset with Polygon Annotations
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Image segmentation
        Object detection

Recommendations

Fashion-focused creative commons social dataset
MMSys '13: Proceedings of the 4th ACM Multimedia Systems Conference

In this work, we present a fashion-focused Creative Commons dataset, which is designed to contain a mix of general images as well as a large component of images that are focused on fashion (i.e., relevant to particular clothing items or fashion ...
Inception Models for Fashion Image Captioning: An Extensive Study on Multiple Datasets
Experimental IR Meets Multilinguality, Multimodality, and Interaction
Abstract
Fashion e-commerce platforms are becoming increasingly popular. However, scanning, rendering, and captioning fashion items are still done mostly manually. In this work, we address the task of generating a textual description of a fashion item from ...
DocILE 2023 Teaser: Document Information Localization and Extraction
Advances in Information Retrieval
Abstract
The lack of data for information extraction (IE) from semi-structured business documents is a real problem for the IE community. Publications relying on large-scale datasets use only proprietary, unpublished data due to the sensitive nature of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '18: Proceedings of the 26th ACM international conference on Multimedia

October 2018

2167 pages

ISBN:9781450356657

DOI:10.1145/3240508

General Chairs:
Susanne Boll
University of Oldenburg, Germany
,
Kyoung Mu Lee
Seoul National University, Korea
,
Jiebo Luo
University of Rochester, USA
,
Wenwu Zhu
Tsinghua University, China
,
Program Chairs:
Hyeran Byun
Yonsei University, Korea
,
Chang Wen Chen
State Univ. Of New York at Buffalo, USA
,
Rainer Lienhart
University of Augsburg, Germany
,
Tao Mei
JD AI, China

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '18

Sponsor:

SIGMM

MM '18: ACM Multimedia Conference

October 22 - 26, 2018

Seoul, Republic of Korea

Acceptance Rates

MM '18 Paper Acceptance Rate 209 of 757 submissions, 28%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

80
Total Citations
View Citations
606
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)3

Reflects downloads up to 07 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yang SLiu XWei W(2024)Open-Vocabulary Part-Level Detection and Segmentation for Human–Robot InteractionApplied Sciences10.3390/app1414635614:14(6356)Online publication date: 21-Jul-2024
https://doi.org/10.3390/app14146356
Hou JLu YYang YLiu Z(2024)P D N: A Priori Dictionary Network for Fashion ParsingApplied Sciences10.3390/app1408350914:8(3509)Online publication date: 22-Apr-2024
https://doi.org/10.3390/app14083509
Zhao YSun GLing ZZhang AJia X(2024)Point-Based Weakly Supervised Deep Learning for Semantic Segmentation of Remote Sensing ImagesIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.340990362(1-16)Online publication date: 2024
https://doi.org/10.1109/TGRS.2024.3409903
Yu FZhang YLi HDu CLiu LJiang M(2024)Phase Contour Enhancement Network for Clothing ParsingIEEE Transactions on Consumer Electronics10.1109/TCE.2024.337737770:1(2784-2793)Online publication date: Feb-2024
https://doi.org/10.1109/TCE.2024.3377377
Velioglu RChan RHammer B(2024)FashionFail: Addressing Failure Cases in Fashion Object Detection and Segmentation2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651287(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10651287
Farhan Audianto MDwi Sulistiyo MRachmawati EHadiyoso S(2024)Fashion Parsing and Identification for E-Commerce Using Semantic Segmentation2024 12th International Conference on Information and Communication Technology (ICoICT)10.1109/ICoICT61617.2024.10698349(566-571)Online publication date: 7-Aug-2024
https://doi.org/10.1109/ICoICT61617.2024.10698349
Kalinin AJafari AAvots EOzcinar CAnbarjafari G(2024)Generative AI-based style recommendation using fashion item detection and classificationSignal, Image and Video Processing10.1007/s11760-024-03538-x18:12(9179-9189)Online publication date: 13-Sep-2024
https://doi.org/10.1007/s11760-024-03538-x
Yang LJia WLi SSong Q(2024)Deep Learning Technique for Human Parsing: A Survey and OutlookInternational Journal of Computer Vision10.1007/s11263-024-02031-9132:8(3270-3301)Online publication date: 9-Mar-2024
https://doi.org/10.1007/s11263-024-02031-9
Islam SJoardar SSekh A(2024)BangleFIR: bridging the gap in fashion image retrieval with a novel dataset of banglesMultimedia Tools and Applications10.1007/s11042-024-19698-4Online publication date: 10-Jul-2024
https://doi.org/10.1007/s11042-024-19698-4
Li YZhang WWu MZhang DWang ZYou C(2024)Multi-keypoints matching network for clothing detectionThe Visual Computer10.1007/s00371-024-03337-yOnline publication date: 25-Mar-2024
https://doi.org/10.1007/s00371-024-03337-y
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents