research-article

Deep Human Parsing with Active Template Regression

Authors:

Shuicheng YanAuthors Info & Claims

IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 37, Issue 12

Pages 2402 - 2414

https://doi.org/10.1109/TPAMI.2015.2408360

Published: 01 December 2015 Publication History

Abstract

In this work, the human parsing task, namely decomposing a human image into semantic fashion/body regions, is formulated as an active template regression (ATR) problem, where the normalized mask of each fashion/body item is expressed as the linear combination of the learned mask templates, and then morphed to a more precise mask with the active shape parameters, including position, scale and visibility of each semantic region. The mask template coefficients and the active shape parameters together can generate the human parsing results, and are thus called the structure outputs for human parsing. The deep Convolutional Neural Network (CNN) is utilized to build the end-to-end relation between the input human image and the structure outputs for human parsing. More specifically, the structure outputs are predicted by two separate networks. The first CNN network is with max-pooling, and designed to predict the template coefficients for each label mask, while the second CNN network is without max-pooling to preserve sensitivity to label mask position and accurately predict the active shape parameters. For a new image, the structure outputs of the two networks are fused to generate the probability of each label for each pixel, and super-pixel smoothing is finally used to refine the human parsing result. Comprehensive evaluations on a large dataset well demonstrate the significant superiority of the ATR framework over other state-of-the-arts for human parsing. In particular, the F1-score reaches 64.38 percent by our ATR framework, significantly higher than 44.76 percent based on the state-of-the-art algorithm [28].

References

[1]

J. Carreira, R. Caseiro, J. Batista, and C. Sminchisescu, “Semantic segmentation with second-order pooling,” in Proc. Eur. Conf. Comput. Vis., 2012, pp. 430–443.

[2]

J. Carreira and C. Sminchisescu, “Cpmc: Automatic object segmentation using constrained parametric min-cuts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 7, pp. 1312–1328, Jul. 2012.

Digital Library

[3]

H. Chen, Z. Xu, Z. Liu, and S. C. Zhu, “Composite templates for cloth modeling and sketching,” in Proc. Comput. Vis. Pattern Recog., 2006, pp. 943–950.

[4]

H. Chen, A. Gallagher, and B. Girod, “Describing clothing by semantic attributes,” in Proc. Eur. Conf. Comput. Vis., 2012, pp. 609–623.

[5]

T. F. Cootes, G. J. Edwards, and C. J. Taylor, “ Active appearance models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 6, pp. 681–685, Jun. 2001.

Digital Library

[6]

T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham, “Active shape models—their training and application,” Comput. Vis. Image Understanding, vol. 61, no. 1, pp. 38–59, 1995.

Digital Library

[7]

M. Dantone, J. Gall, C. Leistner, and L. V. Gool, “Human pose estimation using body parts dependent joint regressors,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. , 2013, pp. 3041–3048.

[8]

J. Dong, Q. Chen, W. Xia, Z. Huang, and S. Yan, “A deformable mixture parsing model with parselets,” in Proc. Int. Conf. Comput. Vis., 2013, pp. 3408–3415.

[9]

C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning hierarchical features for scene labeling,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1915–1929, Aug. 2013.

Digital Library

[10]

P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient graph-based image segmentation,” Int. J. Comput. Vis., vol. 59, no. 2, pp. 167–181, 2004.

Digital Library

[11]

B. Fulkerson, A. Vedaldi, and S. Soatto, “Class segmentation and object localization with superpixel neighborhoods,” in Proc. Int. Conf. Comput. Vision, 2009, pp. 670–677.

[12]

R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. Comput. Vision Pattern Recog. , 2014.

[13]

V. Gulshan, C. Rother, A. Criminisi, A. Blake, and A. Zisserman, “Geodesic star convexity for interactive image segmentation,” in Proc. Comput. Vis. Pattern Recog., 2010, pp. 3129 –3136.

[14]

Y. Jia. (2013). Caffe: An open source convolutional architecture for fast feature embedding. [Online]. Available: http://caffe.berkeleyvision.org/

[15]

I. Jolliffe, “Principal component analysis,” Encyclopedia of Statistics in Behavioral Science. Springer, New York, 2002.

[16]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inform. Process. Syst., 2012, pp. 1097–1105.

[17]

D. D. Lee and H. S. Seung, “Learning the parts of objects by nonnegative matrix factorization,” Nature, vol. 401, pp. 788–791, 1999.

[18]

L. Lin, X. Wang, W. Yang, and J.-H. Lai, “Discriminatively trained and-or graph models for object shape detection,” CoRR, 2015, http://arxiv.org/abs/1502.00341

[19]

S. Liu, J. Feng, C. Domokos, H. Xu, J. Huang, Z. Hu, and S. Yan, “Fashion parsing with weak color-category labels,” IEEE Trans. Multimedia, vol. 16, no. 1, pp. 253–265, Jan. 2014.

[20]

S. Liu, J. Feng, Z. Song, T. Zhang, H. Lu, C. Xu, and S. Yan, “Hi, magic closet, tell me what to wear! ” in Proc. 20th ACM Int. Conf. Multimedia, 2012, pp. 619 –628.

Digital Library

[21]

S. Liu, Z. Song, G. Liu, C. Xu, H. Lu, and S. Yan, “ Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set,” in Proc. Comput. Vis. Pattern Recog., 2012, pp. 3330–3337 .

[22]

J. Mairal, F. Bach, J. Ponce, and G. Sapiro, “Online learning for matrix factorization and sparse coding,” J. Mach. Learn. Res., vol. 11, pp. 19–60, 2010.

Digital Library

[23]

Y. Peng, A. Ganesh, J. Wright, W. Xu, and Y. Ma, “RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 11, pp. 2233–2246, Nov. 2012.

Digital Library

[24]

P. H. O. Pinheiro and R. Collobert, “Recurrent convolutional neural networks for scene labeling,” in Proc. Int. Conf. Mach. Learn., 2014, pp. 82–90.

[25]

J. Shotton, M. Johnson, and R. Cipolla, “Semantic texton forests for image categorization and segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2008, pp. 1–8.

[26]

C. Szegedy, A. Toshev, and D. Erhan, “Deep neural networks for object detection,” in Proc. Adv. Neural Inform. Process. Syst., 2013, pp. 2553–2561.

[27]

A. Toshev and C. Szegedy, “Deeppose: Human pose estimation via deep neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2014, pp. 1653–1660.

[28]

K. Yamaguchi, M. H. Kiapour, and T. L. Berg, “ Paper doll parsing: Retrieving similar styles to parse clothing items,” in Proc. Int. Conf. Comput. Vis., 2013, pp. 3519–3526.

[29]

K. Yamaguchi, M. H. Kiapour, L. E. Ortiz, and T. L. Berg, “Parsing clothing in fashion photographs,” in Proc. Comput. Vision Pattern Recog., 2012, pp. 3570–3577.

[30]

W. Yang, L. Lin, and P. Luo, “Clothing co-parsing by joint image segmentation and labeling,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2014, pp. 3182–3189.

[31]

M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” European Conference on Computer Vision, pp. 818–833, 2014.

Cited By

Ge JZhou MBao WXu HFu C(2024)Creating LEGO Figurines from Single ImagesACM Transactions on Graphics10.1145/365816743:4(1-16)Online publication date: 19-Jul-2024
https://dl.acm.org/doi/10.1145/3658167
Gong QWei YZhao Y(2024)BiEPNet: Bilateral Edge-perceiving Network for High-Resolution Human ParsingProceedings of the 2024 8th International Conference on Digital Signal Processing10.1145/3653876.3653898(197-204)Online publication date: 23-Feb-2024
https://dl.acm.org/doi/10.1145/3653876.3653898
Kang DBaek ESon SLee YGong TKim H(2024)MIRRORProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314207:4(1-27)Online publication date: 12-Jan-2024
https://dl.acm.org/doi/10.1145/3631420
Show More Cited By

Index Terms

Deep Human Parsing with Active Template Regression
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Class-level Aware Network for Human Parsing
CNIOT '21: Proceedings of the 2021 2nd International Conference on Computing, Networks and Internet of Things

Having shown great performance in human parsing, convolutional neural networks(CNNs) come with much computation budget. In this paper, a novel class-level aware network(CANet), which employs an asymmetric encoder-decoder architecture, is presented to ...
Multi-Human Parsing Machines
MM '18: Proceedings of the 26th ACM international conference on Multimedia

Human parsing is an important task in human-centric analysis. Despite the remarkable progress in single-human parsing, the more realistic case of multi-human parsing remains challenging in terms of the data and the model. Compared with the considerable ...
Human parsing by weak structural label

Human parsing, which decomposes a human centric image into several semantic labels, e.g., face, skin etc, is an active topic in recent years. Traditional human parsing methods are always conducted on a supervised setting, i.e., the pixel-wise labels are ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Pattern Analysis and Machine Intelligence

IEEE Transactions on Pattern Analysis and Machine Intelligence Volume 37, Issue 12

Dec. 2015

240 pages

ISSN:0162-8828

Issue’s Table of Contents

Copyright © 2015.

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 December 2015

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

71
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ge JZhou MBao WXu HFu C(2024)Creating LEGO Figurines from Single ImagesACM Transactions on Graphics10.1145/365816743:4(1-16)Online publication date: 19-Jul-2024
https://dl.acm.org/doi/10.1145/3658167
Gong QWei YZhao Y(2024)BiEPNet: Bilateral Edge-perceiving Network for High-Resolution Human ParsingProceedings of the 2024 8th International Conference on Digital Signal Processing10.1145/3653876.3653898(197-204)Online publication date: 23-Feb-2024
https://dl.acm.org/doi/10.1145/3653876.3653898
Kang DBaek ESon SLee YGong TKim H(2024)MIRRORProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314207:4(1-27)Online publication date: 12-Jan-2024
https://dl.acm.org/doi/10.1145/3631420
Liu TZhu HWei YWei SZhao YZhang Y(2024)Toward Accurate Human Parsing Through Edge Guided DiffusionIEEE Transactions on Image Processing10.1109/TIP.2024.337993133(2530-2543)Online publication date: 27-Mar-2024
https://dl.acm.org/doi/10.1109/TIP.2024.3379931
Lin ZJiang XZheng Z(2024)A coarse-to-fine pattern parser for mitigating the issue of drastic imbalance in pixel distributionPattern Recognition10.1016/j.patcog.2023.110143148:COnline publication date: 17-Apr-2024
https://dl.acm.org/doi/10.1016/j.patcog.2023.110143
Lin ZXu LZheng Z(2024)FCPNNeural Networks10.1016/j.neunet.2024.106258174:COnline publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1016/j.neunet.2024.106258
Lin ZZheng Z(2024)FCPNNeural Networks10.1016/j.neunet.2023.10.021169:C(398-416)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1016/j.neunet.2023.10.021
Lin ZZheng ZJia JGao W(2024)Reducing vulnerable internal feature correlations to enhance efficient topological structure parsingExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.123268247:COnline publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1016/j.eswa.2024.123268
Yang LJia WLi SSong Q(2024)Deep Learning Technique for Human Parsing: A Survey and OutlookInternational Journal of Computer Vision10.1007/s11263-024-02031-9132:8(3270-3301)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1007/s11263-024-02031-9
Rico Gómez RLorentz JHartmann TGoknil APal Singh IHalaç TBoruzanlı Ekinci G(2024)An AI pipeline for garment price projection using computer visionNeural Computing and Applications10.1007/s00521-024-09901-w36:25(15631-15651)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s00521-024-09901-w
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents