skip to main content
research-article

Deep Human Parsing with Active Template Regression

Published: 01 December 2015 Publication History

Abstract

In this work, the human parsing task, namely decomposing a human image into semantic fashion/body regions, is formulated as an active template regression (ATR) problem, where the normalized mask of each fashion/body item is expressed as the linear combination of the learned mask templates, and then morphed to a more precise mask with the active shape parameters, including position, scale and visibility of each semantic region. The mask template coefficients and the active shape parameters together can generate the human parsing results, and are thus called the structure outputs for human parsing. The deep Convolutional Neural Network (CNN) is utilized to build the end-to-end relation between the input human image and the structure outputs for human parsing. More specifically, the structure outputs are predicted by two separate networks. The first CNN network is with max-pooling, and designed to predict the template coefficients for each label mask, while the second CNN network is without max-pooling to preserve sensitivity to label mask position and accurately predict the active shape parameters. For a new image, the structure outputs of the two networks are fused to generate the probability of each label for each pixel, and super-pixel smoothing is finally used to refine the human parsing result. Comprehensive evaluations on a large dataset well demonstrate the significant superiority of the ATR framework over other state-of-the-arts for human parsing. In particular, the F1-score reaches 64.38 percent by our ATR framework, significantly higher than 44.76 percent based on the state-of-the-art algorithm [28].

References

[1]
J. Carreira, R. Caseiro, J. Batista, and C. Sminchisescu, “Semantic segmentation with second-order pooling,” in Proc. Eur. Conf. Comput. Vis., 2012, pp. 430–443.
[2]
J. Carreira and C. Sminchisescu, “Cpmc: Automatic object segmentation using constrained parametric min-cuts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 7, pp. 1312–1328, Jul. 2012.
[3]
H. Chen, Z. Xu, Z. Liu, and S. C. Zhu, “Composite templates for cloth modeling and sketching,” in Proc. Comput. Vis. Pattern Recog., 2006, pp. 943–950.
[4]
H. Chen, A. Gallagher, and B. Girod, “Describing clothing by semantic attributes,” in Proc. Eur. Conf. Comput. Vis., 2012, pp. 609–623.
[5]
T. F. Cootes, G. J. Edwards, and C. J. Taylor, “ Active appearance models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 6, pp. 681–685, Jun. 2001.
[6]
T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham, “Active shape models—their training and application,” Comput. Vis. Image Understanding, vol. 61, no. 1, pp. 38–59, 1995.
[7]
M. Dantone, J. Gall, C. Leistner, and L. V. Gool, “Human pose estimation using body parts dependent joint regressors,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. , 2013, pp. 3041–3048.
[8]
J. Dong, Q. Chen, W. Xia, Z. Huang, and S. Yan, “A deformable mixture parsing model with parselets,” in Proc. Int. Conf. Comput. Vis., 2013, pp. 3408–3415.
[9]
C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning hierarchical features for scene labeling,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1915–1929, Aug. 2013.
[10]
P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient graph-based image segmentation,” Int. J. Comput. Vis., vol. 59, no. 2, pp. 167–181, 2004.
[11]
B. Fulkerson, A. Vedaldi, and S. Soatto, “Class segmentation and object localization with superpixel neighborhoods,” in Proc. Int. Conf. Comput. Vision, 2009, pp. 670–677.
[12]
R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. Comput. Vision Pattern Recog. , 2014.
[13]
V. Gulshan, C. Rother, A. Criminisi, A. Blake, and A. Zisserman, “Geodesic star convexity for interactive image segmentation,” in Proc. Comput. Vis. Pattern Recog., 2010, pp. 3129 –3136.
[14]
Y. Jia. (2013). Caffe: An open source convolutional architecture for fast feature embedding. [Online]. Available: http://caffe.berkeleyvision.org/
[15]
I. Jolliffe, “Principal component analysis,” Encyclopedia of Statistics in Behavioral Science. Springer, New York, 2002.
[16]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inform. Process. Syst., 2012, pp. 1097–1105.
[17]
D. D. Lee and H. S. Seung, “Learning the parts of objects by nonnegative matrix factorization,” Nature, vol. 401, pp. 788–791, 1999.
[18]
L. Lin, X. Wang, W. Yang, and J.-H. Lai, “Discriminatively trained and-or graph models for object shape detection,” CoRR, 2015, http://arxiv.org/abs/1502.00341
[19]
S. Liu, J. Feng, C. Domokos, H. Xu, J. Huang, Z. Hu, and S. Yan, “Fashion parsing with weak color-category labels,” IEEE Trans. Multimedia, vol. 16, no. 1, pp. 253–265, Jan. 2014.
[20]
S. Liu, J. Feng, Z. Song, T. Zhang, H. Lu, C. Xu, and S. Yan, “Hi, magic closet, tell me what to wear! ” in Proc. 20th ACM Int. Conf. Multimedia, 2012, pp. 619 –628.
[21]
S. Liu, Z. Song, G. Liu, C. Xu, H. Lu, and S. Yan, “ Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set,” in Proc. Comput. Vis. Pattern Recog., 2012, pp. 3330–3337 .
[22]
J. Mairal, F. Bach, J. Ponce, and G. Sapiro, “Online learning for matrix factorization and sparse coding,” J. Mach. Learn. Res., vol. 11, pp. 19–60, 2010.
[23]
Y. Peng, A. Ganesh, J. Wright, W. Xu, and Y. Ma, “RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 11, pp. 2233–2246, Nov. 2012.
[24]
P. H. O. Pinheiro and R. Collobert, “Recurrent convolutional neural networks for scene labeling,” in Proc. Int. Conf. Mach. Learn., 2014, pp. 82–90.
[25]
J. Shotton, M. Johnson, and R. Cipolla, “Semantic texton forests for image categorization and segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2008, pp. 1–8.
[26]
C. Szegedy, A. Toshev, and D. Erhan, “Deep neural networks for object detection,” in Proc. Adv. Neural Inform. Process. Syst., 2013, pp. 2553–2561.
[27]
A. Toshev and C. Szegedy, “Deeppose: Human pose estimation via deep neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2014, pp. 1653–1660.
[28]
K. Yamaguchi, M. H. Kiapour, and T. L. Berg, “ Paper doll parsing: Retrieving similar styles to parse clothing items,” in Proc. Int. Conf. Comput. Vis., 2013, pp. 3519–3526.
[29]
K. Yamaguchi, M. H. Kiapour, L. E. Ortiz, and T. L. Berg, “Parsing clothing in fashion photographs,” in Proc. Comput. Vision Pattern Recog., 2012, pp. 3570–3577.
[30]
W. Yang, L. Lin, and P. Luo, “Clothing co-parsing by joint image segmentation and labeling,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2014, pp. 3182–3189.
[31]
M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” European Conference on Computer Vision, pp. 818–833, 2014.

Cited By

View all

Index Terms

  1. Deep Human Parsing with Active Template Regression
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image IEEE Transactions on Pattern Analysis and Machine Intelligence
        IEEE Transactions on Pattern Analysis and Machine Intelligence  Volume 37, Issue 12
        Dec. 2015
        240 pages

        Publisher

        IEEE Computer Society

        United States

        Publication History

        Published: 01 December 2015

        Author Tags

        1. active shape network
        2. Active template regression
        3. CNN
        4. human parsing
        5. active template network

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 15 Sep 2024

        Other Metrics

        Citations

        Cited By

        View all

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media