skip to main content
research-article

A new 3D descriptor for human classification: application for human detection in a multi-kinect system

Published: 01 August 2019 Publication History

Abstract

In this paper we present a new 3D descriptor for human classification and a human detection method based on this descriptor. The proposed 3D descriptor allows classification of an object represented by a point cloud, as human or non-human. It is derived from the well-known Histogram of Oriented Gradient by employing surface normals instead of gradients. The process consists in an appropriate subdivision of the object point cloud into blocks. These blocks provide the spatial distribution modeling of the surface normal orientation into the different parts of the object. This distribution modelling is expressed as a histogram. In addition we have set up a multi-kinect acquisition system that provides us with Complete Point Clouds (CPC) (i.e. 360° view). Such CPCs enable a suitable processing, particularly in case of occlusions. Moreover they allow for the determination of the human frontal orientation. Based on the proposed 3D descriptor, we have developed a human detection method that is applied on CPCs. First, we evaluated the 3D descriptor over a set of CPC candidates by using the Support Vector Machine (SVM) classifier. The learning process was conducted with the original CPC database that we have built. The results are very promising. The descriptor can discriminate human from non-human candidates and provides the frontal direction of humans with high precision. In addition we demonstrated that using the CPCs improves significantly the classification results in comparison with Single Point Clouds (i.e. points clouds acquired with only one kinect). Second, we compared our detection method with two others, namely the HOG detector on RGB images and a 3D HOG-based detection method that is applied on RGB-depth data. The obtained results on different situations show that the proposed human detection method provides excellent performances that outperform the other two detection methods.

References

[1]
Angelova A, Krizhevsky A, Vanhoucke V, Ogale A, Ferguson D (2015) Real-time pedestrian detection with deep network cascades. In: British machine vision conference
[2]
Bajracharya M, Moghaddam B, Howard A, Brennan S, Matthies L H (2009) A fast stereo-based system for detecting and tracking pedestrians from a moving vehicle. In: International Journal of Robotics Research
[3]
Baltieri D, Vezzani R, Cucchiara R (2012) People orientation recognition by mixtures of wrapped distributions on random trees. In: European conference on computer vision, pp 270–283
[4]
Campmany V, Silva S, Espinosa A, Moure J, Vazquez D, Lopez A (2016) GPU-based pedestrian detection for autonomous driving. Proc Comput Scie 80:2377–2381
[5]
Chang C, Lin C (2011) LIBSVM: a library for support vector machines. In: Transactions on intelligent systems and technology, vol 27. ACM, pp 1–27
[6]
Chen C, Heili A, Odobez J (2011) Combined estimation of location and body pose in surveillance video. In: International conference on advanced video and signal based surveillance, pp 5–10
[7]
Choi B, Meriçli C, Biswas J, Veloso M (2013) Fast human detection for indoor mobile robots using depth images. In: IEEE international conference on robotics and automation. IEEE, pp 1108– 1113
[8]
Choi B, Pantofaru C, Savarese S (2011) Detecting and tracking people using an RGB-D camera via multiple detector fusion. In: Conference on computer vision workshops. IEEE, pp 6–13
[9]
Culhane K M, OConnor M, Lyons D, Lyons G M (2008) Accelerometers in rehabilitation medicine for older adults. Age Ageing 6:556–560
[10]
Herrera DC, Kannala J, Heikkilä J (2011) Accurate and practical calibration of a depth and color camera pair. In: Lecture notes in computer science
[11]
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Computer vision and pattern recognition, vol I. IEEE, pp 886–893
[12]
Deveaux J C, Hadj-Abdelkader H, Colle E (2013) A multi-sensor calibration toolbox for kinect: application to kinect and laser range finder fusion. In: International conference on advanced robotics
[13]
Drory A, Zhu G, Li H, Hartley R (2017) Automated detection and tracking of slalom paddlers from broadcast image sequences using cascade classifiers and discriminative correlation filters. Comput Vis Image Underst 159:116–127
[14]
Eichner M, Marin-Jimenez M, Zisserman A, Ferrari V (2012) 2D articulated human pose estimation and retrieval in (almost) unconstrained still images. Int J Comput Vis 99(2):190–214
[15]
Engelcke M, Rao D, Wang D Z, Tong C H, Posner I (2017) Vote3Deep: fast object detection in 3D point clouds using efficient convolutional neural networks. International Conference on Robotics and Automation
[16]
Fitte-Duval L, Mekonnen A, Lerasle F (2015) Upper body detection and feature set evaluation for body pose classification. In: International conference on computer vision theory and applications, pp 439–446
[17]
Gavrila D M, Munder S (2007) Multi-cue pedestrian detection and tracking from a moving vehicle. In: International journal of computer vision, vol 73. Springer, pp 41–59
[18]
Gond L, Sayd P, Chateau T, Dhome M (2008) A 3D shape descriptor for human pose recovery. In: Lecture notes in computer science, vol 5098. Springer, pp 370–379
[19]
Hegger F, Hochgeschwender N, Kraetzschmar G K, Ploeger P G (2013) People detection in 3d point clouds using local surface normals. Lect Notes Comput Sci 7500:154–165
[20]
Holz D, Holzer S, Rusu R B, Benke S (2012) Real-time plane segmentation using rgb-d cameras. In: Lecture notes in computer science. Springer, pp 306–317
[21]
Hosseini JO, Mitzel D, Leibe B (2014) Real-time rgb-d based people detection and tracking for mobile robots and head-worn cameras. In: IEEE international conference on robotics and automation
[22]
Ikemura S, Fujiyoshi H (2011) Real-time human detection using relational depth similarity features. In: Asian conference on computer vision. Springer, pp 25–38
[23]
Johnson A (1997) Spin-images: a representation for 3-D surface matching. Ph.D. thesis, The Robotics Institute, Carnegie Mellon University
[24]
Klaser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3D- gradients. In: British machine vision conference, pp 275:1–10
[25]
Lai K, Bo L, Ren X, Fox D (2011) A scalable tree-based approach for joint object and pose recognition. In: Conference on artificial intelligence
[26]
Li C, Wang X, Liu W (2017) Neural features for pedestrian detection. In: Neurocomputing, pp 420–432
[27]
Liem M C, Gavrila D M (2014) Coupled person orientation estimation and appearance modeling using spherical harmonics. Image Vis Comput 32(10):728–738
[28]
Lin B Z, Lin C C (2016) Pedestrian detection by fusing 3D points and color images. Int J Netw Distrib Comput 4:252
[29]
Liu B, Wu H, Su W, Sun J (2017) Sector-ring HOG for rotation-invariant human detection. Signal Process Image Commun 54:1–10
[30]
Liu J, Liu Y, Zhang G, Zhu P, Chen Y Q (2015) Detecting and tracking people in real time with RGB-D camera. In: Pattern recognition letters. Elsevier, p 1623
[31]
Maimone A, Fuchs H (2011) Encumbrance-free telepresence system with real-time 3D capture and display using commodity depth cameras. In: IEEE international symposium on mixed and augmented reality, pp 137–146
[32]
Mattausch O, Panozzo D, Mura C, Sorkine-Hornung O, Pajarola R (2014) Object detection and classification from large-scale cluttered indoor scans. In: EUROGRAPHICS, vol 33
[33]
Mitzel D, Leibe B (2012) Close-range human detection for head-mounted cameras. In: British machine vision conference
[34]
Moeslund T B, Hilton A, Kruger V (2008) A survey of advances in vision-based human motion capture and analysis. Comput Vis Image Underst 23:90–126
[35]
Mozos O M, Kurazume R, Hasegawa T (2010) Multi-layer people detection using 2D range data. In: International journal of social robotics, vol 2. Springer, pp 31–40
[36]
Munaro M, Basso F, Menegatti E (2012) Tracking people within groups with RGB-D data. In: International conference on intelligent robots and systems. IEEE, pp 2101–2107
[37]
Nakazawa M, Mitsugami I, Makihara Y, Nakajima H, Habe H, Yamazoe H, Yagi Y (2012) Dynamic scene reconstruction using asynchronous multiple kinects. In: International conference on pattern recognition, pp 11–15
[38]
Navarro-Serment L, Mertz C, Hebert M (2010) Pedestrian detection and tracking using three-dimensional LADAR data. In: Tracts in advanced robotics, vol 62. Springer, pp 103–112
[39]
Oreifej O, Liu Z (2013) HON4D: histogram of oriented 4D normals for activity recognition from depth sequences. In: IEEE conference on computer vision and pattern recognition
[40]
Ott C, Lee D, Nakamura Y (2008) Motion capture based human motion recognition and imitation by direct marker control. In: IEEE-RAS international conference on humanoid robots, pp 399–405
[41]
Ouyang W, Wang X (2012) A discriminative deep model for pedestrian detection with occlusion handling. In: IEEE conference on computer vision and pattern recognition, pp 3258–3265
[42]
Ouyang W, Zeng X, Wang X (2013) Modeling mutual visibility relationship in pedestrian detection. In: IEEE conference on computer vision and pattern recognition, pp 3222–3229
[43]
Parisot P, Vleeschouwer C D (2017) Scene-specific classifier for effective and efficient team sport players detection from a single calibrated camera. Comput Vis Image Underst 159(Supplement C):74–88
[44]
Paul P, Haque S M E, Chakraborty S (2013) Human detection in surveillance videos and its applications - a review. EURASIP J Adv Signal Process 1:1–16
[45]
Plagemann C, Ganapathi V, Koller D, Thrun S (2010) Real-time identification and localization of body parts from depth images. In: IEEE international conference on robotics and automation, pp 3108–3113
[46]
Raposo C, Barreto J P, Nunes U (2013) Fast and accurate calibration of a kinect sensor. In: International conference on 3D vision. IEEE, pp 342–349
[47]
Roetenberg D, Luinge H, Slycke P (2009) Xsens mvn: full 6dof human motion tracking using miniature inertial sensors xsens motion technologies bv
[48]
Rusu R (2010) Semantic 3D object maps for everyday manipulation in human living environments. In: KI - Künstliche Intelligenz, vol 24
[49]
Rusu R B, Blodow N, Beetz M (2009) Fast point feature histograms (FPFH) for 3D registration. In: Proceedings of the 2009 IEEE international conference on robotics and automation, pp 1848–1853
[50]
Salas J, Tomasi C (2011) People detection using color and depth images. In: Pattern recognition, vol 6718. Springer, Berlin, pp 127–135
[51]
Satake J, Miura J (2009) Multiple-person tracking for a mobile robot using stereo. In: IAPR conference on machine vision applications, pp 8–17
[52]
Shashua A, Gdalyahu Y, Hayun G (2004) Pedestrian detection for driving assistance systems: single-frame classification and system level performance. In: IEEE intelligent vehicles symposium, pp 1–6
[53]
Shen Y, Hao Z, Wang P, Ma S (2013) A novel human detection approach based on depth map via kinect. In: IEEE conference on computer vision and pattern recognition workshops, pp 535–541
[54]
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
[55]
Song S, Xiao J (2014) Sliding shapes for 3D object detection in depth images. In: European conference on computer vision
[56]
Spinello L, Arras K O (2011) People detection in RGB-D data. In: International conference on intelligent robots and systems. IEEE, pp 3838–3843
[57]
Stone E E, Skubic M (2012) Capturing habitual, in-home gait parameter trends using an inexpensive depth camera. In: IEEE engineering in medicine and biology society, pp 5106–9
[58]
Tang S, Wang X, Lv X, Han T X, Keller J, He Z, Skubic M, Lao S (2012) Histogram of oriented normal vectors for object recognition with a depth sensor. Asian Conference on Computer Vision 7725:525– 538
[59]
Tian Q, Zhou B, Zhao W, Wei Y, Fei W (2013) Human detection using HOG features of head and shoulder based on depth map. J Softw 8:2223–2230. Academy Publisher
[60]
Tian Y, Luo P, Wang X, Tang X (2015) Deep learning strong parts for pedestrian detection
[61]
Tombari F, Salti S, Stefano L D (2010) Unique signatures of histograms for local surface description. In: European conference on computer vision, pp 356–369
[62]
Weinrich C, Vollmer C, Gross H (2012) Estimation of human upper body orientation for mobile robotics using an SVM decision tree on monocular images. In: International conference on intelligent robots and systems, pp 2147–2152
[63]
Xia L, Chen C, Aggarwal J K (2011) Human detection using depth information by Kinect. In: Computer vision and pattern recognition workshops. IEEE, pp 15–22
[64]
Zhang L, Lin L, Liang X, He K (2016) Is faster r-cnn doing well for pedestrian detection?. In: European conference on computer vision, pp 443–457
[65]
Zong C, Clady X, Chetouani M (2011) An embedded human motion capture system for an assistive walking robot, pp 1–6

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Multimedia Tools and Applications
Multimedia Tools and Applications  Volume 78, Issue 16
Aug 2019
1594 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 August 2019

Author Tags

  1. Human classification
  2. 3D descriptor
  3. Multi-kinect

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media