skip to main content
10.1145/1178782.1178793acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Multiview fusion for canonical view generation based on homography constraints

Published: 27 October 2006 Publication History

Abstract

Activity and gait recognition are among the various applications that necessitate view-specific input. In a real surveillance scenario it is impractical to assume that the desired canonical view will always be available. We present a framework to generate the canonical view of any translating object in a scene monitored by multiple cameras. The method is capable of recovering this view despite the fact that none of the cameras can see it individually. In this two step process, first the camera and scene geometry is used to identify the sagittal plane of the object, which is used to define the canonical view. Next, each original view is warped to the canonical view through planar homographies learnt from geometric constraints. The warped images are then combined by way of evidence fusion to recover the shape energy map which is used to obtain the final binary silhouette of the object's shape. Results presented for various indoor and outdoor sequences demonstrate the efficacy of this method in generating the shape of the object as seen from the canonical view, while resolving occlusions.

References

[1]
S. Avidan and A. Shashua. Novel view synthesis by cascading trilinear tensors. IEEE Trans. Visualization and Computer Graphics 4(4):293--306, 1998.
[2]
A. R. Chowdhury, A. Kale, and R. Chellappa. Video synthesis of arbitrary views for approximately planar scenes. In Proc. Int. Conf. Acoustics, Speech, and Signal Process.volume 3, pages 497--500, April 2003.
[3]
R. Collins, R. Gross, and J. Shi. Silhouette-based human identification from body shape and gait. In Proc. Int. Conf. on Auto. Face and Gesture Recognition 2002.
[4]
J. Davis and A. Bobick. The representation and recognition of action using temporal templates. In Proc. Comp. Vis. and Pattern Rec.pages 928--934. IEEE, 1997.
[5]
J. Davis and A. Tyagi. A reliable-inference framework for recognition of human actions. In Advanced Video and Signal Based Surveillance pages 169--176. IEEE, 2003.
[6]
J. Davis and A. Tyagi. Minimal-latency human action recognition using reliable-inference. Image and Vision Computing 24(5): 455--472, May 2006.
[7]
T. Denton, M. F. Demirci, J. Abrahamson, A. Shokoufandeh, and S. Dickinson. Selecting canonical views for view-based 3-d object recognition. In Proc. Int. Conf. Pat. Rec.pages 273--276, 2004.
[8]
A. Habed and B. Boufama. Novel view synthesis:a comparative analysis study. In Vision Interface pages 217--224, 2000.
[9]
R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision Cambridge University Press, ISBN: 0521540518, second edition, 2004.
[10]
P. Huang, C. Harris, and M. Nixon. Recognising humans by gait via parametric canonical space. Artif. Intell. in Eng.13: 359--366, 1999.
[11]
T. Huang and A. Netravali. Motion and structure from feature correspondences: A review. In Proc. IEEE volume 82, pages 252--268, Feb 1994.
[12]
T. Jebara, A. Azarbeyejani, and A. Pentland. 3D structure from 2D motion. IEEE Signal Processing Magazine 16(3), 1999.
[13]
K. Jeong and C. Jaynes. Moving shadow detection using a combined geometric and color classification approach. In Wkshp. on Motion and Video Computing Jan 2004.
[14]
S. M. Khan and M. Shah. A multiview approach to tracking people in crowded scenes using a planar homography constraint. In Proc. European Conf. Comp. Vis. 2006.
[15]
K. Kim, T. H. Chalidabhongse, D. Harwood, and L. Davis. Real-time foreground-background segmentation using codebook model. Elsevier Real-Time Imaging 11(3): 172--185, June 2005.
[16]
S. Mahamud, M. Hebert, Y. Omori,and J. Ponce. Provably-convergent iterative methods for projective structure from motion. In Proc. Comp. Vis. and Pattern Rec. 2001.
[17]
J. A. Nelder and R. Mead. A simplex method for function minimization. Comput. J. pages 308--313, 1965.
[18]
V. Parameswaran and R. Chellappa. View invariants for human action recognition. In Proc. Comp. Vis. and Pattern Rec. pages 613--619, 2003.
[19]
M. Pollefeys, L. V. Gool, M. Vergauwen, F. Verbiest, K. Cornelis, J. Tops, and R. Koch. Visual modeling with a hand-held camera. Int. J. of Comp. Vis. 59(3): 207--232, 2004.
[20]
M. Pollefeys, R. Koch, and L. V. Gool. Self calibration and metric reconstruction in spite of varying and unknown internal camera parameters. In Proc. Int. Conf. Comp. Vis. pages 90--96, 1998.
[21]
C. Rao and M. Shah. A view-invariant representation and learning of human action. In Proc. Wkshp. on Detection and Recognition of Events in Video pages 55--63. IEEE, 2001.
[22]
C. Stauffer and W. Grimson. Adaptive background mixture models for real-time tracking. In Proc. Comp. Vis. and Pattern Rec. pages 246--252. IEEE, 1999.
[23]
P. Sturm and W. Triggs. A factorization based algorithm for multi-image projective structure and motion. In Proc. European Conf. Comp. Vis. pages 709--720, 1996.
[24]
R. Szeliski. Rapid octree construction from image sequences. CVGIP: Image Understanding 58(1): 23--32, July 1993.
[25]
M. Vergauwen, F. Verbiest, V. Ferrari, C. Strecha, and L. van Gool. Wide-baseline 3D reconstruction from digital stills. In Int. Wkshp. on Visualization and Animation of Reality-based 3D Models Engadin, Switzerland, Feb 2003.
[26]
Z. Zhang, R. Deriche, O. D. Faugeras, and Q.-T. Luong. A robust technique for matching two uncalibrated images through the recovery of the unknown epipolar geometry.Artificial Intelligence 78(1-2):87--119, 1995.
[27]
T. Zhao and R. Nevatia. Tracking multiple humans in complex situations.IEEE Trans. Patt. Analy. and Mach. Intell.26(9): 1208--1221, Sept. 2004.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
VSSN '06: Proceedings of the 4th ACM international workshop on Video surveillance and sensor networks
October 2006
230 pages
ISBN:1595934960
DOI:10.1145/1178782
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. canonical view generation
  2. homography
  3. multiview fusion
  4. shape energy map sagittal plane

Qualifiers

  • Article

Conference

MM06
MM06: The 14th ACM International Conference on Multimedia 2006
October 27, 2006
California, Santa Barbara, USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media