Article

Learning methods for generic object recognition with invariance to pose and lighting

Authors:

Léon BottouAuthors Info & Claims

CVPR'04: Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition

Pages 97 - 104

Published: 27 June 2004 Publication History

Abstract

We assess the applicability of several popular learning methods for the problem of recognizing generic visual categories with invariance to pose, lighting, and surrounding clutter. A large dataset comprising stereo image pairs of 50 uniform-colored toys under 36 azimuths, 9 elevations, and 6 lighting conditions was collected (for a total of 194,400 individual images). The objects were 10 instances of 5 generic categories: four-legged animals, human figures, airplanes, trucks, and cars. Five instances of each category were used for training, and the other five for testing. Low-resolution grayscale images of the objects with various amounts of variability and surrounding clutter were used for training and testing. Nearest Neighbor methods, Support Vector Machines, and Convolutional Networks, operating on raw pixels or on PCA-derived features were tested. Test error rates for unseen object instances placed on uniform backgrounds were around 13% for SVM and 7% for Convolutional Nets. On a segmentation/recognition task with highly cluttered images, SVM proved impractical, while Convolutional nets yielded 16/7% error. A real-time version of the system was implemented that can detect and classify objects in natural scenes at around 10 frames per second.

References

[1]

S. Belongie, J. Malik, and J. Puzicha. Matching shapes. In Proc. of ICCV, IEEE, 2001.

[2]

L. Bottou, Y. Bengio Convergence Properties of the K-Means Algorithm NIPS 7, MIT Press, 1995.

[3]

L. Bottou, Y. LeCun Lush Reference Manual http://lush.sf.net.

[4]

O. Carmichael, M. Hebert Object Recognition by a Cascade of Edge Probes. Proc.e British Mach. Vision Conf., 2002.

[5]

O. Chapelle, P. Haffner, and V. Vapnik, SVMs for Histogram-Based Image Classification, IEEE Trans. Neural Networks, 1999.

[6]

R. Collobert, S. Bengio, and J. Mariethoz Torch: a modular machine learning software library. Technical Report IDIAPRR 02-46, IDIAP, 2002.

[7]

Y. LeCun, P. Haffner, L. Bottou, and Y. Bengio. Gradient-Based Learning Applied to Document Recognition Proc. IEEE, Nov 1998.

[8]

J. Malik, S. Belongie, T. Leung, and J. Shi Contour and Texture Analysis for Image Segmentation. Int. J. of Comp. Vision, 2001.

[9]

B. Mel SEEMORE:Combining color, shape, and texture histogramming in a neuraly-inspired approach to visual object recognition. Neural Computation, 9:777-804, 1997.

[10]

B. Moghaddam, A. Pentland. Probabilistic Visual Learning for Object Detection. ICCV, IEEE, June 1995.

[11]

H. Murase and S. Nayar. Visual learning and recognition of 3D objects from appearance. Int. J. of Comp. Vision, 14(1):5- 24, 1995.

[12]

E. Osuna, R. Freund, F. Girosi. Training Support Vector Machines: an Application to Face Detection. Proc. of CVPR, Puerto Rico. IEEE, 1997.

[13]

M. Partridge, R. Calvo. "Fast Dimensionality Reduction and Simple PCA," Intelligent Data Analysis Vol. 2, No. 3.

[14]

J. Ponce, M. Cepeda, S. Pae, S. Sullivan. "Shape models and object recognition." In D.A. Forsyth et al., editor, Shape, Contour and Grouping in Computer Vision. Springer, 1999.

[15]

M. Pontil, A. Verri. "Support Vector Machines for 3-D Object Recognition," IEEE Trans. Patt. Anal. Machine Intell. Vol. 20, 637-646, 1998.

[16]

S. Agarwal, and D. Roth "Learning a Sparse Representation for Object Detection" ECCV'02, May 2002.

[17]

S. Roweis personal communication, 2003.

[18]

H.A. Rowley, S. Baluja, T. Kanade. Neural networkbased face detection. IEEE Trans. Patt. Anal. Mach. Intell., 20(1):23-38, January 1998.

[19]

B. Leibe, and B. Schiele. "Analyzing Appearance and Contour Based Methods for Object Categorization.", CVPR, IEEE, 2003.

[20]

C. Schmid and R. Mohr. Local grayvalue invariants for image retrieval. IEEE Trans. Patt. Anal. Mach. Intell., 19(5):530-535, May 1997.

[21]

H. Schneiderman and T. Kanade. A statistical method for 3d object detection applied to faces and cars. In CVPR, IEEE, 2000.

[22]

A. Selinger, R. Nelson. "Appearance-Based Object Recognition Using Multiple Views," CVPR, IEEE, 2001.

[23]

S. Ullman, M. Vidal-Naquet, and E. Sali. "Visual features of intermediate complexity and their use in classification", Nature Neuroscience, 5(7), 2002.

[24]

R. Vaillant, C. Monrocq, and Y. LeCun. Original approach for the localisation of objects in images. IEE Proc. on Vision, Image, and Signal Proc., 141(4):245-250, August 1994.

[25]

P. Viola, M. Jones. Rapid Object Detection using a Boosted Cascade of Simple Features. CVPR, IEEE, 2001.

[26]

M. Weber, M. Welling, and P. Perona. Towards automatic discovery of object categories. In CVPR, IEEE 2000.

Cited By

Zhang YHuang TDing YZhan DYe HOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Model SpiderProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666726(13692-13719)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666726
Aldausari NSowmya AMarcus NMohammadi G(2022)Video Generative Adversarial Networks: A ReviewACM Computing Surveys10.1145/348789155:2(1-25)Online publication date: 18-Jan-2022
https://dl.acm.org/doi/10.1145/3487891
Riquelme CPuigcerver JMustafa BNeumann MJenatton RPinto AKeysers DHoulsby NRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Scaling vision with sparse mixture of expertsProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540918(8583-8595)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3540918
Show More Cited By

Learning methods for generic object recognition with invariance to pose and lighting
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks

Recommendations

Generic Object Recognition: Building and Matching Coarse Descriptions from Line Drawings

Primal access recognition of visual objects (PARVO), a computer vision system that addresses the problem of fast and generic recognition of unexpected 3D objects from single 2D views, is considered. Recently, recognition by components (RBC), which is a ...
Lighting-aware face frontalization for unconstrained face recognition

Provide both lighting-recovered and lighting-normalized frontalized images.Basic frontalization with a generic 3D face model by the alignment of only five landmarks.Lighting recovered and normalized image filling by the symmetry of quotient image.LRFF ...
2D geon based generic object recognition
MM '11: Proceedings of the 19th ACM international conference on Multimedia

The Recognition by Components(RBC) is a theory in Psychology introduced by Biederman in the late 80s, by which humans perceive scenes through simple 3D objects with regular shapes such as spheres, cubes, cylinders, cones, or wedges, called Geons (...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

CVPR'04: Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition

June 2004

1041 pages

Sponsors

IEEE-CS\DATC: IEEE Computer Society

Publisher

IEEE Computer Society

United States

Publication History

Published: 27 June 2004

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

87
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 07 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang YHuang TDing YZhan DYe HOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Model SpiderProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666726(13692-13719)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666726
Aldausari NSowmya AMarcus NMohammadi G(2022)Video Generative Adversarial Networks: A ReviewACM Computing Surveys10.1145/348789155:2(1-25)Online publication date: 18-Jan-2022
https://dl.acm.org/doi/10.1145/3487891
Riquelme CPuigcerver JMustafa BNeumann MJenatton RPinto AKeysers DHoulsby NRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Scaling vision with sparse mixture of expertsProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540918(8583-8595)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3540918
Wu JMo SWang LFuron DLiu DRawat PZhang DZhao P(2021)An Empirical Study of Uncertainty Gap for Disentangling FactorsProceedings of the 1st International Workshop on Trustworthy AI for Multimedia Computing10.1145/3475731.3484954(1-8)Online publication date: 24-Oct-2021
https://dl.acm.org/doi/10.1145/3475731.3484954
Botero UWilson RLu HRahman MMallaiyan MGanji FAsadizanjani NTehranipoor MWoodard DForte D(2021)Hardware Trust and Assurance through Reverse Engineering: A Tutorial and Outlook from Image Analysis and Machine Learning PerspectivesACM Journal on Emerging Technologies in Computing Systems10.1145/346495917:4(1-53)Online publication date: 30-Jun-2021
https://dl.acm.org/doi/10.1145/3464959
Lin LZheng XLiu BChen WXiao Y(2021)A Latent Variable Augmentation Method for Image Categorization with Insufficient Training SamplesACM Transactions on Knowledge Discovery from Data10.1145/345116516:1(1-35)Online publication date: 20-Jul-2021
https://dl.acm.org/doi/10.1145/3451165
S DDevi G U(2021)TAMIZHİ: Historical Tamil-Brahmi Script Recognition Using CNN and MobileNetACM Transactions on Asian and Low-Resource Language Information Processing10.1145/340289120:3(1-26)Online publication date: 14-Jul-2021
https://dl.acm.org/doi/10.1145/3402891
Dang TYu GNguyen HVo HLee JKim J(2020)Convolutional Neural Network-based image retrieval with degraded sampleThe 9th International Conference on Smart Media and Applications10.1145/3426020.3426041(86-91)Online publication date: 17-Sep-2020
https://dl.acm.org/doi/10.1145/3426020.3426041
Hahn TPyeon MKim GWallach HLarochelle HBeygelzimer Ad'Alché-Buc FFox E(2019)Self-routing capsule networksProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3454975(7658-7667)Online publication date: 8-Dec-2019
https://dl.acm.org/doi/10.5555/3454287.3454975
Hosoya H(2019)Group-based learning of disentangled representations with generalizability for novel contentsProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367243.3367388(2506-2513)Online publication date: 10-Aug-2019
https://dl.acm.org/doi/10.5555/3367243.3367388
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents