skip to main content
article

Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

Published: 01 May 2001 Publication History

Abstract

In this paper, we propose a computational model of the recognition of real world scenes that bypasses the segmentation and the processing of individual objects or regions. The procedure is based on a very low dimensional representation of the scene, that we term the Spatial Envelope. We propose a set of perceptual dimensions (naturalness, openness, roughness, expansion, ruggedness) that represent the dominant spatial structure of a scene. Then, we show that these dimensions may be reliably estimated using spectral and coarsely localized information. The model generates a multidimensional space in which scenes sharing membership in semantic categories (e.g., streets, highways, coasts) are projected closed together. The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.

References

[1]
Amadasun, M. 1989. Textural features corresponding to textural properties. IEEE Trans. Sys., Man and Cybernetics, 19:1264- 1274.
[2]
Atick, J. and Redlich, A. 1992. What does the retina know about natural scenes? Neural Computation, 4:196-210.
[3]
Baddeley, R. 1997. The correlational structure of natural images and the calibration of spatial representations. Cognitive Science, 21:351-372.
[4]
Barrow, H.G. and Tannenbaum, J.M. 1978. Recovering intrinsec scene characteristics from images. In Computer Vision Systems, A. Hanson and E. Riseman (Eds.), Academic Press: New York, pp. 3-26.
[5]
Biederman, I. 1987. Recognition-by-components: A theory of human image interpretation. Psychological Review, 94:115-148.
[6]
Biederman, I. 1988. Aspects and extension of a theory of human image understanding. In Computational Processes in Human Vision: An Interdisciplinary Perspective, Z. Pylyshyn (Ed.), Ablex Publishing Corporation: Norwood, New Jersey.
[7]
Carson, C., Belongie, S., Greenspan, H., and Malik, J. 1997. Region-based image querying. In Proc. IEEE W. on Content-Based Access of Image and Video Libraries, pp. 42-49.
[8]
Carson, C., Thomas, M., Belongie, S., Hellerstein, J.M., and Malik, J. 1999. Blobworld: A system for region-based image indexing and retrieval. In Third Int. Conf. on Visual Information Systems, June 1999, Springer-Verlag.
[9]
De Bonet, J.S. and Viola, P. 1997. Structure driven image database retrieval. Advances in Neural Information Processing, 10:866-872.
[10]
van der Schaaf, A. and van Hateren, J.H. 1996. Modeling of the power spectra of natural images: Statistics and information. Vision Research, 36:2759-2770.
[11]
Field, D.J. 1987. Relations between the statistics of natural images and the response properties of cortical cells. Journal of Optical Society of America, 4:2379-2394.
[12]
Field, D.J. 1994. What is the goal of sensory coding? Neural Computation , 6:559-601.
[13]
Friedman, A. 1979. Framing pictures: The role of knowledge in automatized encoding and memory for gist. Journal of Experimental Psychology: General, 108:316-355.
[14]
Guerin-Dugue, A. and Oliva, A. 2000. Classification of scene photographs from local orientations features. Pattern Recognition Letters , 21:1135-1140.
[15]
Gorkani, M.M. and Picard, R.W. 1994. Texture orientation for sorting photos "at a glance". In Proc. Int. Conf. Pat. Rec., Jerusalem, Vol. I, pp. 459-464.
[16]
Hancock, P.J., Baddeley, R.J., and Smith, L.S. 1992. The principal components of natural images. Network, 3:61-70.
[17]
Heaps, C. and Handel, S. 1999. Similarity and features of natural textures. Journal of Experimental Psychology: Human Perception and Performance, 25:299-320.
[18]
Henderson, J.M. and Hollingworth, A. 1999. High level scene perception. Annual Review of Psychology, 50:243-271.
[19]
Hochberg, J.E. 1968. In the mind's eye. In Contemporary Theory and Research in Visual Perception, R.N. Haber (Ed.), Holt, Rinehart, and Winston: New York, pp. 309-331.
[20]
Lipson, P., Grimson, E., and Sinha, P. 1997. Configuration based scene classification and image indexing. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Puerto Rico, pp. 1007-1013.
[21]
Marr, D. 1982. Vision. WH Freeman: San Francisco, CA.
[22]
Moghaddam, B. and Pentland, A. 1997. Probabilistic Visual Learning for Object Representation. IEEE Trans. Pattern Analysis and Machine Vision, 19(7):696-710.
[23]
Morgan, M.J., Ross, J., and Hayes, A. 1991. The relative importance of local phase and local amplitude in patchwise image reconstruction. Biological Cybernetics, 65:113-119.
[24]
Oliva, A. and Schyns, P.G. 1997. Coarse blobs or fine edges? Evidence that information diagnosticity changes the perception of complex visual stimuli. Cognitive Psychology, 34:72-107.
[25]
Oliva, A. and Schyns, P.G. 2000. Diagnostic color blobs mediate scene recognition. Cognitive Psychology, 41:176-210.
[26]
Oliva, A., Torralba, A., Guerin-Dugue, A., and Herault, J. 1999. Global semantic classification using power spectrum templates. In Proceedings of The Challenge of Image Retrieval, Electronic Workshops in Computing series, Springer-Verlag: Newcastle.
[27]
O'Regan, J.K., Rensink, R.A., and Clark, J.J. 1999. Change-blindness as a result of 'mudsplashes'. Nature, 398:34.
[28]
Piotrowski, L.N. and Campbell, F.W. 1982. A demonstration of the visual importance and flexibility of spatial-frequency amplitude and phase. Perception, 11:337-346.
[29]
Pentland, A.P. 1984. Fractal-based description of natural scenes. IEEE Trans. on Pattern Analysis and Machine Intelligence, 6:661- 674.
[30]
Potter, M.C. 1975. Meaning in visual search. Science, 187:965-966.
[31]
Rao, A.R. and Lohse, G.L. 1993. Identifying high level features of texture perception. Graphical Models and Image Processing, 55:218-233.
[32]
Rensink, R.A. 2000. The dynamic representation of scenes. Visual Cognition, 7:17-42.
[33]
Rensink, R.A., O'Regan, J.K., and Clark, J.J. 1997. To see or not to see: the need for attention to perceive changes in scenes. Psychological Science, 8:368-373.
[34]
Ripley, B.D. 1996. Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, UK.
[35]
Rosch, E. and Mervis, C.B. 1975. Family resemblances: Studies in the internal structure of categories. Cognitive Psychology, 7:573- 605.
[36]
Sanocki, T. and Epstein, W. 1997. Priming spatial layout of scenes. Psychological Science, 8:374-378.
[37]
Sanocki, T. and Reynolds, S. 2000. Does figural goodness influence the processing and representation of spatial layout. Investigative Ophthalmology and Visual Science, 41:723.
[38]
Schyns, P.G. and Oliva, A. 1994. From blobs to boundary edges: evidence for time- and spatial-scale dependent scene recognition. Psychological Science, 5:195-200.
[39]
Simons, D.J. and Levin, D.T. 1997. Change blindness. Trends in Cognitive Sciences, 1:261-267.
[40]
Sirovich, L. and Kirby, M. 1987. Low-dimensional procedure for the characterization of human faces. Journal of Optical Society of America, 4:519-524.
[41]
Swets, D.L. and Weng, J.J. 1996. Using discriminant eigenfeatures for image retrieval. IEEE Trans. On Pattern Analysis and Machine Intelligence, 18:831-836.
[42]
Switkes, E., Mayer, M.J., and Sloan, J.A. 1978. Spatial frequency analysis of the visual environment: anisotropy and the carpentered environment hypothesis. Vision Research, 18:1393- 1399.
[43]
Szummer, M. and Picard, R.W. 1998. Indoor-outdoor image classification. In IEEE intl. Workshop on Content-Based Access of Image and Video Databases.
[44]
Tamura, H., Mori, S., and Yamawaki, T. 1978. Textural features corresponding to visual perception. IEEE Trans. Sys. Man and Cybernetics , 8:460-473.
[45]
Torralba, A. and Oliva, A. 1999. Scene organization using discriminant structural templates. In IEEE Proc. Of Int. Conf in Comp. Vision, pp. 1253-1258.
[46]
Torralba, A. and Oliva, A. 2001. Depth perception from familiar structure. submitted.
[47]
Torralba, A. and Sinha, P. 2001. Statistical context priming for object detection. In IEEE. Proc of Int. Conf. in Computer Vision.
[48]
Tversky, B. and Hemenway, K. 1983. Categories of environmental scenes. Cognitive Psychology, 15:121-149.
[49]
Vailaya, A., Figueiredo, M., Jain, A., and Zhang, H.J. 1999. Content-based hierarchical classification of vacation images. In Proceedings of the International Conference on Multimedia, Computing and Systems, June.
[50]
Vailaya, A., Jain, A., and Zhang, H.J. 1998. On image classification: City images vs. landscapes. Pattern Recognition, 31:1921- 1935.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image International Journal of Computer Vision
International Journal of Computer Vision  Volume 42, Issue 3
May-June 2001
73 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 May 2001

Author Tags

  1. energy spectrum
  2. natural images
  3. principal components
  4. scene recognition
  5. spatial layout

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media