skip to main content
10.1145/1869790.1869829acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

Bag-of-visual-words and spatial extensions for land-use classification

Published: 02 November 2010 Publication History

Abstract

We investigate bag-of-visual-words (BOVW) approaches to land-use classification in high-resolution overhead imagery. We consider a standard non-spatial representation in which the frequencies but not the locations of quantized image features are used to discriminate between classes analogous to how words are used for text document classification without regard to their order of occurrence. We also consider two spatial extensions, the established spatial pyramid match kernel which considers the absolute spatial arrangement of the image features, as well as a novel method which we term the spatial co-occurrence kernel that considers the relative arrangement. These extensions are motivated by the importance of spatial structure in geographic data.
The methods are evaluated using a large ground truth image dataset of 21 land-use classes. In addition to comparisons with standard approaches, we perform extensive evaluation of different configurations such as the size of the visual dictionaries used to derive the BOVW representations and the scale at which the spatial relationships are considered.
We show that even though BOVW approaches do not necessarily perform better than the best standard approaches overall, they represent a robust alternative that is more effective for certain land-use classes. We also show that extending the BOVW approach with our proposed spatial co-occurrence kernel consistently improves performance.

References

[1]
LIBSVM-a library for support vector machines. http://www.csie.ntu.edu.tw/cjlin/libsvm/.
[2]
Snaptell -- visual product search. http://snaptell.com/.
[3]
H. Bay, T. Tuytelaars, and L. V. Gool. SURF: Speeded up robust features. In European Conference on Computer Vision, 2006.
[4]
S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. IEEE Trans. on Pattern Analysis and Machine Intelligence, 24(4):509--522, 2002.
[5]
W. T. Freeman and E. H. Adelson. The design and use of steerable filters. IEEE Trans. on Pattern Analysis and Machine Intelligence, 13(9):891--906, 1991.
[6]
L. J. V. Gool, T. Moons, and D. Ungureanu. Affine/photometric invariants for planar intensity patterns. In European Conference on Computer Vision, 1996.
[7]
K. Grauman and T. Darrell. The pyramid match kernel: Discriminative classification with sets of image features. In IEEE International Conference on Computer Vision, 2005.
[8]
R. M. Haralick, K. Shanmugam, and I. Dinstein. Texture features for image classification. IEEE Transactions on Systems, Man, and Cybernetics, 3:610--621, 1973.
[9]
C. Harris and M. Stephens. A combined corner and edge detector. In Proceedings of The Fourth Alvey Vision Conference, 1988.
[10]
T. Kadir, A. Zisserman, and M. Brady. An affine invariant salient region detector. In European Conference on Computer Vision, 2004.
[11]
Y. Ke and R. Sukthankar. PCA-SIFT: a more distinctive representation for local image descriptors. In IEEE International Conference on Computer Vision and Pattern Recognition, 2004.
[12]
S. Lazebnik, C. Schmid, and J. Ponce. Sparse texture representation using affine-invariant neighborhoods. In IEEE International Conference on Computer Vision and Pattern Recognition, 2003.
[13]
S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE International Conference on Computer Vision and Pattern Recognition, 2006.
[14]
T. Lindeberg. Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2):79--116, 1998.
[15]
D. G. Lowe. Object recognition from local scale-invariant features. In IEEE International Conference on Computer Vision, 1999.
[16]
D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91--110, 2004.
[17]
B. S. Manjunath, P. Salembier, and T. Sikora, editors. Introduction to MPEG7: Multimedia Content Description Interface. John Wiley & Sons, 2002.
[18]
J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide baseline stereo from maximally stable extremal regions. In British Machine Vision Conference, 2002.
[19]
K. Mikolajczyk and C. Schmid. Scale & affine invariant interest point detectors. International Journal of Computer Vision, 60(1):63--86, 2004.
[20]
K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. IEEE Trans. on Pattern Analysis and Machine Intelligence, 27(10):1615--1630, 2005.
[21]
S. Newsam, L. Wang, S. Bhagavathy, and B. S. Manjunath. Using texture to analyze and manage large collections of remote sensed image and video data. Journal of Applied Optics: Information Processing, 43(2):210--217, 2004.
[22]
D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In IEEE International Conference on Computer Vision and Pattern Recognition, 2006.
[23]
J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In IEEE International Conference on Computer Vision, 2003.
[24]
W. Tobler. A computer movie simulating urban growth in the Detroit region. Economic Geography, 46(2):234--240, 1970.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GIS '10: Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems
November 2010
566 pages
ISBN:9781450304283
DOI:10.1145/1869790
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bag-of-visual-words
  2. land-use classification
  3. local invariant features

Qualifiers

  • Research-article

Funding Sources

Conference

GIS '10
Sponsor:

Acceptance Rates

Overall Acceptance Rate 257 of 1,238 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)444
  • Downloads (Last 6 weeks)56
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media