skip to main content
10.1145/1989323.1989429acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Flexible aggregate similarity search

Published: 12 June 2011 Publication History

Abstract

Aggregate similarity search, a.k.a. aggregate nearest neighbor (Ann) query, finds many useful applications in spatial and multimedia databases. Given a group Q of M query objects, it retrieves the most (or top-k) similar object to Q from a database P, where the similarity is an aggregation (e.g., sum, max) of the distances between the retrieved object p and all the objects in Q. In this paper, we propose an added flexibility to the query definition, where the similarity is an aggregation over the distances between p and any subset of ÆM objects in Q for some support 0 < Æ d 1. We call this new definition flexible aggregate similarity (Fann) search, which generalizes the Ann problem. Next, we present algorithms for answering Fann queries exactly and approximately. Our approximation algorithms are especially appealing, which are simple, highly efficient, and work well in both low and high dimensions. They also return nearoptimal answers with guaranteed constant-factor approximations in any dimensions. Extensive experiments on large real and synthetic datasets from 2 to 74 dimensions have demonstrated their superior efficiency and high quality.

References

[1]
S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. An optimal algorithm for approximate nearest neighbor searching in fixed dimensions. Journal of ACM, 45(6):891--923, 1998.
[2]
S. Berchtold, C. Böhm, D. A. Keim, and H. P. Kriegel. A cost model for nearest neighbor search in high-dimensional data space. In PODS, 1997.
[3]
M. Berg, M. Kreveld, M. Overmars, and O. Schwarzkopf. Computational geometry: algorithms and applications. Springer, 1997.
[4]
C. Böhm. A cost model for query processing in high dimensional data spaces. ACM Transaction on Database Systems, 25(2):129--178, 2000.
[5]
C. Böhm and H.-P. Kriegel. Determining the convex hull in large multidimensional databases. In International Conference on Data Warehousing and Knowledge Discovery, 2001.
[6]
K. Chakrabarti, K. Porkaew, and S. Mehrotra. The Color Data Set. http://kdd.ics.uci.edu/databases/CorelFeatures/ CorelFeatures.data.html.
[7]
P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In VLDB, 1997.
[8]
R. Fagin, R. Kumar, and D. Sivakumar. Efficient similarity search and classification via rank aggregation. In SIGMOD, 2003.
[9]
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS, 2001.
[10]
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, 1999.
[11]
G. R. Hjaltason and H. Samet. Distance browsing in spatial databases. ACM Trans. Database Syst., 24(2), 1999.
[12]
H. V. Jagadish, B. C. Ooi, K.-L. Tan, C. Yu, and R. Zhang. iDistance: An adaptive B$^
[13]
$-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst., 30(2):364--397, 2005.
[14]
P. Kumar, J. S. B. Mitchell, and E. A. Yildirim. Approximate minimum enclosing balls in high dimensions using core-sets. ACM Journal of Experimental Algorithmics, 8, 2003.
[15]
Y. LeCun and C. Cortes. The MNIST Data Set. http://yann.lecun.com/exdb/mnist/.
[16]
F. Li, B. Yao, and P. Kumar. Group enclosing queries. IEEE TKDE, To Appear, 2010.
[17]
H. Li, H. Lu, B. Huang, and Z. Huang. Two ellipse-based pruning methods for group nearest neighbor queries. In GIS, 2005.
[18]
D. Papadias, Q. Shen, Y. Tao, and K. Mouratidis. Group nearest neighbor queries. In ICDE, 2004.
[19]
D. Papadias, Y. Tao, K. Mouratidis, and C. K. Hui. Aggregate nearest neighbor queries in spatial databases. ACM TODS, 30(2):529--576, 2005.
[20]
H. L. Razente, M. C. N. Barioni, A. J. M. Traina, C. Faloutsos, and C. Traina, Jr. A novel optimization approach to efficiently process aggregate similarity queries in metric access methods. In CIKM, 2008.
[21]
K. Rose and B. S. Manjunath. The CORTINA Data Set. http://www.scl.ece.ucsb.edu/datasets/index.htm.
[22]
N. Roussopoulos, S. Kelley, and F. Vincent. Nearest neighbor queries. In SIGMOD, 1995.
[23]
Y. Tao, K. Yi, C. Sheng, and P. Kalnis. Quality and efficiency in high dimensional nearest neighbor search. In SIGMOD, 2009.
[24]
M. L. Yiu, N. Mamoulis, and D. Papadias. Aggregate nearest neighbor queries in road networks. IEEE TKDE, 17(6):820--833, 2005.
[25]
H. Yu, P. K. Agarwal, R. Poreddy, and K. R. Varadarajan. Practical methods for shape fitting and kinetic data structures using core sets. In SoCG, 2004.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
June 2011
1364 pages
ISBN:9781450306614
DOI:10.1145/1989323
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. aggregate nearest neighbor
  2. aggregate similarity search
  3. nearest neighbor
  4. similarity search

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media