Google Scholar

[PDF][PDF] A cost model for similarity queries in metric spaces

P Ciaccia, M Patella, P Zezula - Proceedings of the seventeenth ACM …, 1998 - dl.acm.org

Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on …, 1998•dl.acm.org

Abstract

Wu consider tho problem of estimating CPU (distance computntlons) nnd I/O costs for processing range and k-nearest neighbors qucrics over metric spaces. Unlike the specific case of vector spaces, where information on data distribution has been exploited to derive cost models for predicting the performanco of multi-dimensional access methods, in a generic metric space there is no such a possibility, which makes the problem quite different and requires a novel approach. We insist that the distance distribution of objects can be profitably used to solve the problem, and consequently develop a concrete cost model for the M-tree access method [lo]. Our results rely on the assumption that the indexed dataset comes from a metric space which is “homogeneous” enough (in a probabilistic sense) to allow reliable cost estimations even if the distance distribution with respect to a specific query object is unknown. We experimentally validate the modol ovor both real and synthetic datasets, and show how the model can be used to tune the M-tree in order to minimlzo a combination of CPU and I/O costs. Finally, we sketch how the same approach can be applied to derive a cost model for the up-tree index structure [8].

ACM Digital Library

Show moreShow less

Save Cite Cited by 155 Related articles All 12 versions

Cite

Advanced search

Saved to My library

[PDF][PDF] A cost model for similarity queries in metric spaces