skip to main content
10.1145/1273496.1273588acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Learning for efficient retrieval of structured data with noisy queries

Published: 20 June 2007 Publication History

Abstract

Increasingly large collections of structured data necessitate the development of efficient, noise-tolerant retrieval tools. In this work, we consider this issue and describe an approach to learn a similarity function that is not only accurate, but that also increases the effectiveness of retrieval data structures. We present an algorithm that uses functional gradient boosting to maximize both retrieval accuracy and the retrieval efficiency of vantage point trees. We demonstrate the effectiveness of our approach on two datasets, including a moderately sized real-world dataset of folk music.

References

[1]
Beygelzimer, A., Kakade, S., & Langford, J. (2006). Cover trees for nearest neighbor. ICML '06: Proceedings of the 23rd international conference on Machine learning (pp. 97--104). Pittsburgh, Pennsylvania.
[2]
Carré, M., Philippe, P., & Apéélian, C. (2001). New query-by-humming music retrieval system conception and evaluation based on a query nature study. Proc. COST G-6 Conference on Digital Audio Effects. Limerick, Ireland.
[3]
Dannenberg, R. B., Birmingham, W. P., Tzanetakis, G., Meek, C., Hu, N., & Pardo, B. (2003). The musart testbed for query-by-humming evaluation. Proc. 4th International Symposium on Music Information Retrieval.
[4]
Faloutsos, C., Barber, R., Flickner, M., Hafner, J., Niblack, W., Petkovic, D., & Equitz, W. (1994). Efficient and effective querying by image content. Journal of Intelligent Information Systems, 3, 231--262.
[5]
Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28, 337--407.
[6]
Ghias, A., Logan, J., Chamberlin, D., & Smith, B. C. (1995). Query by humming: Music information retrieval in an audio database. Proc. 3rd ACM Multimedia Conference (pp. 231--236).
[7]
Hjaltason, G. R., & Samet, H. (2003). Index-driven similarity search in metric spaces (survey article). ACM Transactions on Database Systems, 28, 517--580.
[8]
Joachims, T. (2003). Learning to align sequences, a maximum margin approach (Technical Report). Cornell University.
[9]
Krogh, A., Brown, M., Mian, I. S., Sjölander, K., & Haussler, D. (1994). Hidden Markov models in computational biology: Applications to protein modeling. Journal of Molecular Biology, 235, 1501--1531.
[10]
Meek, C. (2004). Modelling error in query-by-humming applications. Doctoral dissertation, The University of Michigan.
[11]
Pardo, B., & Birmingham, W. (2002). Encoding timing information for musical query matching. Proc. 3rd International Symposium on Music Information Retrieval.
[12]
Pardo, B., Birmingham, W., & Shifrin, J. (2004). Name that tune: A pilot study in finding a melody from a sung query. Journal of the American Society for Information Science and Technology, 55.
[13]
Parker, C., Fern, A., & Tadepalli, P. (2006). Gradient boosting for sequence alignment. The Twenty-First National Conference on Artificial Intelligence (AAAI-06). Boston, MA.
[14]
Skopal, T. (2006). On fast non-metric similarity search by metric access methods. Proc. 10th International Conference on Extending Database Technology (EDBT '06) (pp. 718--736).
[15]
Smith, M. S., & Waterman, T. F. (1981). Identification of common molecular subsequence. Journal of Molecular Biology, 147, 195--197.
[16]
Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector machine learning for interdependent and structured output spaces. Proc. 21st International Conference on Machine Learning.
[17]
Weber, R., Schek, H.-J., & Blott, S. (1998). A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. Proc. 24th Int. Conf. Very Large Data Bases, VLDB (pp. 194--205).

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '07: Proceedings of the 24th international conference on Machine learning
June 2007
1233 pages
ISBN:9781595937933
DOI:10.1145/1273496
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • Machine Learning Journal

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2007

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ICML '07 & ILP '07
Sponsor:

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media