skip to main content
10.1145/2348283.2348392acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Content-based retrieval for heterogeneous domains: domain adaptation by relative aggregation points

Published: 12 August 2012 Publication History

Abstract

We introduce the problem of domain adaptation for content-based retrieval and propose a domain adaptation method based on relative aggregation points (RAPs). Content-based retrieval including image retrieval and spoken document retrieval enables a user to input examples as a query, and retrieves relevant data based on the similarity to the examples. However, input examples and relevant data can be dissimilar, especially when domains from which the user selects examples and from which the system retrieves data are different. In content-based geographic object retrieval, for example, suppose that a user who lives in Beijing visits Kyoto, Japan, and wants to search for relatively inexpensive restaurants serving popular local dishes by means of a content-based retrieval system. Since such restaurants in Beijing and Kyoto are dissimilar due to the difference in the average cost and areas' popular dishes, it is difficult to find relevant restaurants in Kyoto based on examples selected in Beijing. We propose a solution for this problem by assuming that RAPs in different domains correspond, which may be dissimilar but play the same role. A RAP is defined as the expectation of instances in a domain that are classified into a certain class, e.g. the most expensive restaurant, average restaurant, and restaurant serving the most popular dishes. Our proposed method constructs a new feature space based on RAPs estimated in each domain and bridges the domain difference for improving content-based retrieval in heterogeneous domains. To verify the effectiveness of our proposed method, we evaluated various methods with a test collection developed for content-based geographic object retrieval. Experimental results show that our proposed method achieved significant improvements over baseline methods. Moreover, we observed that the search performance of content-based retrieval in heterogeneous domains was significantly lower than that in homogeneous domains. This finding suggests that relevant data for the same search intent depend on the search context, that is, the location where the user searches and the domain from which the system retrieves data.

References

[1]
O. Alonso and R. Baeza-Yates. Design and implementation of relevance assessments using crowdsourcing. In Proc. of ECIR, pages 153--164, 2011.
[2]
N. J. Belkin. Helping people find what they don't know. Communications of the ACM, 43:58--61, 2000.
[3]
S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adaptation. In Proc. of NIPS, pages 137--144, 2006.
[4]
J. Blitzer, M. Dredze, and F. Pereira. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. In Proc. of ACL, pages 440--447, 2007.
[5]
J. Blitzer, R. McDonald, and F. Pereira. Domain adaptation with structural correspondence learning. In Proc. of EMNLP, pages 120--128, 2006.
[6]
C. Buckley and E. M. Voorhees. Retrieval evaluation with incomplete information. In Proc. of SIGIR, pages 25--32, 2004.
[7]
P. Cai, W. Gao, A. Zhou, and K. Wong. Relevant knowledge helps in choosing right teacher: active query selection for ranking adaptation. In Proc. of SIGIR, pages 115--124, 2011.
[8]
Y. Chen, X. Zhou, and T. Huang. One-class svm for learning in image retrieval. In Proc. of ICIP, pages 34--37, 2001.
[9]
T. Chia, K. Sim, H. Li, and H. Ng. A lattice-based approach to query-by-example spoken document retrieval. In Proc. of SIGIR, pages 363--370, 2008.
[10]
W. Dai, G.-R. Xue, Q. Yang, and Y. Yu. Co-clustering based classification for out-of-domain documents. In Proc. of KDD, pages 210--219, 2007.
[11]
W. Gao, P. Cai, K.-F. Wong, and A. Zhou. Learning to rank only using training data from related domain. In Proc. of SIGIR, pages 162--169, 2010.
[12]
K. J\"arvelin and J. Kek\"al\"ainen. Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems, 20(4):422--446, 2002.
[13]
T. Joachims. Transductive inference for text classification using support vector machines. In Proc. of ICML, pages 200--209, 1999.
[14]
M. Kamvar, M. Kellar, R. Patel, and Y. Xu. Computers and iphones and mobile phones, oh my!: a logs-based comparison of search users on different devices. In Proc. of WWW, pages 801--810, 2009.
[15]
M. P. Kato, H. Ohshima, S. Oyama, and K. Tanaka. Search as if you were in your home town: geographic search by regional context and dynamic feature-space selection. In Proc. of CIKM, pages 1541--1544, 2010.
[16]
X. Ling, W. Dai, G.-R. Xue, Q. Yang, and Y. Yu. Spectral domain-transfer learning. In Proc. of KDD, pages 488--496, 2008.
[17]
Y. Liu, D. Zhang, G. Lu, and W. Ma. A survey of content-based image retrieval with high-level semantics. Pattern Recognition, 40(1):262--282, 2007.
[18]
S. Nakajima and K. Tanaka. Relative queries and the relative cluster-mapping method. In Proc. of DASFAA 2004, pages 843--856, 2004.
[19]
S. Pan, X. Ni, J. Sun, Q. Yang, and Z. Chen. Cross-domain sentiment classification via spectral feature alignment. In Proc. of WWW, pages 751--760, 2010.
[20]
S. Pan and Q. Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10):1345--1359, 2010.
[21]
P. Rai, A. Saha, H. Daumé III, and S. Venkatasubramanian. Domain adaptation meets active learning. In Proc. of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing, pages 27--32, 2010.
[22]
B. Schölkopf, J. Platt, J. Shawe-Taylor, A. Smola, and R. Williamson. Estimating the support of a high-dimensional distribution. Neural computation, 13(7):1443--1471, 2001.
[23]
F. X. Schumacher and R. W. Eschmeyer. The estimation of fish populations in lakes and ponds. Journal of the Tennessee Academy of Sciences, 18:228--249, 1999.
[24]
J. Sim and C. Wright. The kappa statistic in reliability studies: use, interpretation, and sample size requirements. Physical therapy, 85(3):257--268, 2005.
[25]
J. Teevan, S. Dumais, and E. Horvitz. Potential for personalization. ACM Transactions on Computer-Human Interaction, 17(1):1--31, 2010.
[26]
B. Wang, J. Tang, W. Fan, S. Chen, Z. Yang, and Y. Liu. Heterogeneous cross domain ranking in latent space. In Proc. CIKM, pages 987--996, 2009.
[27]
H. Wang, H. Huang, F. Nie, and C. Ding. Cross-language web page classification via dual knowledge transfer using nonnegative matrix tri-factorization. In Proc. of SIGIR, pages 933--942, 2011.
[28]
G. Xue, W. Dai, Q. Yang, and Y. Yu. Topic-bridged plsa for cross-domain text classification. In Proc. of SIGIR, pages 627--634, 2008.

Cited By

View all

Index Terms

  1. Content-based retrieval for heterogeneous domains: domain adaptation by relative aggregation points

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
    August 2012
    1236 pages
    ISBN:9781450314725
    DOI:10.1145/2348283
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 August 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. content-based retrieval
    2. domain adaptation

    Qualifiers

    • Research-article

    Conference

    SIGIR '12
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media