skip to main content
research-article

User language model for collaborative personalized search

Published: 09 March 2009 Publication History

Abstract

Traditional personalized search approaches rely solely on individual profiles to construct a user model. They are often confronted by two major problems: data sparseness and cold-start for new individuals. Data sparseness refers to the fact that most users only visit a small portion of Web pages and hence a very sparse user-term relationship matrix is generated, while cold-start for new individuals means that the system cannot conduct any personalization without previous browsing history. Recently, community-based approaches were proposed to use the group's social behaviors as a supplement to personalization. However, these approaches only consider the commonality of a group of users and still cannot satisfy the diverse information needs of different users. In this article, we present a new approach, called collaborative personalized search. It considers not only the commonality factor among users for defining group user profiles and global user profiles, but also the specialties of individuals. Then, a statistical user language model is proposed to integrate the individual model, group user model and global user model together. In this way, the probability that a user will like a Web page is calculated through a two-step smoothing mechanism. First, a global user model is used to smooth the probability of unseen terms in the individual profiles and provide aggregated behavior of global users. Then, in order to precisely describe individual interests by looking at the behaviors of similar users, users are clustered into groups and group-user models are constructed. The group-user models are integrated into an overall model through a cluster-based language model. The behaviors of the group users can be utilized to enhance the performance of personalized search. This model can alleviate the two aforementioned problems and provide a more effective personalized search than previous approaches. Large-scale experimental evaluations are conducted to show that the proposed approach substantially improves the relevance of a search over several competitive methods.

References

[1]
Agichtein, E., Brill, E., Dumais, S. T., and Ragno, R. 2006. Learning user interaction models for predicting Web search preferences. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 3--10.
[2]
Allan, J., Carbonell, J., Doddington, G., Yamron, J., and Yang, Y. 1998. Topic detection and tracking pilot study: Final report. In Proceedings of the ARPA Broadcast News Transcription and Understanding Workshop, 194--218.
[3]
Almeida, R. B. and Almeida, V. A. 2004. A community-aware search engine. In Proceedings of the 13th International Conference on World Wide Web. ACM Press, New York, NY, 413--421.
[4]
Anand, S. S. and Mobasher, B. 2005. Intelligent techniques for web personalization. In Intelligent Techniques for Web Personalization. Bamshad Mobasher and Sarabjot Singh Anand (Eds), Lecture Notes in Artificial Intelligence (3169), Springer, 1--37.
[5]
Anick, P. 2004. Using terminological feedback for Web search refinement: A log-based study. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. ACM Press, New York, NY, 88--95.
[6]
Bai, J., Nei, J. Y., Cao, G. H., and Bouchard, H. 2007. Using query contexts in information retrieval. In Proceedings of the 30th Annual international ACM SIGIR Conference on Research and Development in information Retrieval, Amsterdam, The Netherlands. ACM, New York, NY, 15--22.
[7]
Beaulieu, M., Thien, D., Payne, A., and Jones, S. 1996. Enquire okapi project. British Library Resear. Innov. Rep. 17, 103.
[8]
Belkin, N., Cool, C., Koenemann, J., NG, K. B., and Park, S. 1996. Using relevance feedback and ranking in interactive searching. In Proceedings of 4th Text Retrieval Conference (TREC-4). Harman, D.K., ed. NIST, 181--188.
[9]
Bharat, K. and Kamba, T. 1995. An interactive personalized newspaper on the WWW. In Proceedings of the 4th International Conference on World Wide Web. ACM Press, New York, NY, 159--170.
[10]
Billsus, D., and Pazzani, M. 1999. A hybrid user model for news story classification. In Proceedings of 7th International Conference on User Modeling. 99--108.
[11]
Breese, J. S., Heckerman, D., and Kadie, C. 1998. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence. 43--52.
[12]
Brown, P. F., Dellapietra, V. J., Desouza, P. V., Lai, J. C., and Mercer, R. L. 1992. Class-based N-gram models of natural language. Computat. Linguis. 18, 4, 468--479.
[13]
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., and Hullender, G. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning. 89--96.
[14]
Chen, S. and Goodman, J. 1998. An empirical study of smoothing techniques for language modeling. Tech. Rep. TR-10-98, Computer Science Group, Harvard University.
[15]
Chirita, P., Nejdl, W., Paiu, R., and Kohlshuetter, C. 2005. Using ODP metadata to personalize search. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 178--185.
[16]
Claypool, M., Le, P., Waseda, M., and Brown, D. 2001. Implicit interest indicators. In Proceedings of the 6th International Conference on Intelligent User Interfaces (IUI2001), 33--34.
[17]
Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. 39, 1, Series B, 1--38.
[18]
Dumais, S. T., Cutrell, E., Cadiz, J. J., Jancke, G., Sarin, R., and Robbins, D. C. 2003. Stuff I've seen: A system for personal information retrieval and re-use. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM Press, New York, 72--79.
[19]
Fisher, D., Hildrum, K., Hong, J., Newman, M., Thomas, M., and Vuduc, R. 2000. SWAMI: A framework for collaborative filtering algorithm development and evaluation. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 366--368.
[20]
Fox, S., Karnawat, K., Mydland, M., Dumais, S., and White, T. 2005. Evaluating implicit measures to improve web search. ACM Trans. Inform. Syst. 23, 2, 147--168.
[21]
Frakes, W. B. 1992. Stemming Algorithms. In W. B. Frakes and R. Baeza-Yates, editors, Information Retrieval: Data Structures & Algorithms, Chapter 8, Prentice-Hall, 131--160.
[22]
Goldberg, K., Roeder, T., Gupta, D., and Perkins, C. 2001. Eigentaste: A constant time collaborative filtering algorithm. Inform. Retriev. 4, 2, 133--151.
[23]
Hartigan, J. A. and Wong, M. A. 1979. A K-means clustering algorithm. Appl. Statist. 28, 100--108.
[24]
Haveliwala, T. 2002. Topic-sensitive PageRank. In Proceedings of the 11th International Conference on World Wide Web, ACM Press, 517--526.
[25]
Hofmann T. and Puzicha J. 1998. Statistical models for co-occurrence data. Tech. Rep. AI Memo 1625, Artificial Intelligence Laboratory, MIT, February 1998.
[26]
Järvelin, K. and Kekäläinen, J. 2000. IR evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM Press, New York, NY, 41--48.
[27]
Jin, R., Hauptmann, A., and Zhai, C. 2002. Title language model for information retrieval. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM Press, New York, NY, 42--48.
[28]
Joachims, T. 2002. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press. 133--142.
[29]
Kelly, D. and Teevan, J. 2003. Implicit feedback for inferring user preference: A bibliography. SIGIR Forum 37, 2, 18--28.
[30]
Kerschberg, L., Kim, W., and Scime, A. 2001. WebSifer: Personalizable meta-search agent based on semantic weighted taxonomy tree. In Proceedings of the International Conference on Internet Computing. 14--20.
[31]
Koenmann, J. and Belkin, N. 1996. A case for interaction: A study of interactive information retrieval behavior and effectiveness. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: Common Ground. M. J. Tauber, Ed. ACM Press, 205--212.
[32]
Konstan, J. A., Miller, B. N., Maltz, D., Herlocker, J. L., Gordon, L. R., and Riedl, J. 1997. GroupLens: Applying collaborative filtering to usenet news. Comm. ACM 40, 3, 77--87.
[33]
Kritikopoulos, A. and Sideri, M. 2003. The compass filter: Search engine results personalization using web communities. In Proceedings of the Workshop on Intelligent Techniques for Web Personalization (ITWP'03), 229--240.
[34]
Krulwich, B. 1997. Lifestyle finder—Intelligent user profiling using large-scale demographic data. AI Mag. 18, 2, 37--45.
[35]
Lafferty, J. and Zhai, C. 2001. Document language models, query models, and risk minimization for information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 111--119.
[36]
Lavrenko, V. and Croft, W. B. 2001. Relevance based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 120--127.
[37]
Li, X. and Croft, W. B. 2003. Time-based language models. In Proceedings of the 12th International Conference on Information and Knowledge Management. ACM Press, New York, NY, 469--475.
[38]
Lieberman, H. 1995. Letizia: An agent that assists Web browsing. In Proceedings of the 14th International Joint Conference on Artificial Intelligence. 924--929.
[39]
Liu, F., Yu, C., and Meng, W. 2002. Personalized Web search by mapping user queries to categories. In Proceedings of the 11th international Conference on information and Knowledge Management, ACM Press, New York, NY, 558--565.
[40]
Liu, X. and Croft, W. B. 2004. Cluster-based retrieval using language models. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 186--193.
[41]
Manber, U., Patel, A., and Robison, J. 2002. Experience with personalization on Yahoo! Comm. ACM 43, 8, 35--39.
[42]
McLachlan, G. J. and Basford, K. E. 1988. Mixture Models: Inference and Applications to Clustering. Marcel Dekker, New York.
[43]
Mitchell, T. M. 1997. Machine Learning, McGraw Hill, 1997.
[44]
Mladenic, D. 1996. Personal Webwatcher: Design and implementation, Tech. Rep. ijs-dp-7472, School of Computer Science, Carnegie-Mellon University, Pittsburgh.
[45]
Ntoulas, A., Cho, J., and Olston, C. 2004. What's new on the Web? The evolution of the Web from a search engine perspective. In Proceedings of the 15th International Conference on World Wide Web, ACM Press, New York, NY, 233--241.
[46]
Pazzani, M. 1999. A framework for collaborative, content-based and demographic filtering. Artif. Intell. Revi. 13, 5, 393--408.
[47]
Pazzani, M., Muramatsu, J., and Billsus, D. 1996. Syskill & Webert: Identifying interesting Web sites. In Proceedings of the 13th National Conference on Artificial Intelligence, 54--61.
[48]
Pitkow, I., Schutze, H., Cass, T., Cooley, R., Turnbull, D., Edmonds, A., Adar, E., and Breuel, T. 2001. Personalized search. Comm. ACM 45, 0, 50--55.
[49]
Ponte, J. M. and Croft, W. B. 1998. A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 275--281.
[50]
Resnick, P., Iacovou, N., Suchak, M., and Bergstorm, J. R. P. 1994. GroupLens: An open architecture for collaborative filtering of netnews. In Proceedings of the ACM Conference on Computer Supported Cooperative Work. 175--186.
[51]
Robertson, S. E. and Walker, S. 1994. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 345--354.
[52]
Robertson, S. E., van Rijsbergen, C. J., and Porter, M. F. 1980. Probabilistic models of indexing and searching. In Proceedings of the 3rd Annual ACM Conference on Research and development in Information Retrieval. 35--56.
[53]
Salton, G., Wong, A., and Yang, C. S. 1975. A vector space model for automatic indexing. Comm. ACM 18, 11, 613--620.
[54]
Sarwar, B. M., Karypis, G., Konstan, J. A., and Riedl, J. T. 2000. Application of dimensionality reduction in recommender system—A case study. In Proceedings of the ACM WebKDD Web Mining for E-Commerce Workshop, 82--90.
[55]
Sarwar, B. M., Karypis, G., Konstan, J. A., and Riedl, J. T. 2001. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web. 285--295.
[56]
Shahabi, C. and Chen, Y. S. 2003. Web information personalization: Challenges and approaches. In Proceedings of Databases in Networked Information Systems. 5--15.
[57]
Shen, X., Tan, B., and Zhai, C. 2005. Context-sensitive information retrieval using implicit feedback. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM Press, New York, NY, 43--50.
[58]
Smyth, B., Balfe, E., Briggs, P., Coyle, M., and Freyne, J. 2003. Collaborative Web search. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI'03).
[59]
Spitters, M. and Kraaij, W. 2001. TNO at TDT2001: Language model-based topic detection. In Topic Detection and Tracking Workshop Report.
[60]
Sugiyama, K., Hatano, K., and Yoshikawa, M. 2004. Adaptive Web search based on user profile constructed without any effort from users. In Proceedings of the 13th International Conference on World Wide Web. ACM Press, New York, NY, 675--684.
[61]
Sullivan, D. 2004. Eurekster launches personalized social search. Search engine watch. http://searchenginewatch.com/searchday/article.php/3301481.
[62]
Surowiecki, J. 2004. The Wisdom of Crowds: Why the Many are Smarter than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations, Little Brown, 2004.
[63]
Teevan, J., Dumain, S. T., and Horvitz, E. 2005. Personalizing search via automated analysis of interests and activities. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM Press, New York, NY, 449--456.
[64]
Wang, J., de Vries, A. P., and Reinders, M. J. 2006. A user-item relevance model for log-based collaborative filtering. In Proceedings of 28th European Conference on Information Retrieval. 37--48.
[65]
Xu, J. and Croft, W. B. 1999. Cluster-based language models for distributed retrieval. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 254--261.
[66]
Xue, G. R., Lin, C, Yang, Q., Xi, W., Zeng, H.-J., Yu, Y., and Chen, Z. 2005. Scalable collaborative filtering using cluster-based smoothing. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 114--121.
[67]
Yamron, J. P., Carp, I., Gillick, L., Lowe, S. A., and van Mulbregt, P. 1999. Topic tracking in a news stream. In Proceedings of the DARPA Broadcast News Workshop, San Francisco: Morgan Kaufmann.
[68]
Zeng, C., Xing, C. X., and Zhou, L. Z. 2003. Similarity measure and instance selection for collaborative filtering. In Proceedings of the 12th International Conference on World Wide Web. 652--658.
[69]
Zhai, C. and Lafferty, J. 2001. Model-based feedback in KL divergence retrieval model. In Proceedings of the 10th International Conference on Information and Knowledge Management, Atlanta, Georgia, USA, ACM Press, 403--410.
[70]
Zhai, C. and Lafferty, J. 2004. A study of smoothing methods for language models applied to information retrieval, ACM Trans. Inform. Syst. 2, 2, 179--214.

Cited By

View all

Index Terms

  1. User language model for collaborative personalized search

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Information Systems
      ACM Transactions on Information Systems  Volume 27, Issue 2
      February 2009
      184 pages
      ISSN:1046-8188
      EISSN:1558-2868
      DOI:10.1145/1462198
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 March 2009
      Accepted: 01 August 2008
      Revised: 01 April 2007
      Received: 01 July 2006
      Published in TOIS Volume 27, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Collaborative personalized search
      2. clustering
      3. cold-start
      4. data Sparseness
      5. smoothing
      6. user language model

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)24
      • Downloads (Last 6 weeks)6
      Reflects downloads up to 09 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media