CL4DIV: A Contrastive Learning Framework for Search Result Diversification

Published: 04 March 2024 Publication History


Search result diversification aims to provide a diversified document ranking list so as to cover as many intents as possible and satisfy the various information needs of different users. Existing approaches usually represented documents by pretrained embeddings (such as doc2vec and Glove). These document representations cannot adequately represent the document's content and are hard to capture the intrinsic user's intent coverage of the given query. Moreover, the limited number of labeled data for search result diversification exacerbates the difficulty of obtaining more efficient document representations. To alleviate these problems and learn more effective document representations, we propose a Contrastive Learning framework for search result DIVersification (CL4DIV). Specifically, we design three contrastive learning tasks from the perspective of subtopics, documents, and candidate document sequences, which correspond to three essential elements in search result diversification. These training tasks are employed to pretrain the document encoder and the document sequence encoder, which are used in the diversified ranking model. Experimental results show that øurs significantly outperforms all existing diversification models. Further analysis demonstrates that our method has wide applicability and can also be used to improve several existing methods.


  1. CL4DIV: A Contrastive Learning Framework for Search Result Diversification



    WSDM '24: Proceedings of the 17th ACM International Conference on Web Search and Data Mining
    March 2024
    Published: 04 March 2024


    Author Tags

    contrastive learning
    search result diversification
    self-supervised learning


