1. Introduction
In today’s fast-growing information era, people enjoy diverse services offered by electronic platforms. However, as the number of users continues to rise, the issue of information overload has become increasingly severe, leaving users feeling overwhelmed. Search engines can assist users in sifting through vast amounts of information by matching keywords, but for users without a specific goal, they are less effective and cannot efficiently target the desired content. As a solution to information overload, recommendation systems have gained wide acceptance [
1]. By learning from user interaction history, these systems automatically recommend items that are likely to interest users [
2]. With the progress of society and the improvement in material living standards, people now have more non-survival needs, which are closely linked to individual personalities and preferences. This has led to the continuous refinement of personalized recommendation systems, which are now widely deployed across various fields, including e-commerce and news platforms [
3,
4,
5].
Group activities like office gatherings and family movie nights are common forms of daily entertainment. In recent years, with the rise of online social networking, it has become increasingly convenient and easy for users with similar interests to form online groups [
6,
7]. In such scenarios, group recommendations are necessary. Traditional recommendation algorithms designed for individuals cannot serve groups effectively, thus giving rise to group recommendation algorithms. These algorithms consider group preferences observed in the group interactions and aggregate the diverse preferences of group members, enabling the group to filter information quickly and recommend satisfactory items. Unlike personalized recommendation algorithms, group recommendations must also model the affiliation between the group and its members to aid in efficient decision making.
Both traditional user-based recommendations and group recommendation systems face the same challenge: historical records of users and groups are often sparse compared with the vast number of items, making it difficult for the system to provide users and groups with accurate content recommendations, significantly impairing the user experience. Collaborative filtering, a widely used approach to alleviate data sparsity, has been applied extensively to recommendation systems [
8,
9,
10,
11,
12,
13,
14,
15,
16]. However, collaborative filtering often struggles with cold-start problems and fails to fully capture complex relationships between users, items, and groups. Inspired by the success of graph convolution and self-supervised learning in other fields, recent group recommendation algorithms have integrated these techniques to address data sparsity effectively. Graph convolution networks (GCNs) are particularly well suited for recommendation tasks as they can model higher-order connectivity and capture rich contextual relationships in sparse data environments. Similarly, self-supervised learning leverages unlabeled data through contrastive learning or auxiliary tasks, providing an additional layer of optimization that enhances the quality of embeddings for sparse datasets. These techniques offer significant advantages over traditional collaborative filtering by exploiting structural patterns in the data and learning robust representations even with limited interactions.
Early research on group recommendations typically focused on aggregating the preferences of group members or scoring items across users. The three most common aggregation methods include the average, least misery, and maximum satisfaction strategies [
17,
18,
19]. However, these methods are often overly simplistic, overlooking interactions between users within the group. In recent years, with the rapid development of deep learning, more group recommendation models have adopted attention mechanisms to model member interactions within groups. For example, Cao et al. [
20] integrated attention networks with neural collaborative filtering to solve preference aggregation issues by learning aggregation strategies from data, significantly improving group recommendation performance, particularly for groups with no interaction history. Subsequently, He et al. [
21] used heterogeneous information networks and attention mechanisms to learn multi-view embeddings and member weights. Vinh et al. [
22] employed self-attention networks to understand individual user preferences and model member interactions. Yin et al. [
23] introduced a sophisticated group recommendation model that incorporates a latent variable and attention mechanism to capture both the local and global social influence of users and to model interactions within groups, using bipartite graph embedding to mitigate data sparsity. Jia et al. [
24] proposed a dual-channel hypergraph convolutional network for group recommendations in which member-level and group-level preference networks independently learn both personal and general group preferences.
Traditional group recommendation algorithms mostly focus on generating a recommendation list for the group and tend to overlook individual recommendations. Although users are not the primary target in group recommendation scenarios, individual recommendation performance influences overall group satisfaction to some extent, as a group interaction is likely only when most members are satisfied with the recommended items. Focusing solely on group recommendations may lead to wasted user interaction data, as graph convolutional capabilities might underperform when limited to a single group interaction dataset.
To this end, this paper proposes a multi-view co-training and self-supervised learning model (MCSS) for group recommendations, addressing both group and user recommendation tasks. By utilizing user–item, group–item, and group–user interactions, three bipartite graphs are generated, with graph convolution and attention mechanisms [
9] used to obtain three sets of embeddings. A self-supervised auxiliary task is designed to further leverage data, generating recommendation lists for each group and user through multi-task joint training. The embedding propagation, embedding fusion, and multi-task joint training steps are designed. Embedding propagation leverages three bipartite graphs (user–item, group–user, and group–item) to capture the complex relationships among users, groups, and items, to produce initial embeddings that form the foundation for further refinement. Embedding fusion then integrates these embeddings, employing an attention mechanism to balance individual member preferences with group-level interactions, ensuring the final embeddings reflect the group’s collective interests rather than simple aggregations. Finally, multi-task joint training combines group recommendation, user recommendation, and contrastive learning tasks, which enhances model robustness by maximizing features shared among group members, thereby addressing data sparsity. Together, these steps enable the model to deliver accurate and personalized recommendations that better meet the needs of both groups and individuals.
Firstly, embedding propagation is applied to generate initial embeddings for users, groups, and items based on three bipartite graphs: user–item, group–user, and group–item. By utilizing the LightGCN algorithm, this step allows for the capture of complex relationships between the entities in a graph structure. In contrast with conventional convolutional models, LightGCN removes self-loops and combines embeddings across propagation layers with a weighted sum, enabling a flexible representation. This propagation process ultimately provides multiple sets of embeddings for users, groups, and items that can be used in subsequent steps.
Firstly, embedding propagation is applied to generate the initial embeddings for users, groups, and items based on three bipartite graphs: user–item, group–user, and group–item. LightGCN is chosen for its ability to model high-order connectivity efficiently while maintaining computational simplicity, making it well-suited for sparse data scenarios. Unlike conventional convolutional models, LightGCN removes self-loops and combines embeddings across propagation layers with a weighted sum, enabling a flexible representation. This propagation process ultimately provides multiple sets of embeddings for users, groups, and items that can be used in subsequent steps.
Following this, embedding fusion integrates these embeddings to capture both individual and group preferences more comprehensively. An attention mechanism is employed to fuse the embeddings, thereby selecting the most relevant features across different views. This approach ensures that group preferences are not solely reliant on individual preferences but are shaped by broader group-level interactions, which is crucial for recommendations that meet collective satisfaction.
To tackle data sparsity, a multi-task joint training process combines group and user recommendation tasks with a contrastive learning task derived from self-supervised learning. For instance, in the contrastive learning task, positive samples are defined as pairs of group and member embeddings from the same group, such as a group embedding generated from Group A and a member embedding from a user in Group A. Hard negative samples are defined as embeddings of users with similar interaction histories but belonging to different groups, such as a user from Group B who has interacted with similar items as Group A members. A non-sampling strategy is employed to use all non-positive samples as negatives, improving robustness in sparse conditions. The contrastive learning task, leveraging InfoNCE loss, encourages the model to maximize the mutual information between group embeddings and member embeddings, helping the model learn shared latent preferences among group members. Positive samples are defined as pairs of group and member embeddings, while hard negative samples are drawn from users outside the group with similar interaction histories, which reinforces the learning of nuanced relationships within the group.
The contributions of the paper are as follows.
Introduces a multi-view embedding propagation framework. This framework leverages three bipartite graphs—user–item, group–user, and group–item—to capture complex relationships and generate initial embeddings for users, groups, and items, improving representation in sparse data scenarios.
Proposes a multi-task joint training strategy with contrastive learning. By combining group recommendation, user recommendation, and contrastive learning tasks, the model enhances robustness and generalization. The self-supervised contrastive learning task maximizes shared features among group members, addressing data sparsity and improving recommendation accuracy for both groups and individuals.
The proposed model is extensively evaluated on public datasets, demonstrating significant improvements in recommendation accuracy and robustness in sparse data scenarios. The experimental results validate the effectiveness of the multi-view embedding propagation, attention-based fusion, and multi-task joint training approach in enhancing group recommendation performance.
The remainder of this paper is organized as follows:
Section 2 reviews related works in recommendation systems, particularly focusing on group recommendation techniques and self-supervised learning.
Section 3 details our proposed methodology, including multi-view embedding propagation, attention-based embedding fusion, and multi-task joint training.
Section 4 presents the experimental results and analysis on public datasets, demonstrating the model’s effectiveness and comparing it with state-of-the-art approaches. Finally,
Section 5 concludes the paper.
3. Methodology
In group recommendation scenarios, in addition to group–item interaction data, there are often interaction data between individual users within the group and items as well as information about each group’s membership. The core challenge lies in effectively leveraging these data to enhance recommendation performance.
Graph convolution can use interaction data to construct a user–item bipartite graph, enabling the propagation of learned node embeddings. Multi-view co-training allows the model to learn embeddings from different views, each reflecting various pieces of semantic information. This approach captures rich, multi-angle information, yielding more accurate embeddings for users, groups, and items. Self-supervised learning, by designing auxiliary tasks, can further harness data to boost recommendation performance, alleviating data sparsity issues.
To this end, we propose a multi-view co-training and self-supervised learning model (MCSS) that simultaneously performs group and user recommendations. Based on interaction and group membership data from the dataset, three bipartite graphs—user–item, group–user, and group–item—are constructed. These graphs enable the model to effectively capture the complex relationships between users, items, and groups, which is crucial for handling data sparsity and individual preferences within the group. After applying graph convolution, an attention mechanism is used to derive three sets of embeddings for users, groups, and items. The attention mechanism allows the model to focus on the most relevant interactions, optimizing the embeddings by considering both individual user preferences and group dynamics. Additionally, a contrastive learning task is designed to maximize the mutual information between groups and their members, fully utilizing interaction data to address data sparsity. This contrastive learning task further enhances the model’s robustness by ensuring that the learned representations are consistent with both individual- and group-level preferences, which is key to improving recommendation accuracy. Group recommendation, user recommendation, and contrastive learning tasks are trained jointly to generate the final recommendation list. This joint training ensures that the model not only learns the preferences of individual users but also aligns these preferences with the broader group context, which is essential for effective group recommendations.
The overall architecture of the proposed group recommendation algorithm, which focuses on both group and user recommendations through multi-view co-training and self-supervised learning, is illustrated in
Figure 1.
As shown in the figure, the entire algorithm process is divided into three parts: the embedding propagation layer, the embedding fusion layer, and the multi-task joint training layer: embedding propagation to capture interactions across user–item, user–group, and group–item views; embedding fusion with attention to integrate these embeddings; and multi-task joint training to optimize user and group recommendations along with contrastive learning.
Table 1 summarizes the key symbols and their definitions used throughout the proposed model. These symbols represent various components and parameters, including embeddings, loss functions, and hyperparameters, which are essential for understanding the theoretical framework and implementation details.
3.1. Embedding Propagation
This section corresponds to the embedding propagation layer in
Figure 1, which primarily obtains group embeddings, user embeddings, and item embeddings to accomplish the tasks of the algorithm. In traditional recommendation algorithms that incorporate graphs, user–item interactions are typically modeled using bipartite graphs. Simply put, a bipartite graph is structured so that its vertices are divided into two disjoint subsets, with each edge connecting vertices from different subsets. This structure aligns well with user–item interactions.
In this paper, we adopt a bipartite graph structure to construct three types of bipartite graphs: user–item, group–user, and group–item. The user–item graph captures interaction information between users and items, the group–user graph displays group membership and the relationships of users across different groups, while the group–item graph models all interactions between groups and items.
After constructing these three bipartite graphs, we randomly initialize three sets of embeddings for graph convolution. We apply a parallel convolution strategy on the three bipartite graphs using the LightGCN algorithm for graph convolution. The propagation rule of LightGCN is shown in Equation (3). Unlike traditional approaches, LightGCN removes self-loops in the convolution process and combines embeddings from all propagation layers using a weighted sum, which captures the effect of self-loops. For instance, the final user embedding is represented by Equation (1):
where
represents the user embedding,
is the number of propagation layers, and
is a manually adjustable weight for each layer. To simplify the process, our model omits the weighted combination of all layers’ embeddings to obtain the final embedding, and instead, we add self-loops in LightGCN’s structure to maintain model performance.
The graph convolution operation generates two sets of embeddings for users, groups, and items, capturing their interactions across different contexts. For users, UE1 (User Embedding 1) captures individual preferences derived from user–item interactions, while UE2 (User Embedding 2) reflects their roles and contributions within group settings from user–group relationships. For groups, GE1 (Group Embedding 1) models collective preferences based on group–item interactions, and GE2 (Group Embedding 2) captures internal group dynamics from user–group relationships. Similarly, for items, IE1 (Item Embedding 1) highlights relevance derived from user–item interactions, while IE2 (Item Embedding 2) focuses on item popularity within group–item interactions. These complementary embeddings collectively capture both individual- and group-level preferences, enabling the model to effectively learn nuanced relationships and improve recommendation accuracy across diverse scenarios.
It is worth mentioning that although this paper employs the LightGCN encoder, the proposed framework’s encoder is modular, allowing LightGCN to be easily replaced by other graph convolution encoders in the code. Thus, our framework is model agnostic, providing a relatively general approach for group recommendation. Future research can use this framework for further performance studies.
3.2. Embedding Fusion
In some group recommendation models, group embeddings are aggregated from the embeddings of their members, as in GroupIM [
27]. This approach effectively encodes common preferences among members, which positively impacts group recommendation tasks. However, in specific situations, this approach may not provide accurate recommendations for the group. For example, in a family of three, where the parents prefer thriller and romance movies, respectively, and the child likes animated films, the family may choose a comedy or educational film suitable for all ages when watching together. In such scenarios, aggregating user embeddings to form group embeddings may overlook the unique interaction purpose of the group.
Therefore, this paper obtains group embeddings through graph convolutions based on group interactions and fuses these embeddings with those derived from group–user relationships to create the final group embedding.
In the embedding acquisition stage mentioned above, we obtain three sets of embeddings: (UE1, UE2), (GE1, GE2), and (IE1, IE2). We employ an attention mechanism to fuse these embeddings into the final embeddings for users, groups, and items. The attention mechanism is designed to automatically identify and prioritize the most relevant features from the different embeddings, ensuring that the fused embeddings effectively represent the underlying preferences and interactions.
For group recommendations, the attention mechanism addresses unique challenges by dynamically balancing the contributions of individual member preferences (captured by UE2 and GE2) with collective group-level preferences (captured by GE1 and IE2). This ensures that the final group embeddings reflect both the individual member contributions and the overall group consensus, which is critical for generating recommendations that satisfy the group as a whole. Similarly, for user and item embeddings, the attention mechanism selects features that capture nuanced relationships within user–item and group–item interactions, enhancing the model’s ability to provide accurate and personalized recommendations.
By leveraging the attention mechanism in this way, the model effectively mitigates the risk of over-relying on either individual- or group-level interactions, addressing the inherent complexity of group recommendations. This approach enables the system to adaptively weigh different sources of information, improving the robustness and accuracy of the recommendations in diverse scenarios.
3.3. Multi-Task Joint Training
This study aims to improve the performance of both group and user recommendations while using contrastive learning from self-supervised learning to address the issue of data sparsity. As shown in
Figure 2, the model includes three tasks: a group recommendation task, a user recommendation task, and an auxiliary contrastive learning task.
3.3.1. Group Recommendation Task
The group recommendation algorithm proposed in this paper prioritizes target items using implicit feedback data. Two common training strategies for implicit feedback are the negative sampling strategy and the non-sampling strategy [
33]. Briefly, the main difference lies in the selection of negative samples: the former selects a subset of all samples, excluding positive samples, as negatives, while the latter uses all non-positive samples as negatives. One of the algorithms compared in this paper, BPR, adopts the negative sampling strategy, which has fewer training samples and is faster, but less robust, making it difficult for the model to achieve optimal performance.
Due to the inclusion of all samples, traditional non-sampling strategies generally yield better training results but are inefficient, with a model training complexity of O (|B||V|d), where B represents the batch size of users, V is the total number of items, and d is the embedding dimension. This complexity is generally unacceptable for recommendation models. In recent years, the information retrieval group at Tsinghua University has explored non-sampling strategies for recommendation systems, designing and implementing efficient non-sampling learning algorithms successfully applied in various recommendation system scenarios [
33].
The present study employs a non-sampling strategy, with the loss function for the group recommendation task illustrated in Equation (2). This formulation is designed to balance computational efficiency and representation accuracy, leveraging the relationships between groups, items, and users to enhance model performance.
Here,
denotes the weight of the sample, and
represents the interaction data between the group and the items. These interaction data are computed as shown in Equation (3).
in Equation (3),
and
denote the embedding vectors for the group
g and item
i, respectively, while ⊙ represents the element-wise dot product of these vectors;
h is a trainable parameter vector that projects the interaction features into a scalar value.
The computational complexity of this loss function is
, where
denotes the number of positive samples. Given that the number of positive samples in practical data is less than
<
, the complexity of this loss function is an order of magnitude lower than that of traditional non-sampling complexities, allowing for efficient application in neural recommendation systems [
33]. This loss function integrates seamlessly into the broader model framework by jointly optimizing the group recommendation task (Equation (2)) alongside the user recommendation task and the contrastive learning task, as described in the following sections. This joint optimization ensures that the theoretical goals of balancing group-level and individual-level interactions translate effectively into practical implementation, enhancing recommendation accuracy across all tasks.
3.3.2. User Recommendation Task
The group recommendation task aims to provide a recommendation list for each group, whereas the user recommendation task seeks to generate a recommendation list for each user. To maintain consistency and reduce model complexity, the user recommendation task adopts the same efficient non-sampling strategy mentioned earlier. The loss function for the user recommendation task is expressed in Equation (4).
The definitions of the various parameters in this equation are similar to those in the group recommendation loss, with the only difference being the shift from groups to users.
3.3.3. Contrastive Learning Task
To address the issue of data sparsity, this study introduces a contrastive learning approach derived from self-supervised learning techniques. By constructing auxiliary tasks, it enhances the performance of the main task and the model’s generalization ability. According to empirical data, the model typically encounters challenges due to insufficient group interaction data while completing recommendation tasks. Group preferences are partially dependent on the preferences of group members, and members who frequently interact within a group often exhibit similar preferences. Consequently, group activities reveal both intra-group connections and inter-group distinctions.
By maximizing the mutual information between group members and group embeddings—contrasting the preference representations of group members with those of non-group members who have similar item interaction histories—it effectively regularizes the feature spaces of user and group representations. This process promotes the encoding of shared features among group members, which may not be discernible from their limited interaction histories in the group–item graph. Positive sample pairs are defined as (group embedding, group member embedding), while negative sample pairs are (group embedding, non-group member embedding with similar interaction history). The contrastive loss utilizes InfoNCE loss, defined as in Equation (5).
In this equation, is the temperature coefficient for the InfoNCE loss, a manually adjustable hyperparameter. denotes the set of all groups, represents all members within group , indicates the embedding of group , refers to the embedding of group members, and signifies the embedding of user-specific negative samples for group .
This study employs a preference-based negative sample sampling distribution
, which allocates a higher probability to non-group member users who have purchased items interacted with by the group. These hard negative sample pairs encourage the model to learn shared latent information among group members by contrasting with other users who have similar individual item histories. The sampling distribution
is defined as in Equation (6).
Here, is the indicator function, and is the hyperparameter used to control sampling bias. Variables and represent the interaction histories of group and negative sample , respectively. This sampling method is more effective than random negative sampling in achieving the model’s objectives.
This section corresponds to the contrastive learning task illustrated in
Figure 2, where ∣K∣ indicates the number of members in group
, and M denotes the number of negative samples sampled from group
, which is also an adjustable hyperparameter.
3.3.4. Multi-Task Joint Training Loss
The model optimization employs a joint training approach for the group recommendation task, user recommendation task, and contrastive learning task. The overall objective of the model comprises the group loss (Equation (2)), user loss (Equation (4)), and contrastive loss (Equation (5)). The composite objective is expressed in Equation (7).
In this equation, and are hyperparameters used to regulate the importance between different tasks. Parameter balances the group recommendation task and the user recommendation task, while adjusts the weight of contrastive learning.
4. Experiments and Analysis
To verify the performance of the proposed algorithm, this section implements the MCSS group recommendation algorithm based on multi-view co-training and self-supervised learning. The algorithm is then compared with three classical recommendation algorithms and two group recommendation algorithms on two public datasets. Additionally, we analyze the impact of key hyperparameters on the model’s effectiveness. Further ablation studies are conducted on essential components of the model, examining performance differences between single-task and multi-task models, exploring possible reasons, and analyzing the impact of sample interaction quantities on model performance.
The experiments are divided into three main parts:
Performance Evaluation of MCSS: This section implements the proposed algorithm and compares its performance with three classic recommendation algorithms and two group recommendation algorithms on two public datasets.
Hyperparameter Sensitivity Analysis: The MCSS algorithm contains several hyperparameters, with three particularly important ones: the depth of the neural network (i.e., the number of graph convolution layers, lll); the number of negative samples, MMM, in contrastive learning; and the temperature coefficient, τ, in contrastive loss. This section analyzes the sensitivity of these three hyperparameters on the CAMRa2011 dataset.
Ablation Study: This experiment analyzes the role of key components in the model by testing three variants: a standalone group recommendation task, a standalone user recommendation task, and a dual-task model compared with the overall model with contrastive learning. The performance of these variants is compared and analyzed.
4.1. Datasets, Baselines, and Setup
This study selects two public datasets, CAMRa2011 [
20] and Mafengwo [
20], commonly used for group recommendation. The CAMRa2011 dataset is a public dataset provided for a movie recommendation competition, containing interaction records between individual users, families, and movies. It includes 602 users, 290 groups, 7710 items, 116,344 user–item interactions, and 145,068 group–item interactions, with an average group size of 2.08. The Mafengwo dataset comes from the travel website Mafengwo, where users can record travel destinations and create or join group trips. This dataset contains 5275 users, 995 groups, 1513 items, 39,765 user–item interactions, and 3595 group–item interactions, with an average group size of 7.19 users. Details of these two datasets are presented in
Table 2.
In this section, we first implement the group recommendation algorithm based on multi-view co-training and self-supervised learning on the CAMRa2011 and Mafengwo datasets. Additionally, three classical recommendation algorithms (BPR, NGCF, and LightGCN), one recommendation algorithm for mitigating the cold start problem (DUAL), and five group recommendation algorithms (AGREE, HCR, HHGR, LARGE, and CDRec) are implemented on both datasets for comparative evaluation.
Since this study investigates group recommendations in sparse scenarios, we use only 40% of the original training set for training, 20% as a validation set, and the remaining 40% as a test set. In our approach, the embedding dimension is set to 64, and the batch size is 512. The baseline methods retain their optimal parameter settings. The variance proportion threshold β is set at 0.075 for CAMRa2011 and 0.3 for Mafengwo. Notably, the three classical recommendation algorithms are general-purpose algorithms in the recommendation domain, not specifically designed for group recommendations. As a result, they do not capture group–user affiliation relationships, so they rely only on group–item and user–item interactions to separately generate group and user recommendation lists. For fairness in algorithm comparison, the group recommendation algorithms are adjusted to ensure consistency in evaluation metrics with the proposed approach. This adjustment is made based on the original source code provided by the authors of these algorithms.
4.2. Overall Performance Evaluation
The overall comparisons for group and user recommendations on the CAMRa2011 and Mafengwo datasets are shown in
Table 3 and
Table 4, respectively. Precision, Recall, and NDCG are used as evaluation metrics to assess model performance at the Top-20 level.
We highlight the best results across all models in bold and underline the best-performing comparison algorithm, with “improve” in the tables representing the increase relative to the best comparison algorithm. Based on
Table 3, it can be observed that on the CAMRa2011 dataset, our proposed model achieves the highest scores across all metrics at Top-20 for both group recommendation and user recommendation tasks compared with the other algorithms. Among the three classical recommendation algorithms, NGCF and LightGCN perform comparably and outperform BPR, as both NGCF and LightGCN utilize graph convolution to effectively model the interaction process. BPR utilizes a traditional matrix factorization approach, NGCF and LightGCN exploit graph-based representations, capturing more complex interaction patterns within the data. The performance boost in NGCF and LightGCN highlights the importance of utilizing graph convolutional networks to model high-order interactions, which proves particularly advantageous in scenarios with dense interaction data.
The DUAL method, which incorporates two auxiliary tasks to alleviate the cold-start problem in general recommendation settings, demonstrates competitive performance, achieving the second-best precision in user recommendation while also showing strong results in group recommendation. The dual-task approach allows the model to better generalize across different scenarios by addressing data sparsity and providing more robust embeddings for both users and items, particularly in cold-start situations.
Among the five group recommendation algorithms compared, HHGR, CDRec, and LARGE significantly outperform AGREE and HCR in terms of precision for the group recommendation task. This is because AGREE and HCR primarily generate group embeddings from group–item interactions, which limits their ability to capture the diverse preferences of individual group members. In contrast, HHGR, CDRec, and LARGE better model the interactions between group members and items, either through hierarchical structures (HHGR), contrastive learning (CDRec), or leadership dynamics (LARGE). These models focus on aggregating embeddings from individual members, which allows for a more comprehensive and representative group preference model.
CDRec, in particular, stands out by addressing the cold-start problem in group recommendations, resulting in better precision compared with the other baseline methods. This highlights the critical importance of effectively addressing cold-start issues, particularly in group settings where data for new or less-active groups is often limited.
In summary, our model consistently outperforms the baseline methods across multiple metrics, particularly by better handling both user and group recommendation tasks. The comparative analysis demonstrates the advantages of incorporating advanced techniques such as graph convolutions, auxiliary tasks, and contrastive learning to address the specific challenges in recommendation systems, such as cold-start problems and the accurate aggregation of group member preferences.
Referring to
Table 4, we observe that on the Mafengwo dataset, our proposed model again outperforms all other algorithms across all metrics for both group and user recommendation tasks. The performance of the other algorithms on this dataset is similar to their performance on the CAMRa2011 dataset. Notably, some group recommendation algorithms exhibit slightly worse performance compared with NGCF or LightGCN on both datasets. This can be attributed to a larger base of items used in generating the recommendation lists in group recommendation algorithms. For example, suppose in a dataset, the item set A contains 1000 items that have interacted with users, while the item set B contains 500 items that have interacted with groups. Due to an overlapping yet partial distinction between sets A and B, the total item set C that has interacted with either users or groups may comprise 1200 items. For the group recommendation baselines, the item set C is used to generate recommendations for both group and user tasks. In contrast, NGCF and LightGCN, which do not consider group–user interaction relationships, use item set A for the group recommendation tasks and item set B for the user recommendation tasks, resulting in slightly better performance than some group recommendation baselines.
4.3. Hyperparameter Sensitivity Analysis
We conduct a hyperparameter sensitivity analysis on the CAMRa2011 dataset, using NDCG@20 and Recall@20 as performance evaluation metrics. This analysis examines the effects of three key hyperparameters: the number of graph convolution layers, the number of negative samples M, and the temperature coefficient τ in contrastive loss. The impact of the graph convolution layers l on each performance metric is shown in
Figure 3.
From the figure, it can be observed that for both group and user recommendation tasks, as the number of graph convolution layers (i.e., the neural network depth) increases, the metrics NDCG@20 and Recall@20 initially rise and then decline, peaking at a graph convolution layer count of 3. Therefore, setting the graph convolution layer number to 3 yields the best model performance. The performance decrease beyond this peak may be due to the over-smoothing problem in deep graph convolutional networks.
The impact of the number of negative samples M per group on model performance is shown in
Figure 4.
From the figure above, it can be observed that as the number of negative samples increases from 2 to 10, both metrics for both tasks initially improve and then decline, reaching their highest values at a negative sample count of 6. This suggests that a small number of negative samples can enhance the recommendation task, while a larger number of negative samples may mislead the recommendation system, leading to decreased performance.
The impact of the temperature coefficient
in contrastive loss on the model is shown in
Figure 5.
Figure 5a illustrates the effect of
on the group recommendation, and
Figure 5b shows its effect on the user recommendation. As the temperature coefficient increases, contrastive loss degrades into a loss function that focuses only on the hardest negative samples. Conversely, as the temperature coefficient decreases, contrastive loss applies equal weighting to all negative samples, thus losing the focus on hard samples.
Figure 5 indicates that the method presented in this paper is not highly sensitive to the temperature coefficient. Consequently, we adopt a value of τ = 0.07, which comparatively achieves higher performance, as the final temperature coefficient.
4.4. Ablation Study
In this section, an ablation study is conducted to validate the effectiveness and necessity of each major component of the model, analyzing the possible reasons behind the results obtained. The model consists of three tasks: group recommendation, user recommendation, and contrastive learning tasks.
To evaluate the importance of each component, we design three variants: a standalone group recommendation model (MCSS-G), a standalone user recommendation model (MCSS-U), and a dual-task model without self-supervised learning (MCSS-Dual). By comparing the performance of these three variants with the complete model (MCSS), we can assess whether the two main tasks are mutually beneficial and the true impact of the contrastive learning task.
This experiment is conducted on the CAMRa2011 and Mafengwo datasets, with NDCG@20, Recall@20, NDCG@50, and Recall@50 as the evaluation metrics, representing the Top-20 and Top-50 performance for each metric.
Table 5 shows the specific results for the CAMRa2011 dataset.
Firstly, examining the data in
Table 5, where the left side shows Top-20 and the right side shows Top-50 results, we observe that in the Top-50 setting, the final model (MCSS) achieves the best performance for both group and user recommendation tasks. This confirms the effectiveness and high performance of the proposed multi-view co-training and self-supervised learning-based group recommendation algorithm.
Moreover, the dual-task model (MCSS-Dual) slightly outperforms the two single-task models, indicating that on the CAMRa2011 dataset, jointly training the two tasks yields better performance. We attribute this to the following: for the group recommendation task, a group’s preferences are largely dependent on the common preferences of its member users. Therefore, training the user task together with this task helps improve the learning of group embeddings. For the user recommendation task, the interactions within groups often relate to the preferences of individual users, which may be influenced by group interactions. For instance, Alice might never have tried Sichuan cuisine, but after dining with roommates who enjoy it, she tries it and grows to like it, increasing her likelihood of choosing Sichuan cuisine when dining alone. Thus, combining group and user recommendations is mutually beneficial.
Further examining
Table 5, we observe that the performance improvement of the dual-task model (MCSS-Dual) over the single-task model (MCSS-G) is not very large. Similarly, while adding the contrastive learning task to the dual-task model (MCSS) enhances performance, the improvement is not substantial. In theory, adding the user task to the group task should provide considerable benefits, as it compensates for sparse group data with a wealth of user interactions. However, analyzing the CAMRa2011 dataset reveals that the average group size is only 2.08, and the group–item interactions exceed user–item interactions by 28,724 records, which deviates from the typical assumption. Under these conditions, user interactions do not significantly aid the group task, making the modest performance gain reasonable. Additionally, statistics show that, on average, 40% of items interacted with by users are also interacted with by groups the users belong to, further reducing the contribution of user interactions to groups, supporting the validity of our algorithm’s results. The conclusions drawn from the Top-20 results on the CAMRa2011 dataset align with this analysis.
4.5. Discussion
The proposed model demonstrates significant advancements over existing group recommendation approaches. It successfully integrates the strengths of multiple learning techniques, including attention mechanisms, contrastive learning, and multi-task training. In comparison with traditional models, such as AGREE, HCR, and LightGCN, the proposed model offers several key advantages.
One of the primary strengths of the model is its ability to address the cold-start problem more effectively through contrastive learning and preference-based negative sampling. This enables the model to learn from limited data and make more accurate predictions, especially in scenarios wherein group interactions are sparse or limited. The use of attention mechanisms in the model further enhances its ability to prioritize important user and group interactions, thereby improving the accuracy of both group and user recommendation tasks.
Moreover, the joint optimization of multiple tasks—group recommendation, user recommendation, and contrastive learning—allows the model to learn shared features among group members and refine user and group representations simultaneously. This holistic approach is particularly effective in modeling complex group dynamics, where group preferences are not simply an aggregation of individual preferences.
However, there are also limitations. Despite its promising performance, the model still faces challenges related to computational efficiency and scalability, particularly when dealing with very large datasets. The increased complexity introduced by multiple attention mechanisms and the contrastive learning component may also result in longer training times. Additionally, the model’s reliance on group embeddings and individual user interactions could limit its ability to generalize in more heterogeneous group settings, where member preferences may vary significantly.
In comparison with other models, the proposed approach shows clear advantages in terms of recommendation accuracy and the ability to handle sparse data, but it may require further optimization to balance model complexity with computational efficiency.
5. Conclusions
In conclusion, this paper presents a novel approach to group recommendations designed to address challenges associated with data sparsity and varying group member preferences. By introducing a multi-view embedding propagation framework, an attention-based embedding fusion process, and a multi-task joint training strategy, the model effectively captures complex user–group–item interactions, balancing both individual and collective preferences within groups. Extensive experiments on public datasets demonstrate the model’s robustness and superior performance over existing methods, particularly in sparse data scenarios. This approach not only improves recommendation accuracy but also provides a flexible, modular framework that can be adapted to various recommendation contexts.
While the proposed multi-view co-training and self-supervised learning model (MCSS) provides a solid foundation for group recommendations, there are several promising directions for future research. First, the embedding fusion layer in the model relies on an attention mechanism to integrate user, group, and item embeddings. Future work could explore advanced attention techniques, such as multi-head attention or hierarchical attention networks, to better capture the complex relationships between users and groups, especially in diverse groups where individual interactions vary significantly. This would enhance the model’s ability to focus on the most relevant interactions, improving overall recommendation accuracy.
Second, our approach leverages contrastive learning to mitigate data sparsity, but there is potential for further refinement. Future research could focus on optimizing the contrastive learning task to better handle situations where user–item interactions are particularly sparse. For instance, semi-supervised learning approaches could be incorporated to make better use of limited labeled data, improving the model’s robustness and its ability to generalize from fewer interactions.
Lastly, the multi-task joint training layer simultaneously optimizes group and user recommendation tasks. Future work could investigate how to fine-tune this multi-task learning process to improve the balance between individual user satisfaction and group-level recommendations. Techniques such as dynamic task weighting or adaptive task prioritization could help ensure that both tasks are optimized effectively, leading to more accurate and personalized recommendations for both individual users and groups.