skip to main content
research-article

Robust multimodal federated learning for incomplete modalities

Published: 12 April 2024 Publication History

Abstract

Consumer electronics are continuously collecting multimodal data, such as audio, video, and so on. A multimodal learning mechanism can be adopted to deal with these data. Due to the consideration of privacy protection, some successful attempts at multimodal federated learning (MMFed) have been conducted. However, real-world multimodal data is usually missing modalities, which can significantly affect the accuracy of the global model in MMFed. Effectively fusing and analyzing multimodal data with incompleteness remains a challenging problem. To tackle this problem, we propose a robust Multimodal Federated Learning Framework for Incomplete Modalities (FedInMM). More specifically, we design a Long Short-Term Memory (LSTM)-based module to extract the information in the temporal sequence. We dynamically learn a weight map to rescale the feature in each modality and formulate the different contributions of features. And then the content of each modality is further fused to form a uniform representation of all modalities of data. By considering the temporal dependency and intra-relation of multi-modalities automatically through the learning stage, this MMFed framework can efficiently mitigate the effects of missing modalities. By using two multimodal datasets, DEAP and AReM, we have conducted comprehensive experiments by simulating different levels of incompleteness. Experimental results demonstrate that FedInMM outperforms other approaches and can train highly accurate models on datasets comprising different incompleteness patterns, which is more appropriate for integration into a practical multimodal application.

References

[1]
Qian K., Koike T., Nakamura T., Schuller B.W., Yamamoto Y., Learning multimodal representations for drowsiness detection, IEEE Trans. Intell. Transp. Syst. 23 (8) (2021) 11539–11548.
[2]
Aghili M., Tabarestani S., Adjouadi M., Addressing the missing data challenge in multi-modal datasets for the diagnosis of Alzheimer’s disease, J. Neurosci. Methods 375 (2022).
[3]
Li Y., Li K., Wang S., Chen X., Wen D., Pilot behavior recognition based on multi-modality fusion technology using physiological characteristics, Biosensors 12 (6) (2022) 404.
[4]
Z. Jia, Y. Lin, J. Wang, Z. Feng, X. Xie, C. Chen, HetEmotionNet: two-stream heterogeneous graph recurrent neural network for multi-modal emotion recognition, in: Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 1047–1056.
[5]
Zhao C., Gao Z., Wang Q., Xiao K., Mo Z., Deen M.J., FedSup: A communication-efficient federated learning fatigue driving behaviors supervision approach, Future Gener. Comput. Syst. 138 (2023) 52–60.
[6]
McMahan B., Moore E., Ramage D., Hampson S., y Arcas B.A., Communication-efficient learning of deep networks from decentralized data, in: Artificial Intelligence and Statistics, PMLR, 2017, pp. 1273–1282.
[7]
Liu J., Huang J., Zhou Y., Li X., Ji S., Xiong H., Dou D., From distributed machine learning to federated learning: A survey, Knowl. Inf. Syst. 64 (4) (2022) 885–917.
[8]
Xiong B., Yang X., Qi F., Xu C., A unified framework for multi-modal federated learning, Neurocomputing 480 (2022) 110–118.
[9]
Zhao Y., Barnaghi P., Haddadi H., Multimodal federated learning on IoT data, in: 2022 IEEE/ACM Seventh International Conference on Internet-of-Things Design and Implementation, IoTDI, IEEE, 2022, pp. 43–54.
[10]
X. Yang, B. Xiong, Y. Huang, C. Xu, Cross-Modal Federated Human Activity Recognition via Modality-Agnostic and Modality-Specific Representation Learning, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36, No. 3, 2022, pp. 3063–3071.
[11]
J. Chen, A. Zhang, FedMSplit: Correlation-Adaptive Federated Multi-Task Learning across Multimodal Split Networks, in: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 87–96.
[12]
Zheng T., Li A., Chen Z., Wang H., Luo J., AutoFed: Heterogeneity-aware federated multimodal learning for robust autonomous driving, 2023, arXiv preprint arXiv:2302.08646.
[13]
Zhang C., Cui Y., Han Z., Zhou J.T., Fu H., Hu Q., Deep partial multi-view learning, IEEE Trans. Pattern Anal. Mach. Intell. 44 (5) (2020) 2402–2415.
[14]
Zhou T., Vera P., Canu S., Ruan S., Missing data imputation via conditional generator and correlation learning for multimodal brain tumor segmentation, Pattern Recognit. Lett. 158 (2022) 125–132.
[15]
J. Chen, A. Zhang, Hgmf: heterogeneous graph-based fusion for multimodal data with incompleteness, in: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, pp. 1295–1305.
[16]
Yu G., Li Q., Shen D., Liu Y., Optimal sparse linear prediction for block-missing multi-modality data without imputation, J. Amer. Statist. Assoc. 115 (531) (2020) 1406–1419.
[17]
Gaw N., Yousefi S., Gahrooei M.R., Multimodal data fusion for systems improvement: A review, IISE Trans. 54 (11) (2022) 1098–1116.
[18]
L. Cai, Z. Wang, H. Gao, D. Shen, S. Ji, Deep adversarial learning for multi-modality missing data completion, in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, pp. 1158–1166.
[19]
Chen C., Dou Q., Jin Y., Chen H., Qin J., Heng P.-A., Robust multimodal brain tumor segmentation via feature disentanglement and gated fusion, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part III 22, Springer, 2019, pp. 447–456.
[20]
M. Jing, J. Li, L. Zhu, K. Lu, Y. Yang, Z. Huang, Incomplete cross-modal retrieval with dual-aligned variational autoencoders, in: Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 3283–3291.
[21]
M. Ma, J. Ren, L. Zhao, S. Tulyakov, C. Wu, X. Peng, Smil: Multimodal learning with severely missing modality, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, No. 3, 2021, pp. 2302–2310.
[22]
Wang N., Cao H., Zhao J., Chen R., Yan D., Zhang J., M2R2: Missing-modality robust emotion recognition framework with iterative data augmentation, IEEE Trans. Artif. Intell. (2022).
[23]
J. Zhao, R. Li, Q. Jin, Missing modality imagination network for emotion recognition with uncertain missing modalities, in: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 2608–2618.
[24]
Poklukar P., Vasco M., Yin H., Melo F.S., Paiva A., Kragic D., Geometric multimodal contrastive representation learning, in: International Conference on Machine Learning, PMLR, 2022, pp. 17782–17800.
[25]
Y. Tan, G. Long, L. Liu, T. Zhou, Q. Lu, J. Jiang, C. Zhang, Fedproto: Federated prototype learning across heterogeneous clients, in: AAAI Conference on Artificial Intelligence, Vol. 1, No. 3, 2022.
[26]
L. Qu, Y. Zhou, P.P. Liang, Y. Xia, F. Wang, E. Adeli, L. Fei-Fei, D. Rubin, Rethinking architecture design for tackling data heterogeneity in federated learning, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10061–10071.
[27]
Qayyum A., Ahmad K., Ahsan M.A., Al-Fuqaha A., Qadir J., Collaborative federated learning for healthcare: Multi-modal covid-19 diagnosis at the edge, IEEE Open J. Comput. Soc. 3 (2022) 172–184.
[28]
Nandi A., Xhafa F., A federated learning method for real-time emotion state classification from multi-modal streaming, Methods (2022).
[29]
Pham T.D., Wårdell K., Eklund A., Salerud G., Classification of short time series in early Parkinson’s disease with deep learning of fuzzy recurrence plots, IEEE/CAA J. Autom. Sin. 6 (6) (2019) 1306–1317.
[30]
Zitouni M.S., Park C.Y., Lee U., Hadjileontiadis L.J., Khandoker A., LSTM-modeling of emotion recognition using peripheral physiological signals in naturalistic conversations, IEEE J. Biomed. Health Inf. 27 (2) (2022) 912–923.
[31]
Fedorin I., Slyusarenko K., Consumer smartwatches as a portable PSG: LSTM based neural networks for a sleep-related physiological parameters estimation, in: 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, EMBC, IEEE, 2021, pp. 849–852.
[32]
Lindemann B., Maschler B., Sahlab N., Weyrich M., A survey on anomaly detection for technical systems using lstm networks, Comput. Ind. 131 (2021).
[33]
T. Sweet, D.E. Thompson, Applying Big Transfer-based classifiers to the DEAP dataset, in: 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, EMBC, 2022, pp. 406–409.
[34]
E. Eldele, M. Ragab, Z. Chen, M. Wu, C.K. Kwoh, X. Li, C. Guan, Time-Series Representation Learning via Temporal and Contextual Contrasting, in: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, 2021, pp. 2352–2359.
[35]
Min J., Xiong C., Zhang Y., Cai M., Driver fatigue detection based on prefrontal EEG using multi-entropy measures and hybrid model, Biomed. Signal Process. Control 69 (2021).
[36]
Lian Z., Chen L., Sun L., Liu B., Tao J., GCNet: Graph completion network for incomplete multimodal learning in conversation, IEEE Trans. Pattern Anal. Mach. Intell. (2023).
[37]
Wen J., Zhang Z., Lan Y., Cui Z., Cai J., Zhang W., A survey on federated learning: challenges and applications, Int. J. Mach. Learn. Cybern. (2022) 1–23.
[38]
Wang Y., Huang H., Rudin C., Shaposhnik Y., Understanding how dimension reduction tools work: an empirical approach to deciphering t-SNE, UMAP, trimap, and PaCMAP for data visualization, J. Mach. Learn. Res. 22 (1) (2021) 9129–9201.
[39]
X. Ouyang, Z. Xie, H. Fu, S. Cheng, L. Pan, N. Ling, G. Xing, J. Zhou, J. Huang, Harmony: Heterogeneous Multi-Modal Federated Learning through Disentangled Model Training, in: Proceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services, 2023, pp. 530–543.
[40]
Chen S., Li B., Towards optimal multi-modal federated learning on non-IID data with hierarchical gradient blending, in: IEEE INFOCOM 2022-IEEE Conference on Computer Communications, IEEE, 2022, pp. 1469–1478.

Cited By

View all
  • (2024)Cross-Modal Meta Consensus for Heterogeneous Federated LearningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681510(975-984)Online publication date: 28-Oct-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Computer Communications
Computer Communications  Volume 214, Issue C
Jan 2024
285 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 12 April 2024

Author Tags

  1. Multimodal fusion
  2. Federated learning
  3. Data incompleteness
  4. Missing modalities

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Cross-Modal Meta Consensus for Heterogeneous Federated LearningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681510(975-984)Online publication date: 28-Oct-2024

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media