research-article

Open access

FedVQA: Personalized Federated Visual Question Answering over Heterogeneous Scenes

Authors:

Mingrui Lao,

Nan Pu,

Zhun Zhong,

Nicu Sebe,

Michael S. LewAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 7796 - 7807

https://doi.org/10.1145/3581783.3611958

Published: 27 October 2023 Publication History

PDF eReader

Abstract

This paper presents a new setting for visual question answering (VQA) called personalized federated VQA (FedVQA) that addresses the growing need for decentralization and data privacy protection. FedVQA is both practical and challenging, requiring clients to learn well-personalized models on scene-specific datasets with severe feature/label distribution skews. These models then collaborate to optimize a generic global model on a central server, which is desired to generalize well on both seen and unseen scenes without sharing raw data with the server and other clients. The primary challenge of FedVQA is that, client models tend to forget the global knowledge initialized from central server during the personalized training, which impairs their personalized capacity due to the potential overfitting issue on local data. This further leads to divergence issues when aggregating distinct personalized knowledge at the central server, resulting in an inferior generalization ability on unseen scenes. To address the challenge, we propose a novel federated pairwise preference preserving (FedP3) framework to improve personalized learning via preserving generic knowledge under FedVQA constraints. Specifically, we first design a differentiable pairwise preference (DPP) to improve knowledge preserving by formulating a flexible yet effective global knowledge. Then, we introduce a forgotten-knowledge filter (FKF) to encourage the client models to selectively consolidate easily-forgotten knowledge. By aggregating the DPP and the FKF, FedP3 coordinates the generic and the personalized knowledge to enhance the personalized ability of clients and generalizability of the server. Extensive experiments show that FedP3 consistently surpasses the state-of-the-art in FedVQA task.

Supplemental Material

MP4 File

The presentation video for paper "FedVQA: Personalized Federated Visual Question Answering over Heterogeneous Scenes"

Download
37.00 MB

References

[1]

Durmus Alp Emre Acar, Yue Zhao, Ramon Matas Navarro, Matthew Mattina, Paul N Whatmough, and Venkatesh Saligrama. 2021. Federated learning based on dynamic regularization. arXiv preprint arXiv:2111.04263 (2021).

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

Multimodal attention-driven visual question answering for Malayalam

Visual question answering: Which investigated applications?

Item group based pairwise preference learning for personalized ranking

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations