FedDdrl: Federated Double Deep Reinforcement Learning for Heterogeneous IoT with Adaptive Early Client Termination and Local Epoch Adjustment
Abstract
:1. Introduction
- We modeled the FL system as an MDP. Then, we proposed to use a DDRL framework for adaptive early client termination and local epoch adjustment, to maximize the global model accuracy while minimizing the training latency and communication costs.
- We demonstrated our proposed algorithm in a non-IID setting on MNIST, CIFAR-10 and CrisisIBD datasets. We showed that our solution could outperform existing methods in terms of global model accuracy with shorter training latency and lower communication costs.
- We explored the influence of balanced-MixUp in the FL system. In most settings, balanced-MixUp could mitigate weight divergence and improve convergence speed.
2. Related Work
2.1. Federated Learning and Deep Reinforcement Learning
2.2. Weight Divergence in Federated Learning
3. System Model and Problem Formulation
3.1. System Model
- Model broadcasting: If , the FL server will initialize a global model, whereas at , the FL server will collect the client models trained at round and aggregate them into a new global model. Then, the FL server will broadcast the global model to randomly selected clients.
- Probing training: Each selected client will perform one epoch of local training called probing training. The purpose of probing training is to acquire the metadata of each client. The metadata consist of the client’s states, which will be fed to the DRL agents for adaptive early client termination and local epoch adjustment. The details of the client states will be defined later together with the specification of the DRL agents. After probing training, each client will upload its metadata to the server and proceed to the next phase.
- Early client termination: Based on the collected client states, the DRL agents at the FL server will drop non-essential clients to reduce total latency and total communication cost for round . The decision made by DRL agents will be sent to each client.
- Completion of training: Only the remaining clients that are not dropped by the DRL agent will resume training. Each client will complete the remaining local training until epochs are reached. Each locally trained model will be uploaded to the FL server for model aggregation.
3.2. Problem Formulation
Algorithm 1 Search for the best , and | |||||
1. | Input: | Set , , | |||
2. | Output: | The best , , | |||
3: | Run one complete iteration of FedAvg with communication rounds and record the traces value of , and for . | ||||
4: | Initialize an empty set to store all that satisfied constraints (10c-e) is the weighted-sum optimization goal | ||||
5: | for do | ||||
6: | for do | ||||
7: | for do | ||||
8: | Compute , , , based on the recorded traces value, where we assume | ||||
9: | if (10c-e) are satisfied: | ||||
10: | Compute from the traces value | ||||
11: | Record in | ||||
12: | end if | ||||
13: | end for | ||||
14: | end for | ||||
15. | end for | ||||
16: | From , find out which combination of results in the smallest . This can be treated as finding the worse-case . | ||||
17: | return |
4. Proposed Method
4.1. Deep Reinforcement Learning for Federated Learning Optimization
4.1.1. Early Client Termination
- State State consisted of the client states for each VDN agent. Each agent consisted of six components: (i) the probing loss , (ii) probing training latencies , (iii) model uploading latencies , (iv) communication cost from client to server , (v) local training dataset size and (vi) current communication round index . The state vector for agent can be written as Equation (11):It is noteworthy that since each agent in the VDN only has access to its own local observation instead of the full observed environment, the policy has to incorporate past agent observations from history [33]. Thus, the historical values of probing latencies and model uploading latencies were included in the state vector to mitigate the limitation of local observation. Note that and are the sizes of the historical information of probing latencies and model uploading latencies, respectively.
- Action : Action comprised the client termination decision for each VDN agent. The action space for client termination was , where 0 indicates the termination of the client and 1 indicates the client is selected for complete training.
- Reward : A vanilla reward for VDN 1, denoted as , can be adopted from the FL optimization problem as described in Equation (12):where the system is rewarded with accuracy improvement and penalties for and . However, Equation (12) has one obvious limitation. When , the , regardless of the magnitude of . If , the reward . This causes the optimization problem to diverge from improving accuracy with the constraint of and to merely the reduction of and . To show the severeness of this problem, we trained the VDN agents using the as defined by Equation (12). Let and denote the expected values of accuracy improvement and penalties (, respectively. For MNIST dataset, the expected values of both components for the last communication rounds can be computed in Equations (13) and (14):It is observed that for the last five communication rounds. This is because as training approach the end, the accuracy improvement is often smaller compared to the earlier stage. Consequently, the VDN agents start to terminate more clients from complete training, giving way to the reduction of and . To make sure the agents are motivated to learn even when , we can introduce a bias term to . Let . Hence, the reward function can be reformulated as shown in Equation (15):Note that we only added the bias term to the reward when since it is intended to encourage accuracy improvement. We did not subtract the bias term from the reward when since the penalty terms are sufficient to penalize the inferior actions.
4.1.2. Local Epoch Adjustments
- State: The second VDN shared the same state in Equation (11) since both VDNs required the same local observation for decision making.
- Action : Action comprises the local epoch counts for each VDN agent. The action space is . This action aims to exploit client devices with stronger computation power for more training epochs and vice versa.
- Reward : We adopted Equation (15) as the starting point for the reward function for VDN 2. However, the communication cost was not part of the optimizing objectives of VDN 2 since local epoch adjustment is only bounded by the constraint. Hence, the reward function for this VDN networks can ignore the penalty. As such, the can be defined in Equation (16):where we used the same bias term from (15) for the simplicity’s sake.
Algorithm 2 FedDdrl Algorithm | ||||||
1: | Input: | Initialize VDN 1 and its target network for client selection policy | ||||
Initialize VDN 2 and its target network for local epoch adjustment policy | ||||||
2: | Output: | Trained and networks | ||||
3: | Set | |||||
4: | for Episode do | |||||
5: | Reset the FL environment | |||||
6: | Initialize a global model | |||||
7: | for communication round do | |||||
8: | Randomly select clients from all clients | |||||
9: | Broadcast the global model to each selected client | |||||
10: | for each client in parallel do | |||||
11: | ; Copy the global model as each client model | |||||
12: | Update the client model using the local training dataset | |||||
13: | Upload client states to the FL server | |||||
14: | end for | |||||
15: | Each agent in VDN 1 selects the optimal action = argmax with a × 100% probability, else randomly output actions | |||||
16: | Each agent in VDN 2 selects the optimal action = argmax with a × 100% probability, else randomly output actions | |||||
17: | Send action and to each client | |||||
18: | for each client in parallel do | |||||
19: | if : | |||||
20: | Continue updating using until is reached | |||||
21: | Return updated | |||||
22: | end if | |||||
23: | end for | |||||
24: | Aggregate global model where | |||||
25: | Reward and are given to and based on | |||||
26: | , , | |||||
27: | Store transitions 1 for into memory buffer 1 | |||||
28: | Store transitions 2 for into memory buffer 2 | |||||
29: | Sample mini-batches with size from memory buffer to train , and | |||||
30: | Decay gradually from 1.0 to 0.1 | |||||
31: | end for | |||||
32: | end for |
4.2. Balanced-MixUp to Mitigate Weight Divergence
5. Simulation Results
- We created client configurations, each consisting of the (i) client’s computing latency per data, (ii) model upload latency and (iii) local dataset identity (ID) number. To closely simulate the heterogeneity of resources in an IoT network as in [14], the computing latency per data in each client configuration can be any of seconds, while the model upload latency can be any of seconds.
- CPU 1, CPU 2 and GPU simulated three, three and four clients, respectively. The simulated clients represent the randomly selected clients from the total clients in each communication round .
- In each communication round , 10 client configurations were randomly sampled out from the configuration pools. The 10 simulated clients (in CPU 1, CPU 2 and GPU) were configured according to the selected client configuration. This entire process (3) is equivalent to the FL process of randomly selected 10 clients with unique local datasets and resources.
- After step (3), each simulated client proceeded with its training. If the FL algorithm was FedAvg or FedProx, all 10 simulated clients underwent complete training of local epochs. On the contrary, if the FL algorithm was FedMarl or FedDdrl, only the simulated clients that were not terminated by the FedMarl/FedDdrl completed their local training based on = argmax by VDN 2.
5.1. Results and Ablation Study
5.1.1. Model Accuracy
5.1.2. Training Latency
5.1.3. Communication Efficiency
5.2. Strategy Learned by FedDdrl
5.3. How FedDdrl Optimizes the Three Objectives Simultaneously
5.4. Computational Complexity Analysis
5.5. Why Balanced-MixUp Helps in Federated Learning
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Al-Maslamani, N.; Abdallah, M.; Ciftler, B.S. Secure Federated Learning for IoT Using DRL-Based Trust Mechanism. In Proceedings of the 2022 International Wireless Communications and Mobile Computing, IWCMC 2022, Dubrovnik, Croatia, 30 May–3 June 2022; pp. 1101–1106. [Google Scholar] [CrossRef]
- Reinsel, D.; Gantz, J.; Rydning, J. The Digitization of the World from Edge to Core. Fram. Int. Data Corp. 2018, 16, 16–44. [Google Scholar]
- Sheller, M.J.; Edwards, B.; Reina, G.A.; Martin, J.; Pati, S.; Kotrotsou, A.; Milchenko, M.; Xu, W.; Marcus, D.; Colen, R.R.; et al. Federated Learning in Medicine: Facilitating Multi-Institutional Collaborations without Sharing Patient Data. Sci. Rep. 2020, 10, 12598. [Google Scholar] [CrossRef] [PubMed]
- McMahan, B.H.; Moore, E.; Ramage, D.; Hampson, S.; Agüera y Arcas, B. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, Fort Lauderdale, FL, USA, 20–22 April 2017. [Google Scholar] [CrossRef]
- Hard, A.; Rao, K.; Mathews, R.; Ramaswamy, S.; Beaufays, F.; Augenstein, S.; Eichner, H.; Kiddon, C.; Ramage, D. Federated Learning for Mobile Keyboard Prediction. arXiv 2018, arXiv:1811.03604. [Google Scholar] [CrossRef]
- Ahmed, L.; Ahmad, K.; Said, N.; Qolomany, B.; Qadir, J.; Al-Fuqaha, A. Active Learning Based Federated Learning for Waste and Natural Disaster Image Classification. IEEE Access 2020, 8, 208518–208531. [Google Scholar] [CrossRef]
- Wong, Y.J.; Tham, M.-L.; Kwan, B.-H.; Gnanamuthu, E.M.A.; Owada, Y. An Optimized Multi-Task Learning Model for Disaster Classification and Victim Detection in Federated Learning Environments. IEEE Access 2022, 10, 115930–115944. [Google Scholar] [CrossRef]
- Reina, G.A.; Gruzdev, A.; Foley, P.; Perepelkina, O.; Sharma, M.; Davidyuk, I.; Trushkin, I.; Radionov, M.; Mokrov, A.; Agapov, D.; et al. OpenFL: An Open-Source Framework for Federated Learning. arXiv 2021, arXiv:2105.06413. [Google Scholar] [CrossRef]
- Chen, X.; Li, Z.; Ni, W.; Wang, X.; Zhang, S.; Xu, S.; Pei, Q. Two-Phase Deep Reinforcement Learning of Dynamic Resource Allocation and Client Selection for Hierarchical Federated Learning. In Proceedings of the 2022 IEEE/CIC International Conference on Communications in China, ICCC 2022, Foshan, China, 11–13 August 2022; pp. 518–523. [Google Scholar] [CrossRef]
- Yang, W.; Xiang, W.; Yang, Y.; Cheng, P. Optimizing Federated Learning with Deep Reinforcement Learning for Digital Twin Empowered Industrial IoT. IEEE Trans. Industr. Inform. 2022, 19, 1884–1893. [Google Scholar] [CrossRef]
- Zhang, W.; Yang, D.; Wu, W.; Peng, H.; Zhang, N.; Zhang, H.; Shen, X. Optimizing Federated Learning in Distributed Industrial IoT: A Multi-Agent Approach. IEEE J. Sel. Areas Commun. 2021, 39, 3688–3703. [Google Scholar] [CrossRef]
- Liu, L.; Zhang, J.; Song, S.H.; Letaief, K.B. Client-Edge-Cloud Hierarchical Federated Learning. In Proceedings of the IEEE International Conference on Communications 2020, Dublin, Ireland, 7–11 June 2020. [Google Scholar] [CrossRef]
- Song, Q.; Lei, S.; Sun, W.; Zhang, Y. Adaptive Federated Learning for Digital Twin Driven Industrial Internet of Things; Adaptive Federated Learning for Digital Twin Driven Industrial Internet of Things. In Proceedings of the 2021 IEEE Wireless Communications and Networking Conference (WCNC), Nanjing, China, 29 March–1 April 2021. [Google Scholar] [CrossRef]
- Zhang, S.Q.; Lin, J.; Zhang, Q. A Multi-Agent Reinforcement Learning Approach for Efficient Client Selection in Federated Learning. Proc. AAAI Conf. Artif. Intell. 2022, 36, 9091–9099. [Google Scholar] [CrossRef]
- Abdulrahman, S.; Tout, H.; Ould-Slimane, H.; Mourad, A.; Talhi, C.; Guizani, M. A Survey on Federated Learning: The Journey from Centralized to Distributed on-Site Learning and Beyond. IEEE Internet Things J. 2021, 8, 5476–5497. [Google Scholar] [CrossRef]
- Zhao, Y.; Li, M.; Lai, L.; Suda, N.; Civin, D.; Chandra, V. Federated Learning with Non-IID Data. arXiv 2018, arXiv:1806.00582. [Google Scholar] [CrossRef]
- Wang, J.; Liu, Q.; Liang, H.; Joshi, G.; Vincent Poor, H. Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, BC, Canada, 6–12 December 2020. [Google Scholar] [CrossRef]
- Park, J.; Yoon, D.; Yeo, S.; Oh, S. AMBLE: Adjusting Mini-Batch and Local Epoch for Federated Learning with Heterogeneous Devices. J. Parallel. Distrib. Comput. 2022, 170, 13–23. [Google Scholar] [CrossRef]
- Zhang, H.; Xie, Z.; Zarei, R.; Wu, T.; Chen, K. Adaptive Client Selection in Resource Constrained Federated Learning Systems: A Deep Reinforcement Learning Approach. IEEE Access 2021, 9, 98423–98432. [Google Scholar] [CrossRef]
- Zhang, P.; Wang, C.; Jiang, C.; Han, Z. Deep Reinforcement Learning Assisted Federated Learning Algorithm for Data Management of IIoT. IEEE Trans. Industr. Inform. 2021, 17, 8475–8484. [Google Scholar] [CrossRef]
- Wang, H.; Kaplan, Z.; Niu, D.; Li, B. Optimizing Federated Learning on Non-IID Data with Reinforcement Learning. In Proceedings of the IEEE INFOCOM 2020-IEEE Conference on Computer Communications, Toronto, ON, Canada, 6–9 July 2020; pp. 1698–1707. [Google Scholar] [CrossRef]
- Galdran, A.; Carneiro, G.; González Ballester, M.A. Balanced-MixUp for Highly Imbalanced Medical Image Classification. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021, Proceedings of the 24th International Conference, Strasbourg, France, 27 September–1 October 2021; Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2021; Volume 12905, pp. 323–333. [Google Scholar] [CrossRef]
- Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated Optimization in Heterogeneous Networks. arXiv 2018, arXiv:1812.06127. [Google Scholar] [CrossRef]
- Nishio, T.; Yonetani, R. Client Selection for Federated Learning with Heterogeneous Resources in Mobile Edge. In Proceedings of the ICC 2019—2019 IEEE International Conference on Communications (ICC), Shanghai, China, 20–24 May 2019. [Google Scholar] [CrossRef] [Green Version]
- Zheng, J.; Li, K.; Tovar, E.; Guizani, M. Federated Learning for Energy-Balanced Client Selection in Mobile Edge Computing. In Proceedings of the 2021 International Wireless Communications and Mobile Computing, IWCMC 2021, Harbin, China, 28 June–2 July 2021; pp. 1942–1947. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing Atari with Deep Reinforcement Learning. arXiv 2013, arXiv:1312.5602. [Google Scholar] [CrossRef]
- Vinyals, O.; Babuschkin, I.; Czarnecki, W.M.; Mathieu, M.; Dudzik, A.; Chung, J.; Choi, D.H.; Powell, R.; Ewalds, T.; Georgiev, P.; et al. Grandmaster Level in StarCraft II Using Multi-Agent Reinforcement Learning. Nature 2019, 575, 350–354. [Google Scholar] [CrossRef]
- Silver, D.; Hubert, T.; Schrittwieser, J.; Antonoglou, I.; Lai, M.; Guez, A.; Lanctot, M.; Sifre, L.; Kumaran, D.; Graepel, T.; et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv 2017, arXiv:1712.01815. [Google Scholar] [CrossRef]
- Han, M.; Sun, X.; Zheng, S.; Wang, X.; Tan, H. Resource Rationing for Federated Learning with Reinforcement Learning. In Proceedings of the 2021 Computing, Communications and IoT Applications (ComComAp), Shenzhen, China, 26–28 November 2021; pp. 150–155. [Google Scholar] [CrossRef]
- Xiong, Z.; Cheng, Z.; Xu, C.; Lin, X.; Liu, X.; Wang, D.; Luo, X.; Zhang, Y.; Qiao, N.; Zheng, M.; et al. Facing Small and Biased Data Dilemma in Drug Discovery with Federated Learning. bioRxiv 2020. [Google Scholar] [CrossRef] [Green Version]
- Jallepalli, D.; Ravikumar, N.C.; Badarinath, P.V.; Uchil, S.; Suresh, M.A. Federated Learning for Object Detection in Autonomous Vehicles. In Proceedings of the IEEE 7th International Conference on Big Data Computing Service and Applications, BigDataService, Oxford, UK, 23–26 August 2021; pp. 107–114. [Google Scholar] [CrossRef]
- Li, Q.; Diao, Y.; Chen, Q.; He, B. Federated Learning on Non-IID Data Silos: An Experimental Study. In Proceedings of the 2022 IEEE 38th International Conference on Data Engineering (ICDE), Virtual, 9–12 May 2021; pp. 965–978. [Google Scholar] [CrossRef]
- Sunehag, P.; Lever, G.; Gruslys, A.; Marian Czarnecki, W.; Zambaldi, V.; Jaderberg, M.; Lanctot, M.; Sonnerat, N.; Leibo, J.Z.; Tuyls, K.; et al. Value-Decomposition Networks for Cooperative Multi-Agent Learning. arXiv 2017, arXiv:1706.05296. [Google Scholar] [CrossRef]
- Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. Mixup: Beyond Empirical Risk Minimization. In Proceedings of the 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018. Conference Track Proceedings 2017. [Google Scholar] [CrossRef]
- Thulasidasan, S.; Chennupati, G.; Bilmes, J.A.; Bhattacharya, T.; Michalak, S. On Mixup Training: Improved Calibration and Predictive Uncertainty for Deep Neural Networks. Adv. Neural. Inf. Process. Syst. 2019, 32. [Google Scholar]
- Zhou, Z.; Qi, L.; Shi, Y. Generalizable Medical Image Segmentation via Random Amplitude Mixup and Domain-Specific Image Restoration. In Proceedings of the 17th European Conference, Computer Vision–ECCV 2022, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2020; pp. 420–436. [Google Scholar] [CrossRef]
- Sun, L.; Xia, C.; Yin, W.; Liang, T.; Yu, P.S.; He, L. Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 3436–3440. [Google Scholar] [CrossRef]
- Guo, H.; Mao, Y.; Zhang, R. Augmenting Data with Mixup for Sentence Classification: An Empirical Study. arXiv 2019, arXiv:1905.08941. [Google Scholar] [CrossRef]
- Chou, H.P.; Chang, S.C.; Pan, J.Y.; Wei, W.; Juan, D.C. Remix: Rebalanced Mixup. In Proceedings of the Computer Vision–ECCV 2020 Workshops, Glasgow, UK, 23–28 August 2020; Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Volume 12540, pp. 95–110. [Google Scholar] [CrossRef]
- Alam, F.; Alam, T.; Ofli, F.; Imran, M. Social Media Images Classification Models for Real-Time Disaster Response. arXiv 2021, arXiv:2104.04184v1. [Google Scholar] [CrossRef]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
- Tensorflow Federated Using TFF for Federated Learning Research|TensorFlow Federated. Available online: https://www.tensorflow.org/federated/tff_for_research (accessed on 25 December 2022).
- Xu, Z.-Q.J.; Zhang, Y.; Luo, T.; Xiao, Y.; Ma, Z. Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks. Commun. Comput. Phys. 2019, 28, 1746–1767. [Google Scholar] [CrossRef]
- Li, X.; Huang, K.; Yang, W.; Wang, S.; Zhang, Z. On the Convergence of FedAvg on Non-IID Data. arXiv 2019, arXiv:1907.02189. [Google Scholar] [CrossRef]
- Yang, H.; Xiong, Z.; Zhao, J.; Niyato, D.; Xiao, L.; Wu, Q. Deep Reinforcement Learning Based Intelligent Reflecting Surface for Secure Wireless Communications. IEEE Trans. Wirel. Commun. 2020, 20, 375–388. [Google Scholar] [CrossRef]
Method | Resource Optimization | Client Selection | Local Epoch Adjustment |
---|---|---|---|
FedAvg [4] | - | Random | Fixed |
FedProx [23] | - | Random | Fixed |
FedNova [17] | Computing | Random | Flexible |
FAVOR [21] | Computing | DRL Agent | Fixed |
TP-DDPG [9] | Computing + Communication | DRL Agent | Fixed |
Research work [10] | Computing + Communication | DRL Agent | Fixed |
FedMarl [14] | Computing + Communication | DRL Agent | Fixed |
Research work [19] | Computing + Communication | DRL Agent | Fixed |
Research work [20] | Computing + Communication | Random | Fixed |
Research work [29] | Communication | Random | Fixed |
Proposed FedDdrl | Computing + Communication | DRL Agent | DRL Agent |
Notation | Definition |
---|---|
Index of communication round | |
The total number of client devices (IoT devices) | |
The total number of client devices selected at each communication round | |
Index of selected IoT devices at communication round | |
Model broadcasting latency from server to client | |
Probing training latency for client | |
Metadata uploading latency from client to server | |
Model uploading latency from client to server | |
Complete training latency for communication round | |
Communication cost of client | |
Total communication cost for communication round | |
Accuracy of the global model at communication round | |
Global model’s accuracy improvement | |
Client selection matrix at communication round | |
Local epoch count matrix at communication round |
Parameters | Values |
---|---|
Number of agents in each VDN network, | 10 |
Total number of clients, | 100 |
Local training dataset distribution, | 0.8 |
Learning rate for VDN network | 1 × 10−3 |
Target network update interval | 5 |
Number of episodes, | 40 |
Number of clients selected for training in each round, | 10 |
Default number of local epochs (before adjustment by FedDdrl), | 5 |
Number of communication rounds, | 15 |
Batch size to update VDN agents, | 32 |
Initial -greedy exploration value | 1 |
Final -greedy exploration value | 0.1 |
Replay memory size | 300 |
VDN 1 agent (MLP) size | 10 × 256 × 256 × 2 |
VDN 2 agent (MLP) size | 10 × 256 × 256 × 3 |
Method | MNIST | CIFAR-10 | CrisisIBD | |
---|---|---|---|---|
FedAvg | 94.6% ± 2.1% | 72.8% ± 3.9% | 43.2% ± 5.5% | |
FedAvg with Balanced-MixUp | 93.2% ± 2.0% | 76.5% ± 1.7% | 60.2% ± 1.5% | |
FedProx ( = 0.01) | 95.6% ± 0.5% | 74.5% ± 0.2% | 48.1% ± 2.9% | |
FedProx ( = 0.01) with Balanced-MixUp | 95.4% ± 0.7% | 77.8% ± 0.5% | 60.7% ± 2.0% | |
A: | FedMarl (=1.0, =0.1, =0.2) | 91.5% ± 1.1% | 65.5% ± 2.3% | 42.4% ± 3.6% |
B: | A + Optimized (=2.9, =0.1, =0.2) | 93.2% ± 1.4% | 71.7% ± 2.9% | 44.4% ± 3.9% |
C: | B + Balanced-MixUp | 93.3% ± 1.2% | 75.0% ± 2.6% | 63.3% ± 2.0% |
D: | C + Local Epoch Adjustment (FedDdrl) | 94.9% ± 1.1% | 78.2% ± 2.4% | 64.2% ± 1.4% |
Method | MNIST | CIFAR-10 | CrisisIBD | |
---|---|---|---|---|
FedAvg | 78.9% ± 9.3% | 62.7% ± 2.9% | 42.0% ± 3.4% | |
FedAvg with Balanced-MixUp | 88.1% ± 3.6% | 69.4% ± 2.4% | 52.9% ± 4.2% | |
FedAvg | 94.6% ± 2.1% | 72.8% ± 3.9% | 43.2% ± 5.5% | |
FedAvg with Balanced-MixUp | 93.2% ± 2.0% | 76.5% ± 1.7% | 60.2% ± 1.5% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wong, Y.J.; Tham, M.-L.; Kwan, B.-H.; Owada, Y. FedDdrl: Federated Double Deep Reinforcement Learning for Heterogeneous IoT with Adaptive Early Client Termination and Local Epoch Adjustment. Sensors 2023, 23, 2494. https://doi.org/10.3390/s23052494
Wong YJ, Tham M-L, Kwan B-H, Owada Y. FedDdrl: Federated Double Deep Reinforcement Learning for Heterogeneous IoT with Adaptive Early Client Termination and Local Epoch Adjustment. Sensors. 2023; 23(5):2494. https://doi.org/10.3390/s23052494
Chicago/Turabian StyleWong, Yi Jie, Mau-Luen Tham, Ban-Hoe Kwan, and Yasunori Owada. 2023. "FedDdrl: Federated Double Deep Reinforcement Learning for Heterogeneous IoT with Adaptive Early Client Termination and Local Epoch Adjustment" Sensors 23, no. 5: 2494. https://doi.org/10.3390/s23052494
APA StyleWong, Y. J., Tham, M.-L., Kwan, B.-H., & Owada, Y. (2023). FedDdrl: Federated Double Deep Reinforcement Learning for Heterogeneous IoT with Adaptive Early Client Termination and Local Epoch Adjustment. Sensors, 23(5), 2494. https://doi.org/10.3390/s23052494