Auto-weighted Robust Federated Learning with Corrupted Data Sources

Published: 11 June 2022


Federated learning provides a communication-efficient and privacy-preserving training process by enabling learning statistical models with massive participants without accessing their local data. Standard federated learning techniques that naively minimize an average loss function are vulnerable to data corruptions from outliers, systematic mislabeling, or even adversaries. In this article, we address this challenge by proposing Auto-weighted Robust Federated Learning (ARFL), a novel approach that jointly learns the global model and the weights of local updates to provide robustness against corrupted data sources. We prove a learning bound on the expected loss with respect to the predictor and the weights of clients, which guides the definition of the objective for robust federated learning. We present an objective that minimizes the weighted sum of empirical risk of clients with a regularization term, where the weights can be allocated by comparing the empirical risk of each client with the average empirical risk of the best \(p\) clients. This method can downweight the clients with significantly higher losses, thereby lowering their contributions to the global model. We show that this approach achieves robustness when the data of corrupted clients is distributed differently from the benign ones. To optimize the objective function, we propose a communication-efficient algorithm based on the blockwise minimization paradigm. We conduct extensive experiments on multiple benchmark datasets, including CIFAR-10, FEMNIST, and Shakespeare, considering different neural network models. The results show that our solution is robust against different scenarios, including label shuffling, label flipping, and noisy features, and outperforms the state-of-the-art methods in most scenarios.

A Proof of Theorem 1

\begin{equation} \mathcal {L}_{\mathcal {D}_{\mathbf {\alpha }}}\left(h\right) \le \hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}\left(h\right) + \sup _{f\in \mathcal {H}}\left(\mathcal {L}_{\mathcal {D}_{\mathbf {\alpha }}}\left(f\right)- \hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}\left(f\right)\right) . \end{equation}
To link the second term to its expectation, we prove the following:
Lemma 1.
Define the function \(\phi :\left(\mathcal {X}\times \mathcal {Y}\right)^m \rightarrow \mathbb {R}\) by:
\[\phi \left(\lbrace x_{1,1}, y_{1,1}\rbrace , \ldots , \lbrace x_{N, m_N}, y_{N, m_N}\rbrace \right) = \sup _{f\in \mathcal {H}}\left(\mathcal {L}_{\mathcal {D}_{\mathbf {\alpha }}}\left(f\right)- \hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}\left(f\right)\right).\]
Denote for brevity \(z_{i,j} = \lbrace x_{i,j}, y_{i,j}\rbrace\). Then, for any \(i \in \lbrace 1, 2, \ldots , N\rbrace , j \in \lbrace 1, 2, \ldots , m_i\rbrace\):
\begin{equation} \begin{split} \sup _{z_{1,1}, \ldots , z_{N, m_N}, z_{i,j}^{^{\prime }}} |\phi (z_{1,1},\ldots , z_{i,j}, \ldots , z_{N, m_N}) - \phi (z_{1,1}, \ldots , z_{i,j}^{^{\prime }}, \ldots , z_{N, m_N})| \le \frac{\alpha _i}{m_i}\mathcal {M} \end{split} . \end{equation}
Fix any \(i, j\) and any \(z_{1,1}, \ldots , z_{N, m_N}, z_{i,j}^{^{\prime }}\). Denote the \(\alpha\)-weighted empirical average of the loss with respect to the sample \(z_{1,1}, \ldots , z_{i,j}^{^{\prime }}, \ldots , z_{N, m_N}\) by \(\mathcal {L}_{\mathcal {D}_{\mathbf {\alpha }}}^{^{\prime }}\). Then, we have that:
\begin{align*} |\phi (\ldots , z_{i,j}, \ldots) - \phi (\ldots , z_{i,j}^{^{\prime }}, \ldots)| & = |\sup _{f\in \mathcal {H}}\left(\mathcal {L}_{\mathcal {D}_{\mathbf {\alpha }}}\left(f\right) - \hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}\left(f\right)\right) - \sup _{f\in \mathcal {H}} \left(\mathcal {L}_{\mathcal {D}_{\mathbf {\alpha }}}\left(f\right) - \hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}^{^{\prime }}\left(f\right)\right)| \\ & \le |\sup _{f\in \mathcal {H}}\left(\hat{\mathcal {L}}^{^{\prime }}_{\mathcal {D}_{\mathbf {\alpha }}}\left(f\right) - \hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}\left(f\right)\right)| \\ & = \frac{\alpha _i}{m_i}|\sup _{f\in \mathcal {H}}\left(\ell _f(z^{\prime }_{i,j}) - \ell _f(z_{i,j})\right)| \\ & \le \frac{\alpha _i}{m_i}\mathcal {M} . \end{align*}
Note: The inequality we used above holds for bounded functions inside the supremum.□
Let \(S\) denote a random sample of size \(m\) drawn from a distribution as the one generating out data (i.e., \(m_i\) samples from \(\mathcal {D}_i\) for each \(i\)). Now, using Lemma 1, McDiarmid’s inequality gives:
\begin{equation*} \begin{split} \mathbb {P}\left(\phi (S) - \mathbb {E}(\phi (S)) \ge t\right) & \le \exp \left(-\frac{2t^2}{\sum _{i=1}^N\sum _{j=1}^{m_i}\frac{\alpha _i^2}{m_i^2}\mathcal {M}^2} \right) \\ & = \exp \left(-\frac{2t^2}{\mathcal {M}^2\sum _{i=1}^N \frac{\alpha _i^2}{m_i}}\right) . \end{split} \end{equation*}
For any \(\delta \gt 0\), setting the right-hand side above to be \(\delta /4\) and using Equation (12), we obtain that with probability at least \(1-\delta /4\):
\begin{equation} \begin{split} \mathcal {L}_{\mathcal {D}_{\mathbf {\alpha }}}\left(h\right) \le \hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}\left(h\right) & + \mathbb {E}_S\left(\sup _{f\in \mathcal {H}}\left(\mathcal {L}_{\mathcal {D}_{\mathbf {\alpha }}} (f) - \hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}(f)\right)\right) + \sqrt {\frac{\log \left(\frac{4}{\delta }\right)\mathcal {M}^2}{2}}\sqrt {\sum _{i=1}^N\frac{\alpha _i^2}{m_i}} . \end{split} \end{equation}
To deal with the expected loss inside the second term, introduce a ghost sample (denoted by \(S^{\prime }\)), drawn from the same distributions as our original sample (denoted by \(S\)). Denoting the weighted empirical loss with respect to the ghost sample by \(\mathcal {L}_{\mathcal {D}_{\mathbf {\alpha }}}^{^{\prime }}\), \(\beta _i = m_i/m\) for all \(i\), and using the convexity of the supremum, we obtain:
\begin{equation*} \begin{split} \mathbb {E}_S \left(\sup _{f\in \mathcal {H}}\left(\mathcal {L}_{\mathcal {D}_{\mathbf {\alpha }}} (f) - \hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}(f)\right)\right) & = \mathbb {E}_{S}\left(\sup _{f\in \mathcal {H}}\left(\mathbb {E}_{S^{\prime }}\left(\hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}^{^{\prime }}(f)\right) - \hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}(f)\right)\right) \\ & \le \mathbb {E}_{S, S^{\prime }} \left(\sup _{f\in \mathcal {H}}\left(\hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}^{^{\prime }}(f) - \hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}(f) \right)\right) \\ & = \mathbb {E}_{S, S^{\prime }}\left(\sup _{f\in \mathcal {H}}\left(\frac{1}{m}\sum _{i=1}^N\sum _{j=1}^{m_i}\frac{\alpha _i}{\beta _i}\left(\ell _f(z^{\prime }_{i,j}) \right. \right. \right. \left. \left. \left. - \ell _f(z_{i,j}) \vphantom{L^{^{\prime }}}\right) \vphantom{\frac{1}{m}\sum _{i=1}^N}\right)\right) . \end{split} \end{equation*}
Introducing \(m\) independent Rademacher random variables and noting that \((\ell _f(z^{\prime }) - \ell _f(z))\) and \(\sigma (\ell _f(z^{\prime }) - \ell _f(z))\) have the same distribution, as long as \(\mathbf {z}\) and \(\mathbf {z}^{\prime }\) have the same distribution:
\begin{equation*} \begin{split} \mathbb {E}_S \left(\sup _{f\in \mathcal {H}}\left(\mathcal {L}_{\mathcal {D}_{\mathbf {\alpha }}} (f) - \hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}(f)\right)\right) & \le \mathbb {E}_{S, S^{\prime }, \sigma }\left(\sup _{f\in \mathcal {H}}\left(\frac{1}{m}\sum _{i=1}^N\sum _{j=1}^{m_i}\frac{\alpha _i}{\beta _i}\sigma _{i,j}\left(\ell _f(z_{i,j})^{^{\prime }}) \right. \right. \right. \left. \left. \left. -\, \ell _f(z_{i,j}) \vphantom{L^{^{\prime }}}\right) \vphantom{\frac{1}{m}\sum _{i=1}^N}\right)\right) \\ & \le \mathbb {E}_{S^{^{\prime }}, \sigma }\left(\sup _{f\in \mathcal {H}}\left(\frac{1}{m}\sum _{i=1}^N\sum _{j=1}^{m_i}\frac{\alpha _{i}}{\beta _{i}}\sigma _{i,j}\ell _f(z_{i,j})\right)\right) \\ &\quad + \mathbb {E}_{S, \sigma }\left(\sup _{f\in \mathcal {H}}\left(\frac{1}{m}\sum _{i=1}^N\sum _{j=1}^{m_i}\frac{\alpha _{i}}{\beta _{i}}(-\sigma _{i,j})\ell _f(z_{i,j})\right)\right) \\ & = 2\mathbb {E}_{S, \sigma }\left(\sup _{f\in \mathcal {H}}\left(\frac{1}{m}\sum _{i=1}^N\sum _{j=1}^{m_i}\frac{\alpha _{i}}{\beta _{i}}\sigma _{i,j}\ell _f(z_{i,j})\right)\right). \end{split} \end{equation*}
We can now link the last term to the empirical analog of the Rademacher complexity by using the McDiarmid Inequality (with an observation similar to Lemma 1). Putting this together, we obtain that for any \(\delta \gt 0\) with probability at least \(1 - \delta /2\):
\begin{equation} \begin{split} \mathcal {L}_{\mathcal {D}_{\mathbf {\alpha }}}\left(h\right) & \le \hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}} \left(h\right) + 2\mathbb {E}_{\sigma }\left(\sup _{f\in \mathcal {H}}\left(\frac{1}{m}\sum _{i=1}^N\sum _{j=1}^{m_i}\frac{\alpha _{i}}{\beta _{i}}\sigma _{i,j}\ell _f(z_{i,j})\right)\right) + 3 \sqrt {\frac{\log \left(\frac{4}{\delta }\right)M^2}{2}}\sqrt {\sum _{i=1}^N\frac{\alpha _i^2}{m_i}} . \end{split} \end{equation}
Finally, note that:
\begin{align*} \mathbb {E}_{\sigma } \left(\sup _{f\in \mathcal {H}}\left(\frac{1}{m}\sum _{i=1}^N\sum _{j=1}^{m_i}\frac{\alpha _{i}}{\beta _{i}}\sigma _{i,j}\ell _f(z_{i,j})\right)\right) & \le \mathbb {E}_{\sigma }\left(\sum _{i=1}^{N}\alpha _i\sup _{f\in \mathcal {H}}\left(\frac{1}{m_i}\sum _{j=1}^{m_i}\sigma _{i,j}\ell _f(z_{i,j})\right)\right) \\ & = \sum _{i=1}^N \alpha _i \mathbb {E}_{\sigma }\left(\sup _{f\in \mathcal {H}}\left(\frac{1}{m_i}\sum _{j=1}^{m_i}\sigma _{i,j}\ell _f(z_{i,j})\right)\right) \\ & = \sum _{i=1}^N \alpha _i \mathcal {R}_i \left(\mathcal {H}\right) . \end{align*}
Bounding \(\hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}(h) - \mathcal {L}_{\mathcal {D}_{\mathbf {\alpha }}}(h)\) with the same quantity and with probability at least \(1 - \delta /2\) follows by a similar argument. The result then follows by applying the union bound.

B Proof of Theorem 2

The Lagrangian function of Equation (6) is
\begin{equation} \mathbb {L} = \mathbf {\alpha }^\top {\hat{\mathcal {L}}}(\mathbf {w}) + \frac{\lambda }{2} || \mathbf {\alpha }^{\top } \mathbf {m}^{\circ - \frac{1}{2}} ||^2_2 - \mathbf {\alpha }^{\top } \mathbf {\beta } - \eta (\mathbf {\alpha }^{\top } \mathbf {1} - 1), \end{equation}
where \(\hat{\mathcal {L}}(\mathbf {w}) = [\hat{\mathcal {L}}_1(\mathbf {w}),\hat{\mathcal {L}}_2(\mathbf {w}),\ldots , \hat{\mathcal {L}}_N(\mathbf {w})]^\intercal\), \(\circ\) is the Hadamard root operation, \(\mathbf {\beta }\) and \(\eta\) are the Lagrangian multipliers. Then, the following Karush-Kuhn-Tucker (KKT) conditions hold:
\begin{align} \partial _{\mathbf {\alpha }} \mathbb {L}(\mathbf {\alpha }, \mathbf {\beta }, \eta) &= 0 , \end{align}
\begin{align} \mathbf {\alpha }^\intercal \mathbf {1} - 1 &= 0, \end{align}
\begin{align} \mathbf {\alpha } &\ge 0, \end{align}
\begin{align} \mathbf {\beta } &\ge 0, \end{align}
\begin{align} \alpha _i \beta _i &= 0 , \forall i = 1, 2,\ldots N. \end{align}
According to Equation (17), we have:
\begin{equation} \alpha _i = \frac{m_i(\beta _i + \eta - \hat{\mathcal {L}}_i(\mathbf {w}))}{\lambda }. \end{equation}
Since \(\beta _i \ge 0\), we discuss the following cases:
When \(\beta _i = 0\), we have \(\alpha _i = \frac{m_i(\eta - \hat{\mathcal {L}}_i(\mathbf {w}))}{\lambda } \ge 0\). Note that we further have \(\eta - \hat{\mathcal {L}}_i(\mathbf {w}) \ge 0\).
When \(\beta _i \gt 0\), from the condition \(\alpha _i \beta _i = 0\), we have \(\alpha _i = 0\).
Therefore, the optimal solution to Equation (6) is given by:
\begin{equation} \alpha _i(\mathbf {w}) = \left[\frac{m_i (\eta - \hat{\mathcal {L}}_i(\mathbf {w}))}{\lambda }\right]_{+}, \end{equation}
where \([\cdot ]_+ = max(0, \cdot)\).
We notice that \(\sum _{i=1}^p \alpha _i = 1\), thus we can get:
\begin{equation} \eta = \frac{\sum _{i=1}^{p} m_i \hat{\mathcal {L}}_i(\mathbf {w}) + \lambda }{\sum _{i=1}^{p} m_i}. \end{equation}
According to \(\eta - \hat{\mathcal {L}}_i(\mathbf {w}) \ge 0\), we have Equations (7) and (8). Finally, plugging Equation (24) into Equation (23) yields Equation (9).□


