Example: FedMGDA+

This section describes how to use FLGo to implement the algorithm that makes changes during the aggregation phase. The example used here is FedMGDA+, an algorithm proposed by Hu et al. in 2020 and published in IEEE Transactions on Network Science and Engineering 2022. Inspired by the Multi-Gradient Descent Algorithm (MGDA) in multi-objective optimization, it applies MGDA to the aggregation stage of FL, so as to prevent the aggregated model update from harming the interests of any party (i.e. the inner product of the global update amount and any user’s update amount is non-negative).

Compared to FedAvg, FedMGDA+ only differs in the aggregation stage. It finds an aggregation weight that deconflict the global model update and local model updates by solving the following problem

\[\mathbf{\lambda}^*=\min_{\mathbf{\lambda}}||\sum_{i\in \mathbb{S}_t}\lambda_i d_i||_2\\s.t. \|\mathbf{\lambda}-\mathbf{\lambda_0}\|_{\infty}\le \epsilon, \mathbf{1}^\top\mathbf{\lambda}=1\]

where \(\Delta\theta_i = \theta^t-\theta^{t+1}_i, d_i=\frac{\Delta\theta_i}{\|\Delta\theta_i\|}\)

Then, update the global model with the global learning rate

\[\theta^{t+1}=\theta^t-\eta \sum_{i} \lambda_i d_i\]

Implementation

First notice that the FedMGDA+ algorithm has two hyperparameters: \(\eta\) and \(\epsilon\), so add the algorithm hyperparameters in the initialization method initialize; For the aggregation part, the aggregate function receives the parameter as models by default, so the steps to implement aggregate are strictly followed by the above three steps. Note that the part of optimizing the weights is left to the self.optim_lambda method implementation, which receives the current set of gradients, the original weights, and the use of the hyperparameter self.eta to find the optimal weights (the implementation of the methods of reading optim_lambda and quadprog can be skipped for now, which are unrelated to the main flow of FLGo). The CVXOPT library (which can be installed via pip install CVXOPT) is used here to solve the minimum norm problem in step 2. If the cvxopt library is not installed, run the following command to install it:

!pip install cvxopt

The code of FedMGDA+ is as follows:

from flgo.utils import fmodule
from flgo.algorithm.fedbase import BasicServer
from flgo.algorithm.fedavg import Client
import flgo
import os
import numpy as np
import copy
import cvxopt

class Server(BasicServer):
    def initialize(self, *args, **kwargs):
        # init hyper-parameters
        self.init_algo_para({'eta':1.0, 'epsilon':0.1})

    ##############################################################
    ########## Overwrite the aggregate method#####################
    def aggregate(self, models: list, *args, **kwargs):
        # 1. calculate normalized gradients
        grads = [self.model - w for w in models]
        for gi in grads: gi.normalize()

        # 2. calculate λ0
        nks = [len(self.clients[cid].train_data) for cid in self.received_clients]
        nt = sum(nks)
        lambda0 = [1.0 * nk / nt for nk in nks]
        # 3. optimize lambdas to minimize ||λ'g||² s.t. λ∈Δ, ||λ - λ0||∞ <= ε
        op_lambda = self.optim_lambda(grads, lambda0)
        op_lambda = [ele[0] for ele in op_lambda]

        # 4. aggregate grads
        dt = fmodule._model_average(grads, op_lambda)
        return self.model - dt * self.eta

    def optim_lambda(self, grads, lambda0):
        # create H_m*m = 2J'J where J=[grad_i]_n*m
        n = len(grads)
        Jt = []
        for gi in grads:
            Jt.append((copy.deepcopy(fmodule._modeldict_to_tensor1D(gi.state_dict())).cpu()).numpy())
        Jt = np.array(Jt)
        # target function
        P = 2 * np.dot(Jt, Jt.T)

        q = np.array([[0] for i in range(n)])
        # equality constraint λ∈Δ
        A = np.ones(n).T
        b = np.array([1])
        # boundary
        lb = np.array([max(0, lambda0[i] - self.epsilon) for i in range(n)])
        ub = np.array([min(1, lambda0[i] + self.epsilon) for i in range(n)])
        G = np.zeros((2*n,n))
        for i in range(n):
            G[i][i]=-1
            G[n+i][i]=1
        h = np.zeros((2*n,1))
        for i in range(n):
            h[i] = -lb[i]
            h[n+i] = ub[i]
        res=self.quadprog(P, q, G, h, A, b)
        return res

    def quadprog(self, P, q, G, h, A, b):
        """
        Input: Numpy arrays, the format follows MATLAB quadprog function: https://www.mathworks.com/help/optim/ug/quadprog.html
        Output: Numpy array of the solution
        """
        P = cvxopt.matrix(P.tolist())
        q = cvxopt.matrix(q.tolist(), tc='d')
        G = cvxopt.matrix(G.tolist())
        h = cvxopt.matrix(h.tolist())
        A = cvxopt.matrix(A.tolist())
        b = cvxopt.matrix(b.tolist(), tc='d')
        sol = cvxopt.solvers.qp(P, q.T, G.T, h.T, A.T, b)
        return np.array(sol['x'])

# Construct FedMGDA+
class fedmgda:
    Server = Server
    Client = Client

Experiment

import flgo.algorithm.fedavg as fedavg
task = './test_cifar10'
config = {'benchmark':{'name':'flgo.benchmark.cifar10_classification'},'partitioner':{'name': 'DiversityPartitioner','para':{'num_clients':100, 'diversity':0.2}}}
if not os.path.exists(task): flgo.gen_task(config, task_path = task)
option = {'learning_rate':0.01, 'num_steps':5, 'num_rounds':200,'gpu':0}

fedavg_runner = flgo.init(task, fedavg, option=option)
fedmgda_runner_eta1epsilon01 = flgo.init(task, fedmgda, option=option)
fedmgda_runner_eta05epsilon01 = flgo.init(task, fedmgda, option={'learning_rate':0.01, 'num_steps':5, 'num_rounds':200,'gpu':0, 'algo_para':[0.5, 0.1]})
fedmgda_runner_eta01epsilon01 = flgo.init(task, fedmgda, option={'learning_rate':0.01, 'num_steps':5, 'num_rounds':200,'gpu':0, 'algo_para':[0.1, 0.1]})

fedavg_runner.run()
fedmgda_runner_eta1epsilon01.run()
fedmgda_runner_eta05epsilon01.run()
fedmgda_runner_eta01epsilon01.run()