1.1 Descriptor of FL

We first introduce how we describe FL in our framework. We use the API flgo.init to create a federated runner to finish a run of FL, which is described as below:

def init(task: str, algorithm, option = {}, model=None, Logger: flgo.experiment.logger.BasicLogger = flgo.experiment.logger.simple_logger.SimpleLogger, Simulator: BasicSimulator=flgo.simulator.DefaultSimulator, scene='horizontal'):
    r"""
    Initialize a runner in FLGo, which is to optimize a model on a specific task (i.e. IID-mnist-of-100-clients) by the selected federated algorithm.
    :param
        task (str): the dictionary of the federated task
        algorithm (module || class): the algorithm will be used to optimize the model in federated manner, which must contain pre-defined attributions (e.g. algorithm.Server and algorithm.Client for horizontal federated learning)
        option (dict || str): the configurations of training, environment, algorithm, logger and simulator
        model (module || class): the model module that contains two methods: model.init_local_module(object) and model.init_global_module(object)
        Logger (class): the class of the logger inherited from flgo.experiment.logger.BasicLogger
        Simulator (class): the class of the simulator inherited from flgo.simulator.BasicSimulator
        scene (str): 'horizontal' or 'vertical' in current version of FLGo
    :return
        runner: the object instance that has the method runner.run()
    """
    ...

Each run of a federated training process aims to optimize a given model on a specific task by using an algorithm with some hyper-parameter (i.e. option) under a particular environment (e.g. scene, hardware condition).

The term model usually shares the same meaning with centralized ML. The term task describes how the datasets are distributed among clients and some task-specific configuration (e.g. the dataset information, the target). Algorithm is the used optimization strategy and option contains several running-time option like learning rate and the number of training rounds. The hardware condition is simulated by the Simulator. For example, different clients may have different computing power, network latency, communication bandwidth. Finally, the scene refers to the four main paradigm in FL: Horizontal FL, Vertical FL, Decentralized FL and Hierarchical FL, as shown in Figure 1.

(a) Horizontal FL: a server coordinates different clients to collaboratively train the model. Particularly, each clients owns different samples, and each sample is with full features and labels.
(b) Vertical FL: an active party (i.e. label owner) coordinates other passive parties to improve the model performance for its local objective. Particularly, different parties own different dimensions of the feature of each sample, and different data owners will shares a set of the same sample IDs.
(c) Hierarchical FL: edge servers are responsible for coordinating their themselves clients, and a global server coordinates different edge servers to train the model.
(d) Decentralized FL: clients directly communicates with other clients to collaboratively maintain a global model or improve their own local models under specific communication protocols (e.g. line, ring, full).

Figure_1

Finally, from the view of doing experiments, we add the term Logger to customizely log the variables of interest (e.g. model checkpoints, training-time performance). Some options of experiments (e.g. device, the number of processes) are also contained in the term option.