Back to Articles

Flower: Building Federated Learning Systems Without Centralizing Your Data

[ View on GitHub ]

Flower: Building Federated Learning Systems Without Centralizing Your Data

Hook

Your users’ most valuable training data will never reach your servers—and with federated learning frameworks like Flower, it doesn’t need to.

Context

Traditional machine learning operates on a simple assumption: centralize your data, train your model, deploy it back to the edge. But this paradigm breaks down when data can’t be centralized—whether due to privacy regulations (GDPR, HIPAA), bandwidth constraints, or user trust concerns. Healthcare providers can’t share patient records. Mobile keyboard apps can’t slurp up everything users type. IoT devices generate terabytes of sensor data that’s expensive to transmit.

Federated learning flips the script: instead of bringing data to the model, you bring the model to the data. Clients train on local datasets, send only model updates back to a central server, which aggregates them into a global model. Google pioneered this for Gboard’s next-word prediction, training on millions of phones without ever seeing what users typed. But Google’s infrastructure isn’t available to most teams. Flower originated from a research project at the University of Oxford to make federated learning accessible, providing a production-ready framework that works with any ML library and deploys anywhere—from Android phones to Raspberry Pis to hospital servers.

Technical Insight

Client N

Client 2

Client 1

Communication

Select clients & config

Training instructions

Training instructions

Training instructions

Local training

Local training

Local training

Model updates

Model updates

Model updates

Collect updates

Aggregated model

Next round

Flower Server

Orchestrator

Aggregation Strategy

FedAvg/FedProx

gRPC Protocol Layer

Local Model

PyTorch/TF

Private Dataset

Local Model

PyTorch/TF

Private Dataset

Local Model

PyTorch/TF

Private Dataset

System architecture — auto-generated

Flower’s architecture separates three critical concerns: communication protocols, aggregation strategies, and ML framework integration. At its core is a client-server model where the server orchestrates training rounds while clients maintain complete control over their local data.

The brilliance lies in Flower’s framework-agnostic design. Unlike TensorFlow Federated (which locks you into TensorFlow), Flower treats your ML framework as a black box. Here’s a minimal PyTorch client:

import flwr as fl
import torch

class FlowerClient(fl.client.NumPyClient):
    def __init__(self, model, trainloader, valloader):
        self.model = model
        self.trainloader = trainloader
        self.valloader = valloader

    def get_parameters(self, config):
        return [val.cpu().numpy() for _, val in self.model.state_dict().items()]

    def fit(self, parameters, config):
        self.set_parameters(parameters)
        # Train with your existing PyTorch code
        train(self.model, self.trainloader, epochs=1)
        return self.get_parameters(config), len(self.trainloader), {}

    def evaluate(self, parameters, config):
        self.set_parameters(parameters)
        loss, accuracy = test(self.model, self.valloader)
        return float(loss), len(self.valloader), {"accuracy": float(accuracy)}

    def set_parameters(self, parameters):
        params_dict = zip(self.model.state_dict().keys(), parameters)
        state_dict = {k: torch.tensor(v) for k, v in params_dict}
        self.model.load_state_dict(state_dict, strict=True)

fl.client.start_client(server_address="localhost:8080", client=FlowerClient(model, trainloader, valloader))

Notice that Flower doesn’t dictate your training loop—train() and test() are your own functions. The framework only cares about three operations: getting parameters, training (fit), and evaluation. This abstraction means the same federated learning infrastructure works whether you’re using PyTorch, TensorFlow, JAX, or even scikit-learn for linear models.

On the server side, Flower provides pluggable strategies that define how model updates get aggregated. The default FedAvg (Federated Averaging) performs a weighted average based on dataset sizes, but you can swap in FedProx for handling heterogeneous data distributions, or implement custom strategies:

import flwr as fl
from flwr.server.strategy import FedAvg

strategy = FedAvg(
    fraction_fit=0.1,  # Sample 10% of clients each round
    fraction_evaluate=0.05,
    min_fit_clients=10,
    min_evaluate_clients=5,
    min_available_clients=50,
)

fl.server.start_server(
    server_address="0.0.0.0:8080",
    config=fl.server.ServerConfig(num_rounds=100),
    strategy=strategy,
)

Flower’s strategy system is where research happens. The framework includes baselines reproducing papers like FedProx, FedBN, and FedMeta—complete implementations you can use as starting points. Want to experiment with differential privacy? Implement a custom strategy that clips and adds noise to gradients before aggregation. Testing a new aggregation algorithm? Subclass Strategy and override aggregate_fit().

The cross-platform story is particularly compelling. Flower provides quickstarts for Android (using TFLite) and iOS (using CoreML), enabling genuine on-device learning. The same Python server can orchestrate training across a fleet of mobile devices, edge servers, and cloud instances simultaneously. This heterogeneity is intentional—Flower’s design assumes clients have different computational capabilities, network conditions, and data distributions.

Gotcha

Flower makes federated learning accessible, but it can’t eliminate the inherent complexity of distributed systems. Debugging distributed federated learning can be challenging: when training fails, the issue could be a network timeout on a client, data-specific problems, or serialization bugs in custom strategies.

Performance considerations are important to understand. Every training round requires serializing model weights (potentially hundreds of megabytes for large transformers), transmitting them over networks of varying quality, and deserializing on clients. With high-latency connections or frequent client failures, convergence can be significantly slower than centralized training. The framework appears to leave optimizations like model compression or gradient quantization to user implementation.

The framework’s privacy story is more nuanced than the concept of federated learning might suggest. Yes, raw data stays local, but model updates can potentially leak information. Flower’s README emphasizes framework flexibility and customization but doesn’t explicitly describe built-in secure aggregation or differential privacy guarantees. For privacy-sensitive applications, you may need to integrate additional privacy-preserving techniques, which requires specialized expertise.

Verdict

Use Flower if you’re building ML systems where data genuinely can’t be centralized—medical imaging across hospitals, predictive maintenance on industrial equipment, personalized models on user devices—or if you’re conducting federated learning research and need framework flexibility plus battle-tested baselines. It’s the best option for teams that need to support heterogeneous ML frameworks (some clients using PyTorch, others TensorFlow) or deploy across wildly different hardware (servers, phones, Raspberry Pis). The framework’s maturity, active community (6,732 stars), and comprehensive documentation make it production-ready. Skip it if you can centralize your data without privacy/regulatory concerns—traditional distributed training (PyTorch DDP, Horovod) will be simpler, faster, and easier to debug. Also skip if you need production-grade differential privacy or secure aggregation as turnkey solutions rather than research experiments, since these would require additional implementation work on top of Flower’s core framework.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/data-knowledge/flwrlabs-flower.svg)](https://starlog.is/api/badge-click/data-knowledge/flwrlabs-flower)