HFL Model Training with a Sample Local Dataset
An end-to-end tutorial for how to train an existing model with a synthetic dataset on a local machine.
To help you get started, we've put together a tutorial based on synthetic data, with pre-built configuration files. In this tutorial, you will be training a federated feedforward neural network (iai_ffn
) using data from two datasets. The datasets, model, and data configuration are provided for you.
The sample notebook (integrateai_api.ipynb) is an interactive tool for exploring the SDK, and should be used in parallel with this tutorial. This documentation provides supplementary and conceptual information to expand on the code demonstration.
Prerequisites
Complete the Environment Setupfor your local machine.
Open the
integrateai_api.ipynb
notebook to test the code as you walk through this exercise.
Understanding Models
integrate.ai has a standard model class available for Feedforward Neural Nets (iai_ffn) and Generalized Linear Models (iai_glm). These standard models are defined using JSON configuration files during session creation.
The example below is a model provided by integrate.ai.
Review the sample model configuration
The model configuration is a JSON object that contains the model parameters for the session. There are five main properties with specific key-value pairs used to configure the model: strategy
, model
, ml-task
, optimizer
, and differential_privacy_params
.
For this tutorial, you do not need to change any of the values.
Example JSON:
Review the sample data configuration
The data configuration is a JSON object where the user specifies predictor
and target
columns that are used to describe input data. This is the same structure for both GLM and FNN.
Now that you've reviewed the model and data configuration, the next step is to create a training session to begin working with the model and datasets.
Create and Start the Session
Federated learning models created in integrate.ai are trained through sessions. You define the parameters required to train a federated model, including data and model configurations, in a session.
Create a session each time you want to train a new model.
The following code sample demonstrates creating and starting a session with two training clients (two datasets) and two rounds. It returns a session ID that you can use to track and reference your session.
Join the Session
The next step is to join the session with the sample data. This example has data for two datasets simulating two clients, as specified with the min_num_clients
argument. Therefore, to run this example, you will call subprocess.Popen
twice to connect each dataset to the session as a separate client.
The session begins training once the minimum number of clients have joined the session.
Each client runs as a separate Docker container to simulate distributed data silos.
If you extracted the contents of the sample file to a different location than the default, change the data_path
in the sample code before attempting to run it.
Example:
where
data_path
is the path to the sample data on your local machineIAI_TOKEN
is your access tokensession.id
is the ID returned by the previous step (Create and Start the Session)train-path
is the path to and name of the sample dataset file
Poll for Session Results
Sessions take some time to run. In the sample notebook and this tutorial, we poll the server to determine the session status.
You can log information about the session during this time. In this example, we are logging the current round and the clients that have joined the session.
Another popular option is to log the session.metrics().as_dict()
to view the in-progress training metrics.
Session Complete
Congratulations, you have your first federated model! You can test it by making predictions. For more information, see Making Predictions.
Last updated