Join a Session

Once a session has started, users can join their datasets to contribute to the federated model.

Open a terminal window and follow these steps to connect to the Docker container and join a dataset to a session that is in progress.

Before starting, be sure that you have set up a Docker image and prepared your data.

You may incur an error if there are docker containers running with duplicate names. Use Docker Desktop to validate and remove duplicates.

Step 1: Generate token

Generate a JSON Web Token to use for authentication while interacting with the integrate.ai Federated Learning client. Find “Tokens” under "Client Management" in the navigation bar.

Make sure to keep this token safe! Treat it like a password. This token is used for authentication.

The token will automatically expire if it has not been used for 30 days. A token can also be manually revoked in the Token Management page.

Step 2: Create a directory where the training data is stored

The data you prepared for this session needs to be accessible to use for training.

Find your train and test datasets. If necessary, copy them to the same directory on the machine where the docker container will run. Make note of their file location, it will be required in Step 3.

For example if the train and test sets are in ~/data/session-test that will become the <data_path> in the next step.

Step 3: Start the Docker container

Run the following commands to start the docker container and login to the container’s shell.

In line 1: Assign a name to this data silo, replacing <silo_name>

In line 2: Specify the <data_path> you created in Step 2 to give the docker container access to the datasets. /root/demo will mount the <data_path> folder.

export SILO_NAME=<silo_name> 
docker run -it -d --name $SILO_NAME -v <data_path>:/root/demo 919740802015.dkr.ecr.ca-central-1.amazonaws.com/edge/fl-client:0.2.35
docker exec -it $SILO_NAME /bin/bash 

Docker container is run with root user permissions, because the local filesystem is being mounted on docker run. This will only be used to read train and test datasets, as well as write prediction data locally. This is a potential security risk depending on how your host machine is set up.

Once you start the docker container, you can run the following cmd in your terminal for help documentation

hfl -h

Step 4: Connect the data silo to the training session

Once you are in the Docker container, use the following commands to connect the dataset to a session in progress, making the required updates.

cd /root/demo 
export TOKEN=<token>
hfl train --token $TOKEN --session-id <session_id> --train-path <train_path> --test-path <test_path> --batch-size <batch_size> --instruction-polling-time <polling_time>
  • Update <token> with the JWT token obtained in Step 1

In line 3:

  • Update <session_id> - found on the session page

  • Update <train_path> - the train set should be within the /root/demo location specified above. This is the data you want to train the model on.

  • Update <test_path> - the test set should be within the /root/demo location specified above. This is the data you want to test the model with.

  • Update <batch_size> - the batch size is the number of samples processed before the model is updated and it must be an integer.

  • Update <polling_time> - Optional. This integer input is the time to wait for new instructions in seconds. If not defined, default value is 30 seconds.

Once the minimum number of datasets have joined the session, the training will begin. It's normal for the integrate.ai API to remain in a ‘waiting for server' state when all clients have joined for a few minutes.

If a user chooses to run a custom model, they will be asked to validate that they have decided to do before the training kicks off.

The session is using model package <custom model package name> developed 
by <organization name>. Do you want to proceed?
Only 'yes' will be accepted to approve.
Enter a value:

The session will only begin once the user confirms by answering yes

If the log_interval is set too large, the training client will not print out intermediate results. For example, if the data has 1000 rows, and the batch_size is set to 20 , then there are 50 batches in the data. Therefore, if log_interval is set to anything bigger than 50, say 100 (i.e., to print out intermediate results every 100 batches), the training client will not show any intermediate printout.

Keep reading to learn how to make predictions on your completed model.

Last updated