Join a Session
Once a session has started, users can join their datasets to contribute to the federated model.
Open a terminal window and follow these steps to connect to the Docker container and join a dataset to a session that is in progress.
Before starting, be sure that you have set up a Docker image and prepared your data.
You may incur an error if there are docker containers running with duplicate names. Use Docker Desktop to validate and remove duplicates.
Step 1: Generate token
Generate a JSON Web Token to use for authentication while interacting with the integrate.ai Federated Learning client. Find “Tokens” under "Client Management" in the navigation bar.
Make sure to keep this token safe! Treat it like a password. This token is used for authentication.
The token will automatically expire if it has not been used for 30 days. A token can also be manually revoked in the Token Management page.
Step 2: Create a directory where the training data is stored
The data you prepared for this session needs to be accessible to use for training.
Find your train and test datasets. If necessary, copy them to the same directory on the machine where the docker container will run. Make note of their file location, it will be required in Step 3.
For example if the train and test sets are in ~/data/session-test
that will become the <data_path>
in the next step.
Step 3: Start the Docker container
Run the following commands to start the docker container and login to the container’s shell.
In line 1: Assign a name to this data silo, replacing <silo_name>
In line 2: Specify the <data_path>
you created in Step 2 to give the docker container access to the datasets. /root/demo will mount the <data_path>
folder.
Docker container is run with root user permissions, because the local filesystem is being mounted on docker run. This will only be used to read train and test datasets, as well as write prediction data locally. This is a potential security risk depending on how your host machine is set up.
Once you start the docker container, you can run the following cmd in your terminal for help documentation
Step 4: Connect the data silo to the training session
Once you are in the Docker container, use the following commands to connect the dataset to a session in progress, making the required updates.
Update
<token>
with the JWT token obtained in Step 1
In line 3:
Update
<session_id>
- found on the session pageUpdate
<train_path>
- the train set should be within the/root/demo
location specified above. This is the data you want to train the model on.Update
<test_path>
- the test set should be within the/root/demo
location specified above. This is the data you want to test the model with.Update
<batch_size>
- the batch size is the number of samples processed before the model is updated and it must be an integer.Update
<polling_time>
- Optional. This integer input is the time to wait for new instructions in seconds. If not defined, default value is 30 seconds.
Once the minimum number of datasets have joined the session, the training will begin. It's normal for the integrate.ai API to remain in a ‘waiting for server' state when all clients have joined for a few minutes.
If a user chooses to run a custom model, they will be asked to validate that they have decided to do before the training kicks off.
The session will only begin once the user confirms by answering yes
If the log_interval
is set too large, the training client will not print out intermediate results. For example, if the data has 1000 rows, and the batch_size
is set to 20 , then there are 50 batches in the data. Therefore, if log_interval
is set to anything bigger than 50, say 100 (i.e., to print out intermediate results every 100 batches), the training client will not show any intermediate printout.
Keep reading to learn how to make predictions on your completed model.
Last updated