Using AWS Batch with integrate.ai
Use the integrate.ai SDK to run batch jobs on remote datasets in AWS
AWS Batch provides convenient, scalable, machine learning computing that can be integrated with the SDK.
Configure AWS Batch
If this is your first time running a training session in AWS Batch, ensure that you have followed the instructions for Setting up AWS Batch before you continue.
Running AWS Batch jobs through the SDK
Federated learning models are trained through sessions. You define the parameters required to train a federated model, including data and model configurations, in a session. Additional session parameters are required when using AWS Batch.
Install the SDK
Before you begin, ensure that you have installed the latest integrate.ai SDK. See Environment Setup for details.
Use the integrateai_batch_client.ipynb notebook to follow along and test the examples shown below by filling in your own variables as required.
Specify Batch parameters
In addition to the session definition, there are AWS Batch-specific parameters required by the SDK to run a batch job.
Training and test data paths
Specify the path(s) to your training and test data on S3.
Example:
AWS Authentication
If you are generating temporary AWS Credentials, specify them as in the example below. Otherwise, use the default profile credentials, or pass in a Dict
of AWS credential values.
Example:
Batch environment
Specify the name of the job_queue
, job_definition
, and ssm_token
name that you created in Setting up AWS Batch.
Define a training session
Prepare your model configuration and data schema. See Models for information on the models available out-of-the-box in integrate.ai, or see Building a Custom Model for information on building your own model.
Define your training session as usual. The session definition is passed to the batch through the task group that also contains the tasks for the batch.
Example:
The min_num_clients
specified here must match the number of tasks added to the task group.
Specify the model_config
and data_config
names for the configuration and schema that you want to use.
Running a batch with task group and taskbuilder
Instead of running the integrate.ai client directly, import and use the taskgroup
and taskbuilder
functions. For each task, create a task object. Use the taskgroup
to add each task to the batch.
One task is equivalent to one client in integrate.ai terms. The min_num_clients
given in the training session definition must match the number of tasks defined in the batch.
Import the required functions.
Create a taskbuilder object, and provide the required parameters.
Create a task group and start the batch
The task group defines the training_session
and the tasks
to run for the batch. The following code snippet creates the task group and starts the job.
The vcpu
and memory
parameters are optional. Use them to adjust the values in the job definition if necessary.
Monitor submitted jobs
The task group context contains the the session ID.
To monitor the status of the tasks:
Specify a wait
time that is appropriate for your tasks and monitor for session completion or failure:
You can also review the results of the job(s) in the AWS console.
Last updated