Running a training server on AWS Fargate
Learn how to run your integrate.ai training server on AWS Fargate
To follow along with this tutorial, open the integrateai_fargate_server.ipynb notebook, which you can find in the sample_packages/sample_notebook folder in the SDK package.
Configure AWS Fargate
If this is your first time running a training session with a server in AWS Fargate, ensure that you have followed the instructions for Setting up AWS Fargate before you continue.
Running AWS Batch jobs in Fargate through the SDK
Install the SDK
Authenticate to the API client
First, the client must be authenticated.
Model config and data schema
Set up your model configuration and data schema for your training session. For detailed information, see Building a Custom Model. A generic example that matches the sample notebook is provided below.
Create a training session
This example session uses 2 clients and 2 rounds. The training_session
definition is passed to the server as part of the task definition.
Note: If you are using a custom model, ensure that you specify the correct model_config
and data_schema
.
Specifying optional AWS Credentials
If you are generating temporary AWS credentials, specify them here. Otherwise use the default profile credentials.
Specify the Fargate Cluster, Task Definition Name and Network Parameters
Configure the cluster, task definition, and network parameters on AWS first, then specify them as variables for the SDK.
With the credentials and variables defined, you can now use the SDK to run the training server on AWS Fargate.
Run the training server
The SDK provides a taskgroup and taskbuilder object to simplify the process of creating and managing Fargate and AWS Batch tasks.
Create a Fargate task builder object
Create an AWS Batch task builder object
Create a taskgroup
The taskgroup starts the server and the batch.
Here we are creating a session task group that takes as input the training_session
created earlier. The first task added (tb
) starts the server. The tb_batch
task is added twice - once for each client.
Tip: See Create a training session to review the session definition.
You can monitor the running server to check training progress.
Last updated