Setting up AWS Fargate
AWS Fargate provides serverless computing for containers. It simplifies deployment by removing the need to set up and maintain a server to manage your computing containers. It is also compatible with Amazon ECS and EKS.
You can use Fargate with integrate.ai's SDK to create a machine learning environment that is flexible and portable. With an integrate.ai training server running on Fargate, your data in S3 buckets, and clients running on AWS Batch, you can use the SDK to manage and run fully remote training sessions.
This deployment scenario is best suited to those who want to maintain full control of the training process while working with existing remote capabilities and infrastructure in AWS.
Requirements
This section outlines the setup steps required to configure your working environment. Steps that are performed in the AWS platform are not explained in detail. Refer to the AWS documentation as needed.
The requirements are tool-agnostic - that is, you can complete the steps through the AWS console, or through a tool such as Terraform or AWS CloudFormation.
Generate an Access Token
To install the client, server, and SDK, you must generate an access token through the integrate.ai web portal.
Log in to your integrate.ai account on the web.
On the Dashboard, click Generate Access Token.
Copy the access token and save it to a secure location.
This is the only time that the API token can be viewed or downloaded. If you lose or forget your API token, you cannot retrieve it. Instead, create a new API token and revoke the old one. You can manage API tokens through the web portal.
Treat your API tokens like passwords and keep them secret. When working with the API, use the token as an environment variable instead of hardcoding it into your programs. In this documentation, the token is referenced as <IAI_TOKEN>
.
Configure AWS Credentials
On the AWS CLI, run
aws configure
to set your AWS session credentials, or use your default profile.Set the IAI token as a parameter for your SSM agent. SSM handles getting and using the token as needed for the batch session.
Install integrate.ai components
Install the integrate.ai CLI tool and pull the server. Environment Setup for details.
Push the IAI server Docker image to an AWS ECR repository. See the AWS ECR documentation for detailed instructions for setting up ECR, then upload the integrate.ai server Docker image.
The IAI client and server versions change as updates are released. Make sure that you are always uploading the latest version by specifying the correct <version>
number.
See Check version numbers for instructions on how to view the component version numbers.
Create Roles in AWS IAM
This guide describes in brief how to create and manage roles and policies through the AWS IAM service console. The JSON configuration is also provided for those using Terraform or other tools.
You can create roles from the Roles link under Access Management in IAM.
Note: For the sample code that follows, replace any variable placeholders (such as <AWS_REGION>
) with the correct information for your environment before attempting to use it.
Fargate Execution Role
Create an execution role to allow the batch job to access ECR and SSM.
In the IAM console, create an AWS Service role. Use the drop-down menu to select Elastic Container Service as the Use case, and select Elastic Container Service Task.
Under Permissions policies, select AmazonECSTaskExecutionRolePolicy.
Create a custom policy using the JSON below and add it to the execution role.
Note: The SSM policy uses a wildcard (*) in the token name to allow for flexible token use for this task.
Fargate Task Role
This task role enables Fargate to access CloudWatch for logging, and the VPC.
In the IAM console, create an AWS Service role. Use the drop-down menu to select Elastic Container Service as the Use case, and select Elastic Container Service Task.
This role requires policies for four services: CloudWatch, VPC, S3, and SSM.
Create a policy with the following JSON for CloudWatch.
The VPC policy enables the IAI server to retrieve public IPs for service discovery registration.
The Fargate Task role uses the S3 policy to store models in S3 after they have been trained.
The SSM policy allows access to required tokens.
Note: The SSM policy uses a wildcard (*) in the token name to allow for flexible token use for this task.
SDK User IAM Role
The SDK user requires permission to describe, start, and run an ECS Fargate task using the job definition provided. Any further permissions granted to the user depends on your application.
Set up the Fargate environment
You can configure an AWS Fargate environment through the console UI, or through tools such as Terraform. The required components are an ECS cluster, security and log groups, and a job definition.
ECS cluster
There are no integrate.ai-specific settings required for the ECS cluster. You can use the default configuration.
EC2 Security Group
The security group manages traffic for the Fargate server resources.
Cloudwatch Log Group
The log group captures and displays the output from task execution.
Fargate Compute Environment
Create a Compute environment for the batch job. For detailed instructions, see the AWS documentation.
We recommend the following defaults for the compute environment.
When using Batch with Fargate, add egress to port 9999 from port 443.
Fargate Job Definition
The job definition contains additional configuration.
This completes the configuration and setup for Fargate and the IAI components, as well as the roles, policies, and secrets required to run the server and client.
The remaining tasks focus on using the SDK to start and monitor the server tasks in Fargate. Examples of the required code are provided.
See Setting up AWS Batch for additional configuration.
Continue to Running a training server on AWS Fargate.
Last updated