Setting up AWS Fargate

AWS Fargate provides serverless computing for containers. It simplifies deployment by removing the need to set up and maintain a server to manage your computing containers. It is also compatible with Amazon ECS and EKS.

You can use Fargate with integrate.ai's SDK to create a machine learning environment that is flexible and portable. With an integrate.ai training server running on Fargate, your data in S3 buckets, and clients running on AWS Batch, you can use the SDK to manage and run fully remote training sessions.

This deployment scenario is best suited to those who want to maintain full control of the training process while working with existing remote capabilities and infrastructure in AWS.

Requirements

This section outlines the setup steps required to configure your working environment. Steps that are performed in the AWS platform are not explained in detail. Refer to the AWS documentation as needed.

The requirements are tool-agnostic - that is, you can complete the steps through the AWS console, or through a tool such as Terraform or AWS CloudFormation.

Required components

integrate.ai SDK, and client and server Docker images

AWS Fargate

AWS Batch

AWS CLI

AWS Elastic Container Registry (ECR)

AWS Elastic Container Service (ECS)

AWS Key Management Service (KMS)

AWS Systems Manager (SSM)

Amazon Simple Storage Service (S3)

AWS Console or CloudFormation or Terraform

Generate an Access Token

To install the client, server, and SDK, you must generate an access token through the integrate.ai web portal.

  1. Log in to your integrate.ai account on the web.

  2. On the Dashboard, click Generate Access Token.

  3. Copy the access token and save it to a secure location.

This is the only time that the API token can be viewed or downloaded. If you lose or forget your API token, you cannot retrieve it. Instead, create a new API token and revoke the old one. You can manage API tokens through the web portal.

Treat your API tokens like passwords and keep them secret. When working with the API, use the token as an environment variable instead of hardcoding it into your programs. In this documentation, the token is referenced as <IAI_TOKEN>.

Configure AWS Credentials

  1. On the AWS CLI, run aws configure to set your AWS session credentials, or use your default profile.

  2. Set the IAI token as a parameter for your SSM agent. SSM handles getting and using the token as needed for the batch session.

aws ssm put-parameter --name iai-token --value <IAI_TOKEN> --type SecureString

Install integrate.ai components

  1. Install the integrate.ai CLI tool and pull the server. Environment Setup for details.

# Install integrate.ai packages
pip install integrate-ai

iai client pull --token <IAI_TOKEN>

iai server pull --token <IAI_TOKEN>
  1. Push the IAI server Docker image to an AWS ECR repository. See the AWS ECR documentation for detailed instructions for setting up ECR, then upload the integrate.ai server Docker image.

# Create a repository for the server
aws ecr create-repository --repository-name iai_fl_server

# Log in and upload the server Docker image
aws ecr get-login-password --region <AWS_REGION> | docker login --username AWS --password-stdin <AWS_ACCOUNT>.dkr.ecr.<AWS_REGION>.amazonaws.com

docker tag 919740802015.dkr.ecr.<AWS_REGION>.amazonaws.com/edge/iai_fl_server:<version> <AWS_ACCOUNT>.dkr.ecr.<AWS_REGION>.amazonaws.com/iai_fl_server:<version>
docker push <AWS_ACCOUNT>.dkr.ecr.<AWS_REGION>.amazonaws.com/iai_fl_server:<version>

The IAI client and server versions change as updates are released. Make sure that you are always uploading the latest version by specifying the correct <version> number.

See Check version numbers for instructions on how to view the component version numbers.

If you are using AWS Batch

Set up the dataset(s)

Create one or more S3 buckets to contain the dataset(s). The URL for the dataset is a required parameter for the SDK.

Set up AWS Batch

if you are using AWS Batch, follow the instructions for setting up the roles and permissions required for the batch process. See Setting up AWS Batch for details.

Create Roles in AWS IAM

This guide describes in brief how to create and manage roles and policies through the AWS IAM service console. The JSON configuration is also provided for those using Terraform or other tools.

You can create roles from the Roles link under Access Management in IAM.

Note: For the sample code that follows, replace any variable placeholders (such as <AWS_REGION>) with the correct information for your environment before attempting to use it.

Fargate Execution Role

Create an execution role to allow the batch job to access ECR and SSM.

In the IAM console, create an AWS Service role. Use the drop-down menu to select Elastic Container Service as the Use case, and select Elastic Container Service Task.

Fargate Execution role
{
    "Version": "2012-10-17",
    "Statement": [
        {
        "Sid": "",
        "Effect": "Allow",
        "Principal": {
            "Service": "ecs-tasks.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
        }
    ]
}

Under Permissions policies, select AmazonECSTaskExecutionRolePolicy.

Create a custom policy using the JSON below and add it to the execution role.

Fargate Execution role SSM policy
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "SSMAccessForTokens",
            "Effect": "Allow",
            "Action": [
                "ssm:DescribeParameters",
                "ssm:GetParameter",
                "ssm:GetParameters"
            ],
            "Resource": [
                "arn:aws:ssm:<AWS_REGION>:<AWS_ACCOUNT>:parameter/*-token"
            ]
        }
    ]
}

Note: The SSM policy uses a wildcard (*) in the token name to allow for flexible token use for this task.

Fargate Task Role

This task role enables Fargate to access CloudWatch for logging, and the VPC.

In the IAM console, create an AWS Service role. Use the drop-down menu to select Elastic Container Service as the Use case, and select Elastic Container Service Task.

Fargate Task role
{
    "Version": "2012-10-17",
    "Statement": [
        {
        "Sid": "",
        "Effect": "Allow",
        "Principal": {
            "Service": "ecs-tasks.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
        }
    ]
}

This role requires policies for four services: CloudWatch, VPC, S3, and SSM.

Create a policy with the following JSON for CloudWatch.

Fargate Task role CloudWatch policy
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowLogGroupAndStreamAccess",
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogStream",
                "logs:DescribeLogStreams",
                "logs:PutLogEvents",
                "logs:CreateLogGroup"
            ],
            "Resource": [
                "arn:aws:logs:<AWS_REGION>:<AWS_ACCOUNT>:log-group:/ecs/fl-server-fargate:*"
            ]
        }
    ]
}

The VPC policy enables the IAI server to retrieve public IPs for service discovery registration.

Fargate Task role VPC policy
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowVPC",
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeNetworkInterfaces"
            ],
            "Resource": ["*"]
        }
    ]
}

The Fargate Task role uses the S3 policy to store models in S3 after they have been trained.

Fargate Task role S3 policy
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowS3BucketAccess",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetEncryptionConfiguration",
                "s3:GetObjectAcl",
                "s3:ListBucketVersions",
                "s3:*Object"
            ],
            "Resource": [
                "arn:aws:s3:::<path to your data>",
                "arn:aws:s3:::<path to your data>/*"
            ]
        }
    ]
}

The SSM policy allows access to required tokens.

Fargate Task role SSM policy
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "SSMAccessForTokens",
            "Effect": "Allow",
            "Action": [
                "ssm:DescribeParameters",
                "ssm:GetParameter",
                "ssm:GetParameters"
            ],
            "Resource": [
                "arn:aws:ssm:<AWS_REGION>:<AWS_ACCOUNT>:parameter/*-token"
            ]
        }
    ]
}

Note: The SSM policy uses a wildcard (*) in the token name to allow for flexible token use for this task.

SDK User IAM Role

The SDK user requires permission to describe, start, and run an ECS Fargate task using the job definition provided. Any further permissions granted to the user depends on your application.

Example SDK User IAM role
{
    "Statement": [
        {
            "Action" : [
                "ecs:DescribeContainerInstances",
                "ecs:DescribeTasks",
                "ecs:DescribeTaskDefinition",
                "ecs:ListTasks",
                "ecs:UpdateContainerAgent",
                "ecs:StartTask",
                "ecs:StopTask",
                "ecs:RunTask"
            ],
            "Effect" : "Allow",
            "Resource": ["arn:aws:ecs:<AWS_REGION>:<AWS_ACCOUNT>:cluster/iai-server-ecs-cluster",
            "arn:aws:ecs:<AWS_REGION>:<AWS_ACCOUNT>:task-definition/iai-server-fargate-job:*"]
        }, {
            "Sid" : "SSMAccessForTokens",
            "Effect" : "Allow",
            "Action" : [
                "ssm:DescribeParameters",
                "ssm:GetParameter",
                "ssm:PutParameter
            ],
            "Resource" : [
                "arn:aws:ssm:<AWS_REGION>:<AWS_ACCOUNT>:parameter/*-token"
            ]
        }, {
          "Sid": "PassRole",
          "Effect": "Allow",
          "Action": [
            "iam:PassRole"
          ],
          "Resource": [
            <fargate_task_role.arn>,
            <fargate_execution_role.arn>,
          ]
        }
    ],
    "Version": "2012-10-17"
}

Set up the Fargate environment

You can configure an AWS Fargate environment through the console UI, or through tools such as Terraform. The required components are an ECS cluster, security and log groups, and a job definition.

ECS cluster

There are no integrate.ai-specific settings required for the ECS cluster. You can use the default configuration.

Default ECS cluster configuration
resource "aws_ecs_cluster" "iai_server_ecs_cluster" {
  name = "iai-server-ecs-cluster"

  tags = {
    Name = "IAI Server ECS Cluster"
  }
}

EC2 Security Group

The security group manages traffic for the Fargate server resources.

Example Security group
resource "aws_security_group" "iai_server_security_group" {
  name        = "iai_server_security_group"
  vpc_id      = <VPC_ID>
  description = "Security group for IAI server ECS cluster"

  tags = {
    Name = "IAI Server Security Group"
  }

  ingress {
    description = "Allow to accept HTTP requests"
    from_port   = 9999
    to_port     = 9999
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    description = "Allow nodes to access external APIs"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

}

Cloudwatch Log Group

The log group captures and displays the output from task execution.

Example Cloudwatch log group
resource "aws_cloudwatch_log_group" "iai_server_fargate_log_group" {
  name = "iai-server-fargate-log-group"

  tags = {
    Name = "IAI Server Log Group"
  }
}

Fargate Compute Environment

Create a Compute environment for the batch job. For detailed instructions, see the AWS documentation.

We recommend the following defaults for the compute environment.

min_vcpus: 4
max_vcpus: 256
instance_type: ["c4.4xlarge", "m4.2xlarge", "r4.2xlarge"]
type: "EC2"

When using Batch with Fargate, add egress to port 9999 from port 443.

Fargate Job Definition

The job definition contains additional configuration.

Example Job definition
{
  "family": "iai-server-fargate-job",
  "type": "container",
  "network_mode": "awsvpc",
  "requires_compatibilities": ["FARGATE"],
  "cpu": 1024,
  "memory": 2048,
  #Specify the Fargate Execution role
  "execution_role_arn": "<ECS_EXECUTION_ROLE_ARN>",  
  #Specify the Fargate Task role
  "task_role_arn": "<ECS_TASK_ROLE_ARN>",  
  "runtime_platform": {
    "operating_system_family": "LINUX"
  },
  "placement_constraints": [],
  "container_definitions": [{
    "command": [],
    "cpu": 0,
    "disableNetworking": false,
    "entryPoint": [],
    "environment": [],
    "essential": true,
    #Specify the IAI server Docker image. Make sure you update this when the server version updates
    "image": "<ACCOUNT_ID>.dkr.ecr.<REGION>.amazonaws.com/iai_fl_server:<VERSION>",
    "logConfiguration": {
      "logDriver": "awslogs",
      "options": {
        #Specify the Fargate CloudWatch log group
        "awslogs-group": "<LOG_GROUP_ID>",
        "awslogs-region": "<AWS_REGION>",
        "awslogs-stream-prefix": "ecs"
      }
    },
    "mountPoints": [],
    "name": "iai-server-fargate",
    "portMappings": [
      {
        "containerPort": 9999,
        "hostPort": 9999,
        "protocol": "tcp"
      }
    ],
    "secrets": [{
        "name": "IAI_TOKEN",
        "valueFrom": "iai-token" 
    }],
    "volumesFrom": []
  }]
}

This completes the configuration and setup for Fargate and the IAI components, as well as the roles, policies, and secrets required to run the server and client.

The remaining tasks focus on using the SDK to start and monitor the server tasks in Fargate. Examples of the required code are provided.

See Setting up AWS Batch for additional configuration.

Continue to Running a training server on AWS Fargate.

Last updated