Setting up AWS Batch

You can configure an AWS Batch environment through the console UI, or through tools such as Terraform.

Generate an Access Token

To install the client, server, and SDK, you must generate an access token through the integrate.ai web portal.

  1. Log in to your integrate.ai account on the web.

  2. On the Dashboard, click Generate Access Token.

  3. Copy the access token and save it to a secure location.

This is the only time that the API token can be viewed or downloaded. If you lose or forget your API token, you cannot retrieve it. Instead, create a new API token and revoke the old one. You can manage API tokens through the web portal.

Treat your API tokens like passwords and keep them secret. When working with the API, use the token as an environment variable instead of hardcoding it into your programs. In this documentation, the token is referenced as <IAI_TOKEN>.

Configure AWS Credentials

  1. On the AWS CLI, run aws configure to set your AWS session credentials, or use your default profile.

  2. Set the IAI token as a parameter for your SSM agent. SSM handles getting and using the token as needed for the batch session.

aws ssm put-parameter --name iai-token --value <IAI_TOKEN> --type SecureString

Install integrate.ai components

  1. Install the integrate.ai CLI tool and pull the client. Environment Setup for details.

# Install the CLI tool
pip install integrate-ai

# Pull the client image
iai pull client --token <IAI_TOKEN>
  1. Push the IAI client Docker image to an AWS ECR repository. See the AWS ECR documentation for detailed instructions for setting up ECR, then upload the integrate.ai client Docker image.

# Create a repository for the client
aws ecr create-repository --repository-name iai_client

# Log in and upload the Docker image
aws ecr get-login-password --region <AWS_REGION> | docker login --username AWS --password-stdin <AWS_ACCOUNT>.dkr.ecr.<AWS_REGION>.amazonaws.com

docker tag 919740802015.dkr.ecr.<AWS_REGION>.amazonaws.com/edge/fl-client:<version> <AWS_ACCOUNT>.dkr.ecr.<AWS_REGION>.amazonaws.com/iai_client:<version>

docker push 919740802015.dkr.ecr.<AWS_REGION>.amazonaws.com/iai_client:<version>

Create Roles in AWS IAM

This guide assumes you are creating and managing roles and policies through the AWS IAM service console. However, the JSON configuration is also provided for reference.

You can create roles from the Roles link under Access Management in IAM.

EC2 Instance Role

  1. Click Create Role.

  2. Select AWS Service.

  3. Select EC2.

  4. Click Next.

  5. In the Permissions policies search box, type AmazonEC2ContainerServiceforEC2. This is the ARN.

  6. Select this policy and click Next.

  7. Provide a meaningful name for the role.

  8. Review the configuration and compare it to the example below.

  9. Click Create role.

Batch service role example
{
    "Version": "2012-10-17",
    "Statement": [
      {
          "Action": "sts:AssumeRole",
          "Effect": "Allow",
          "Principal": {
            "Service": "batch.amazonaws.com"
          }
      }
    ]
}

arn:aws:iam::aws:policy/service-role/AWSBatchServiceRole

ECS Task Role (Job Role) with CloudWatch and S3 policies

This role requires access to the S3 bucket containing your data. It requires the ECS task role policy and an S3 access policy.

  1. Click Create Role.

  2. Select AWS Service.

  3. Select Elastic Container Service from the drop-down list.

  4. Select Elastic Container Service Task.

  5. Click Next.

  6. On the Add Permissions page, click Create policy. This opens a Policies page in a second browser window to allow you to create policies and return to your role to attach them.

  7. On the Create policy page, select the JSON tab and paste the following, with your <AWS region> and <AWS account> filled in.

ECS Task Role policy
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowLogGroupAndStreamAccess",
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogStream",
                "logs:DescribeLogStreams",
                "logs:PutLogEvents",
                "logs:CreateLogGroup"
            ],
            "Resource": [
                "arn:aws:logs:<AWS_REGION>:<AWS_ACCOUNT>:log-group:/iai-client/batch:log-stream:*"
            ]
        }
    ]
}

8. Click Next: Tags, then click Next: Review.

9. Provide a meaningful name for the policy.

10. Click Create policy.

Add an S3 policy.

  1. Click Create policy.

  2. On the Create policy page, select the JSON tab and paste the following, with the name of <your.aws.data.bucket> filled in.

ECS Task Role S3 policy
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowS3BucketReadAccess",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket", 
                "s3:GetEncryptionConfiguration",
                "s3:GetObject",
                "s3:GetObjectAcl",
                "s3:ListBucketVersions"
            ],
            "Resource": [
                "arn:aws:s3:::<your.aws.data.bucket>",
                "arn:aws:s3:::<your.aws.data.bucket>/*"
            ]
        }
    ]
}

The S3 policy on the ECS task role restricts the Job Definition to the S3 buckets that are referenced. To restrict data access further, consider creating different Job Definitions that access different S3 buckets.

  1. Click Next: Tags, then click Next: Review.

  2. Provide a meaningful name for the policy.

  3. Click Create policy.

Add policies to the role

  1. Return to the Add permissions page in the previous browser window.

  2. Use the Permissions policies search box to search for and select the names of the two policies you just created.

  3. Click Next.

  4. Provide a meaningful name for the role, such as ecs-task-role.

  5. Click Create role.

ECS Execution Role with ECR, CloudWatch, and SSM policies

This role gives the batch job access to ECR and SSM. It requires the ECS task role policy and ECR, CloudWatch, and SSM policies.

  1. Click Create Role.

  2. Select AWS Service.

  3. Select Elastic Container Service from the drop-down list.

  4. Select Elastic Container Service Task.

  5. Click Next.

  6. In the Permissions policies search box, type AmazonECSTaskExecutionRolePolicy.

  7. Select this policy, then click Next.

  8. Provide a meaningful name for the role, such as ecs-execution-role.

  9. Scroll down to Step 2: Add permissions and click Edit.

  10. Click Create policy.

  11. On the Create policy page, select the JSON tab and paste the following:

{
   "Version": "2012-10-17",
   "Statement": [
     {
       "Effect": "Allow",
       "Action": [
         "ecr:GetAuthorizationToken",
         "ecr:BatchCheckLayerAvailability",
         "ecr:GetDownloadUrlForLayer",
         "ecr:BatchGetImage",
         "logs:CreateLogStream",
         "logs:PutLogEvents"
       ],
       "Resource": "*"
     }
   ]
 }
  1. Click Next: Tags and then click Next: Review.

  2. Provide a meaningful name for the policy.

  3. Click Create policy.

  4. In the Create role browser window, in the Permissions policies search box, type the name of the policy you just created. Note: you may have to click the Refresh button beside the Create policy button to make the new policy appear.

  5. Select the policy.

  6. Click Next.

  7. Repeat the process above to add an SSM policy. Fill in your <AWS region> and <AWS account> information.

ECS Execution Role SSM Policy

{
     "Version": "2012-10-17",
     "Statement": [
         {
             "Sid": "SSMAccessForIAIToken",
             "Effect": "Allow",
             "Action": [
                 "ssm:DescribeParameters",
                 "ssm:GetParameter",
                 "ssm:GetParameters"
             ],
             "Resource": [
                 "arn:aws:ssm:<AWS_REGION>:<AWS_ACCOUNT>:parameter/iai-token"
             ]
         }
     ]
 }
  1. Add the policy to your ECS Execution role.

  2. Click Next.

  3. Click Create Role.

SDK User Role

The role and permissions required to set up and configure the environment are different than the permissions required to start an integrate.ai client batch job using the integrate.ai SDK.

The end user of the SDK requires, at minimum, permission to submit a batch job using the job queue and job definition created earlier, and permission to describe jobs.

Any additional permissions required depend on your application. For example, you may give the user the ability to access the SSM parameter set earlier, so that they can pull the integrate.ai access token and use it to manage sessions with the integrate.ai SDK.

Example IAM policy for SDK User Role
{
    "Statement": [
        {
            "Action": [
                "batch:SubmitJob"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:batch:<AWS_REGION>:<AWS_ACCOUNT>:job-queue/iai-client-batch-job-queue",
                "arn:aws:batch:<AWS_REGION>:<AWS_ACCOUNT>:job-queue/iai-client-batch-job-queue/*",
                "arn:aws:batch:<AWS_REGION>:<AWS_ACCOUNT>:job-definition/iai-client-batch-job"
            ],
            "Sid": "AllowUserBatchJobSubmission"
        },
        {
            "Action": [
                "batch:DescribeJobs"
            ],
            "Effect": "Allow",
            "Resource": [
                "*"
            ],
            "Sid": "AllowUserDescribeBatch"
        },
        {
            "Action": [
                "ssm:DescribeParameters",
                "ssm:GetParameter"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:ssm:<AWS_REGION>:<AWS_ACCOUNT>:parameter/iai-token"
            ],
            "Sid": "SSMAccessForIAIToken"
        }
    ],
    "Version": "2012-10-17"
}

Role and policy configuration is now complete. Continue on to set up the AWS Batch job.

Create a Compute Environment

Create a Compute environment for the batch job. For detailed instructions, see the AWS documentation. Configurations specific to the integrate.ai environment are described here.

The name of this environment is a required parameter for the SDK.

We recommend the following defaults for the compute environment.

min_vcpus: 4
max_vcpus: 256
instance_type: ["c4.4xlarge", "m4.2xlarge", "r4.2xlarge"]
type: "EC2"

When using Batch with AWS Fargate, add egress to port 9999 from port 443.

Example of adding egress in Terraform
egress {
    description = "Allow nodes to access external fargate"
    from_port   = 443
    to_port     = 9999
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

Create a Job Queue

Create a Job queue. The name of this queue is a required parameter for the SDK.

The queue must be associated with the compute environment that you created or configured earlier.

name: "<iai-client-batch-job-queue>"
state: "ENABLED"
priority: 1

There are no additional roles or policies required for the queue.

Create a Job Definition

Create a Job definition. The name of this definition is a required parameter for the SDK.

In the definition, you must specify the following:

  • image - this is the integrate.ai client Docker image you uploaded to ECR

    • <AWS Account> and <AWS Region>

    • <version> - the IAI client version

  • jobRoleArn - the ECS Task Role you created

  • executionRoleArn - the ECS Execution Role you created

Example:

{
    "image": "<AWS_ACCOUNT>.dkr.ecr.<AWS_REGION>.amazonaws.com/iai_client:<version>",
    "vcpus": 1,
    "memory": 60000,
    "command": [
      "hfl",
      "Ref::task",
      "--token",
      "Ref::token",
      "--session-id",
      "Ref::sessionId",
      "--train-path",
      "Ref::trainPath",
      "--test-path",
      "Ref::testPath",
      "--batch-size",
      "Ref::batchSize",
      "--instruction-polling-time",
      "Ref::pollingTime",
      "--log-interval",
      "Ref::logInterval",
      "--approve-custom-package"
    ],
    "parameters": { "task": "train" },
    "jobRoleArn": "<ECS_JOB_ROLE_ARN>",
    "executionRoleArn": "<ECS_EXECUTION_ROLE_ARN>"
    "volumes": [],
    "mountPoints": [],
    "ulimits": [
      {
        "name": "nofile",
        "hardLimit": 10240,
        "softLimit": 10240
      }
    ],
    "secrets": [{
      "name": "IAI_TOKEN",
      "valueFrom": "<iai-token>"
    }],
    "resourceRequirements": []
}

About "secrets"

In order for the batch job to access the IAI_TOKEN through SSM it needs to be set in the job definition secrets configuration.

Set the name of the secret to IAI_TOKEN to create the IAI_TOKEN environment variable that the Docker client uses to authenticate with the session.

The valueFrom is the SSM key that contains the integrate.ai access token. If you are running the batch job in the same region that the SSM parameter was created in, pass in the name of the SSM parameter. If the SSM parameter is in a different region, pass in the SSM parameter ARN.

To have different tokens for different user groups, you must have a different batch job definition for each token.

The command and parameter values are examples only. They are overwritten when starting a batch job with the SDK.

Batch job configuration is now complete.

AWS Batch is now ready to use with the SDK. For more information, see Using AWS Batch with integrate.ai.

Last updated