Overview
This guide covers AWS-specific integration for CrewAI Platform deployments on Amazon EKS. It focuses on how CrewAI uses AWS services and the platform-specific configuration required, rather than general AWS setup.
This guide assumes you have:
- An EKS cluster running Kubernetes 1.32.0+
- AWS CLI and kubectl configured
- Helm 3.10+ installed
- Basic familiarity with AWS services (RDS, S3, ALB)
Prerequisites
Before configuring CrewAI Platform, ensure these AWS components are in place:
CrewAI Platform supports AMD64 (x86_64) Kubernetes worker nodes. ARM64 (aarch64) worker nodes are not currently supported. For full platform requirements, see the Requirements Guide.
Required AWS Infrastructure
| Component | Documentation Link | Notes |
|---|
| EKS Cluster | AWS EKS Getting Started | Version 1.32.0 or later |
| AWS Load Balancer Controller | AWS LBC Installation | Required for ALB ingress |
| VPC and Subnets | EKS VPC Requirements | Public and private subnets recommended |
Do not proceed with CrewAI installation until these prerequisites are met. The Helm chart will fail to deploy without them.
Amazon Aurora for PostgreSQL
CrewAI Platform requires PostgreSQL 16.8+ for production deployments. This section covers RDS-specific requirements for CrewAI.
Aurora Instance Sizing
Minimum recommended specifications based on CrewAI workload characteristics:
| Deployment Size | RDS Instance Class | vCPU | RAM | Storage |
|---|
| Development | db.t3.medium | 2 | 4 GiB | 50 GiB gp3 |
| Small Production | db.r6g.large | 2 | 16 GiB | 100 GiB gp3 |
| Medium Production | db.r6g.xlarge | 4 | 32 GiB | 250 GiB gp3 |
| Large Production | db.r6g.2xlarge | 8 | 64 GiB | 500 GiB gp3 |
CrewAI’s Rails-based architecture benefits from memory-optimized instances (R6g family). Use gp3 storage with minimum 3000 IOPS for production workloads.
Network Connectivity
CrewAI pods must reach your RDS instance. Two options:
Option 1: RDS in Private Subnet (Recommended)
- Place RDS in private subnets within your EKS VPC
- No internet exposure
- Security group allows PostgreSQL (5432) from EKS node security group
**Option 2: RDS with Public Access **
- Enable public accessibility on RDS instance
- Configure security group to allow EKS NAT gateway IPs
- Requires SSL/TLS (enforce
sslmode=require)
Database Setup
CrewAI requires three databases (primary, cable, and OAuth), plus an optional fourth for Wharf tracing:
-- Create required databases
CREATE DATABASE crewai_plus_production;
CREATE DATABASE crewai_plus_cable_production;
CREATE DATABASE crewai_plus_oauth_production;
-- If using Wharf OTLP trace collector (wharf.enabled: true)
CREATE DATABASE wharf;
-- Grant privileges to the CrewAI user
GRANT ALL PRIVILEGES ON DATABASE crewai_plus_production TO crewai;
GRANT ALL PRIVILEGES ON DATABASE crewai_plus_cable_production TO crewai;
GRANT ALL PRIVILEGES ON DATABASE crewai_plus_oauth_production TO crewai;
GRANT ALL PRIVILEGES ON DATABASE wharf TO crewai;
When using an external RDS instance, you must manually create these databases before deploying CrewAI Platform. The Helm chart does not automatically create databases when postgres.enabled: false.
Helm Configuration
# Disable internal PostgreSQL
postgres:
enabled: false
envVars:
DB_HOST: "crewai-prod.cluster-abc123.us-east-1.rds.amazonaws.com"
DB_PORT: "5432"
DB_USER: "crewai"
secrets:
DB_PASSWORD: "your-secure-password"
Amazon S3 for Object Storage
CrewAI Platform uses S3 for storing crew artifacts, tool outputs, and user uploads. This section covers S3 integration and authentication.
S3 Bucket Configuration
# Create S3 bucket for CrewAI
aws s3api create-bucket \
--bucket crewai-prod-storage \
--region us-east-1
# Enable versioning for data protection
aws s3api put-bucket-versioning \
--bucket crewai-prod-storage \
--versioning-configuration Status=Enabled
# Enable default encryption
aws s3api put-bucket-encryption \
--bucket crewai-prod-storage \
--server-side-encryption-configuration '{
"Rules": [{
"ApplyServerSideEncryptionByDefault": {
"SSEAlgorithm": "AES256"
}
}]
}'
S3 Authentication Options
CrewAI supports three authentication methods for S3. Choose based on your security requirements:
| Method | OIDC Provider Required | Credential Rotation | Setup Complexity | Best For |
|---|
| Pod Identity | No | Automatic | Low | New EKS 1.24+ clusters |
| IRSA | Yes | Automatic | Medium | Existing clusters with OIDC |
| Static Keys | No | Manual | Low | Development only |
Option 1: Pod Identity (Recommended - Newest)
Best for: New EKS deployments (EKS 1.24+), highest security
Pod Identity provides credentials without OIDC configuration or static keys.
Benefits:
- No long-lived credentials
- Simplified setup vs IRSA — does not require an OIDC provider on the EKS cluster
- Automatic credential rotation
- Native EKS integration
Setup Steps:
- Create IAM policy for S3 access:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::crewai-prod-storage",
"arn:aws:s3:::crewai-prod-storage/*"
]
}
]
}
- Create IAM role and associate with Pod Identity:
# Create IAM role
aws iam create-role \
--role-name CrewAIPodIdentityRole \
--assume-role-policy-document '{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {
"Service": "pods.eks.amazonaws.com"
},
"Action": ["sts:AssumeRole", "sts:TagSession"]
}]
}'
# Attach S3 policy
aws iam attach-role-policy \
--role-name CrewAIPodIdentityRole \
--policy-arn arn:aws:iam::ACCOUNT:policy/CrewAIS3Access
# Create Pod Identity Association
# The service account name must match the one used by CrewAI pods.
# With rbac.create: true (default), the chart creates "crewai-sa".
aws eks create-pod-identity-association \
--cluster-name your-cluster \
--namespace crewai \
--service-account crewai-sa \
--role-arn arn:aws:iam::ACCOUNT:role/CrewAIPodIdentityRole
Helm Configuration:
envVars:
STORAGE_SERVICE: "amazon"
AWS_REGION: "us-east-1"
AWS_BUCKET: "crewai-prod-storage"
# rbac.create: true (default) auto-creates a ServiceAccount named crewai-sa
rbac:
create: true
When rbac.create: true (the default), the chart automatically creates a ServiceAccount named crewai-sa. The Pod Identity association must reference this name. You do not need to set serviceAccount.name explicitly.
Option 2: Static Access Keys
Best for: Development environments, non-EKS Kubernetes clusters
Not recommended for production. Use Pod Identity or IRSA instead.
envVars:
STORAGE_SERVICE: "amazon"
AWS_REGION: "us-east-1"
AWS_BUCKET: "crewai-prod-storage"
secrets:
AWS_ACCESS_KEY_ID: "AKIAIOSFODNN7EXAMPLE"
AWS_SECRET_ACCESS_KEY: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
Option 3: IAM Roles for Service Accounts (IRSA)
Best for: Existing EKS clusters with OIDC provider already configured
IRSA binds an IAM role to a Kubernetes ServiceAccount via an OIDC trust relationship. Unlike Pod Identity, IRSA requires an OIDC provider on the EKS cluster.
Prerequisites:
Setup Steps:
- Get your OIDC provider URL:
aws eks describe-cluster --name your-cluster \
--query "cluster.identity.oidc.issuer" --output text
# Returns: https://oidc.eks.us-east-1.amazonaws.com/id/EXAMPLE1234567890
- Create IAM role with OIDC trust relationship:
# Extract the OIDC ID from the URL
OIDC_ID=$(aws eks describe-cluster --name your-cluster \
--query "cluster.identity.oidc.issuer" --output text | sed 's|https://||')
aws iam create-role \
--role-name CrewAIIRSARole \
--assume-role-policy-document '{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::ACCOUNT:oidc-provider/'$OIDC_ID'"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"'$OIDC_ID':sub": "system:serviceaccount:crewai:crewai-sa"
}
}
}]
}'
# Attach S3 policy
aws iam attach-role-policy \
--role-name CrewAIIRSARole \
--policy-arn arn:aws:iam::ACCOUNT:policy/CrewAIS3Access
The Condition in the trust relationship must match the exact namespace and ServiceAccount name used by CrewAI pods. With rbac.create: true (default), the ServiceAccount is named crewai-sa. Adjust the namespace (crewai in the example) to match your deployment namespace.
- Configure Helm values with the IRSA annotation:
Helm Configuration:
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT:role/CrewAIIRSARole
envVars:
STORAGE_SERVICE: "amazon"
AWS_REGION: "us-east-1"
AWS_BUCKET: "crewai-prod-storage"
rbac:
create: true
The same IAM role can grant both S3 and ECR permissions. See Combined IAM Policy for a single role covering all AWS services.
Application Load Balancer (ALB)
CrewAI Platform requires specific ALB configuration to support long-running crew executions and WebSocket connections.
CrewAI-Specific ALB Requirements
CrewAI’s architecture has specific needs:
- Long-running requests: Crew executions can take 5+ minutes
- WebSocket support: ActionCable requires persistent connections
- Session affinity: Not required (stateless application)
ALB Security Group Configuration
The ALB security group should allow:
- Inbound: HTTPS (443) from your allowed CIDR ranges (e.g.,
0.0.0.0/0 for public access)
- Outbound: HTTP to EKS worker node security group on NodePort range
EKS worker node security group should allow:
- Inbound: HTTP from ALB security group
ACM Certificate
CrewAI requires a valid SSL certificate:
# Request certificate (DNS validation recommended)
aws acm request-certificate \
--domain-name crewai.your-company.com \
--validation-method DNS \
--region us-east-1
# Note the certificate ARN for use in Helm values
Internet-Facing vs Internal ALB
The scheme setting controls whether the ALB is publicly accessible or restricted to your internal network.
The scheme value is case-sensitive. The AWS Load Balancer Controller only accepts lowercase values. Using "Internal" or "Internet-Facing" (with capital letters) will cause the ALB to silently fail to provision.
| Scheme | Value | Use Case |
|---|
| Internet-facing | internet-facing | Public access from the internet |
| Internal | internal | Private access within your VPC only |
Internet-facing ALB (default):
web:
ingress:
enabled: true
className: "alb"
host: "crewai.company.com"
alb:
scheme: "internet-facing"
targetType: "ip"
certificateArn: "arn:aws:acm:REGION:ACCOUNT:certificate/CERTIFICATE_ID"
sslPolicy: "ELBSecurityPolicy-TLS-1-2-2017-01"
Internal ALB:
web:
ingress:
enabled: true
className: "alb"
host: "crewai.internal.company.com"
alb:
scheme: "internal"
targetType: "ip"
certificateArn: "arn:aws:acm:REGION:ACCOUNT:certificate/CERTIFICATE_ID"
sslPolicy: "ELBSecurityPolicy-TLS-1-2-2017-01"
ALB Subnet Selection
The AWS Load Balancer Controller uses subnet tags to automatically discover which subnets to place the ALB in. If the wrong subnets are tagged, the ALB will be created in unintended subnets.
Required subnet tags:
| ALB Scheme | Required Tag | Tag Value |
|---|
internet-facing | kubernetes.io/role/elb | 1 |
internal | kubernetes.io/role/internal-elb | 1 |
Ensure only the subnets where you want the ALB placed have the appropriate tag. Remove the tag from any subnets that should not host the ALB.
Verify current subnet tags:
aws ec2 describe-subnets \
--filters "Name=vpc-id,Values=YOUR_VPC_ID" \
--query 'Subnets[*].{SubnetId:SubnetId,AZ:AvailabilityZone,CIDR:CidrBlock,Tags:Tags[?Key==`kubernetes.io/role/internal-elb` || Key==`kubernetes.io/role/elb`]}' \
--output table
Tag the correct EKS subnets (internal ALB example):
aws ec2 create-tags \
--resources subnet-EXAMPLE1 subnet-EXAMPLE2 \
--tags Key=kubernetes.io/role/internal-elb,Value=1
Remove tags from incorrect subnets:
aws ec2 delete-tags \
--resources subnet-WRONG1 subnet-WRONG2 \
--tags Key=kubernetes.io/role/internal-elb
Explicit Subnet Selection
If you cannot modify subnet tags (e.g., shared VPC environments), you can explicitly specify which subnets the ALB should use via the web.ingress.annotations field:
web:
ingress:
enabled: true
className: "alb"
host: "crewai.internal.company.com"
annotations:
alb.ingress.kubernetes.io/subnets: "subnet-abc123,subnet-def456"
alb:
scheme: "internal"
targetType: "ip"
certificateArn: "arn:aws:acm:REGION:ACCOUNT:certificate/CERTIFICATE_ID"
sslPolicy: "ELBSecurityPolicy-TLS-1-2-2017-01"
When using explicit subnet annotations, the controller bypasses tag-based auto-discovery entirely for that ingress resource.
ALB Security Features
You can enable additional security features on the ALB using web.ingress.annotations. These annotations are passed directly to the AWS Load Balancer Controller.
Deletion Protection
Prevents accidental deletion of the ALB:
web:
ingress:
annotations:
alb.ingress.kubernetes.io/load-balancer-attributes: "deletion_protection.enabled=true"
With deletion protection enabled, helm uninstall and kubectl delete ingress will not be able to remove the ALB. You must first disable deletion protection manually before teardown:aws elbv2 modify-load-balancer-attributes \
--load-balancer-arn YOUR_ALB_ARN \
--attributes Key=deletion_protection.enabled,Value=false
Access Logs
Enables logging of all requests to an S3 bucket:
web:
ingress:
annotations:
alb.ingress.kubernetes.io/load-balancer-attributes: "access_logs.s3.enabled=true,access_logs.s3.bucket=YOUR_LOG_BUCKET,access_logs.s3.prefix=crewai-alb"
The S3 bucket must have a bucket policy that allows the Elastic Load Balancing service to write logs. See AWS documentation on access log bucket requirements for the required policy and the ELB account ID for your region.
WAF Integration
Attach an AWS WAFv2 Web ACL to protect the ALB:
web:
ingress:
annotations:
alb.ingress.kubernetes.io/wafv2-acl-arn: "arn:aws:wafv2:REGION:ACCOUNT:regional/webacl/WEB_ACL_NAME/WEB_ACL_ID"
The Web ACL must already exist in the same region as the ALB. List available Web ACLs with:
aws wafv2 list-web-acls --scope REGIONAL --region YOUR_REGION \
--query 'WebACLs[*].{Name:Name,ARN:ARN}' --output table
Combined Example
To enable all three features together, combine the load-balancer-attributes values in a single annotation:
web:
ingress:
enabled: true
className: "alb"
host: "crewai.internal.company.com"
annotations:
alb.ingress.kubernetes.io/load-balancer-attributes: "deletion_protection.enabled=true,access_logs.s3.enabled=true,access_logs.s3.bucket=crewai-alb-logs,access_logs.s3.prefix=crewai"
alb.ingress.kubernetes.io/wafv2-acl-arn: "arn:aws:wafv2:REGION:ACCOUNT:regional/webacl/WEB_ACL_NAME/WEB_ACL_ID"
alb:
scheme: "internal"
targetType: "ip"
certificateArn: "arn:aws:acm:REGION:ACCOUNT:certificate/CERTIFICATE_ID"
sslPolicy: "ELBSecurityPolicy-TLS-1-2-2017-01"
Pre-existing Security Groups
By default, the AWS Load Balancer Controller automatically creates a security group for the ALB. In environments with restrictive AWS IAM policies that forbid automatic security group creation, you can use a pre-existing security group instead.
Using a Pre-existing Security Group
Specify your existing security group ID via annotation:
web:
ingress:
enabled: true
className: "alb"
host: "crewai.company.com"
annotations:
alb.ingress.kubernetes.io/security-groups: "sg-0123456789abcdef0"
alb:
scheme: "internet-facing"
targetType: "ip"
certificateArn: "arn:aws:acm:REGION:ACCOUNT:certificate/CERTIFICATE_ID"
You can specify multiple security groups as a comma-separated list:
annotations:
alb.ingress.kubernetes.io/security-groups: "sg-0123456789abcdef0,sg-abcdef0123456789a"
Disabling Backend Security Group Management
By default, the AWS Load Balancer Controller also modifies the backend (node/pod) security groups to allow inbound traffic from the ALB. If your AWS policies also forbid modifying existing security groups, disable this behavior:
web:
ingress:
annotations:
alb.ingress.kubernetes.io/security-groups: "sg-0123456789abcdef0"
alb.ingress.kubernetes.io/manage-backend-security-group-rules: "false"
When manage-backend-security-group-rules is set to false, you must manually configure security group rules to allow traffic from the ALB to your EKS worker nodes or pods.Required manual rule on node/pod security group:
- Type: Custom TCP
- Port: Target port (default: 443 for CrewAI)
- Source: ALB security group ID (e.g.,
sg-0123456789abcdef0)
Pre-existing Security Group Requirements
Your pre-existing security group must allow the following traffic:
| Direction | Port | Source/Destination | Purpose |
|---|
| Inbound | 443 | Your allowed CIDRs (e.g., 0.0.0.0/0) | HTTPS traffic to ALB |
| Inbound | 80 | Your allowed CIDRs | HTTP traffic (redirects to HTTPS) |
| Outbound | Target port | EKS node security group | Traffic to backend pods |
Complete Example for Restrictive Environments
For environments where both security group creation and modification are forbidden:
web:
ingress:
enabled: true
className: "alb"
host: "crewai.company.com"
annotations:
# Use pre-existing security group
alb.ingress.kubernetes.io/security-groups: "sg-0123456789abcdef0"
# Don't modify backend security groups
alb.ingress.kubernetes.io/manage-backend-security-group-rules: "false"
# Optionally specify subnets if tag-based discovery is also restricted
alb.ingress.kubernetes.io/subnets: "subnet-abc123,subnet-def456"
alb:
scheme: "internal"
targetType: "ip"
certificateArn: "arn:aws:acm:REGION:ACCOUNT:certificate/CERTIFICATE_ID"
sslPolicy: "ELBSecurityPolicy-TLS-1-2-2017-01"
Amazon ECR for Container Images
CrewAI Platform requires Amazon ECR for storing crew automation container images. When users create and deploy crews, CrewAI builds container images and pushes them to ECR.
ECR Repository Requirements
Critical Requirements:
- Repository URI must end in
/crewai-enterprise
- Immutable tags must be disabled (CrewAI overwrites tags for crew versions)
- Lifecycle policies recommended to manage old images
Create ECR Repository
# Create ECR repository with correct naming
aws ecr create-repository \
--repository-name your-org/crewai-enterprise \
--region us-east-1 \
--image-scanning-configuration scanOnPush=true
# Disable immutable tags (required for CrewAI)
aws ecr put-image-tag-mutability \
--repository-name your-org/crewai-enterprise \
--image-tag-mutability MUTABLE \
--region us-east-1
# Optional: Set lifecycle policy to clean up untagged images
aws ecr put-lifecycle-policy \
--repository-name your-org/crewai-enterprise \
--lifecycle-policy-text '{
"rules": [{
"rulePriority": 1,
"description": "Remove untagged images after 7 days",
"selection": {
"tagStatus": "untagged",
"countType": "sinceImagePushed",
"countUnit": "days",
"countNumber": 7
},
"action": {"type": "expire"}
}]
}'
Valid repository URIs:
- ✅
123456789012.dkr.ecr.us-east-1.amazonaws.com/crewai-enterprise
- ✅
123456789012.dkr.ecr.us-east-1.amazonaws.com/my-org/crewai-enterprise
- ✅
123456789012.dkr.ecr.us-east-1.amazonaws.com/prod/crewai-enterprise
- ❌
123456789012.dkr.ecr.us-east-1.amazonaws.com/crewai (must end in /crewai-enterprise)
- ❌
123456789012.dkr.ecr.us-east-1.amazonaws.com/crewai-platform (wrong suffix)
ECR Authentication with Pod Identity
CrewAI pods require ECR push and pull permissions for building and deploying crew images.
Create IAM policy for ECR access:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"ecr:PutImage",
"ecr:InitiateLayerUpload",
"ecr:UploadLayerPart",
"ecr:CompleteLayerUpload"
],
"Resource": "arn:aws:ecr:us-east-1:ACCOUNT:repository/*/crewai-enterprise"
}
]
}
Attach ECR policy to Pod Identity role:
# Create ECR policy
aws iam create-policy \
--policy-name CrewAIECRAccess \
--policy-document file://ecr-policy.json
# Attach to existing Pod Identity role
aws iam attach-role-policy \
--role-name CrewAIPodIdentityRole \
--policy-arn arn:aws:iam::ACCOUNT:policy/CrewAIECRAccess
Combined IAM Policy (S3 + ECR)
For production deployments using Pod Identity, combine S3 and ECR permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "S3Access",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::crewai-prod-storage",
"arn:aws:s3:::crewai-prod-storage/*"
]
},
{
"Sid": "ECRAuthToken",
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken"
],
"Resource": "*"
},
{
"Sid": "ECRPushPull",
"Effect": "Allow",
"Action": [
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"ecr:PutImage",
"ecr:InitiateLayerUpload",
"ecr:UploadLayerPart",
"ecr:CompleteLayerUpload"
],
"Resource": "arn:aws:ecr:us-east-1:ACCOUNT:repository/*/crewai-enterprise"
}
]
}
Helm Configuration for ECR
envVars:
# ECR registry configuration (REQUIRED - deployment will fail if not set)
CREW_IMAGE_REGISTRY_OVERRIDE: "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-org"
# Note: The /crewai-enterprise suffix is added automatically by CrewAI Platform
# The Helm chart validates this field is set before deployment
# S3 configuration
STORAGE_SERVICE: "amazon"
AWS_REGION: "us-east-1"
AWS_BUCKET: "crewai-prod-storage"
# rbac.create: true (default) auto-creates ServiceAccount crewai-sa for Pod Identity
rbac:
create: true
ECR Authentication: credHelper vs Pod Identity / IRSA
When mirroring platform images to ECR and using image.registries[].credHelper: "ecr-login", be aware of an important distinction:
credHelper does not work for kubelet image pulls. Kubernetes kubelet only supports static auths entries from kubernetes.io/dockerconfigjson secrets — it does not execute credential helper binaries. The credHelper configuration is only effective for BuildKit, which mounts the docker config directly and can invoke the helper binary (if present in the BuildKit image).For kubelet to pull platform images from ECR, use one of these approaches:
- Node-level ECR auth via EC2 instance profile (EKS worker nodes do this automatically for same-account ECR registries)
- A CronJob or controller (e.g., ecr-credential-helper) that periodically refreshes static tokens in the image pull secret
- Pod Identity or IRSA on the node (for cross-account ECR access)
For crew image builds and pushes (via BuildKit), Pod Identity or IRSA on the crewai-sa ServiceAccount is the recommended approach. BuildKit inherits the pod’s IAM credentials and can authenticate to ECR without static tokens.
Verifying ECR Access
Test ECR authentication from CrewAI pods:
# Check if pod can authenticate to ECR
kubectl exec -it deploy/crewai-web -- aws ecr get-login-password --region us-east-1
# Test ECR push permissions (requires BuildKit)
kubectl exec -it deploy/crewai-buildkit -- \
buildctl debug workers list
AWS Secrets Manager Integration
AWS Secrets Manager provides centralized secret management with automatic rotation for CrewAI Platform.
Which Secrets to Store
Store in AWS Secrets Manager (sensitive, need rotation):
DB_PASSWORD - Database credentials
SECRET_KEY_BASE - Rails secret key
ENTRA_ID_CLIENT_SECRET / OKTA_CLIENT_SECRET - OAuth secrets
AWS_SECRET_ACCESS_KEY - If using static S3 credentials
GITHUB_TOKEN - For private repository access
Keep in values.yaml (configuration, not secrets):
DB_HOST, DB_PORT, DB_USER, POSTGRES_DB, POSTGRES_CABLE_DB
AWS_REGION, AWS_BUCKET
APPLICATION_HOST
AUTH_PROVIDER
Secret Structure in Secrets Manager
CrewAI expects secrets in specific formats. Two options:
Option 1: Single Secret with Multiple Keys
Create one secret crewai/platform with JSON structure:
{
"DB_PASSWORD": "your-db-password",
"SECRET_KEY_BASE": "your-secret-key-base",
"ENTRA_ID_CLIENT_ID": "your-client-id",
"ENTRA_ID_CLIENT_SECRET": "your-client-secret",
"ENTRA_ID_TENANT_ID": "your-tenant-id"
}
Option 2: Separate Secrets
Create individual secrets:
crewai/db-password
crewai/secret-key-base
crewai/entra-id-credentials (JSON with client_id, client_secret, tenant_id)
External Secrets Operator Setup
CrewAI uses External Secrets Operator (ESO) to sync secrets from AWS Secrets Manager to Kubernetes.
Install ESO (if not already installed):
helm repo add external-secrets https://charts.external-secrets.io
helm install external-secrets \
external-secrets/external-secrets \
--namespace external-secrets-operator \
--create-namespace
Create IAM Policy for ESO:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"secretsmanager:GetSecretValue",
"secretsmanager:DescribeSecret"
],
"Resource": "arn:aws:secretsmanager:us-east-1:ACCOUNT:secret:crewai/*"
}
]
}
Helm Configuration:
# Enable external secret store
externalSecret:
enabled: true
secretStore: "crewai-secret-store"
secretPath: "crewai/platform" # Path to your Secrets Manager secret
# Control which secrets to sync
includes_aws_credentials: false # Set true if S3 credentials in Secrets Manager
includes_azure_credentials: false
# Configure SecretStore resource
secretStore:
enabled: true
provider: "aws"
aws:
region: "us-east-1"
# Use IRSA for ESO authentication (recommended)
auth:
serviceAccount:
enabled: true
name: "crewai-secrets-reader"
# Annotate with IAM role ARN after deployment:
# kubectl annotate serviceaccount crewai-secrets-reader \
# eks.amazonaws.com/role-arn=arn:aws:iam::ACCOUNT:role/CrewAISecretsReader
Secret Mapping Example
If using single secret with JSON structure:
externalSecret:
enabled: true
secretStore: "crewai-secret-store"
secretPath: "crewai/platform"
# CrewAI will automatically map these from JSON keys:
# DB_PASSWORD -> crewai/platform:DB_PASSWORD
# SECRET_KEY_BASE -> crewai/platform:SECRET_KEY_BASE
# ENTRA_ID_CLIENT_ID -> crewai/platform:ENTRA_ID_CLIENT_ID
# etc.
Complete AWS Deployment Example
Here’s a complete production configuration for AWS:
# values-aws-production.yaml
# Disable internal services (use AWS managed services)
postgres:
enabled: false
minio:
enabled: false
# Database configuration (RDS)
envVars:
DB_HOST: "crewai-prod.cluster-abc123.us-east-1.rds.amazonaws.com"
DB_PORT: "5432"
DB_USER: "crewai"
POSTGRES_DB: "crewai_plus_production"
POSTGRES_CABLE_DB: "crewai_plus_cable_production"
RAILS_MAX_THREADS: "5"
DB_POOL: "5"
# S3 configuration (using Pod Identity, no credentials needed)
STORAGE_SERVICE: "amazon"
AWS_REGION: "us-east-1"
AWS_BUCKET: "crewai-prod-storage"
# ECR configuration (REQUIRED - using Pod Identity, no credentials needed)
CREW_IMAGE_REGISTRY_OVERRIDE: "123456789012.dkr.ecr.us-east-1.amazonaws.com/production"
# Note: /crewai-enterprise suffix is added automatically
# Chart validates this field is set before deployment
# Application configuration
APPLICATION_HOST: "crewai.company.com"
AUTH_PROVIDER: "entra_id"
RAILS_LOG_LEVEL: "info"
# External secrets from AWS Secrets Manager
externalSecret:
enabled: true
secretStore: "crewai-secret-store"
secretPath: "crewai/platform"
includes_aws_credentials: false # Using Pod Identity
secretStore:
enabled: true
provider: "aws"
aws:
region: "us-east-1"
auth:
serviceAccount:
enabled: true
name: "crewai-secrets-reader"
# Web application configuration
web:
replicaCount: 3 # HA deployment
resources:
requests:
cpu: "1000m"
memory: "6Gi"
limits:
cpu: "6"
memory: "12Gi"
# ALB ingress (scheme, targetType, certificateArn, sslPolicy are set via alb.*)
ingress:
enabled: true
className: "alb"
host: "crewai.company.com"
# Additional annotations beyond what alb.* provides
# Do NOT duplicate scheme/targetType/certificateArn here -- they are set by alb.* below
annotations:
alb.ingress.kubernetes.io/ssl-redirect: "443"
alb.ingress.kubernetes.io/target-group-attributes: idle_timeout.timeout_seconds=300
alb.ingress.kubernetes.io/healthcheck-path: /up
alb.ingress.kubernetes.io/tags: Environment=production,Application=crewai
alb:
scheme: "internet-facing"
targetType: "ip"
certificateArn: "arn:aws:acm:us-east-1:123456789012:certificate/abc-123"
sslPolicy: "ELBSecurityPolicy-TLS13-1-2-2021-06"
# Worker configuration
worker:
replicaCount: 3
resources:
requests:
cpu: "1000m"
memory: "6Gi"
limits:
cpu: "6"
memory: "12Gi"
# BuildKit for crew builds
buildkit:
enabled: true
replicaCount: 1
resources:
requests:
cpu: "500m"
memory: "2Gi"
limits:
cpu: "4"
memory: "8Gi"
# RBAC - auto-creates ServiceAccount crewai-sa for Pod Identity / IRSA
rbac:
create: true
# For IRSA, add the role annotation:
# serviceAccount:
# annotations:
# eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/crewai-platform
Deploy:
# Deploy CrewAI Platform
helm install crewai-platform \
oci://registry.crewai.com/crewai/stable/crewai-platform \
--values values-aws-production.yaml \
--namespace crewai
Troubleshooting AWS-Specific Issues
ALB Not Provisioning
Symptoms: Ingress shows no ADDRESS after several minutes
kubectl get ingress --namespace crewai
# NAME CLASS HOSTS ADDRESS PORTS AGE
# crewai-ingress alb crewai.company.com 80, 443 5m
Common causes:
- Incorrect
scheme casing — The scheme value is case-sensitive. Use "internal" or "internet-facing" (all lowercase). Values like "Internal" or "Internet-Facing" are silently rejected. See Internet-Facing vs Internal ALB.
- AWS Load Balancer Controller not installed or not running
- Insufficient IAM permissions for LBC
- Subnet tags missing for ALB discovery
Check LBC status:
kubectl get deployment -n kube-system aws-load-balancer-controller
kubectl logs -n kube-system deployment/aws-load-balancer-controller
Verify subnet tags (required for ALB):
- Public subnets:
kubernetes.io/role/elb=1
- Private subnets:
kubernetes.io/role/internal-elb=1
ALB Created in Wrong Subnets
Symptoms: ALB is provisioned but placed in subnets outside the EKS cluster VPC or in unintended subnets.
Common causes:
- Multiple subnets tagged with
kubernetes.io/role/elb or kubernetes.io/role/internal-elb across different VPCs or availability zones
- Shared VPC where non-EKS subnets also carry the discovery tag
Resolution:
- Identify which subnets the EKS cluster uses:
aws eks describe-cluster --name YOUR_CLUSTER \
--query 'cluster.resourcesVpcConfig.subnetIds' --output table
-
Remove discovery tags from non-EKS subnets and ensure only the correct subnets are tagged. See ALB Subnet Selection for details.
-
Alternatively, use the explicit subnet annotation to bypass auto-discovery:
web:
ingress:
annotations:
alb.ingress.kubernetes.io/subnets: "subnet-abc123,subnet-def456"
- After fixing tags or adding the annotation, delete the ingress and re-deploy to force the LBC to re-discover subnets.
RDS Connection Timeout
Symptoms: Pods show could not connect to server: Connection timed out
Check security groups:
# Verify RDS security group allows inbound from EKS worker nodes
aws ec2 describe-security-groups --group-ids sg-xxxxx
# Check EKS node security group
aws ec2 describe-security-groups --filters "Name=tag:aws:eks:cluster-name,Values=your-cluster"
Test connectivity from pod:
kubectl run -it --namespace crewai --rm debug --image=postgres:16 --restart=Never -- \
psql -h crewai-prod.cluster-abc123.us-east-1.rds.amazonaws.com -U crewai -d crewai_plus_production
S3 Access Denied
Symptoms: Logs show Access Denied or 403 errors for S3 operations
Verify authentication method:
For Pod Identity:
# Check Pod Identity Agent is running
kubectl get daemonset -n kube-system eks-pod-identity-agent
# List associations
aws eks list-pod-identity-associations --cluster-name your-cluster
For IRSA:
# Verify service account annotation
kubectl get serviceaccount crewai-sa -o yaml | grep eks.amazonaws.com/role-arn
# Test from pod
kubectl exec -it deploy/crewai-web -- aws sts get-caller-identity
kubectl exec -it deploy/crewai-web -- aws s3 ls s3://crewai-prod-storage/
For Static Keys:
# Verify secrets exist
kubectl get secret crewai-secrets -o jsonpath='{.data.AWS_ACCESS_KEY_ID}' | base64 -d
Secrets Manager Access Denied
Symptoms: ExternalSecret shows SecretSyncedError
# Check ExternalSecret status
kubectl get externalsecret
kubectl describe externalsecret crewai-external-secret
# Check SecretStore status
kubectl get secretstore
kubectl describe secretstore crewai-secret-store
# Verify ESO can assume role
kubectl logs -n external-secrets-operator deployment/external-secrets
ECR Builds Fail After Running Successfully
Symptoms: Crew builds worked initially but start failing with unauthorized or denied errors after several hours
Common causes:
- ECR auth token expiration — ECR tokens obtained via
aws ecr get-login-password expire after 12 hours. If you configured image.registries with static username/password (using an ECR token), builds will fail once the token expires.
credHelper binary not present — If using credHelper: "ecr-login" in image.registries, the ecr-login binary must be available in the BuildKit image. Without it, the docker config references a helper that doesn’t exist, causing silent auth failures.
Resolution:
The recommended approach is to use Pod Identity or IRSA instead of static ECR tokens. With Pod Identity or IRSA, credentials are automatically rotated and never expire:
# Verify Pod Identity association exists and targets the correct service account
aws eks list-pod-identity-associations --cluster-name your-cluster \
--query 'associations[?serviceAccount==`crewai-sa`]'
# Or verify IRSA annotation on the service account
kubectl get serviceaccount crewai-sa -n crewai -o yaml | grep eks.amazonaws.com/role-arn
If you must use static tokens, set up a CronJob to refresh them before expiration:
# Manually refresh the ECR token (valid for 12 hours)
ECR_TOKEN=$(aws ecr get-login-password --region us-east-1)
kubectl create secret docker-registry docker-registry \
--docker-server=123456789012.dkr.ecr.us-east-1.amazonaws.com \
--docker-username=AWS \
--docker-password="$ECR_TOKEN" \
--namespace crewai \
--dry-run=client -o yaml | kubectl apply -f -
BuildKit Cannot Push to ECR
Symptoms: BuildKit logs show failed to push or unauthorized: authentication required when building crew images
Diagnosis:
# Check BuildKit logs
kubectl logs -l app.kubernetes.io/component=buildkit --tail=100
# Verify the docker config secret mounted in BuildKit
kubectl get secret docker-registry -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d | python3 -m json.tool
Common causes:
- Missing ECR permissions — The IAM role must include
ecr:PutImage, ecr:InitiateLayerUpload, ecr:UploadLayerPart, and ecr:CompleteLayerUpload. See Combined IAM Policy.
- ECR repository doesn’t exist — The repository path ending in
/crewai-enterprise must be pre-created in ECR. CrewAI does not auto-create ECR repositories.
- Pod Identity / IRSA not reaching BuildKit — Ensure the Pod Identity association or IRSA trust relationship references the correct ServiceAccount (
crewai-sa) and namespace.
# Verify the ECR repository exists
aws ecr describe-repositories \
--repository-names "my-org/crewai-enterprise" \
--region us-east-1
# Test ECR auth from the BuildKit pod
kubectl exec -it deploy/crewai-buildkit -- \
buildctl debug workers
ServiceAccount Mismatch with Pod Identity or IRSA
Symptoms: Pods show AccessDeniedException, ExpiredTokenException, or WebIdentityErr despite correct IAM policies
Common causes:
- Wrong ServiceAccount name in association/trust — Pod Identity associations and IRSA trust relationships must reference the exact ServiceAccount name (
crewai-sa by default) and namespace. A mismatch means pods get no IAM credentials.
- Namespace mismatch — The namespace in the Pod Identity association or IRSA trust policy must match the Helm release namespace.
Diagnosis:
# Check which ServiceAccount the pods are using
kubectl get pods -l app.kubernetes.io/component=web -o jsonpath='{.items[0].spec.serviceAccountName}'
# Verify Pod Identity association matches
aws eks list-pod-identity-associations --cluster-name your-cluster
# For IRSA: check the trust relationship
aws iam get-role --role-name YOUR_ROLE_NAME \
--query 'Role.AssumeRolePolicyDocument' --output json