AWS CDK for EKS: Falling Short in Real-World, Multi-Account Kubernetes Deployments

AWS CDK for EKS: Falling Short in Real-World, Multi-Account Kubernetes Deployments

AWS Cloud Development Kit (CDK) aims to simplify cloud infrastructure provisioning using familiar programming languages. While its EKS module promises to streamline Kubernetes cluster creation and management, a closer look reveals significant shortcomings, especially when considering practical, multi-account EKS deployments. This article will delve into these limitations, arguing that AWS CDK’s current EKS implementation, particularly the Cluster.addManifest function, is not truly useful for organizations adopting a shared, multi-account EKS strategy.

The Illusion of Simplicity: Cluster.addManifest and its Account Boundaries

The Cluster.addManifest(id: string, ...manifest: Record<string, any>[]): KubernetesManifest function in CDK appears to offer a straightforward way to deploy Kubernetes manifests to an EKS cluster. However, this simplicity is deceptive when considering real-world scenarios where EKS clusters are designed to be shared across multiple AWS accounts.

In practice, a central EKS cluster is often shared by various teams or applications residing in separate AWS accounts. This multi-account approach is crucial for security, isolation, and cost management. However, Cluster.addManifest operates under the implicit assumption of a single account and region deployment.

Evidence of this Limitation:

Ignoring the Network Foundation: A House Without Proper Plumbing

A truly practical EKS solution, especially in multi-account setups, hinges on a robust network foundation. This typically involves:

However, AWS CDK’s EKS implementation, including blueprints like aws-quickstart/cdk-eks-blueprints, often overlooks or simplifies this critical network layer. While these tools may automate EKS cluster creation and even VPC provisioning, they frequently fall short of providing comprehensive, automated solutions for setting up Transit Gateways or VPC sharing as an integral part of the EKS deployment process.

In real-world EKS architectures, the network layer is not an afterthought; it is the foundation upon which a secure, scalable, and multi-account Kubernetes environment is built. CDK’s focus on simplifying cluster creation while abstracting away network complexities leads to solutions that are ill-equipped for production-grade, shared EKS deployments.

Token Resolution Failures: CDK’s Promise Undermined

CDK’s strength lies in its use of tokens – placeholders that are resolved during deployment, allowing for dynamic configurations and resource references. However, Cluster.addManifest fails to properly resolve these tokens, further hindering its practicality.

CDK tokens are designed to be resolved within the scope of a single CDK application and CloudFormation stack. When attempting to use a token from a resource within a manifest deployed to a cluster using Cluster.addManifest, token resolution often breaks down. CDK’s default token resolution mechanisms are simply not designed to traverse account boundaries.

This limitation forces users to abandon CDK’s elegant token-based approach and resort to manually passing concrete values – such as VPC IDs, subnet IDs, and security group IDs – as context parameters or environment variables to their CDK applications. This manual value passing is not only less elegant but also introduces more opportunities for errors and reduces the overall benefits of using CDK in the first place.

Solution begin with networking:

Prerequisite: Embracing Application-Centric Infrastructure in the Cloud 1

In this diagram:

(Original text refers to a diagram not present here)

AWS Account Networking running two isolated envers, NT Enver LE and NT Enver Prod, take NT Enver Prod as example:

  1. One transit gateway connecting multiple VPCs across multiple accounts (same region);
  2. One NAT to share internet with all connected VPCs;
  3. One IPAM and CIDR pool for all connected VPCs’ subnet to avoid IP conflicts;
  4. Not showing: subnet, routing, SG, DNS, hostedzone, Org, admin delegation, cert, …

Central VPC as proxy to deploy resources cross VPCs:

  1. Running Lambdas to deploy k8s manifest to different EKS clusters from different envers
  2. Running Lambdas to deploy DB/Schema/Role/User to different RDS clusters from different envers

Central VPC as proxy/hub to connect cross VPCs: pods in EKS, tasks in ECS connecting to different databases in different RDS clusters from different envers:

Enver 1:

Declare/control all resources (marked green) inside logically, including k8s manifests and database’s related resources (db, schema, role … ), deploy or roll back in transaction.

  1. Manifest deployed to EKS Enver1’s cluster, pod assuming Iam role 1 (thru SA/oidc) to access dynamoDB.
  2. Database, schema, role/user deployed to RDS Enver Prod, ECS task inside assuming iam role 2 to access DB hosted thru TGW

Enver 2:

Declare/control all resources (marked purple) inside logically too, after Manifest deployed to EKS Enver1’ cluster, pod will:

  1. Assume IAM role A to access DB thru transit gateway (no vpc needed in Enver 2!)
  2. Assume IAM role B to access S3bucket for files.

The platform takes care of the deployments, so that apps and services just focus on business logic/function:

  1. The k8s manifests, declared in Enver 1 and Enver 2 will be sent to EKS Cluster thru Central Account’s Lambda function in VPC-Prod.
  2. The DB schema/role/user declared in Enver 1 and Enver 2 will be sent to RDS Cluster thru Central Account’s Lambda function in VPC-Prod.

A Working Example of Enver-Centric Design

https://github.com/ondemandenv/spring-boot-swagger-3-example

This project exemplifies an application-centric approach to cloud-native development, where all resources (application code, infrastructure, dependencies) are defined as a single vertical “enver” - a self-contained bounded context that deploys/rolls back as a unit.

1. Vertical Resource Ownership.

All resources needed to deliver this Tutorials API capability are co-located:

2. Environment as Versioned Unit.

Each “enver” contains:

3. Platform Services Abstraction

(Original text might contain more here)

Critical Integration Points

1. IAM Role Binding (CDK Stack)

// cdk/lib/cdk-stack.ts
const podSaRole = new Role(this, 'podSaRole', {
    assumedBy: new FederatedPrincipal(
        myEnver.oidcProvider.getSharedValue(this), // From platform enver
        {
            StringEquals: {
                [`${oidcProvider}:aud`]: 'sts.amazonaws.com',
                [`${oidcProvider}:sub`]: `system:serviceaccount:${namespace}:${serviceAccountName}`
            }
        },
        'sts:AssumeRoleWithWebIdentity'
    )
});

2. Environment-Aware Configuration

// src/main/java/com/bezkoder/spring/swagger/config/OpenAPIConfig.java
@Value("${aws.s3.bucket-name}")
private String bucketName; // Injected from enver-specific config

@Bean
public S3Client s3Client() {
    return S3Client.builder()
        .credentialsProvider(WebIdentityTokenFileCredentialsProvider.create())
        .build(); // Auto-utilizes IRSA credentials
}

3. Infrastructure Consistency

// cdk/lib/cdk-stack.ts
new cdk8splus.Deployment(chart, 'to-eks', {
    containers: [{
        image: ContainerImage.fromEcrRepository(
            Repository.fromRepositoryName(this, 'repo',
                myEnver.appImgRepoRef.getSharedValue(this)), // Shared ECR
            Fn.select(0, Fn.split(',', // Git SHA-based tagging
                myEnver.appImgLatestRef.getSharedValue(this)))
        ),
        envVariables: {
            bucket_arn: {value: bucket.bucketArn}, // Enver-owned bucket
            region: {value: this.region} // Inherited from platform
        }
    }]
});

Key Benefits

Obviously the whole design and implementation will depend on platform:

Critical Platform Services

What Developers Never See

This platform transforms AWS multi-account complexity into safe, self-service enver deployments while maintaining enterprise-grade security - exactly what raw AWS CDK/EKS lacks.