How I Built a Multi-Account AWS Platform for an Enterprise Media Company

Project Duration: Q4 2025 - Q1 2026 Infrastructure Scale: 6 AWS accounts, 2 production EKS clusters, 50+ engineers Role: Staff Cloud Architect & Platform Lead


The Challenge

An enterprise media company was facing a classic scaling problem: multiple platform teams (advertising, data analytics, content hosting) were duplicating infrastructure across siloed AWS accounts. Each team built their own VPCs, security baselines, and CI/CD pipelines—resulting in:

  • Inconsistent security controls across accounts
  • Wasted engineering time rebuilding the same infrastructure
  • No centralized networking (teams couldn’t share resources)
  • Compliance gaps (CloudTrail, GuardDuty deployed inconsistently)
  • Slow onboarding for new products (weeks to provision infra)

The company needed a shared platform foundation that would:

  1. Prevent infrastructure duplication (DRY principle)
  2. Maintain team ownership boundaries (no central bottleneck)
  3. Enforce security/compliance standards consistently
  4. Enable fast self-service infrastructure for app teams

I was hired to design and implement this “Enterprise Cloud Platform” (ECP).


The Solution: Multi-Account AWS Foundation

Design Philosophy: “4 M’s”

  • Multiple Accounts: Billing isolation, blast radius containment
  • Multiple Regions: us-east-1 primary, disaster recovery ready
  • Multiple Environments: Dev, nonprod, prod separation
  • Multiple Stacks: Modular Terraform/Terragrunt for reusability

Account Structure

We built 6 AWS accounts under a single AWS Organization:

┌─────────────────────────────────────────────┐
│          AWS Organization (Root)            │
└─────────────┬───────────────────────────────┘
              │
     ┌────────┴────────┬──────────────────┐
     │                 │                  │
┌────▼────┐      ┌─────▼──────┐    ┌─────▼────────┐
│Management│      │ Security OU │    │Infrastructure│
│ Account  │      └─────┬──────┘    │     OU       │
└──────────┘            │           └──────┬───────┘
                        │                  │
              ┌─────────▼──────┐    ┌──────▼────────┐
              │  Security-Prod │    │  Infra-Prod   │
              │   (Audit Logs, │    │ (Transit GW,  │
              │   GuardDuty)   │    │   Shared DNS) │
              └────────────────┘    └───────────────┘

                ┌─────────────┐
                │Workloads OU │
                └──────┬──────┘
                       │
        ┌──────────────┼──────────────┐
        │              │              │
   ┌────▼────┐   ┌─────▼──────┐ ┌────▼────┐
   │Workloads│   │ Workloads  │ │Workloads│
   │   Dev   │   │  NonProd   │ │  Prod   │
   │(Sandbox)│   │ (Testing)  │ │ (Apps)  │
   └─────────┘   └────────────┘ └─────────┘

Key Design Decisions:

  1. Management Account: Billing, AWS Organizations, cross-account IAM roles (no workloads)
  2. Security-Prod Account: Centralized CloudTrail logs, GuardDuty master, Security Hub
  3. Infrastructure-Prod Account: Shared services (Transit Gateway, private hosted zones)
  4. Workloads Accounts: Dev (sandbox), NonProd (testing), Prod (production apps)

Why this structure?

  • Blast radius containment: Compromise in dev doesn’t affect prod
  • Billing clarity: Each team sees their own costs
  • Compliance: Audit logs centralized, tamper-proof
  • Team autonomy: Platform teams own their workloads accounts

Network Architecture: Transit Gateway Hub-and-Spoke

The Problem

Teams needed to:

  • Share services across accounts (private APIs, databases)
  • Avoid VPC peering mesh (scales poorly beyond 3 accounts)
  • Maintain network isolation between environments

The Solution: Transit Gateway

        ┌─────────────────────────────────┐
        │     Infrastructure-Prod         │
        │   Transit Gateway (us-east-1)   │
        └────────┬────────────────────────┘
                 │
    ┌────────────┼────────────┐
    │            │            │
┌───▼───┐   ┌────▼────┐  ┌───▼────┐
│NonProd│   │  Prod   │  │Security│
│  VPC  │   │   VPC   │  │  VPC   │
│10.1/16│   │ 10.2/16 │  │10.3/16 │
└───────┘   └─────────┘  └────────┘

Each VPC (per account):

  • Public Subnets (2 AZs): ALB/NAT Gateways only
  • Private Subnets (2 AZs): EKS nodes, application workloads
  • Transit Gateway Attachment: Routes to other accounts via TGW

Benefits:

  • ✅ Centralized routing (add new account = 1 TGW attachment)
  • ✅ Network segmentation (route tables control inter-VPC traffic)
  • ✅ No peering mesh complexity (scales to 100+ VPCs)

Security Baseline: Defense in Depth

Layer 1: Account-Level Controls

AWS Organizations + Service Control Policies (SCPs):

  • Deny region access outside us-east-1 (prevent shadow IT)
  • Deny S3 public access unless explicit exception
  • Deny root user access (force IAM)
  • Require MFA for privileged actions

IAM Identity Center (SSO):

  • 8 active users with Okta integration
  • 7 standard roles per account:
    • GitHubActionsRole (OIDC, no long-lived keys)
    • AdminRole (AdministratorAccess)
    • DeveloperRole (PowerUserAccess)
    • DataEngineerRole (S3, Glue, Athena)
    • NetworkAdminRole (VPC, TGW)
    • SecurityAuditorRole (ReadOnly + Security)
    • ReadOnlyRole (ViewOnlyAccess)

Layer 2: Network Security

AWS WAF (on ALB):

  • Rate limiting: 2000 requests/5 minutes
  • OWASP Core Rule Set (SQL injection, XSS blocking)
  • Geo-blocking (optional per app team)

Security Groups (least privilege):

  • ALB: Allow 80/443 from 0.0.0.0/0
  • EKS Nodes: Allow 443 from ALB security group only
  • No public SSH access (SSM Session Manager for debugging)

Layer 3: Monitoring & Detection

CloudTrail (all accounts → Security-Prod):

  • API audit logs with 90-day retention
  • S3 bucket with KMS encryption
  • Immutable logs (prevent tampering)

GuardDuty (Security-Prod master):

  • Threat detection across all accounts
  • EKS protection enabled
  • Findings routed to SNS → Slack (#aws-infra-alerts)

Security Hub:

  • CIS AWS Foundations Benchmark
  • AWS Foundational Security Best Practices
  • Automated remediation for high-severity findings

Observability Stack (Opt-In):

  • CloudWatch Alarms: Unauthorized API calls, IAM changes, root login
  • EventBridge Rules: Real-time GuardDuty/Security Hub routing
  • SNS Topics: Critical/High/Medium severity alerts
  • Lambda Notifier: Slack integration
  • Cost: ~$6-11/month per account

Infrastructure as Code: Terraform + Terragrunt

Multi-Repo Strategy

We chose multi-repo over monorepo for clear ownership:

Repo Purpose Owner
ecp-ou-structure AWS Org, IAM roles, SCPs Infrastructure Team
ecp-network VPCs, Transit Gateway, NAT Infrastructure Team
ecp-security CloudTrail, GuardDuty, WAF Infrastructure Team
github-branch-protection PR rules enforcement Infrastructure Team
tf-live-aws-ad-stack AdStack EKS, ALB, ECR AdStack Team
tf-live-aws-data-delivery MWAA, Glue, Athena Data Team

Why multi-repo?

  • ✅ Clear ownership (Security team owns ecp-security, not shared monorepo)
  • ✅ Team autonomy (AdStack can deploy without Infrastructure approval)
  • ✅ No merge conflicts (teams work independently)
  • ✅ Aligns with org structure (matches Jira projects, Slack channels)

Trade-off: Version management overhead (Terraform/Terragrunt updates require coordination across repos)

Mitigation: Renovate bot (future) for automated dependency updates

Terragrunt for DRY

Problem: Terraform requires duplicating backend config, provider config, and variable declarations across every stack.

Solution: Terragrunt wrapper with shared root.hcl:

# root.hcl (shared across all stacks)
remote_state {
  backend = "s3"
  config = {
    bucket         = "terraform-state-${get_aws_account_id()}"
    key            = "${path_relative_to_include()}/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

generate "provider" {
  path      = "provider.tf"
  if_exists = "overwrite_terragrunt"
  contents  = <<EOF
provider "aws" {
  region = "us-east-1"
  default_tags {
    tags = {
      ManagedBy = "Terraform"
      Environment = "${local.environment}"
    }
  }
}
EOF
}

Result: Each stack is 10-20 lines instead of 100+ (90% reduction in boilerplate)


CI/CD Pipeline: GitOps with GitHub Actions

Workflow

Developer → Feature Branch → PR → terraform-plan → Code Review
                                        ↓
                               Code Owner Approval
                                        ↓
                                 Merge to main
                                        ↓
                              terraform-apply → AWS
                                        ↓
                            Slack Notification + Jira Update

Branch Protection (Enforced via Terraform)

All infrastructure repos require:

  • ✅ 1 approval from designated reviewer
  • ✅ Code owner review (per CODEOWNERS file)
  • terraform-plan workflow must pass
  • ❌ No direct pushes to main
  • ❌ No admin bypass
  • ❌ No force pushes

Why Terraform for branch protection?

  • Codified rules (no manual GitHub UI clicks)
  • Consistent across all repos
  • Auditable (changes tracked in Git)

OIDC Authentication (No Long-Lived Keys)

Problem: Traditional approach uses AWS access keys stored in GitHub Secrets (security risk if leaked).

Solution: GitHub Actions OIDC provider + IAM role trust policy

# GitHubActionsRole in each account
data "aws_iam_policy_document" "github_oidc" {
  statement {
    effect  = "Allow"
    actions = ["sts:AssumeRoleWithWebIdentity"]
    principals {
      type        = "Federated"
      identifiers = ["arn:aws:iam::ACCOUNT_ID:oidc-provider/token.actions.githubusercontent.com"]
    }
    condition {
      test     = "StringEquals"
      variable = "token.actions.githubusercontent.com:sub"
      values   = ["repo:company/ecp-network:ref:refs/heads/main"]
    }
  }
}

Benefits:

  • ✅ No keys to rotate or leak
  • ✅ Scoped per repo and branch
  • ✅ Automatic expiration (temporary credentials)

Jira Integration

Commit Message Requirements:

  • NonProd deploys: IN-123 (Infrastructure Jira ticket)
  • Prod deploys: CHANGE-456 (Change record for audit)

GitHub Actions:

  • Extract Jira ticket from commit message
  • Post comment to Jira: “Deployed to NonProd - PR #89”
  • Update ticket status: In Progress → Deployed

Results & Metrics

Infrastructure Deployed (6 Months)

  • 6 AWS accounts managed via Organizations
  • 2 VPCs (nonprod, prod) with multi-AZ design
  • 7 IAM roles per account (standardized access)
  • 1 EKS cluster (nonprod) running production-ready workloads
  • 5 foundation repos enforcing DRY principles

Time Savings

  • New product infrastructure: 3 weeks → 2 days (93% faster)
  • Security baseline deployment: 2 days → 1 hour (automated via modules)
  • Cross-account network setup: 1 week → 30 minutes (Transit Gateway)

Team Velocity

  • AdStack team: Deployed nonprod EKS cluster in 3 days (vs 3 weeks previously)
  • Data team: Self-service infrastructure with ecp-network modules (no infra team dependency)

Security Posture

  • 100% coverage: CloudTrail, GuardDuty, Security Hub across all accounts
  • Zero manual console changes: All infrastructure via Terraform (auditable)
  • Automated threat detection: GuardDuty findings routed to Slack in real-time

Lessons Learned

What Worked Well

  1. Multi-repo for ownership clarity: Teams felt ownership of their repos (vs shared monorepo)
  2. Terragrunt for DRY: Reduced config duplication by 90%
  3. Branch protection as code: Prevented accidental merges, enforced review process
  4. OIDC for GitHub Actions: No key rotation, better security posture

Challenges & Trade-offs

  1. Version management: Coordinating Terraform updates across 6 repos (future: Renovate bot)
  2. Initial learning curve: Teams unfamiliar with Terragrunt required onboarding
  3. Transit Gateway cost: $0.02/GB cross-account (vs VPC peering free) - worth it for operational simplicity

What I’d Do Differently

  1. Start with monorepo: Build initial MVP in monorepo, split into multi-repo after team ownership stabilizes
  2. Renovate from day 1: Automate dependency updates instead of manual coordination
  3. Cost visibility dashboards: Deploy AWS Cost Explorer dashboards earlier (teams didn’t see costs until Month 3)

Next Steps (Q1-Q2 2026)

  1. Observability baseline: Deploy CloudWatch dashboards + Slack alerting to all accounts
  2. Production EKS: Promote nonprod cluster architecture to prod account
  3. Data platform support: Assist Data Science team with tf-live-aws-data-delivery
  4. Automated dependency updates: Implement Renovate bot for cross-repo Terraform version management

Key Takeaways for Your Organization

If you’re building a multi-account AWS platform, here’s what I’d recommend:

Start Simple

  • Don’t over-architect: Start with 3 accounts (Management, NonProd, Prod), expand later
  • Avoid premature optimization: Get one environment working before replicating

Design for Ownership

  • Multi-repo = clear ownership: If teams are siloed, multi-repo prevents merge conflicts
  • Monorepo = faster refactors: If teams collaborate closely, monorepo enables global changes

Enforce Standards Early

  • Branch protection from day 1: Prevent accidental merges before they happen
  • OIDC over access keys: More secure, easier to audit
  • Terraform for everything: Avoid manual console changes (impossible to track/replicate)

Measure Impact

  • Track time-to-infrastructure: How long does it take to provision a new environment?
  • Security coverage: What % of accounts have CloudTrail, GuardDuty, Security Hub?
  • Team velocity: How fast can app teams deploy without infrastructure team dependency?

Want to Build This for Your Team?

I help companies design and implement multi-account AWS platforms like this one.

What I can do for you:

  • Architecture review of your existing AWS setup
  • Design multi-account strategy tailored to your org structure
  • Implement production-ready EKS clusters with security baseline
  • Establish IaC best practices (Terraform/Terragrunt modules)
  • Set up CI/CD pipelines with branch protection and automated testing

Book a consultation call to discuss your project.


About the Author

Glenn Gray is a Staff Cloud Architect with 12+ years building enterprise AWS platforms. He specializes in multi-account architectures, production EKS, and infrastructure as code. Learn more at graycloudarch.com.

Want to learn more? Check out my course: Building an Enterprise Cloud Platform (coming Feb 2026).