How I Built a Multi-Account AWS Platform for an Enterprise Media Company

Project Duration: Q4 2025 - Q1 2026 Infrastructure Scale: 6 AWS accounts, 2 production EKS clusters, 50+ engineers Role: Staff Cloud Architect & Platform Lead

The Challenge

An enterprise media company was facing a classic scaling problem: multiple platform teams (advertising, data analytics, content hosting) were duplicating infrastructure across siloed AWS accounts. Each team built their own VPCs, security baselines, and CI/CD pipelines—resulting in:

Inconsistent security controls across accounts
Wasted engineering time rebuilding the same infrastructure
No centralized networking (teams couldn’t share resources)
Compliance gaps (CloudTrail, GuardDuty deployed inconsistently)
Slow onboarding for new products (weeks to provision infra)

The company needed a shared platform foundation that would:

Prevent infrastructure duplication (DRY principle)
Maintain team ownership boundaries (no central bottleneck)
Enforce security/compliance standards consistently
Enable fast self-service infrastructure for app teams

I was hired to design and implement this “Enterprise Cloud Platform” (ECP).

The Solution: Multi-Account AWS Foundation

Design Philosophy: “4 M’s”

Multiple Accounts: Billing isolation, blast radius containment
Multiple Regions: us-east-1 primary, disaster recovery ready
Multiple Environments: Dev, nonprod, prod separation
Multiple Stacks: Modular Terraform/Terragrunt for reusability

Account Structure

We built 6 AWS accounts under a single AWS Organization:

┌─────────────────────────────────────────────┐
│          AWS Organization (Root)            │
└─────────────┬───────────────────────────────┘
              │
     ┌────────┴────────┬──────────────────┐
     │                 │                  │
┌────▼────┐      ┌─────▼──────┐    ┌─────▼────────┐
│Management│      │ Security OU │    │Infrastructure│
│ Account  │      └─────┬──────┘    │     OU       │
└──────────┘            │           └──────┬───────┘
                        │                  │
              ┌─────────▼──────┐    ┌──────▼────────┐
              │  Security-Prod │    │  Infra-Prod   │
              │   (Audit Logs, │    │ (Transit GW,  │
              │   GuardDuty)   │    │   Shared DNS) │
              └────────────────┘    └───────────────┘

                ┌─────────────┐
                │Workloads OU │
                └──────┬──────┘
                       │
        ┌──────────────┼──────────────┐
        │              │              │
   ┌────▼────┐   ┌─────▼──────┐ ┌────▼────┐
   │Workloads│   │ Workloads  │ │Workloads│
   │   Dev   │   │  NonProd   │ │  Prod   │
   │(Sandbox)│   │ (Testing)  │ │ (Apps)  │
   └─────────┘   └────────────┘ └─────────┘

Key Design Decisions:

Management Account: Billing, AWS Organizations, cross-account IAM roles (no workloads)
Security-Prod Account: Centralized CloudTrail logs, GuardDuty master, Security Hub
Infrastructure-Prod Account: Shared services (Transit Gateway, private hosted zones)
Workloads Accounts: Dev (sandbox), NonProd (testing), Prod (production apps)

Why this structure?

Blast radius containment: Compromise in dev doesn’t affect prod
Billing clarity: Each team sees their own costs
Compliance: Audit logs centralized, tamper-proof
Team autonomy: Platform teams own their workloads accounts

Network Architecture: Transit Gateway Hub-and-Spoke

The Problem

Teams needed to:

Share services across accounts (private APIs, databases)
Avoid VPC peering mesh (scales poorly beyond 3 accounts)
Maintain network isolation between environments

The Solution: Transit Gateway

        ┌─────────────────────────────────┐
        │     Infrastructure-Prod         │
        │   Transit Gateway (us-east-1)   │
        └────────┬────────────────────────┘
                 │
    ┌────────────┼────────────┐
    │            │            │
┌───▼───┐   ┌────▼────┐  ┌───▼────┐
│NonProd│   │  Prod   │  │Security│
│  VPC  │   │   VPC   │  │  VPC   │
│10.1/16│   │ 10.2/16 │  │10.3/16 │
└───────┘   └─────────┘  └────────┘

Each VPC (per account):

Public Subnets (2 AZs): ALB/NAT Gateways only
Private Subnets (2 AZs): EKS nodes, application workloads
Transit Gateway Attachment: Routes to other accounts via TGW

Benefits:

✅ Centralized routing (add new account = 1 TGW attachment)
✅ Network segmentation (route tables control inter-VPC traffic)
✅ No peering mesh complexity (scales to 100+ VPCs)

Security Baseline: Defense in Depth

Layer 1: Account-Level Controls

AWS Organizations + Service Control Policies (SCPs):

Deny region access outside us-east-1 (prevent shadow IT)
Deny S3 public access unless explicit exception
Deny root user access (force IAM)
Require MFA for privileged actions

IAM Identity Center (SSO):

8 active users with Okta integration
7 standard roles per account:
- GitHubActionsRole (OIDC, no long-lived keys)
- AdminRole (AdministratorAccess)
- DeveloperRole (PowerUserAccess)
- DataEngineerRole (S3, Glue, Athena)
- NetworkAdminRole (VPC, TGW)
- SecurityAuditorRole (ReadOnly + Security)
- ReadOnlyRole (ViewOnlyAccess)

Layer 2: Network Security

AWS WAF (on ALB):

Rate limiting: 2000 requests/5 minutes
OWASP Core Rule Set (SQL injection, XSS blocking)
Geo-blocking (optional per app team)

Security Groups (least privilege):

ALB: Allow 80/443 from 0.0.0.0/0
EKS Nodes: Allow 443 from ALB security group only
No public SSH access (SSM Session Manager for debugging)

Layer 3: Monitoring & Detection

CloudTrail (all accounts → Security-Prod):

API audit logs with 90-day retention
S3 bucket with KMS encryption
Immutable logs (prevent tampering)

GuardDuty (Security-Prod master):

Threat detection across all accounts
EKS protection enabled
Findings routed to SNS → Slack (#aws-infra-alerts)

Security Hub:

CIS AWS Foundations Benchmark
AWS Foundational Security Best Practices
Automated remediation for high-severity findings

Observability Stack (Opt-In):

CloudWatch Alarms: Unauthorized API calls, IAM changes, root login
EventBridge Rules: Real-time GuardDuty/Security Hub routing
SNS Topics: Critical/High/Medium severity alerts
Lambda Notifier: Slack integration
Cost: ~$6-11/month per account

Infrastructure as Code: Terraform + Terragrunt

Multi-Repo Strategy

We chose multi-repo over monorepo for clear ownership:

Repo	Purpose	Owner
`ecp-ou-structure`	AWS Org, IAM roles, SCPs	Infrastructure Team
`ecp-network`	VPCs, Transit Gateway, NAT	Infrastructure Team
`ecp-security`	CloudTrail, GuardDuty, WAF	Infrastructure Team
`github-branch-protection`	PR rules enforcement	Infrastructure Team
`tf-live-aws-ad-stack`	AdStack EKS, ALB, ECR	AdStack Team
`tf-live-aws-data-delivery`	MWAA, Glue, Athena	Data Team

Why multi-repo?

✅ Clear ownership (Security team owns ecp-security, not shared monorepo)
✅ Team autonomy (AdStack can deploy without Infrastructure approval)
✅ No merge conflicts (teams work independently)
✅ Aligns with org structure (matches Jira projects, Slack channels)

Trade-off: Version management overhead (Terraform/Terragrunt updates require coordination across repos)

Mitigation: Renovate bot (future) for automated dependency updates

Terragrunt for DRY

Problem: Terraform requires duplicating backend config, provider config, and variable declarations across every stack.

Solution: Terragrunt wrapper with shared root.hcl:

# root.hcl (shared across all stacks)
remote_state {
  backend = "s3"
  config = {
    bucket         = "terraform-state-${get_aws_account_id()}"
    key            = "${path_relative_to_include()}/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

generate "provider" {
  path      = "provider.tf"
  if_exists = "overwrite_terragrunt"
  contents  = <<EOF
provider "aws" {
  region = "us-east-1"
  default_tags {
    tags = {
      ManagedBy = "Terraform"
      Environment = "${local.environment}"
    }
  }
}
EOF
}

Result: Each stack is 10-20 lines instead of 100+ (90% reduction in boilerplate)

CI/CD Pipeline: GitOps with GitHub Actions

Workflow

Developer → Feature Branch → PR → terraform-plan → Code Review
                                        ↓
                               Code Owner Approval
                                        ↓
                                 Merge to main
                                        ↓
                              terraform-apply → AWS
                                        ↓
                            Slack Notification + Jira Update

Branch Protection (Enforced via Terraform)

All infrastructure repos require:

✅ 1 approval from designated reviewer
✅ Code owner review (per CODEOWNERS file)
✅ terraform-plan workflow must pass
❌ No direct pushes to main
❌ No admin bypass
❌ No force pushes

Why Terraform for branch protection?

Codified rules (no manual GitHub UI clicks)
Consistent across all repos
Auditable (changes tracked in Git)

OIDC Authentication (No Long-Lived Keys)

Problem: Traditional approach uses AWS access keys stored in GitHub Secrets (security risk if leaked).

Solution: GitHub Actions OIDC provider + IAM role trust policy

# GitHubActionsRole in each account
data "aws_iam_policy_document" "github_oidc" {
  statement {
    effect  = "Allow"
    actions = ["sts:AssumeRoleWithWebIdentity"]
    principals {
      type        = "Federated"
      identifiers = ["arn:aws:iam::ACCOUNT_ID:oidc-provider/token.actions.githubusercontent.com"]
    }
    condition {
      test     = "StringEquals"
      variable = "token.actions.githubusercontent.com:sub"
      values   = ["repo:company/ecp-network:ref:refs/heads/main"]
    }
  }
}

Benefits:

✅ No keys to rotate or leak
✅ Scoped per repo and branch
✅ Automatic expiration (temporary credentials)

Jira Integration

Commit Message Requirements:

NonProd deploys: IN-123 (Infrastructure Jira ticket)
Prod deploys: CHANGE-456 (Change record for audit)

GitHub Actions:

Extract Jira ticket from commit message
Post comment to Jira: “Deployed to NonProd - PR #89”
Update ticket status: In Progress → Deployed

Results & Metrics

Infrastructure Deployed (6 Months)

6 AWS accounts managed via Organizations
2 VPCs (nonprod, prod) with multi-AZ design
7 IAM roles per account (standardized access)
1 EKS cluster (nonprod) running production-ready workloads
5 foundation repos enforcing DRY principles

Time Savings

New product infrastructure: 3 weeks → 2 days (93% faster)
Security baseline deployment: 2 days → 1 hour (automated via modules)
Cross-account network setup: 1 week → 30 minutes (Transit Gateway)

Team Velocity

AdStack team: Deployed nonprod EKS cluster in 3 days (vs 3 weeks previously)
Data team: Self-service infrastructure with ecp-network modules (no infra team dependency)

Security Posture

100% coverage: CloudTrail, GuardDuty, Security Hub across all accounts
Zero manual console changes: All infrastructure via Terraform (auditable)
Automated threat detection: GuardDuty findings routed to Slack in real-time

Lessons Learned

What Worked Well

Multi-repo for ownership clarity: Teams felt ownership of their repos (vs shared monorepo)
Terragrunt for DRY: Reduced config duplication by 90%
Branch protection as code: Prevented accidental merges, enforced review process
OIDC for GitHub Actions: No key rotation, better security posture

Challenges & Trade-offs

Version management: Coordinating Terraform updates across 6 repos (future: Renovate bot)
Initial learning curve: Teams unfamiliar with Terragrunt required onboarding
Transit Gateway cost: $0.02/GB cross-account (vs VPC peering free) - worth it for operational simplicity

What I’d Do Differently

Start with monorepo: Build initial MVP in monorepo, split into multi-repo after team ownership stabilizes
Renovate from day 1: Automate dependency updates instead of manual coordination
Cost visibility dashboards: Deploy AWS Cost Explorer dashboards earlier (teams didn’t see costs until Month 3)

Next Steps (Q1-Q2 2026)

Observability baseline: Deploy CloudWatch dashboards + Slack alerting to all accounts
Production EKS: Promote nonprod cluster architecture to prod account
Data platform support: Assist Data Science team with tf-live-aws-data-delivery
Automated dependency updates: Implement Renovate bot for cross-repo Terraform version management

Key Takeaways for Your Organization

If you’re building a multi-account AWS platform, here’s what I’d recommend:

Start Simple

Don’t over-architect: Start with 3 accounts (Management, NonProd, Prod), expand later
Avoid premature optimization: Get one environment working before replicating

Design for Ownership

Multi-repo = clear ownership: If teams are siloed, multi-repo prevents merge conflicts
Monorepo = faster refactors: If teams collaborate closely, monorepo enables global changes

Enforce Standards Early

Branch protection from day 1: Prevent accidental merges before they happen
OIDC over access keys: More secure, easier to audit
Terraform for everything: Avoid manual console changes (impossible to track/replicate)

Measure Impact

Track time-to-infrastructure: How long does it take to provision a new environment?
Security coverage: What % of accounts have CloudTrail, GuardDuty, Security Hub?
Team velocity: How fast can app teams deploy without infrastructure team dependency?

Want to Build This for Your Team?

I help companies design and implement multi-account AWS platforms like this one.

What I can do for you:

Architecture review of your existing AWS setup
Design multi-account strategy tailored to your org structure
Implement production-ready EKS clusters with security baseline
Establish IaC best practices (Terraform/Terragrunt modules)
Set up CI/CD pipelines with branch protection and automated testing

Book a consultation call to discuss your project.

About the Author

Glenn Gray is a Staff Cloud Architect with 12+ years building enterprise AWS platforms. He specializes in multi-account architectures, production EKS, and infrastructure as code. Learn more at graycloudarch.com.

Want to learn more? Check out my course: Building an Enterprise Cloud Platform (coming Feb 2026).

How I Built a Multi-Account AWS Platform for an Enterprise Media Company#

The Challenge#

The Solution: Multi-Account AWS Foundation#

Design Philosophy: “4 M’s”#

Account Structure#

Network Architecture: Transit Gateway Hub-and-Spoke#

The Problem#

The Solution: Transit Gateway#

Security Baseline: Defense in Depth#

Layer 1: Account-Level Controls#

Layer 2: Network Security#

Layer 3: Monitoring & Detection#

Infrastructure as Code: Terraform + Terragrunt#

Multi-Repo Strategy#

Terragrunt for DRY#

CI/CD Pipeline: GitOps with GitHub Actions#

Workflow#

Branch Protection (Enforced via Terraform)#

OIDC Authentication (No Long-Lived Keys)#

Jira Integration#

Results & Metrics#

Infrastructure Deployed (6 Months)#

Time Savings#

Team Velocity#

Security Posture#

Lessons Learned#

What Worked Well#

Challenges & Trade-offs#

What I’d Do Differently#

Next Steps (Q1-Q2 2026)#

Key Takeaways for Your Organization#

Start Simple#

Design for Ownership#

Enforce Standards Early#

Measure Impact#

Want to Build This for Your Team?#