How I Built a Multi-Account AWS Platform for an Enterprise Media Company
Project Duration: Q4 2025 - Q1 2026 Infrastructure Scale: 6 AWS accounts, 2 production EKS clusters, 50+ engineers Role: Staff Cloud Architect & Platform Lead
The Challenge
An enterprise media company was facing a classic scaling problem: multiple platform teams (advertising, data analytics, content hosting) were duplicating infrastructure across siloed AWS accounts. Each team built their own VPCs, security baselines, and CI/CD pipelines—resulting in:
- Inconsistent security controls across accounts
- Wasted engineering time rebuilding the same infrastructure
- No centralized networking (teams couldn’t share resources)
- Compliance gaps (CloudTrail, GuardDuty deployed inconsistently)
- Slow onboarding for new products (weeks to provision infra)
The company needed a shared platform foundation that would:
- Prevent infrastructure duplication (DRY principle)
- Maintain team ownership boundaries (no central bottleneck)
- Enforce security/compliance standards consistently
- Enable fast self-service infrastructure for app teams
I was hired to design and implement this “Enterprise Cloud Platform” (ECP).
The Solution: Multi-Account AWS Foundation
Design Philosophy: “4 M’s”
- Multiple Accounts: Billing isolation, blast radius containment
- Multiple Regions: us-east-1 primary, disaster recovery ready
- Multiple Environments: Dev, nonprod, prod separation
- Multiple Stacks: Modular Terraform/Terragrunt for reusability
Account Structure
We built 6 AWS accounts under a single AWS Organization:
┌─────────────────────────────────────────────┐
│ AWS Organization (Root) │
└─────────────┬───────────────────────────────┘
│
┌────────┴────────┬──────────────────┐
│ │ │
┌────▼────┐ ┌─────▼──────┐ ┌─────▼────────┐
│Management│ │ Security OU │ │Infrastructure│
│ Account │ └─────┬──────┘ │ OU │
└──────────┘ │ └──────┬───────┘
│ │
┌─────────▼──────┐ ┌──────▼────────┐
│ Security-Prod │ │ Infra-Prod │
│ (Audit Logs, │ │ (Transit GW, │
│ GuardDuty) │ │ Shared DNS) │
└────────────────┘ └───────────────┘
┌─────────────┐
│Workloads OU │
└──────┬──────┘
│
┌──────────────┼──────────────┐
│ │ │
┌────▼────┐ ┌─────▼──────┐ ┌────▼────┐
│Workloads│ │ Workloads │ │Workloads│
│ Dev │ │ NonProd │ │ Prod │
│(Sandbox)│ │ (Testing) │ │ (Apps) │
└─────────┘ └────────────┘ └─────────┘
Key Design Decisions:
- Management Account: Billing, AWS Organizations, cross-account IAM roles (no workloads)
- Security-Prod Account: Centralized CloudTrail logs, GuardDuty master, Security Hub
- Infrastructure-Prod Account: Shared services (Transit Gateway, private hosted zones)
- Workloads Accounts: Dev (sandbox), NonProd (testing), Prod (production apps)
Why this structure?
- Blast radius containment: Compromise in dev doesn’t affect prod
- Billing clarity: Each team sees their own costs
- Compliance: Audit logs centralized, tamper-proof
- Team autonomy: Platform teams own their workloads accounts
Network Architecture: Transit Gateway Hub-and-Spoke
The Problem
Teams needed to:
- Share services across accounts (private APIs, databases)
- Avoid VPC peering mesh (scales poorly beyond 3 accounts)
- Maintain network isolation between environments
The Solution: Transit Gateway
┌─────────────────────────────────┐
│ Infrastructure-Prod │
│ Transit Gateway (us-east-1) │
└────────┬────────────────────────┘
│
┌────────────┼────────────┐
│ │ │
┌───▼───┐ ┌────▼────┐ ┌───▼────┐
│NonProd│ │ Prod │ │Security│
│ VPC │ │ VPC │ │ VPC │
│10.1/16│ │ 10.2/16 │ │10.3/16 │
└───────┘ └─────────┘ └────────┘
Each VPC (per account):
- Public Subnets (2 AZs): ALB/NAT Gateways only
- Private Subnets (2 AZs): EKS nodes, application workloads
- Transit Gateway Attachment: Routes to other accounts via TGW
Benefits:
- ✅ Centralized routing (add new account = 1 TGW attachment)
- ✅ Network segmentation (route tables control inter-VPC traffic)
- ✅ No peering mesh complexity (scales to 100+ VPCs)
Security Baseline: Defense in Depth
Layer 1: Account-Level Controls
AWS Organizations + Service Control Policies (SCPs):
- Deny region access outside us-east-1 (prevent shadow IT)
- Deny S3 public access unless explicit exception
- Deny root user access (force IAM)
- Require MFA for privileged actions
IAM Identity Center (SSO):
- 8 active users with Okta integration
- 7 standard roles per account:
GitHubActionsRole(OIDC, no long-lived keys)AdminRole(AdministratorAccess)DeveloperRole(PowerUserAccess)DataEngineerRole(S3, Glue, Athena)NetworkAdminRole(VPC, TGW)SecurityAuditorRole(ReadOnly + Security)ReadOnlyRole(ViewOnlyAccess)
Layer 2: Network Security
AWS WAF (on ALB):
- Rate limiting: 2000 requests/5 minutes
- OWASP Core Rule Set (SQL injection, XSS blocking)
- Geo-blocking (optional per app team)
Security Groups (least privilege):
- ALB: Allow 80/443 from 0.0.0.0/0
- EKS Nodes: Allow 443 from ALB security group only
- No public SSH access (SSM Session Manager for debugging)
Layer 3: Monitoring & Detection
CloudTrail (all accounts → Security-Prod):
- API audit logs with 90-day retention
- S3 bucket with KMS encryption
- Immutable logs (prevent tampering)
GuardDuty (Security-Prod master):
- Threat detection across all accounts
- EKS protection enabled
- Findings routed to SNS → Slack (#aws-infra-alerts)
Security Hub:
- CIS AWS Foundations Benchmark
- AWS Foundational Security Best Practices
- Automated remediation for high-severity findings
Observability Stack (Opt-In):
- CloudWatch Alarms: Unauthorized API calls, IAM changes, root login
- EventBridge Rules: Real-time GuardDuty/Security Hub routing
- SNS Topics: Critical/High/Medium severity alerts
- Lambda Notifier: Slack integration
- Cost: ~$6-11/month per account
Infrastructure as Code: Terraform + Terragrunt
Multi-Repo Strategy
We chose multi-repo over monorepo for clear ownership:
| Repo | Purpose | Owner |
|---|---|---|
ecp-ou-structure |
AWS Org, IAM roles, SCPs | Infrastructure Team |
ecp-network |
VPCs, Transit Gateway, NAT | Infrastructure Team |
ecp-security |
CloudTrail, GuardDuty, WAF | Infrastructure Team |
github-branch-protection |
PR rules enforcement | Infrastructure Team |
tf-live-aws-ad-stack |
AdStack EKS, ALB, ECR | AdStack Team |
tf-live-aws-data-delivery |
MWAA, Glue, Athena | Data Team |
Why multi-repo?
- ✅ Clear ownership (Security team owns ecp-security, not shared monorepo)
- ✅ Team autonomy (AdStack can deploy without Infrastructure approval)
- ✅ No merge conflicts (teams work independently)
- ✅ Aligns with org structure (matches Jira projects, Slack channels)
Trade-off: Version management overhead (Terraform/Terragrunt updates require coordination across repos)
Mitigation: Renovate bot (future) for automated dependency updates
Terragrunt for DRY
Problem: Terraform requires duplicating backend config, provider config, and variable declarations across every stack.
Solution: Terragrunt wrapper with shared root.hcl:
# root.hcl (shared across all stacks)
remote_state {
backend = "s3"
config = {
bucket = "terraform-state-${get_aws_account_id()}"
key = "${path_relative_to_include()}/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
generate "provider" {
path = "provider.tf"
if_exists = "overwrite_terragrunt"
contents = <<EOF
provider "aws" {
region = "us-east-1"
default_tags {
tags = {
ManagedBy = "Terraform"
Environment = "${local.environment}"
}
}
}
EOF
}
Result: Each stack is 10-20 lines instead of 100+ (90% reduction in boilerplate)
CI/CD Pipeline: GitOps with GitHub Actions
Workflow
Developer → Feature Branch → PR → terraform-plan → Code Review
↓
Code Owner Approval
↓
Merge to main
↓
terraform-apply → AWS
↓
Slack Notification + Jira Update
Branch Protection (Enforced via Terraform)
All infrastructure repos require:
- ✅ 1 approval from designated reviewer
- ✅ Code owner review (per CODEOWNERS file)
- ✅
terraform-planworkflow must pass - ❌ No direct pushes to main
- ❌ No admin bypass
- ❌ No force pushes
Why Terraform for branch protection?
- Codified rules (no manual GitHub UI clicks)
- Consistent across all repos
- Auditable (changes tracked in Git)
OIDC Authentication (No Long-Lived Keys)
Problem: Traditional approach uses AWS access keys stored in GitHub Secrets (security risk if leaked).
Solution: GitHub Actions OIDC provider + IAM role trust policy
# GitHubActionsRole in each account
data "aws_iam_policy_document" "github_oidc" {
statement {
effect = "Allow"
actions = ["sts:AssumeRoleWithWebIdentity"]
principals {
type = "Federated"
identifiers = ["arn:aws:iam::ACCOUNT_ID:oidc-provider/token.actions.githubusercontent.com"]
}
condition {
test = "StringEquals"
variable = "token.actions.githubusercontent.com:sub"
values = ["repo:company/ecp-network:ref:refs/heads/main"]
}
}
}
Benefits:
- ✅ No keys to rotate or leak
- ✅ Scoped per repo and branch
- ✅ Automatic expiration (temporary credentials)
Jira Integration
Commit Message Requirements:
- NonProd deploys:
IN-123(Infrastructure Jira ticket) - Prod deploys:
CHANGE-456(Change record for audit)
GitHub Actions:
- Extract Jira ticket from commit message
- Post comment to Jira: “Deployed to NonProd - PR #89”
- Update ticket status: In Progress → Deployed
Results & Metrics
Infrastructure Deployed (6 Months)
- 6 AWS accounts managed via Organizations
- 2 VPCs (nonprod, prod) with multi-AZ design
- 7 IAM roles per account (standardized access)
- 1 EKS cluster (nonprod) running production-ready workloads
- 5 foundation repos enforcing DRY principles
Time Savings
- New product infrastructure: 3 weeks → 2 days (93% faster)
- Security baseline deployment: 2 days → 1 hour (automated via modules)
- Cross-account network setup: 1 week → 30 minutes (Transit Gateway)
Team Velocity
- AdStack team: Deployed nonprod EKS cluster in 3 days (vs 3 weeks previously)
- Data team: Self-service infrastructure with ecp-network modules (no infra team dependency)
Security Posture
- 100% coverage: CloudTrail, GuardDuty, Security Hub across all accounts
- Zero manual console changes: All infrastructure via Terraform (auditable)
- Automated threat detection: GuardDuty findings routed to Slack in real-time
Lessons Learned
What Worked Well
- Multi-repo for ownership clarity: Teams felt ownership of their repos (vs shared monorepo)
- Terragrunt for DRY: Reduced config duplication by 90%
- Branch protection as code: Prevented accidental merges, enforced review process
- OIDC for GitHub Actions: No key rotation, better security posture
Challenges & Trade-offs
- Version management: Coordinating Terraform updates across 6 repos (future: Renovate bot)
- Initial learning curve: Teams unfamiliar with Terragrunt required onboarding
- Transit Gateway cost: $0.02/GB cross-account (vs VPC peering free) - worth it for operational simplicity
What I’d Do Differently
- Start with monorepo: Build initial MVP in monorepo, split into multi-repo after team ownership stabilizes
- Renovate from day 1: Automate dependency updates instead of manual coordination
- Cost visibility dashboards: Deploy AWS Cost Explorer dashboards earlier (teams didn’t see costs until Month 3)
Next Steps (Q1-Q2 2026)
- Observability baseline: Deploy CloudWatch dashboards + Slack alerting to all accounts
- Production EKS: Promote nonprod cluster architecture to prod account
- Data platform support: Assist Data Science team with tf-live-aws-data-delivery
- Automated dependency updates: Implement Renovate bot for cross-repo Terraform version management
Key Takeaways for Your Organization
If you’re building a multi-account AWS platform, here’s what I’d recommend:
Start Simple
- Don’t over-architect: Start with 3 accounts (Management, NonProd, Prod), expand later
- Avoid premature optimization: Get one environment working before replicating
Design for Ownership
- Multi-repo = clear ownership: If teams are siloed, multi-repo prevents merge conflicts
- Monorepo = faster refactors: If teams collaborate closely, monorepo enables global changes
Enforce Standards Early
- Branch protection from day 1: Prevent accidental merges before they happen
- OIDC over access keys: More secure, easier to audit
- Terraform for everything: Avoid manual console changes (impossible to track/replicate)
Measure Impact
- Track time-to-infrastructure: How long does it take to provision a new environment?
- Security coverage: What % of accounts have CloudTrail, GuardDuty, Security Hub?
- Team velocity: How fast can app teams deploy without infrastructure team dependency?
Want to Build This for Your Team?
I help companies design and implement multi-account AWS platforms like this one.
What I can do for you:
- Architecture review of your existing AWS setup
- Design multi-account strategy tailored to your org structure
- Implement production-ready EKS clusters with security baseline
- Establish IaC best practices (Terraform/Terragrunt modules)
- Set up CI/CD pipelines with branch protection and automated testing
Book a consultation call to discuss your project.
About the Author
Glenn Gray is a Staff Cloud Architect with 12+ years building enterprise AWS platforms. He specializes in multi-account architectures, production EKS, and infrastructure as code. Learn more at graycloudarch.com.
Want to learn more? Check out my course: Building an Enterprise Cloud Platform (coming Feb 2026).