Basic VPC design covers subnets and security groups. Production environments require advanced networking: connecting multiple VPCs, hybrid cloud connectivity, private service access, and intelligent DNS routing. This guide covers the patterns AWS architects use at scale.

Network Architecture Evolution

  Stage 1: Single VPC          → Dev/test
Stage 2: Multi-VPC Peering   → 2-4 VPCs (mesh complexity grows)
Stage 3: Transit Gateway     → Hub-and-spoke, 5+ VPCs
Stage 4: Multi-Account + TGW  → Enterprise, team isolation
Stage 5: Hybrid Cloud        → Direct Connect + VPN to on-premises
  

AWS Transit Gateway

Central hub connecting VPCs, VPNs, and Direct Connect:

  # Create Transit Gateway
aws ec2 create-transit-gateway \
    --description "Production hub" \
    --options DefaultRouteTableAssociation=enable,DefaultRouteTablePropagation=enable

# Attach VPC
aws ec2 create-transit-gateway-vpc-attachment \
    --transit-gateway-id tgw-xxx \
    --vpc-id vpc-production \
    --subnet-ids subnet-private-1a subnet-private-1b

# Attach another VPC (shared services)
aws ec2 create-transit-gateway-vpc-attachment \
    --transit-gateway-id tgw-xxx \
    --vpc-id vpc-shared-services \
    --subnet-ids subnet-private-1a subnet-private-1b

# Route table: production VPC → shared services
aws ec2 create-transit-gateway-route \
    --transit-gateway-route-table-id tgw-rtb-xxx \
    --destination-cidr-block 10.1.0.0/16 \
    --transit-gateway-attachment-id tgw-attach-shared
  

Transit Gateway vs VPC Peering

Feature VPC Peering Transit Gateway
Topology Full mesh Hub-and-spoke
Transitive routing No Yes
Max connections Hundreds (manageable to ~10) Thousands
Cost Free (data transfer only) $0.05/hour + $0.02/GB
Cross-region Separate peering per pair Inter-region peering
Best for 2-3 VPCs 5+ VPCs, enterprise

Site-to-Site VPN

Connect on-premises data center to AWS:

  # Create Virtual Private Gateway
aws ec2 create-vpn-gateway --type ipsec.1
aws ec2 attach-vpn-gateway --vpn-gateway-id vgw-xxx --vpc-id vpc-production

# Create Customer Gateway (your on-premises VPN device)
aws ec2 create-customer-gateway \
    --type ipsec.1 \
    --public-ip 203.0.113.10 \
    --bgp-asn 65000

# Create VPN Connection
aws ec2 create-vpn-connection \
    --type ipsec.1 \
    --customer-gateway-id cgw-xxx \
    --vpn-gateway-id vgw-xxx \
    --options StaticRoutesOnly=true

# Download configuration for your VPN device
aws ec2 describe-vpn-connections --vpn-connection-ids vpn-xxx
  

Best practice: Run two VPN tunnels for redundancy. VPN throughput maxes at ~1.25 Gbps — use Direct Connect for higher bandwidth.

AWS Direct Connect

Dedicated network connection from on-premises to AWS (1 Gbps to 100 Gbps):

Aspect VPN Direct Connect
Bandwidth Up to ~1.25 Gbps 1 Gbps – 100 Gbps
Latency Variable (internet) Consistent, lower
Cost Low (VPN endpoint only) Port hours + data transfer
Setup time Hours Weeks (physical install)
Encryption IPsec MACsec or VPN over DX

Use Direct Connect for steady high-bandwidth workloads (database replication, bulk data transfer). Combine with VPN as backup.

Access AWS services and third-party services without traversing the public internet:

  # Create VPC Endpoint Service (provider side)
aws ec2 create-vpc-endpoint-service-configuration \
    --network-load-balancer-arns arn:aws:elasticloadbalancing:us-east-1:123:loadbalancer/net/my-nlb/xxx \
    --acceptance-required true

# Create Interface Endpoint (consumer side)
aws ec2 create-vpc-endpoint \
    --vpc-id vpc-consumer \
    --service-name com.amazonaws.vpce.us-east-1.vpce-svc-xxx \
    --vpc-endpoint-type Interface \
    --subnet-ids subnet-private-1a subnet-private-1b \
    --security-group-ids sg-endpoints
  
Scenario Benefit
SaaS provider exposes API to customers No public internet exposure
Cross-account service access No VPC peering needed
Access AWS services from on-premises Over Direct Connect, not internet
Compliance (PCI, HIPAA) Traffic never leaves AWS network

Route 53 Advanced Routing

Routing Policies

Policy Behavior Use Case
Simple Single resource Basic DNS
Weighted Split traffic by weight A/B testing, gradual migration
Latency Route to lowest-latency region Global applications
Failover Active-passive DR Disaster recovery
Geolocation Route by user location Content localization
Geoproximity Route by geographic bias Shift traffic between regions
Multi-value Return multiple healthy records Simple load distribution
  # Weighted routing: 90% to current, 10% to new version
aws route53 change-resource-record-sets \
    --hosted-zone-id Z1234567890 \
    --change-batch '{
        "Changes": [
            {
                "Action": "UPSERT",
                "ResourceRecordSet": {
                    "Name": "api.example.com",
                    "Type": "A",
                    "SetIdentifier": "current",
                    "Weight": 90,
                    "AliasTarget": {
                        "HostedZoneId": "Z35SXDOTRQ7X7K",
                        "DNSName": "current-alb-xxx.us-east-1.elb.amazonaws.com",
                        "EvaluateTargetHealth": true
                    }
                }
            },
            {
                "Action": "UPSERT",
                "ResourceRecordSet": {
                    "Name": "api.example.com",
                    "Type": "A",
                    "SetIdentifier": "canary",
                    "Weight": 10,
                    "AliasTarget": {
                        "HostedZoneId": "Z35SXDOTRQ7X7K",
                        "DNSName": "canary-alb-xxx.us-east-1.elb.amazonaws.com",
                        "EvaluateTargetHealth": true
                    }
                }
            }
        ]
    }'
  

Health Checks for Failover

  aws route53 create-health-check \
    --health-check-config '{
        "IPAddress": "203.0.113.50",
        "Port": 443,
        "Type": "HTTPS",
        "ResourcePath": "/health",
        "RequestInterval": 30,
        "FailureThreshold": 3
    }'
  

AWS Network Firewall

Managed firewall for VPC traffic inspection:

  aws network-firewall create-firewall \
    --firewall-name production-firewall \
    --vpc-id vpc-production \
    --subnet-mappings SubnetId=subnet-firewall-1a \
    --firewall-policy-arn arn:aws:network-firewall:us-east-1:123:firewall-policy/prod-policy
  

Deploy in dedicated firewall subnets. Route traffic: IGW → Firewall → Protected subnets.

Multi-Account Network Architecture

                      AWS Organization
                         │
              ┌──────────┼──────────┐
              │          │          │
         Account:    Account:    Account:
         Network     Production  Development
         (TGW hub)   (workloads) (workloads)
              │          │          │
              └──── Transit Gateway ────┘
                         │
                    Direct Connect
                         │
                   On-Premises DC
  
Account Purpose Network
Network TGW, Direct Connect, shared VPN Hub
Production Application workloads Spoke via TGW
Development Dev/test workloads Spoke via TGW
Shared Services DNS, logging, CI/CD Spoke via TGW

Use AWS RAM (Resource Access Manager) to share TGW across accounts.

Real-World Scenario: Global E-Commerce Platform

Component Configuration
Route 53 Latency-based routing to us-east-1, eu-west-1, ap-southeast-1
CloudFront Global CDN for static assets
ALB per region Health-checked by Route 53
RDS per region Read replicas; cross-region backup copy
TGW Connects 8 VPCs across 3 accounts
Direct Connect 10 Gbps to on-premises ERP system
PrivateLink Internal payment service accessed by all VPCs

Common Mistakes

  1. VPC peering mesh with 10+ VPCs — use Transit Gateway instead
  2. Single VPN tunnel — always configure redundant tunnels
  3. Overlapping CIDR blocks — plan IP addressing before multi-VPC design
  4. No VPC Flow Logs on TGW attachments — blind to cross-VPC traffic
  5. Public endpoints for internal services — use PrivateLink
  6. Ignoring DNS resolution across peering — enable DNS hostnames on both VPCs

Troubleshooting

Issue Check Fix
Cross-VPC traffic blocked TGW route tables, VPC route tables, SGs Verify routes propagate; check SG allows peer CIDR
VPN tunnel down VPN connection status, on-premises device Check IKE/IPsec config; verify CGW IP hasn’t changed
PrivateLink connection pending Endpoint service acceptance Accept connection request on provider side
Route 53 failover not working Health check status Verify health check endpoint returns 200
High Direct Connect costs Data transfer direction Use DX for ingress; CloudFront for egress

Best Practices

  • Use Transit Gateway as the hub for 5+ VPCs
  • Plan non-overlapping CIDR blocks (/16 per VPC) before deployment
  • Implement defense in depth: NACL → SG → Network Firewall → WAF
  • Use PrivateLink for service-to-service communication
  • Configure Route 53 health checks with failover for DR
  • Enable VPC Flow Logs on all VPCs and TGW attachments
  • Document network topology with diagrams updated quarterly
  • Test VPN failover and Direct Connect failover regularly

Next: Disaster Recovery.