Advanced Networking
Basic VPC design covers subnets and security groups. Production environments require advanced networking: connecting multiple VPCs, hybrid cloud connectivity, private service access, and intelligent DNS routing. This guide covers the patterns AWS architects use at scale.
Network Architecture Evolution
Stage 1: Single VPC → Dev/test
Stage 2: Multi-VPC Peering → 2-4 VPCs (mesh complexity grows)
Stage 3: Transit Gateway → Hub-and-spoke, 5+ VPCs
Stage 4: Multi-Account + TGW → Enterprise, team isolation
Stage 5: Hybrid Cloud → Direct Connect + VPN to on-premises
AWS Transit Gateway
Central hub connecting VPCs, VPNs, and Direct Connect:
# Create Transit Gateway
aws ec2 create-transit-gateway \
--description "Production hub" \
--options DefaultRouteTableAssociation=enable,DefaultRouteTablePropagation=enable
# Attach VPC
aws ec2 create-transit-gateway-vpc-attachment \
--transit-gateway-id tgw-xxx \
--vpc-id vpc-production \
--subnet-ids subnet-private-1a subnet-private-1b
# Attach another VPC (shared services)
aws ec2 create-transit-gateway-vpc-attachment \
--transit-gateway-id tgw-xxx \
--vpc-id vpc-shared-services \
--subnet-ids subnet-private-1a subnet-private-1b
# Route table: production VPC → shared services
aws ec2 create-transit-gateway-route \
--transit-gateway-route-table-id tgw-rtb-xxx \
--destination-cidr-block 10.1.0.0/16 \
--transit-gateway-attachment-id tgw-attach-shared
Transit Gateway vs VPC Peering
| Feature | VPC Peering | Transit Gateway |
|---|---|---|
| Topology | Full mesh | Hub-and-spoke |
| Transitive routing | No | Yes |
| Max connections | Hundreds (manageable to ~10) | Thousands |
| Cost | Free (data transfer only) | $0.05/hour + $0.02/GB |
| Cross-region | Separate peering per pair | Inter-region peering |
| Best for | 2-3 VPCs | 5+ VPCs, enterprise |
Site-to-Site VPN
Connect on-premises data center to AWS:
# Create Virtual Private Gateway
aws ec2 create-vpn-gateway --type ipsec.1
aws ec2 attach-vpn-gateway --vpn-gateway-id vgw-xxx --vpc-id vpc-production
# Create Customer Gateway (your on-premises VPN device)
aws ec2 create-customer-gateway \
--type ipsec.1 \
--public-ip 203.0.113.10 \
--bgp-asn 65000
# Create VPN Connection
aws ec2 create-vpn-connection \
--type ipsec.1 \
--customer-gateway-id cgw-xxx \
--vpn-gateway-id vgw-xxx \
--options StaticRoutesOnly=true
# Download configuration for your VPN device
aws ec2 describe-vpn-connections --vpn-connection-ids vpn-xxx
Best practice: Run two VPN tunnels for redundancy. VPN throughput maxes at ~1.25 Gbps — use Direct Connect for higher bandwidth.
AWS Direct Connect
Dedicated network connection from on-premises to AWS (1 Gbps to 100 Gbps):
| Aspect | VPN | Direct Connect |
|---|---|---|
| Bandwidth | Up to ~1.25 Gbps | 1 Gbps – 100 Gbps |
| Latency | Variable (internet) | Consistent, lower |
| Cost | Low (VPN endpoint only) | Port hours + data transfer |
| Setup time | Hours | Weeks (physical install) |
| Encryption | IPsec | MACsec or VPN over DX |
Use Direct Connect for steady high-bandwidth workloads (database replication, bulk data transfer). Combine with VPN as backup.
AWS PrivateLink
Access AWS services and third-party services without traversing the public internet:
# Create VPC Endpoint Service (provider side)
aws ec2 create-vpc-endpoint-service-configuration \
--network-load-balancer-arns arn:aws:elasticloadbalancing:us-east-1:123:loadbalancer/net/my-nlb/xxx \
--acceptance-required true
# Create Interface Endpoint (consumer side)
aws ec2 create-vpc-endpoint \
--vpc-id vpc-consumer \
--service-name com.amazonaws.vpce.us-east-1.vpce-svc-xxx \
--vpc-endpoint-type Interface \
--subnet-ids subnet-private-1a subnet-private-1b \
--security-group-ids sg-endpoints
PrivateLink Use Cases
| Scenario | Benefit |
|---|---|
| SaaS provider exposes API to customers | No public internet exposure |
| Cross-account service access | No VPC peering needed |
| Access AWS services from on-premises | Over Direct Connect, not internet |
| Compliance (PCI, HIPAA) | Traffic never leaves AWS network |
Route 53 Advanced Routing
Routing Policies
| Policy | Behavior | Use Case |
|---|---|---|
| Simple | Single resource | Basic DNS |
| Weighted | Split traffic by weight | A/B testing, gradual migration |
| Latency | Route to lowest-latency region | Global applications |
| Failover | Active-passive DR | Disaster recovery |
| Geolocation | Route by user location | Content localization |
| Geoproximity | Route by geographic bias | Shift traffic between regions |
| Multi-value | Return multiple healthy records | Simple load distribution |
# Weighted routing: 90% to current, 10% to new version
aws route53 change-resource-record-sets \
--hosted-zone-id Z1234567890 \
--change-batch '{
"Changes": [
{
"Action": "UPSERT",
"ResourceRecordSet": {
"Name": "api.example.com",
"Type": "A",
"SetIdentifier": "current",
"Weight": 90,
"AliasTarget": {
"HostedZoneId": "Z35SXDOTRQ7X7K",
"DNSName": "current-alb-xxx.us-east-1.elb.amazonaws.com",
"EvaluateTargetHealth": true
}
}
},
{
"Action": "UPSERT",
"ResourceRecordSet": {
"Name": "api.example.com",
"Type": "A",
"SetIdentifier": "canary",
"Weight": 10,
"AliasTarget": {
"HostedZoneId": "Z35SXDOTRQ7X7K",
"DNSName": "canary-alb-xxx.us-east-1.elb.amazonaws.com",
"EvaluateTargetHealth": true
}
}
}
]
}'
Health Checks for Failover
aws route53 create-health-check \
--health-check-config '{
"IPAddress": "203.0.113.50",
"Port": 443,
"Type": "HTTPS",
"ResourcePath": "/health",
"RequestInterval": 30,
"FailureThreshold": 3
}'
AWS Network Firewall
Managed firewall for VPC traffic inspection:
aws network-firewall create-firewall \
--firewall-name production-firewall \
--vpc-id vpc-production \
--subnet-mappings SubnetId=subnet-firewall-1a \
--firewall-policy-arn arn:aws:network-firewall:us-east-1:123:firewall-policy/prod-policy
Deploy in dedicated firewall subnets. Route traffic: IGW → Firewall → Protected subnets.
Multi-Account Network Architecture
AWS Organization
│
┌──────────┼──────────┐
│ │ │
Account: Account: Account:
Network Production Development
(TGW hub) (workloads) (workloads)
│ │ │
└──── Transit Gateway ────┘
│
Direct Connect
│
On-Premises DC
| Account | Purpose | Network |
|---|---|---|
| Network | TGW, Direct Connect, shared VPN | Hub |
| Production | Application workloads | Spoke via TGW |
| Development | Dev/test workloads | Spoke via TGW |
| Shared Services | DNS, logging, CI/CD | Spoke via TGW |
Use AWS RAM (Resource Access Manager) to share TGW across accounts.
Real-World Scenario: Global E-Commerce Platform
| Component | Configuration |
|---|---|
| Route 53 | Latency-based routing to us-east-1, eu-west-1, ap-southeast-1 |
| CloudFront | Global CDN for static assets |
| ALB per region | Health-checked by Route 53 |
| RDS per region | Read replicas; cross-region backup copy |
| TGW | Connects 8 VPCs across 3 accounts |
| Direct Connect | 10 Gbps to on-premises ERP system |
| PrivateLink | Internal payment service accessed by all VPCs |
Common Mistakes
- VPC peering mesh with 10+ VPCs — use Transit Gateway instead
- Single VPN tunnel — always configure redundant tunnels
- Overlapping CIDR blocks — plan IP addressing before multi-VPC design
- No VPC Flow Logs on TGW attachments — blind to cross-VPC traffic
- Public endpoints for internal services — use PrivateLink
- Ignoring DNS resolution across peering — enable DNS hostnames on both VPCs
Troubleshooting
| Issue | Check | Fix |
|---|---|---|
| Cross-VPC traffic blocked | TGW route tables, VPC route tables, SGs | Verify routes propagate; check SG allows peer CIDR |
| VPN tunnel down | VPN connection status, on-premises device | Check IKE/IPsec config; verify CGW IP hasn’t changed |
| PrivateLink connection pending | Endpoint service acceptance | Accept connection request on provider side |
| Route 53 failover not working | Health check status | Verify health check endpoint returns 200 |
| High Direct Connect costs | Data transfer direction | Use DX for ingress; CloudFront for egress |
Best Practices
- Use Transit Gateway as the hub for 5+ VPCs
- Plan non-overlapping CIDR blocks (/16 per VPC) before deployment
- Implement defense in depth: NACL → SG → Network Firewall → WAF
- Use PrivateLink for service-to-service communication
- Configure Route 53 health checks with failover for DR
- Enable VPC Flow Logs on all VPCs and TGW attachments
- Document network topology with diagrams updated quarterly
- Test VPN failover and Direct Connect failover regularly
Next: Disaster Recovery.