Well-Architected Framework
The AWS Well-Architected Framework provides architectural best practices across six pillars. Use it to evaluate designs, identify risks, and build production systems that scale securely and cost-effectively. AWS offers free Well-Architected Reviews with a Solutions Architect for production workloads.
The Six Pillars link
Pillar
Focus
Key Question
Operational Excellence
Run and monitor systems
Can you deploy, respond to incidents, and improve?
Security
Protect data and systems
Is everything encrypted, least-privilege, and audited?
Reliability
Recover from failures
Does the system meet SLAs across AZ/region failures?
Performance Efficiency
Use resources efficiently
Right-sized compute, caching, and async processing?
Cost Optimization
Avoid unnecessary spend
Reserved capacity, right-sizing, lifecycle policies?
Sustainability
Minimize environmental impact
Efficient resources, serverless, Graviton instances?
Operational Excellence link Design Principles link
Operations as code — CloudFormation, CDK, Terraform for reproducible infrastructure
Automate changes — CI/CD pipelines, no manual console changes in production
Anticipate failure — Game days, chaos engineering, runbooks
Learn from events — Blameless postmortems, update runbooks
Checklist link
# Infrastructure as Code example (CloudFormation snippet)
Resources:
WebServerASG:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
MinSize: 2
MaxSize: 10
HealthCheckType: ELB
Tags:
- Key: Environment
Value: production
PropagateAtLaunch: true
Security Pillar link Defense in Depth link
Internet → WAF → ALB → Security Groups → Private Subnets → Encryption
↓
CloudTrail + GuardDuty + Config
Layer
AWS Service
Identity
IAM, SSO, MFA
Network
VPC, SG, NACL, WAF, Shield
Data at rest
KMS, S3 encryption, RDS encryption
Data in transit
TLS 1.3, ACM certificates
Detection
GuardDuty, Security Hub, Macie
Audit
CloudTrail, Config, Access Analyzer
Security Checklist link
Reliability Pillar link High Availability Patterns link
Pattern
Implementation
RTO/RPO
Multi-AZ
RDS Multi-AZ, ASG across AZs
Minutes / Zero
Multi-Region
Route 53 failover, S3 CRR
Minutes-Hours / Minutes
Backup & Restore
RDS snapshots, S3 versioning
Hours / Hours
Pilot Light
Minimal DR region, scale on failover
10-30 min / Minutes
Warm Standby
Reduced capacity in DR region
Minutes / Minutes
Active-Active
Full capacity in multiple regions
Near-zero / Near-zero
Reliability Checklist link
Right-Sizing and Selection link
Workload
Recommended Service
Static website
S3 + CloudFront
REST API (variable traffic)
Lambda + API Gateway
Containerized microservices
ECS Fargate or EKS
Long-running batch
EC2 Spot Instances
Real-time analytics
Kinesis + Lambda
Caching layer
ElastiCache (Redis)
# Use Compute Optimizer for right-sizing recommendations
aws compute-optimizer get-ec2-instance-recommendations \
--account-ids 123456789012
Cost Optimization link See the dedicated Cost Optimization page. Key principles:
Right-size — don’t over-provision; use Compute Optimizer
Reserved capacity — Savings Plans or Reserved Instances for steady workloads
Spot Instances — up to 90% savings for fault-tolerant workloads
Lifecycle policies — S3 IA/Glacier for infrequent data
Tag everything — cost allocation by team/project/environment
Sustainability link
Prefer Graviton (ARM) instances — better performance per watt
Use serverless (Lambda, Fargate) — no idle capacity
Apply S3 lifecycle policies — reduce stored data volume
Choose regions powered by renewable energy where possible
Right-size to avoid over-provisioned resources
Well-Architected Review Process link
Define workload scope (e.g., “Production e-commerce API”)
Answer questions for each pillar in the AWS WA Tool
Identify high-risk issues (HRIs) — must fix before production
Create improvement plan with prioritized remediation
Re-review after 6-12 months or major architecture changes
Pillar
Current State
HRI
Remediation
Security
No WAF on ALB
Yes
Attach AWS WAF with managed rules
Reliability
Single AZ RDS
Yes
Enable Multi-AZ
Cost
All On-Demand EC2
No
Purchase Compute Savings Plan
Operations
Manual deploys
Yes
Implement CodePipeline CI/CD
Performance
No CDN
No
Add CloudFront for static assets
Architecture Patterns Reference link
Pattern
Services
When to Use
Three-tier web
ALB + EC2/ECS + RDS
Traditional web apps
Serverless API
API GW + Lambda + DynamoDB
Variable traffic, event-driven
Event-driven
EventBridge + SQS + Lambda
Async workflows, decoupling
Data lake
S3 + Glue + Athena
Analytics on structured/unstructured data
Microservices
ECS/EKS + ALB + per-service DB
Independent team deployment
Common Architectural Mistakes link
Single AZ production — AZ failure = total outage
Monolith on oversized EC2 — can’t scale components independently
No caching layer — database becomes bottleneck
Synchronous everything — tight coupling causes cascading failures
Shared database across microservices — defeats service independence
No observability — can’t debug what you can’t see
Best Practices Summary link
Run a Well-Architected Review before every major launch
Automate everything — infrastructure, deployment, scaling, remediation
Design for failure — assume any component will fail
Apply least privilege at every layer
Measure with SLIs/SLOs — availability, latency, error rate
Document decisions — ADRs (Architecture Decision Records) for future teams
Review architecture quarterly as requirements evolve
Next: Cost Optimization .