to navigate

to select

to close

On this page

Azure Well-Architected Framework

The Azure Well-Architected Framework provides design principles and best practices for building secure, reliable, and efficient cloud workloads. Microsoft organizes guidance around five pillars. Applying these pillars systematically produces systems that are resilient, secure, cost-effective, and maintainable at scale.

The Five Pillars

Pillar	Goal	Key Practices
Reliability	Recover from failures, meet SLAs	Redundancy, health checks, DR plans
Security	Protect data and systems	Identity, encryption, threat protection
Cost Optimization	Maximize value, minimize spend	Right-sizing, reserved capacity, tagging
Operational Excellence	Run and improve systems	Automation, observability, IaC
Performance Efficiency	Scale to meet demand	Auto-scale, caching, async patterns

Each pillar includes a set of design principles and review questions you should answer for every production workload.

Reliability Patterns

Deploy across Availability Zones for zone-level redundancy (App Service, VMs, SQL, Storage)
Use paired regions for geo-disaster recovery — each Azure region has a designated pair
Implement health probes on load balancers, Front Door, and App Service
Design for graceful degradation when dependencies fail (circuit breakers, fallbacks)
Test recovery with chaos engineering (Azure Chaos Studio) and documented DR drills

Example multi-region architecture:

  Primary Region (East US)
  ├── App Service (active, 3 instances)
  ├── Azure SQL (primary, zone-redundant)
  ├── Traffic Manager / Front Door (priority routing)
  └── Storage (GZRS)

Secondary Region (West US)
  ├── App Service (standby, 1 instance — scale on failover)
  ├── Azure SQL (geo-replica in failover group)
  └── Storage (RA-GZRS read access)

  # Verify zone support in a region
az account list-locations --query "[?name=='eastus'].availabilityZoneMappings" -o json

# Create zone-redundant App Service plan
az appservice plan create \
  --name plan-webapp-prod-zr \
  --resource-group rg-webapp-prod \
  --location eastus \
  --sku P1v3 \
  --zone-redundant true

Security Layers

Identity: Entra ID, MFA, conditional access, Managed Identities — no long-lived credentials
Network: NSGs, private endpoints, Azure Firewall, DDoS Protection Standard
Data: Encryption at rest (SSE/CMK) and in transit (TLS 1.2+), Key Vault for secrets
Application: WAF on Application Gateway or Front Door, secure coding, dependency scanning
Governance: Azure Policy, Defender for Cloud, Activity Log audit, RBAC least privilege

  # Enable Defender for Cloud on subscription
az security pricing create \
  --name VirtualMachines \
  --tier Standard

# Assign built-in policy: require HTTPS on storage
az policy assignment create \
  --name require-https-storage \
  --policy /providers/Microsoft.Authorization/policyDefinitions/404c3081-a854-4457-ae30-26a93ef643f9 \
  --scope /subscriptions/SUB_ID

Cost Optimization

  # Tag resources for cost allocation
az resource tag \
  --tags environment=prod cost-center=engineering project=web-app \
  --ids /subscriptions/SUB_ID/resourceGroups/rg-webapp-prod

# Review advisor cost recommendations
az advisor recommendation list --category Cost --query "[].{Name:shortDescription.problem, Impact:impact}" -o table

Cost strategies:

Use Reserved Instances and Savings Plans for predictable baseline compute
Right-size VMs with Azure Advisor — many workloads run at < 20% CPU
Apply auto-shutdown for dev/test VMs and use Azure DevTest Labs
Choose appropriate storage tiers (Cool/Archive) and redundancy (LRS for dev)
Delete orphaned resources: unattached disks, unused IPs, old snapshots

Operational Excellence

Deploy with Bicep or Terraform — no manual Portal changes in production
Use Azure DevOps or GitHub Actions for CI/CD with environment gates
Centralize logs in Log Analytics with structured KQL queries and workbooks
Document runbooks for common operational tasks (failover, scaling, certificate rotation)
Conduct post-incident reviews (PIRs) and track action items to completion
Implement Infrastructure as Code reviews in pull requests

  # Deploy Bicep template with what-if preview
az deployment group what-if \
  --resource-group rg-webapp-prod \
  --template-file main.bicep \
  --parameters @parameters.prod.json

az deployment group create \
  --resource-group rg-webapp-prod \
  --template-file main.bicep \
  --parameters @parameters.prod.json

Performance Efficiency

Pattern	Azure Service	Benefit
Caching	Azure Cache for Redis	Reduce database load, lower latency
CDN	Azure Front Door / CDN	Edge delivery of static assets
Async processing	Service Bus, Functions	Decouple heavy work from request path
Auto-scale	App Service, AKS HPA, VMSS	Match capacity to demand
Read replicas	Azure SQL geo-replicas	Offload read traffic

Real-World Scenario: E-Commerce Platform Review

Pillar	Assessment	Action Items
Reliability	Single-region App Service	Add geo-replica SQL + Front Door failover
Security	Public SQL endpoint	Migrate to private endpoint + Managed Identity
Cost	Over-provisioned D8s_v5 VMs	Downsize to D4s_v5; purchase 1-year RI
Operations	Manual deployments	Implement Bicep + GitHub Actions pipeline
Performance	No caching layer	Add Redis for session and product catalog

Pillar Trade-offs

Decision	Improves	May Impact
Multi-region deployment	Reliability	Cost, complexity
Private endpoints everywhere	Security	Operational complexity, DNS management
Reserved capacity	Cost	Flexibility
Comprehensive monitoring	Operations	Log ingestion costs
Premium SKUs	Performance	Cost

Common Mistakes

Optimizing one pillar in isolation — cheaper but unreliable is not a win
No architecture review before launch — technical debt accumulates fast
Ignoring the shared responsibility model — Azure secures the platform; you secure your data and access
Copying on-premises architecture 1:1 — cloud-native patterns reduce cost and improve resilience
Skipping DR testing — backups exist but restore procedures are untested
No tagging or governance — cost and security sprawl across subscriptions

Troubleshooting Design Issues

Symptom	Likely Pillar	Investigation
Frequent outages	Reliability	Check redundancy, health probes, dependency chains
Security audit failures	Security	Review RBAC, public endpoints, encryption settings
Budget overruns	Cost	Cost analysis by tag; Advisor recommendations
Slow incident response	Operations	Verify monitoring coverage, runbook availability
High latency under load	Performance	Profile bottlenecks; check scaling rules and caching

Best Practices

Run the Azure Well-Architected Review assessment for each major workload
Revisit architecture quarterly or when requirements change significantly
Document architecture decision records (ADRs) for significant design choices
Use Azure Architecture Center reference architectures as starting points
Balance pillars based on business priorities — not every workload needs multi-region
Include FinOps, security, and ops stakeholders in architecture reviews

Next: Cost Management.

Azure Kubernetes Service

Cost Management

Azure Well-Architected Framework

The Five Pillars link

Reliability Patterns link

Security Layers link

Cost Optimization link

Operational Excellence link

Performance Efficiency link

Real-World Scenario: E-Commerce Platform Review link

Pillar Trade-offs link

Common Mistakes link

Troubleshooting Design Issues link

Best Practices link