Cost Management
Cloud costs can grow quickly without governance. Azure provides tools and practices to forecast, monitor, and optimize spending across subscriptions, management groups, and teams. Cost optimization is an ongoing discipline — not a one-time cleanup.
Cost Management Tools
| Tool | Purpose |
|---|---|
| Cost Management + Billing | Analyze spend, export data, create budgets |
| Azure Pricing Calculator | Estimate costs before deployment |
| Azure Advisor | Right-sizing and reserved instance recommendations |
| Tags | Allocate costs to teams, projects, environments |
| Azure Policy | Enforce tagging and SKU restrictions |
| FinOps Hub (preview) | Executive dashboards and optimization workflows |
Analyze Current Spend
# View cost by resource group (requires Cost Management API access)
az consumption usage list \
--start-date 2025-01-01 \
--end-date 2025-01-31 \
--query "[].{Resource:instanceName, Cost:pretaxCost, Meter:meterName}" \
--output table
# Export cost data to storage account (daily CSV)
az costmanagement export create \
--name daily-cost-export \
--scope "subscriptions/SUB_ID" \
--storage-account-id /subscriptions/SUB_ID/resourceGroups/rg-ops/providers/Microsoft.Storage/storageAccounts/stcostexport \
--storage-container cost-exports \
--timeframe MonthToDate \
--recurrence Daily \
--recurrence-period from=2025-01-01 to=2026-12-31
In the Portal: Cost Management + Billing → Cost analysis → group by resource group, service, tag, or meter. Save views for recurring reviews.
Tagging Strategy
Consistent tags enable cost allocation and chargeback:
| Tag | Example Values | Purpose |
|---|---|---|
| environment | dev, staging, prod | Separate environment costs |
| cost-center | engineering, marketing | Department billing |
| project | web-app, data-pipeline | Project-level tracking |
| owner | team-platform | Accountability |
| auto-shutdown | true/false | Identify schedulable resources |
Enforce tags with Azure Policy:
# Assign built-in policy: require 'environment' tag
az policy assignment create \
--name require-environment-tag \
--policy /providers/Microsoft.Authorization/policyDefinitions/871b6d14-10aa-478d-b590-945faa0f723c \
--scope /subscriptions/SUB_ID \
--params '{"tagName": {"value": "environment"}}'
Custom deny policy for untagged resources:
{
"if": {
"field": "tags['environment']",
"exists": false
},
"then": {
"effect": "deny"
}
}
Budgets and Alerts
# Create monthly budget with email alerts at 80% and 100%
az consumption budget create \
--budget-name monthly-prod-budget \
--amount 10000 \
--category cost \
--time-grain Monthly \
--start-date 2025-01-01 \
--end-date 2026-12-31 \
--resource-group rg-webapp-prod
# Filter budget to a specific resource group
az consumption budget create \
--budget-name rg-dev-monthly \
--amount 500 \
--category cost \
--time-grain Monthly \
--start-date 2025-01-01 \
--end-date 2026-12-31 \
--resource-group rg-learning-dev
Set alerts at 50%, 80%, and 100% of budget to catch overspend early. Assign budget owners who can act on alerts.
Reserved Instances and Savings Plans
| Option | Commitment | Flexibility | Typical Discount |
|---|---|---|---|
| Reserved VM Instances | 1 or 3 years, specific VM size/region | Low (exchangeable) | Up to 72% |
| Savings Plans (Compute) | Hourly $ commitment | Higher (any VM family/region) | Up to 65% |
| Azure Hybrid Benefit | Existing Windows/SQL licenses | License reuse on VMs/SQL | Up to 40% additional |
| Reserved Capacity (SQL/Redis) | 1 or 3 years | Service-specific | Up to 65% |
# Check reservation recommendations
az advisor recommendation list \
--category Cost \
--query "[?contains(shortDescription.problem, 'Reserved')].{Problem:shortDescription.problem, Savings:extendedProperties.savingsAmount}" \
-o table
Purchase reservations for steady baseline capacity (24/7 production VMs, SQL databases). Use pay-as-you-go or spot for variable spikes.
Spot VMs and Dev/Test Optimization
# Create spot VM (up to 90% discount, can be evicted)
az vm create \
--resource-group rg-batch \
--name vm-batch-worker \
--image Ubuntu2204 \
--size Standard_D4s_v5 \
--priority Spot \
--max-price 0.05 \
--eviction-policy Deallocate \
--nsg-rule SSH \
--generate-ssh-keys
# Auto-shutdown schedule for dev VMs (via Azure Automation or DevTest Labs)
az vm auto-shutdown \
--resource-group rg-learning-dev \
--name vm-dev-01 \
--time 1900
Real-World Scenario: Engineering Cost Governance
| Practice | Implementation |
|---|---|
| Tagging | Mandatory environment, cost-center, owner via Azure Policy |
| Budgets | $10K/month prod, $2K/month dev — alerts at 80% |
| Reservations | 3-year RI for 4 baseline SQL databases; 1-year Savings Plan for compute |
| Review cadence | Weekly dev cleanup; monthly Advisor review; quarterly RI analysis |
| Chargeback | Cost export → Power BI dashboard by cost-center tag |
| Guardrails | Policy denies creation of GRS storage in dev subscriptions |
Cost Optimization Checklist
- Delete unused resources (VMs, disks, IPs, snapshots, orphaned NICs)
- Downsize over-provisioned VMs after reviewing 30-day CPU/memory metrics
- Use spot VMs for fault-tolerant batch, CI, and ML training workloads
- Move infrequent data to Cool/Archive storage tiers with lifecycle policies
- Shut down dev/test environments outside business hours
- Review Advisor recommendations monthly — act on high-impact items first
- Consolidate underutilized App Service plans and SQL databases
- Use serverless (Functions, Container Apps) for intermittent workloads
FinOps Culture
Cost optimization is a continuous practice across three phases:
- Inform — visible dashboards, regular reports, cost per team/project
- Optimize — technical changes (right-sizing, reservations, tiering, architecture)
- Operate — budgets, policies, accountability, and executive sponsorship
Treat cloud spend as a shared responsibility between engineering, finance, and leadership.
Common Mistakes
- No tagging from day one — retroactive tagging across hundreds of resources is painful
- Buying RIs before stabilizing workloads — commit after 30–60 days of steady usage patterns
- Ignoring storage transaction costs — millions of small blob reads add up quickly
- Leaving dev resources running 24/7 — auto-shutdown saves 65%+ on dev VMs
- No budget alerts — surprise invoices at month end instead of mid-month intervention
- Over-engineering for reliability in dev — GZRS and multi-region for test environments
Troubleshooting
| Issue | Diagnosis | Fix |
|---|---|---|
| Unexpected cost spike | Cost analysis by meter | Identify resource; check for misconfigured auto-scale |
| Budget alert but spend looks normal | Amortized vs actual view | Switch cost analysis to “Actual cost” view |
| RI not applying | Wrong size/region/family | Verify RI scope matches running VMs |
| High data transfer costs | Cross-region traffic | Co-locate resources; use private endpoints |
| Orphaned disk charges | Unattached managed disks | az disk list --query "[?diskState=='Unattached']" and delete |
# Find unattached managed disks
az disk list --query "[?diskState=='Unattached'].{Name:name, RG:resourceGroup, Size:diskSizeGb}" -o table
# Find unused public IPs
az network public-ip list --query "[?ipConfiguration==null].{Name:name, RG:resourceGroup}" -o table
Best Practices
- Start every project with a cost estimate from the Pricing Calculator
- Implement mandatory tags via Azure Policy at the management group level
- Review the Cost Management score in Advisor monthly
- Use management groups to apply budgets and policies hierarchically
- Automate dev resource cleanup with scheduled runbooks
- Include cost impact in architecture review and PR discussions
- Track unit economics (cost per user, per transaction) not just total spend
Next: DevOps with Azure DevOps.