Cloud costs can grow quickly without governance. Azure provides tools and practices to forecast, monitor, and optimize spending across subscriptions, management groups, and teams. Cost optimization is an ongoing discipline — not a one-time cleanup.

Cost Management Tools

Tool Purpose
Cost Management + Billing Analyze spend, export data, create budgets
Azure Pricing Calculator Estimate costs before deployment
Azure Advisor Right-sizing and reserved instance recommendations
Tags Allocate costs to teams, projects, environments
Azure Policy Enforce tagging and SKU restrictions
FinOps Hub (preview) Executive dashboards and optimization workflows

Analyze Current Spend

  # View cost by resource group (requires Cost Management API access)
az consumption usage list \
  --start-date 2025-01-01 \
  --end-date 2025-01-31 \
  --query "[].{Resource:instanceName, Cost:pretaxCost, Meter:meterName}" \
  --output table

# Export cost data to storage account (daily CSV)
az costmanagement export create \
  --name daily-cost-export \
  --scope "subscriptions/SUB_ID" \
  --storage-account-id /subscriptions/SUB_ID/resourceGroups/rg-ops/providers/Microsoft.Storage/storageAccounts/stcostexport \
  --storage-container cost-exports \
  --timeframe MonthToDate \
  --recurrence Daily \
  --recurrence-period from=2025-01-01 to=2026-12-31
  

In the Portal: Cost Management + BillingCost analysis → group by resource group, service, tag, or meter. Save views for recurring reviews.

Tagging Strategy

Consistent tags enable cost allocation and chargeback:

Tag Example Values Purpose
environment dev, staging, prod Separate environment costs
cost-center engineering, marketing Department billing
project web-app, data-pipeline Project-level tracking
owner team-platform Accountability
auto-shutdown true/false Identify schedulable resources

Enforce tags with Azure Policy:

  # Assign built-in policy: require 'environment' tag
az policy assignment create \
  --name require-environment-tag \
  --policy /providers/Microsoft.Authorization/policyDefinitions/871b6d14-10aa-478d-b590-945faa0f723c \
  --scope /subscriptions/SUB_ID \
  --params '{"tagName": {"value": "environment"}}'
  

Custom deny policy for untagged resources:

  {
  "if": {
    "field": "tags['environment']",
    "exists": false
  },
  "then": {
    "effect": "deny"
  }
}
  

Budgets and Alerts

  # Create monthly budget with email alerts at 80% and 100%
az consumption budget create \
  --budget-name monthly-prod-budget \
  --amount 10000 \
  --category cost \
  --time-grain Monthly \
  --start-date 2025-01-01 \
  --end-date 2026-12-31 \
  --resource-group rg-webapp-prod

# Filter budget to a specific resource group
az consumption budget create \
  --budget-name rg-dev-monthly \
  --amount 500 \
  --category cost \
  --time-grain Monthly \
  --start-date 2025-01-01 \
  --end-date 2026-12-31 \
  --resource-group rg-learning-dev
  

Set alerts at 50%, 80%, and 100% of budget to catch overspend early. Assign budget owners who can act on alerts.

Reserved Instances and Savings Plans

Option Commitment Flexibility Typical Discount
Reserved VM Instances 1 or 3 years, specific VM size/region Low (exchangeable) Up to 72%
Savings Plans (Compute) Hourly $ commitment Higher (any VM family/region) Up to 65%
Azure Hybrid Benefit Existing Windows/SQL licenses License reuse on VMs/SQL Up to 40% additional
Reserved Capacity (SQL/Redis) 1 or 3 years Service-specific Up to 65%
  # Check reservation recommendations
az advisor recommendation list \
  --category Cost \
  --query "[?contains(shortDescription.problem, 'Reserved')].{Problem:shortDescription.problem, Savings:extendedProperties.savingsAmount}" \
  -o table
  

Purchase reservations for steady baseline capacity (24/7 production VMs, SQL databases). Use pay-as-you-go or spot for variable spikes.

Spot VMs and Dev/Test Optimization

  # Create spot VM (up to 90% discount, can be evicted)
az vm create \
  --resource-group rg-batch \
  --name vm-batch-worker \
  --image Ubuntu2204 \
  --size Standard_D4s_v5 \
  --priority Spot \
  --max-price 0.05 \
  --eviction-policy Deallocate \
  --nsg-rule SSH \
  --generate-ssh-keys

# Auto-shutdown schedule for dev VMs (via Azure Automation or DevTest Labs)
az vm auto-shutdown \
  --resource-group rg-learning-dev \
  --name vm-dev-01 \
  --time 1900
  

Real-World Scenario: Engineering Cost Governance

Practice Implementation
Tagging Mandatory environment, cost-center, owner via Azure Policy
Budgets $10K/month prod, $2K/month dev — alerts at 80%
Reservations 3-year RI for 4 baseline SQL databases; 1-year Savings Plan for compute
Review cadence Weekly dev cleanup; monthly Advisor review; quarterly RI analysis
Chargeback Cost export → Power BI dashboard by cost-center tag
Guardrails Policy denies creation of GRS storage in dev subscriptions

Cost Optimization Checklist

  • Delete unused resources (VMs, disks, IPs, snapshots, orphaned NICs)
  • Downsize over-provisioned VMs after reviewing 30-day CPU/memory metrics
  • Use spot VMs for fault-tolerant batch, CI, and ML training workloads
  • Move infrequent data to Cool/Archive storage tiers with lifecycle policies
  • Shut down dev/test environments outside business hours
  • Review Advisor recommendations monthly — act on high-impact items first
  • Consolidate underutilized App Service plans and SQL databases
  • Use serverless (Functions, Container Apps) for intermittent workloads

FinOps Culture

Cost optimization is a continuous practice across three phases:

  1. Inform — visible dashboards, regular reports, cost per team/project
  2. Optimize — technical changes (right-sizing, reservations, tiering, architecture)
  3. Operate — budgets, policies, accountability, and executive sponsorship

Treat cloud spend as a shared responsibility between engineering, finance, and leadership.

Common Mistakes

  1. No tagging from day one — retroactive tagging across hundreds of resources is painful
  2. Buying RIs before stabilizing workloads — commit after 30–60 days of steady usage patterns
  3. Ignoring storage transaction costs — millions of small blob reads add up quickly
  4. Leaving dev resources running 24/7 — auto-shutdown saves 65%+ on dev VMs
  5. No budget alerts — surprise invoices at month end instead of mid-month intervention
  6. Over-engineering for reliability in dev — GZRS and multi-region for test environments

Troubleshooting

Issue Diagnosis Fix
Unexpected cost spike Cost analysis by meter Identify resource; check for misconfigured auto-scale
Budget alert but spend looks normal Amortized vs actual view Switch cost analysis to “Actual cost” view
RI not applying Wrong size/region/family Verify RI scope matches running VMs
High data transfer costs Cross-region traffic Co-locate resources; use private endpoints
Orphaned disk charges Unattached managed disks az disk list --query "[?diskState=='Unattached']" and delete
  # Find unattached managed disks
az disk list --query "[?diskState=='Unattached'].{Name:name, RG:resourceGroup, Size:diskSizeGb}" -o table

# Find unused public IPs
az network public-ip list --query "[?ipConfiguration==null].{Name:name, RG:resourceGroup}" -o table
  

Best Practices

  • Start every project with a cost estimate from the Pricing Calculator
  • Implement mandatory tags via Azure Policy at the management group level
  • Review the Cost Management score in Advisor monthly
  • Use management groups to apply budgets and policies hierarchically
  • Automate dev resource cleanup with scheduled runbooks
  • Include cost impact in architecture review and PR discussions
  • Track unit economics (cost per user, per transaction) not just total spend

Next: DevOps with Azure DevOps.