to navigate

to select

to close

On this page

Azure Monitor

Azure Monitor is the unified observability platform for Azure. It collects metrics and logs from resources, applications, and infrastructure — enabling alerts, dashboards, root-cause analysis, and automated remediation. Effective monitoring turns raw telemetry into actionable operational intelligence.

Data Platform Overview

Data Type	Source	Storage	Query Language
Metrics	Azure resources, custom	Azure Monitor Metrics (time-series)	Metrics Explorer
Logs	Resources, agents, apps	Log Analytics workspace	Kusto (KQL)
Traces	Application Insights	Log Analytics	KQL
Activity Logs	Control plane operations	Log Analytics / Storage	KQL
Alerts	Metrics, logs, activity	Alert rules	—

All diagnostic data should flow into a Log Analytics workspace for centralized querying and correlation.

Log Analytics Workspace Setup

  # Create workspace
az monitor log-analytics workspace create \
  --resource-group rg-webapp-prod \
  --workspace-name law-webapp-prod \
  --location eastus \
  --retention-time 90

# Enable diagnostic settings on a VM (send logs to workspace)
az monitor diagnostic-settings create \
  --name vm-diagnostics \
  --resource /subscriptions/SUB_ID/resourceGroups/rg-webapp-prod/providers/Microsoft.Compute/virtualMachines/vm-web-01 \
  --workspace law-webapp-prod \
  --metrics '[{"category":"AllMetrics","enabled":true}]' \
  --logs '[{"category":"Syslog","enabled":true},{"category":"Audit","enabled":true}]'

Application Insights

Application Insights provides APM for web apps, APIs, and functions — tracking requests, dependencies, exceptions, and custom events:

  az monitor app-insights component create \
  --app ai-webapp-prod \
  --location eastus \
  --resource-group rg-webapp-prod \
  --application-type web \
  --kind web \
  --workspace law-webapp-prod

Enable in Node.js:

  const appInsights = require('applicationinsights');
appInsights.setup(process.env.APPLICATIONINSIGHTS_CONNECTION_STRING)
  .setAutoDependencyCorrelation(true)
  .setAutoCollectRequests(true)
  .setAutoCollectExceptions(true)
  .setAutoCollectDependencies(true)
  .setSendLiveMetrics(true)
  .start();

Enable on App Service via CLI:

  az webapp config appsettings set \
  --name my-webapp-prod \
  --resource-group rg-webapp-prod \
  --settings APPLICATIONINSIGHTS_CONNECTION_STRING="<connection-string>"

KQL Query Examples

Kusto Query Language powers log analysis across all Azure Monitor data:

  // Failed requests in the last hour with error details
requests
| where timestamp > ago(1h)
| where success == false
| summarize count(), avg(duration) by resultCode, name, operation_Name
| order by count_ desc

// Average CPU across VMs (5-minute bins)
Perf
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| where TimeGenerated > ago(24h)
| summarize avg(CounterValue) by bin(TimeGenerated, 5m), Computer
| render timechart

// Top 10 slowest API endpoints (P95 latency)
requests
| where timestamp > ago(7d)
| summarize percentiles(duration, 50, 95, 99) by name
| top 10 by percentile_duration_95 desc

// Correlated exceptions with requests
exceptions
| where timestamp > ago(1h)
| join kind=inner (
    requests | where timestamp > ago(1h)
) on operation_Id
| project timestamp, name, outerMessage, url, resultCode

Alerts and Action Groups

  # Create action group (email + webhook)
az monitor action-group create \
  --name ag-platform-oncall \
  --resource-group rg-webapp-prod \
  --short-name platform \
  --email-receiver name=oncall [email protected]

# Metric alert: high CPU on VM
az monitor metrics alert create \
  --name alert-vm-high-cpu \
  --resource-group rg-webapp-prod \
  --scopes /subscriptions/SUB_ID/resourceGroups/rg-webapp-prod/providers/Microsoft.Compute/virtualMachines/vm-web-01 \
  --condition "avg Percentage CPU > 80" \
  --window-size 5m \
  --evaluation-frequency 1m \
  --action ag-platform-oncall \
  --severity 2 \
  --description "VM CPU exceeded 80% for 5 minutes"

# Log query alert: error rate spike
az monitor scheduled-query create \
  --name alert-high-error-rate \
  --resource-group rg-webapp-prod \
  --scopes /subscriptions/SUB_ID/resourceGroups/rg-webapp-prod/providers/Microsoft.OperationalInsights/workspaces/law-webapp-prod \
  --condition-query "requests | where success == false | summarize count() by bin(timestamp, 5m) | where count_ > 50" \
  --condition-threshold 0 \
  --evaluation-frequency 5m \
  --window-size 15m \
  --action ag-platform-oncall \
  --severity 1

Action Groups

Channel	Use Case
Email/SMS/Voice	On-call notifications
Webhook	PagerDuty, Slack, Teams integration
Azure Function	Custom auto-remediation (restart app, scale out)
Logic App	Complex notification and ticketing workflows
ITSM	ServiceNow, System Center integration

Dashboards and Workbooks

Dashboards: Pin charts and metrics for at-a-glance monitoring — share across teams
Workbooks: Interactive reports combining metrics, logs, and parameters — ideal for incident triage
Azure Monitor for VMs: Infrastructure health, performance counters, dependency maps
Container Insights: AKS pod metrics, node health, controller logs

Real-World Scenario: Production SaaS Monitoring

Layer	Monitoring
Application	App Insights — request rate, P95 latency, failure rate SLI
Infrastructure	VM/AKS metrics — CPU, memory, disk IOPS
Database	Azure SQL DMVs via diagnostic logs — DTU%, deadlocks
Network	NSG flow logs, Front Door health probes
Alerts	P1: error rate > 5%; P2: P95 > 2s; P3: disk > 85%
Dashboard	Single pane: availability, latency, error budget burn rate

Monitoring Tools Comparison

Tool	Scope	Best For
Azure Monitor	Full platform	Metrics, logs, alerts
Application Insights	Application APM	Request tracing, dependencies
Log Analytics	Log storage + KQL	Cross-resource correlation
Azure Monitor Agent	VM/on-prem collection	Replace legacy Log Analytics agent
Defender for Cloud	Security posture	Vulnerability and threat detection

Common Mistakes

No diagnostic settings enabled — resources emit metrics but not detailed logs
Alert fatigue — too many low-severity alerts; teams ignore all of them
Missing correlation — App Insights not linked to Log Analytics workspace
Short retention — 30-day default may be insufficient for trend analysis
No baseline before alerting — thresholds set arbitrarily without historical data
Ignoring Activity Log — security incidents missed without control-plane monitoring

Troubleshooting

Issue	Diagnosis	Fix
No data in workspace	Diagnostic settings not configured	Enable diagnostics on each resource
App Insights missing traces	SDK not initialized or wrong connection string	Verify `APPLICATIONINSIGHTS_CONNECTION_STRING`
Alert not firing	Wrong scope or threshold	Test KQL query manually; check evaluation frequency
High ingestion costs	Verbose logging, no filtering	Use transformation rules; filter at source
KQL query timeout	Too broad time range or no indexing	Add `where timestamp > ago(24h)`; use `summarize` early

  # Check workspace ingestion volume
az monitor log-analytics workspace get-shared-keys \
  --resource-group rg-webapp-prod \
  --workspace-name law-webapp-prod

# List active alert rules
az monitor metrics alert list --resource-group rg-webapp-prod -o table

Best Practices

Define SLIs (latency, error rate, availability) and SLOs for each service
Collect metrics at resource level and traces at application level
Set alerts on actionable thresholds — every alert should require human or automated action
Route all logs to a central Log Analytics workspace with RBAC
Use workbooks for incident response runbooks with pre-built KQL queries
Enable diagnostic settings on every production resource at deployment time
Review dashboards weekly and tune thresholds based on baselines
Implement alert enrichment — include resource links and runbook URLs in notifications

Next: Azure Kubernetes Service (AKS).

Azure Functions Serverless

Azure Kubernetes Service

Azure Monitor

Data Platform Overview link

Log Analytics Workspace Setup link

Application Insights link

KQL Query Examples link

Alerts and Action Groups link

Action Groups link

Dashboards and Workbooks link

Real-World Scenario: Production SaaS Monitoring link

Monitoring Tools Comparison link

Common Mistakes link

Troubleshooting link

Best Practices link