Query Store Deep Dive
Query Store captures query text, execution plans, and runtime statistics over time — persisting across restarts unlike the plan cache. It is the primary tool for detecting plan regressions after upgrades, statistics updates, or parameter sniffing issues.
Enable and Configure
ALTER DATABASE MyApp SET QUERY_STORE = ON (
OPERATION_MODE = READ_WRITE,
CLEANUP_POLICY = (STALE_QUERY_THRESHOLD_DAYS = 30),
DATA_FLUSH_INTERVAL_SECONDS = 900,
MAX_STORAGE_SIZE_MB = 2048,
INTERVAL_LENGTH_MINUTES = 60,
QUERY_CAPTURE_MODE = AUTO, -- AUTO, ALL, NONE, CUSTOM
SIZE_BASED_CLEANUP_MODE = AUTO
);
| Setting | Recommendation |
|---|---|
MAX_STORAGE_SIZE_MB |
1–4 GB production; monitor sys.query_store_options |
INTERVAL_LENGTH_MINUTES |
60 for stable baselines; 15 for active tuning |
QUERY_CAPTURE_MODE |
AUTO for most; ALL briefly during incidents |
WAIT_STATS_CAPTURE_MODE |
ON (2017+) — correlate waits with queries |
Verify status:
SELECT actual_state_desc, desired_state_desc, current_storage_size_mb, max_storage_size_mb
FROM sys.database_query_store_options;
What Query Store Captures
- Query text and query_id (parameterized)
- Multiple plans per query (plan_id)
- Runtime stats per plan per interval (duration, CPU, reads, executions)
- Wait categories per plan (2017+)
- Plan forcing and compatibility level context
Data lives in internal tables — query via catalog views only.
Find Top Resource Consumers
SELECT TOP 20
qsq.query_id,
qsp.plan_id,
SUM(qsrs.count_executions) AS executions,
SUM(qsrs.avg_duration * qsrs.count_executions) / NULLIF(SUM(qsrs.count_executions), 0) / 1000.0 AS avg_ms,
SUM(qsrs.avg_logical_io_reads * qsrs.count_executions) / NULLIF(SUM(qsrs.count_executions), 0) AS avg_reads,
LEFT(qt.query_sql_text, 300) AS query_text
FROM sys.query_store_query qsq
JOIN sys.query_store_query_text qt ON qsq.query_text_id = qt.query_text_id
JOIN sys.query_store_plan qsp ON qsq.query_id = qsp.query_id
JOIN sys.query_store_runtime_stats qsrs ON qsp.plan_id = qsrs.plan_id
JOIN sys.query_store_runtime_stats_interval qsrsi
ON qsrs.runtime_stats_interval_id = qsrsi.runtime_stats_interval_id
WHERE qsrsi.start_time > DATEADD(day, -7, GETUTCDATE())
GROUP BY qsq.query_id, qsp.plan_id, qt.query_sql_text
ORDER BY avg_ms * executions DESC;
Weight by total impact (avg × executions), not avg alone.
Detect Plan Regressions
A new plan with worse performance after a change (upgrade, stats update, index drop):
SELECT
qsq.query_id,
qsp.plan_id,
qsp.last_execution_time,
qsrs.avg_duration / 1000.0 AS avg_ms,
qsp.is_forced_plan,
qp.query_plan
FROM sys.query_store_query qsq
JOIN sys.query_store_plan qsp ON qsq.query_id = qsp.query_id
JOIN sys.query_store_runtime_stats qsrs ON qsp.plan_id = qsrs.plan_id
CROSS APPLY (
SELECT query_plan FROM sys.query_store_plan AS p WHERE p.plan_id = qsp.plan_id
) qp
WHERE qsq.query_id = 42 -- Known problematic query
ORDER BY qsp.plan_id, qsrs.last_execution_time;
Compare plans in SSMS — Query Store → Track Query → Plan Summary.
Automated regression view (simplified):
-- Queries where latest plan avg duration > 2x older plan
WITH plan_stats AS (
SELECT
qsq.query_id,
qsp.plan_id,
AVG(qsrs.avg_duration) AS avg_dur,
MAX(qsrs.last_execution_time) AS last_run
FROM sys.query_store_query qsq
JOIN sys.query_store_plan qsp ON qsq.query_id = qsp.query_id
JOIN sys.query_store_runtime_stats qsrs ON qsp.plan_id = qsrs.plan_id
GROUP BY qsq.query_id, qsp.plan_id
)
SELECT a.query_id, a.plan_id AS slow_plan, b.plan_id AS fast_plan,
a.avg_dur / NULLIF(b.avg_dur, 0) AS slowdown_ratio
FROM plan_stats a
JOIN plan_stats b ON a.query_id = b.query_id AND a.plan_id <> b.plan_id
WHERE a.avg_dur > b.avg_dur * 2 AND a.last_run > b.last_run;
Force and Unforce Plans
When a known-good plan exists:
-- SSMS: Right-click plan → Force Plan
EXEC sp_query_store_force_plan @query_id = 42, @plan_id = 3;
-- Verify
SELECT query_id, plan_id, is_forced_plan, forced_plan_failure_count
FROM sys.query_store_plan WHERE query_id = 42;
-- Remove force
EXEC sp_query_store_unforce_plan @query_id = 42, @plan_id = 3;
Forced plans can fail if schema changes invalidate plan — monitor forced_plan_failure_count.
Wait Stats in Query Store (2017+)
SELECT
qsq.query_id,
qsws.wait_category_desc,
SUM(qsws.avg_query_wait_time_ms * qsws.count_executions) AS total_wait_ms
FROM sys.query_store_query qsq
JOIN sys.query_store_plan qsp ON qsq.query_id = qsp.query_id
JOIN sys.query_store_wait_stats qsws ON qsp.plan_id = qsws.plan_id
GROUP BY qsq.query_id, qsws.wait_category_desc
ORDER BY total_wait_ms DESC;
Correlate CPU vs I/O vs Lock waits per query — faster than instance-wide wait stats alone.
Query Store and Upgrades
Microsoft recommends:
- Enable Query Store before upgrading SQL Server version
- Capture baseline for 1–2 weeks
- Upgrade
- Compare plan counts and duration — use Query Store with Compatibility Level insights
ALTER DATABASE MyApp SET COMPATIBILITY_LEVEL = 160; -- SQL 2022
-- Query Store tracks plans per CE version
Capture Mode Tuning
| Mode | Behavior |
|---|---|
| ALL | Every query — high overhead, short incidents only |
| AUTO | Threshold-based — default |
| NONE | Stops capture; existing data retained |
| CUSTOM | QUERY_CAPTURE_POLICY with time/CPU thresholds |
During firefighting:
ALTER DATABASE MyApp SET QUERY_STORE (QUERY_CAPTURE_MODE = ALL);
-- After incident
ALTER DATABASE MyApp SET QUERY_STORE (QUERY_CAPTURE_MODE = AUTO);
Storage and Cleanup
-- Manual cleanup old stats
EXEC sp_query_store_flush_db;
ALTER DATABASE MyApp SET QUERY_STORE CLEAR ALL; -- Nuclear — dev only
-- Read-only mode for investigation
ALTER DATABASE MyApp SET QUERY_STORE (OPERATION_MODE = READ_ONLY);
When current_storage_size_mb approaches max, auto cleanup drops oldest data — may lose pre-upgrade baseline if undersized.
Common Mistakes
| Mistake | Effect |
|---|---|
| Query Store off in production | No regression history after incidents |
| Storage too small | Auto purge deletes evidence |
| Force plan permanently | Schema drift causes failures |
| Ignoring AUTO capture gaps | Rare queries missing — brief ALL mode |
| Tuning on dev with empty QS | No representative plans |
Integration with SSMS Reports
Database → Reports → Standard Reports → Query Store:
- Top resource consuming queries
- Regressed queries
- Queries with forced plans
- Tracked queries
Pin tracked queries for critical stored procedures and API queries.
Production Workflow: Post-Deploy Regression
- Deploy application release Friday 6 PM
- Alert: P95 latency doubled on
GetOrdersAPI - Query Store → regressed queries report
- query_id 881 — new plan_id 12 (scan vs seek)
- Root cause: statistics auto-update skewed on
Statuscolumn - Short-term:
sp_query_store_force_planto plan_id 8 - Long-term: filtered index on active orders + update stats with FULLSCAN
- Unforce plan after index deployed; verify plan_id 12 gone or harmless
Document in postmortem with query_id and plan_id references.
Query Store vs Plan Cache DMVs
| Aspect | Query Store | dm_exec_query_stats |
|---|---|---|
| Persistence | Survives restart | Cleared on restart |
| Historical intervals | Yes | Single cumulative |
| Plan forcing | Built-in | Manual plan guides |
| Overhead | Moderate | None |
Use both — Query Store for trends, DMVs for live snapshot.
Troubleshooting
-- Query Store not capturing
SELECT actual_state_desc, readonly_reason, current_storage_size_mb
FROM sys.database_query_store_options;
-- readonly_reason: 65536 = size limit, 8192 = single user mode, etc.
-- Flush stuck
ALTER DATABASE MyApp SET QUERY_STORE (OPERATION_MODE = READ_WRITE);
-- Compatibility with AG secondaries
-- Query Store on primary only for write workload DBs; readable secondaries may lag
Enable Query Store on every production database before you need it — plan regressions are inevitable; Query Store makes them diagnosable instead of mysterious.