Cost Optimization
GCP costs can grow without governance. Google provides tools and pricing models to forecast, monitor, and optimize spending across projects and teams. Cost optimization is not a one-time exercise — it is a continuous practice shared between engineering and finance, often called FinOps.
Cost Management Tools
| Tool | Purpose |
|---|---|
| Cloud Billing reports | Analyze spend by project, service, label |
| Budgets & alerts | Set spending thresholds with notifications |
| Recommender | Right-sizing, idle resource, CUD recommendations |
| Labels | Allocate costs to teams, environments, projects |
| Billing export (BigQuery) | Custom cost analysis and dashboards |
| Pricing Calculator | Estimate costs before deployment |
Analyze Spend
# List billing accounts
gcloud billing accounts list
# Describe project billing
gcloud billing projects describe learning-gcp-dev
# List all projects and their billing status
gcloud projects list --format="table(projectId,name)"
In Console: Billing → Reports → filter by project, service, or label.
BigQuery Billing Analysis
Enable billing export, then query:
SELECT
service.description AS service,
sku.description AS sku,
SUM(cost) AS total_cost,
SUM(usage.amount) AS total_usage
FROM `billing_export.gcp_billing_export_v1_XXXXX`
WHERE usage_start_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY service, sku
ORDER BY total_cost DESC
LIMIT 20;
-- Cost by label (team)
SELECT
labels.value AS team,
SUM(cost) AS total_cost
FROM `billing_export.gcp_billing_export_v1_XXXXX`,
UNNEST(labels) AS labels
WHERE labels.key = 'team'
AND usage_start_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY team
ORDER BY total_cost DESC;
Labeling Strategy
| Label | Example Values | Purpose |
|---|---|---|
| environment | dev, staging, prod | Separate environment costs |
| team | platform, data, frontend | Team allocation |
| cost-center | engineering, marketing | Department billing |
| application | web-app, api, pipeline | App-level tracking |
gcloud compute instances add-labels web-server-01 \
--labels=environment=prod,team=platform,application=web-app \
--zone=us-central1-a
# Label at resource creation (preferred)
gcloud compute instances create web-server-02 \
--labels=environment=prod,team=platform \
--zone=us-central1-a \
--machine-type=e2-medium \
--image-family=ubuntu-2204-lts \
--image-project=ubuntu-os-cloud
Important: Labels applied after creation do not retroactively tag historical billing data.
Committed Use Discounts (CUDs)
| Type | Commitment | Flexibility | Discount |
|---|---|---|---|
| Resource-based CUD | Specific vCPU/memory in a region | Low (specific machine family) | Up to 57% (3-year) |
| Spend-based CUD | Hourly spend in a region | Higher (any service) | Up to 47% (3-year) |
| Sustained use | Automatic, no commitment | Full | Up to 30% |
# View CUD recommendations
gcloud recommender recommendations list \
--project=learning-gcp-dev \
--location=global \
--recommender=google.compute.commitment.UsageCommitmentRecommender
Purchase CUDs for steady baseline workloads; use on-demand for variable spikes.
Spot and Preemptible Savings
| Workload Type | On-Demand Cost | Spot Cost | Savings |
|---|---|---|---|
| Batch processing | $100/month | $9/month | 91% |
| CI/CD runners | $50/month | $8/month | 84% |
| Dev/test VMs | $30/month | $5/month | 83% |
gcloud compute instances create batch-job \
--provisioning-model=SPOT \
--instance-termination-action=DELETE \
--machine-type=n2-standard-4 \
--zone=us-central1-a \
--image-family=ubuntu-2204-lts \
--image-project=ubuntu-os-cloud
Budgets and Alerts
gcloud billing budgets create \
--billing-account=BILLING_ACCOUNT_ID \
--display-name="Monthly Dev Budget" \
--budget-amount=500USD \
--threshold-rule=percent=50 \
--threshold-rule=percent=90 \
--threshold-rule=percent=100 \
--notifications-rule-pubsub-topic=projects/learning-gcp-dev/topics/billing-alerts
Set budgets per project and per team. Connect to Pub/Sub for automated responses (e.g., shut down dev VMs at 100%).
Cost Optimization Checklist
- Delete idle resources (VMs, disks, IPs, snapshots, unused load balancers)
- Right-size VMs using Recommender suggestions
- Use preemptible/spot VMs for fault-tolerant batch jobs
- Move infrequent data to Nearline, Coldline, or Archive storage
- Enable autoscaling to match demand (MIGs, GKE HPA, Cloud Run)
- Shut down dev/test environments outside business hours
- Review billing reports monthly with engineering and finance
- Purchase CUDs for predictable baseline compute
- Use Autopilot GKE to avoid over-provisioned nodes
- Set Cloud Run min-instances=0 for dev services
FinOps Practices
| Phase | Activities | Participants |
|---|---|---|
| Inform | Dashboards, cost reports, showback | Finance + Engineering |
| Optimize | Right-sizing, CUDs, architecture changes | Engineering |
| Operate | Budgets, policies, accountability, chargeback | Finance + Leadership |
Real-World Scenario: Reducing a $50K Monthly Bill
A startup’s GCP bill grew to $50K/month. Analysis revealed:
| Finding | Monthly Cost | Action | Savings |
|---|---|---|---|
| 40 idle VMs (dev) | $12K | Auto-shutdown scheduler | $12K |
| Over-provisioned Cloud SQL | $8K | Right-size db-custom-8 → db-custom-4 | $4K |
| No storage lifecycle rules | $5K | Move to Nearline after 30 days | $3K |
| On-demand baseline compute | $15K | 1-year CUD for steady workloads | $6K |
| No autoscaling on GKE | $10K | Enable HPA + Cluster Autoscaler | $4K |
Total savings: ~$29K/month (58%) with no architecture changes — just governance and right-sizing.
Common Mistakes
| Mistake | Impact | Fix |
|---|---|---|
| No labels on resources | Cannot allocate costs | Label at creation time |
| Ignoring idle resources | Paying for unused VMs/disks | Weekly Recommender review |
| CUD for variable workloads | Paying for unused commitment | CUD only for steady baseline |
| No budgets | Surprise invoices | Budgets on every project |
| Dev environments running 24/7 | 3x unnecessary compute cost | Auto-shutdown outside hours |
Best Practices
- Export billing to BigQuery from day one
- Review Recommender weekly for idle and overprovisioned resources
- Use showback (not chargeback initially) to build cost awareness
- Include cost estimates in architecture reviews and PR descriptions
- Automate dev environment shutdown with Cloud Scheduler + Cloud Functions
- Negotiate enterprise discounts when spend exceeds $100K/month
- Track unit economics (cost per user, per request) not just total spend
Troubleshooting
Unexpected charges:
# Check recent high-cost resources
# In BigQuery billing export:
# SELECT service.description, SUM(cost) FROM ... WHERE date = CURRENT_DATE() GROUP BY 1 ORDER BY 2 DESC
CUD not applying: Verify the committed resources match actual usage (machine family, region). Resource-based CUDs are inflexible.
Label not appearing in billing: Labels must be set at resource creation or before the billing period. Retroactive labels do not affect historical data.
Cloud cost optimization is a continuous practice shared between engineering and finance.
Next: CI/CD with Cloud Build — pipelines, triggers, and deployments.