A SaaS team we worked with last year watched their AWS bill go from $4,200 a month to $14,800 in six months. Revenue grew 40 percent in the same period. The math was ugly: cloud costs tripled while income didn’t even double. They weren’t doing anything wrong per se — they were shipping features, onboarding customers, scaling infrastructure. But nobody owned the bill, nobody questioned instance sizes, and nobody noticed that three dev environments were running 24/7 with production-grade resources.
This is the default trajectory for cloud cost optimization when it’s treated as an afterthought. Cloud spend grows faster than revenue, margins compress, and by the time someone flags it, the waste is structural — baked into architecture decisions made months ago.
If your product is growing and your cloud bill is growing faster, this guide covers the practical steps that actually move the number: visibility, compute right-sizing, storage lifecycle, autoscaling, data transfer, commitment discounts, and building cost into your engineering culture. If you’re already past the point of DIY fixes and want an outside team to audit your setup, book a free infrastructure review with our cloud engineering team.
Why Cloud Bills Spiral During Growth
The pattern is predictable. Early-stage teams pick generous instance sizes because “we might need the headroom.” Nobody sets up tagging because there are only two services. Autoscaling gets configured once and never revisited. Staging environments mirror production because it was easier to copy the Terraform module than write a separate one.
Then the product takes off. Traffic doubles. The team adds more services, more databases, more queues. Each new service inherits the same over-provisioned defaults. Within a year, 30 to 50 percent of the cloud bill is waste — resources running that nobody uses, storage nobody reads, data transfer routes that cross availability zones for no architectural reason.
According to the Flexera 2025 State of the Cloud Report, organizations estimate they waste 28 percent of their cloud spend on average. The actual number, Flexera notes, is usually higher because teams underestimate what they don’t measure.
The fix isn’t switching providers or renegotiating contracts. It starts with knowing what you’re spending and who’s responsible for each dollar.
Start With Visibility: Tag Everything, Measure Weekly
You cannot optimize what you cannot see. The single highest-leverage action for cloud cost optimization is implementing a tagging strategy and enforcing it before any other optimization work begins.
What to tag:
- Team or product area. Every resource gets an owner. Not “engineering” — a specific team. “Payments-backend” or “onboarding-frontend.”
- Environment. Production, staging, development, sandbox. This alone reveals waste: most teams discover dev and staging account for 20 to 35 percent of their bill.
- Cost center or project. If you’re building multiple products or features on shared infrastructure, this tells you which product line is profitable after infrastructure costs.
Where to look:
- AWS: Cost Explorer with tag-based filtering, plus AWS Cost Anomaly Detection for automated alerts. Enable Cost and Usage Reports (CUR) for granular hourly data.
- GCP: Billing export to BigQuery gives you query-level cost analysis. Labels on every resource, budget alerts at 50/80/100 percent thresholds.
How often: Weekly, not quarterly. Monthly is too slow to catch runaway spend. Set up Slack or email alerts for any cost category that jumps more than 15 percent week-over-week.
A fintech startup we consulted for — call them Relay — had $11,000/month in AWS spend with no tags. After two days of tagging and one week of measurement, they found $3,400/month in resources tied to a feature that had been deprecated three months earlier. Nobody had turned off the infrastructure because nobody knew it existed.
If your infrastructure grew organically from a legacy system and tagging feels overwhelming, a legacy modernization engagement can include cost mapping as part of the migration assessment.
Compute Right-Sizing: Stop Paying for Idle CPUs
Compute is typically 60 to 70 percent of a cloud bill, and overprovisioning is the default. Teams pick an instance family, choose a “safe” size, and never revisit the decision.
The right-sizing process:
- Pull utilization data for 14 days minimum. Use AWS CloudWatch metrics (CPUUtilization, MemoryUtilization via CloudWatch Agent) or GCP’s Recommender API. Seven days isn’t enough — you’ll miss weekly traffic patterns.
- Flag anything under 40 percent average CPU utilization. That’s your shortlist.
- Downsize by one step. If you’re running m6i.xlarge (4 vCPU, 16 GB), try m6i.large (2 vCPU, 8 GB). Monitor for two weeks. If P99 latency doesn’t degrade, the smaller size is correct.
- Consider ARM-based instances. AWS Graviton3 (m7g family) runs 20 to 40 percent cheaper than equivalent x86 instances for most workloads. GCP’s Tau T2A offers similar economics. If your application runs on containers or interpreted languages (Python, Node.js, Java), the migration is usually trivial.
Real example: A B2B analytics platform — we’ll call them Metric Labs — was running 14 m5.2xlarge instances (8 vCPU, 32 GB each) for their API tier. Average CPU utilization: 18 percent. We moved them to 8 m7g.xlarge instances (4 vCPU, 16 GB, Graviton3). Monthly compute cost dropped from $6,720 to $2,480 — a 63 percent reduction with no performance impact. The AWS Well-Architected Framework cost optimization pillar recommends this exact pattern: measure, right-size, re-evaluate.
For teams building new products, choosing the right instance family from day one avoids this entirely. Our custom software development engagements include infrastructure architecture so you’re not paying to fix cost mistakes six months later.
Storage Lifecycle: Delete What Nobody Reads
Storage costs are insidious because they only grow. You add data every day and rarely delete anything. Without lifecycle policies, your S3 or GCS bill becomes a tax on every month you’ve been in business.
Where storage waste hides:
- EBS snapshots. Teams enable daily snapshots and never set a retention policy. After a year, you’re storing 365 snapshots of every volume. Keep 7 daily, 4 weekly, 12 monthly. Delete the rest.
- Log retention. CloudWatch Logs default to “never expire.” Application logs older than 30 days should move to S3 Infrequent Access. Logs older than 90 days should move to Glacier or be deleted entirely. Same applies to GCP Cloud Logging — export to Cloud Storage with lifecycle rules.
- S3 storage classes. Objects accessed fewer than once per month belong in S3 Infrequent Access ($0.0125/GB vs. $0.023/GB for Standard). Objects accessed fewer than once per quarter belong in Glacier Instant Retrieval ($0.004/GB). Use S3 Intelligent-Tiering if you can’t predict access patterns — it automates class transitions with a small monitoring fee.
- Orphaned volumes. When you terminate an EC2 instance, its EBS volume doesn’t always go with it. Run a monthly audit: any unattached EBS volume older than 7 days is almost certainly safe to snapshot and delete.
- Container image registries. ECR and GCR accumulate hundreds of untagged images. Set lifecycle policies to keep only the last 10 tagged images per repository.
Dollar impact: A typical growth-stage product running for 18 months accumulates $500 to $2,000/month in storage waste. It’s never the biggest line item, but it compounds, and cleaning it up takes less than a day.
Autoscaling: Tune the Thresholds, Not Just the Limits
Bad autoscaling costs you twice: it wastes money when it scales too aggressively, and it drops requests when it scales too slowly.
Common autoscaling mistakes:
- Scaling on CPU alone. CPU-based scaling works for compute-bound workloads but is terrible for I/O-bound services (most web applications). If your API is waiting on database queries, CPU stays at 15 percent while response times climb. Use request count per target or custom latency-based metrics instead.
- Min instances set too high. Teams set a minimum of 4 instances “just in case” when overnight traffic needs 1. That’s 3 instances running idle for 10 hours a day — roughly $150 to $400/month per service, depending on instance size.
- No scale-down cooldown. Without a cooldown period, autoscaling oscillates: scales up on a traffic spike, scales down when the spike passes, scales up again when the new request batch arrives. Set scale-down cooldown to 5 to 10 minutes. Scale-up cooldown can be shorter (1 to 2 minutes).
- No scheduled scaling. If your traffic pattern is predictable — high during US business hours, low overnight — use scheduled scaling policies instead of or in addition to reactive autoscaling. AWS lets you set this in Auto Scaling Groups. GCP supports it through managed instance group schedules.
Target tracking vs. step scaling: Target tracking autoscaling (maintain 60 percent CPU / 1000 requests per minute per instance) is simpler and works well for steady growth. Step scaling (add 2 instances when CPU > 70 percent, add 4 when > 85 percent) is better for spiky traffic. Most teams should start with target tracking and switch to step scaling only if they have sharp, unpredictable spikes.
Data Transfer: The Hidden Margin Killer
Data transfer charges are the most misunderstood line item on a cloud bill. Compute and storage are intuitive. Data transfer is not.
What costs money:
- Cross-AZ traffic. In AWS, traffic between availability zones costs $0.01/GB in each direction ($0.02/GB round-trip). If your API servers in us-east-1a talk to your database in us-east-1b, every query and response incurs this charge. At scale, this adds thousands per month.
- Cross-region traffic. $0.02/GB on AWS, varies on GCP. If you’re replicating data to a DR region, you’re paying for every byte transferred. Make sure you’re only replicating what’s necessary.
- NAT Gateway charges. AWS NAT Gateways cost $0.045/hour plus $0.045/GB processed. A single NAT Gateway processing 1 TB/month costs $45 for the gateway plus $45 for the data — $90/month. If you have three services in private subnets, each with its own NAT Gateway, that’s $270/month before any useful work is done. Use VPC endpoints for S3 and DynamoDB (they’re free and bypass the NAT Gateway entirely).
- Egress to the internet. The first 100 GB/month is free on AWS. After that, $0.09/GB. Serving 1 TB of data to users costs $92/month in egress alone. This is where a CDN (CloudFront, Cloud CDN) pays for itself — CDN egress is $0.085/GB, and with caching, you serve a fraction of the requests from origin.
Tactical fixes:
- Put services that communicate heavily in the same AZ when possible.
- Use VPC endpoints for all supported AWS services.
- Compress API responses (gzip/brotli reduces transfer by 60 to 80 percent).
- Cache aggressively at the edge. If your content doesn’t change per-request, it belongs behind a CDN.
One e-commerce client discovered $2,100/month in cross-AZ data transfer charges because their search service (us-east-1a) queried an Elasticsearch cluster (us-east-1c) for every product page load. Moving the search service to the same AZ cut that line item to $340.
Reserved Instances and Committed Use: Lock In What You Know
On-demand pricing is the most expensive way to run cloud infrastructure. If you know you’ll need certain resources for 12 months, commitment discounts save 30 to 60 percent.
AWS options:
- Savings Plans. Commit to a dollar-per-hour spend level for 1 or 3 years. Compute Savings Plans cover EC2, Fargate, and Lambda. Flexible across instance families, regions, and OS. A 1-year no-upfront Compute Savings Plan saves roughly 20 percent. A 3-year all-upfront plan saves up to 52 percent.
- Reserved Instances. More rigid — locked to a specific instance type and region. Slightly deeper discounts than Savings Plans for the same commitment. Use these for databases (RDS Reserved Instances save 30 to 40 percent) and stable baseline compute.
- Spot Instances. Up to 90 percent discount but can be interrupted with 2 minutes’ notice. Use for batch processing, CI/CD, data pipelines, and anything stateless and retry-safe. Not for production API servers.
GCP options:
- Committed Use Discounts (CUDs). 1-year or 3-year commitments for Compute Engine. 1-year CUDs save 28 percent on memory-optimized and 20 percent on general-purpose. 3-year CUDs save up to 52 percent.
- Sustained Use Discounts. Automatic — if you run an instance for more than 25 percent of the month, GCP applies incremental discounts up to 30 percent. No commitment required.
- Preemptible/Spot VMs. Same concept as AWS Spot. Up to 80 percent discount, 24-hour max runtime, can be terminated anytime.
When to commit: Only after you’ve right-sized. Buying a 1-year reservation on an overprovisioned instance locks in waste at a discount. Right-size first (2 to 4 weeks of observation), then commit to your new baseline.
How much to commit: Cover 70 to 80 percent of your steady-state baseline with commitments. Leave the remaining 20 to 30 percent on-demand to handle growth and variability. Re-evaluate quarterly.
Understanding your cost structure also helps when budgeting new projects. Our custom software development cost guide breaks down where money goes in a typical build, including infrastructure planning.
Make Cost a Delivery Metric
The most effective cloud cost optimization isn’t a tool or a one-time audit. It’s a cultural shift: treating infrastructure cost as an engineering quality metric, the same way you treat latency, uptime, or error rate.
How to implement this:
- Add cost to sprint reviews. Show cost-per-workload alongside velocity and bug counts. Not to punish teams — to create awareness. “Our API tier costs $0.003 per request. Last sprint, a caching change dropped it to $0.0018.” That’s the kind of insight that compounds.
- Set cost budgets per team. Not hard limits (those create perverse incentives to underscale). Soft budgets with alerts. “Your team’s baseline is $3,200/month. You’re at $4,100. What changed?” The conversation is the point.
- Run a monthly cost review. 30 minutes. Look at the top 10 cost line items, changes from last month, and any new resources that weren’t in the forecast. Assign owners to anything unexplained.
- Include cost in architecture reviews. Before approving a new service or database, ask: “What does this add to the monthly bill?” If nobody can answer, the design isn’t ready.
- Reward efficiency. When a team reduces their cost-per-transaction by 30 percent without degrading performance, that’s as valuable as a new feature. Recognize it.
Teams that adopted this approach with our guidance — typically as part of a cloud solutions engagement — reduced their cloud spend by 25 to 45 percent within 90 days. Not through heroic one-time efforts, but through consistent attention to a metric that was previously invisible.
What to Do This Week
You don’t need to overhaul your infrastructure to start saving money. Here’s a practical starting sequence:
- Day 1-2: Enable tagging. Tag every resource with team, environment, and cost center. Enforce tags via AWS Service Control Policies or GCP Organization Policies.
- Day 3-5: Pull two weeks of utilization data. Identify the top 5 most over-provisioned instances and the top 3 unnecessary resources (unattached volumes, forgotten dev environments, expired snapshots).
- Week 2: Right-size the top 5 instances. Set storage lifecycle policies. Review autoscaling thresholds.
- Week 3-4: Evaluate commitment discounts based on your new, right-sized baseline. Set up weekly cost alerts.
- Ongoing: Add cost-per-workload to your sprint dashboard. Hold a monthly cost review.
If your infrastructure grew alongside a legacy system and the architecture makes cost optimization difficult, a legacy modernization roadmap can help you untangle dependencies before optimizing spend.
For teams that want an experienced engineering partner to run this process — from audit through implementation — we do this regularly. Book a free 30-minute infrastructure review and bring your last three months of cloud bills. We’ll identify your top three savings opportunities and give you a realistic timeline to capture them.