How to Reduce AWS Costs Without Sacrificing Performance

Cost reviews produce spreadsheets. What organisations actually need is a sequence of changes, in priority order, with clear owner accountability and a measurement plan. Without that structure, recommendations sit in a document and the bill creeps back within six months.

I have run cost audits on AWS environments ranging from 20 instances to several thousand. The pattern is consistent: most of the addressable waste sits in three areas — oversized compute, underutilised Reserved Instance coverage, and unmanaged storage growth. Architectural savings are real but require more implementation effort. This article covers all four in the order I would tackle them.

Compute Rightsizing: The Work Behind the Recommendation

AWS Compute Optimizer is a starting point, not a strategy. It tells you what to downsize. It does not tell you whether the recommendation is safe for your specific workload, how to validate the change, or how to prevent over-provisioning from recurring.

The actual process:

Pull 30 days of CloudWatch utilisation data per instance — not per instance type and not per cluster average. You need per-instance P99 CPU and memory utilisation. Averages hide spiky workloads. A web server sitting at 5% CPU for 23 hours and hitting 85% during a nightly batch job looks like a rightsizing candidate on average data. It is not one.

Group instances by workload profile. Web tier (CPU-bound, short request cycles), batch processing (CPU or memory intensive, long-running), in-memory caches (memory-bound, sensitive to eviction), databases (IO-bound, latency-sensitive). Each category has different sizing logic and different risk tolerance for downsizing.

Headroom matters. The target for web tier is 40–60% average CPU utilisation with P99 below 80%. For batch workloads, you can target higher average utilisation. Never rightsize to 80%+ average on anything with unpredictable spikes — you will get paged when traffic increases 15% during a product launch.

Test in staging with a realistic load profile — not just health checks. For stateful services such as databases and caches, validate that memory eviction behaviour does not degrade under the new size.

Sequence changes within a family before switching families. Moving from r5.4xlarge to r5.2xlarge is low-risk and reversible within minutes. Moving from r5 to c6g — a different architecture with different memory/CPU ratio and Graviton2 — is a meaningful change that needs a planned cutover window. Do not combine both changes in one step.

The common mistake is treating Compute Optimizer recommendations as a deployment checklist rather than an investigation trigger. Each recommendation is a hypothesis. Validate it before acting.

Reserved Instances vs Savings Plans: When Each Applies

This is not just a pricing question. It is a flexibility-versus-certainty trade-off that depends on where your workload is in its lifecycle.

Reserved Instances commit to a specific instance family, size, and (for Standard RIs) region. Discounts reach 72% for 3-year convertible RIs. The trade-off is specificity — if the workload changes, the reservation becomes a cost without a corresponding benefit. Convertible RIs allow instance family changes but require exchange through the console and carry lower discount ceilings than Standard.

Savings Plans commit to a $/hour compute spend, applicable across EC2 instance types, regions, Lambda, and Fargate. Compute Savings Plans cover the broadest scope at approximately 66% maximum discount. EC2 Instance Savings Plans offer up to 72% but are limited to a specific instance family in a region — similar flexibility to Convertible RIs, marginally better rate.

The rule I apply: Savings Plans as the default; Standard RIs only for workloads that have been stable in production for 12+ consecutive months with no planned architecture changes in the next 36 months. The practical candidates for Standard RIs are production databases (RDS Multi-AZ, ElastiCache), core infrastructure services, and workloads explicitly frozen from architectural change.

The mistake I see most often: buying 3-year Standard RIs on application servers while a containerisation or microservices project is in progress. I have reviewed environments with $200K+ in orphaned EC2 reservations from applications that moved to ECS Fargate 18 months into a 3-year term. Savings Plans would have covered both the old and new deployment model.

Coverage target: 70–80% of baseline compute spend under RI or Savings Plan. Reserve the remaining 20–30% as On-Demand to absorb growth and temporary spikes without over-committing.

Storage Tiering: Where the Bill Silently Grows

S3 is the most underoptimised cost line in most environments because it is invisible. It grows continuously, nobody reviews it, and the cost appears as a single line item on the monthly bill without per-bucket breakdown unless you have configured Cost Allocation Tags.

S3 Intelligent-Tiering is frequently applied indiscriminately. The monitoring fee is $0.0025 per 1,000 objects per month. For objects smaller than 128 KB, that monitoring cost equals or exceeds the potential storage savings — Intelligent-Tiering is not cost-effective on small objects. Apply it selectively: buckets containing objects consistently larger than 128 KB with mixed or unpredictable access patterns. Backup sets, large log archives, generated reports, media files. Configure the Archive and Deep Archive access tiers if your retrieval latency SLA permits.

For predictable cold storage — objects you know will not be accessed for 90+ days — S3 Glacier Instant Retrieval at $0.004/GB/month is cheaper than Intelligent-Tiering's Frequent Access tier and retrieval is measured in milliseconds. For compliance archives where a 12-hour retrieval window is acceptable, Glacier Deep Archive at $0.00099/GB/month is the cheapest durable storage AWS offers.

EBS gp2 → gp3 is the most consistently cost-positive change available without architectural work. gp3 provides a 3,000 IOPS baseline at no additional cost. gp2 provides 3 IOPS/GB — a 100 GB gp2 volume delivers 300 IOPS baseline. gp3 is 20% cheaper than gp2 at equivalent storage size. If your workload does not require more than 3,000 IOPS — which covers the majority of application volumes — gp2→gp3 is a cost reduction with a simultaneous performance improvement. Script the migration via AWS CLI and run it in your next maintenance window.

Lifecycle policies are non-negotiable for any bucket used for logs, backups, or generated reports. Transition to IA after 30 days of no access, Glacier after 90, delete after 365 — or whatever your compliance requirements allow. Without lifecycle policies, these buckets accumulate indefinitely.

Architectural Patterns That Eliminate Cost at the Source

These changes take longer to implement but generate structural, sustained savings rather than one-time reductions.

SQS buffering for write-heavy workloads. If you are provisioning compute for peak concurrent write load, you are paying for the spike all the time. Decoupling write operations through an SQS queue and processing asynchronously lets you right-size to average throughput rather than peak. A workload that needs a r5.4xlarge to handle 500 concurrent synchronous writes can often run on a c6g.2xlarge with SQS-backed async processing, at 40–50% lower instance cost. This pattern also improves resilience — SQS absorbs burst traffic that would otherwise drop requests. For more on how this fits into a broader cloud architecture, see our Cloud Architecture & Zero Trust service page.

Spot instances for async workloads. Any workload that can be retried belongs on Spot: batch ETL jobs, ML training, video processing, report generation, CI/CD pipeline runners, dev/test environments. Spot pricing is 60–90% below On-Demand depending on instance type and region. Managing interruption requires three things: a Spot Fleet policy across 3+ instance types and diversified Availability Zones, checkpoint-based state persistence writing to S3 at regular intervals, and graceful handling of the 2-minute EC2 interruption notice via instance metadata or CloudWatch Events. For jobs that checkpoint every 5 minutes, worst-case rework on interruption is one checkpoint window.

Lambda for low-frequency workloads. Any workload running fewer than approximately 20 hours per month is cheaper on Lambda than on the smallest always-on EC2 instance. A t3.nano at $0.0052/hour costs $3.74/month of continuous runtime. Lambda's 400,000 GB-seconds free tier covers substantial invocation volume before billing begins. Scheduled jobs, webhook handlers, health check endpoints, lightweight ETL triggers, cron-based automation — these do not need dedicated compute.

What Not to Cut

Cost reduction exercises regularly surface recommendations to turn off monitoring, reduce redundancy, or disable security controls. These should be refused outright.

GuardDuty, CloudTrail, Security Hub. An undetected IAM compromise running cryptomining can generate more compute spend in a weekend than a year of monitoring costs. GuardDuty at scale costs $2–5/account/month. Disabling it to save $30/month on a 500-instance environment is a risk trade-off that no informed engineer would accept. If monitoring costs are significant, the problem is configuration scope — reduce log retention period or move to cost-effective storage tiers, not disable the controls.

Multi-AZ on production databases. The cost is 2× single-AZ. The protection is the difference between a 30-second automated failover and a 45-minute manual recovery with potential data loss. Single-AZ databases in production are not cost-optimised — they are unbudgeted risk.

KMS and TLS. KMS costs $1/key/month plus $0.03 per 10,000 API calls. The exposure from unencrypted data at rest — compliance incidents, breach notification requirements, remediation costs — is not comparable. These are fixed operating costs of secure infrastructure, not discretionary line items.

Observability. CloudWatch alarms, metric streams, and dashboards cost money. They also tell you when something is wrong before customers do. Cost-optimised infrastructure without observability is expensive infrastructure waiting for an undetected failure.

If you are unsure where your environment stands across these dimensions, the SCAI Cloud Health Check delivers a detailed eight-pillar audit — covering FinOps, security posture, IAM hygiene, and more — with a prioritised report you can act on the same week. If you want to talk through a specific cost challenge first, the contact page is the fastest route to a direct conversation.

FAQ

How quickly can I see savings after implementing these changes?

EBS gp2→gp3 migration and S3 lifecycle policy changes take effect within hours of deployment. Compute rightsizing savings appear in the next billing cycle. Reserved Instance and Savings Plan discounts begin immediately on purchase. In AWS environments I audit — typically 50–500 instances without a dedicated FinOps function — the median reduction is 25–40% of the monthly bill within 90 days, with the first measurable improvement visible in 30 days.

Is it safe to run production workloads on Spot instances?

Spot is appropriate for workloads that tolerate interruption: batch processing, async pipelines, CI/CD runners, ML training, and dev/test environments. Core production API servers and databases should remain on On-Demand or Reserved. The risk management mechanism is checkpoint-based state persistence — write job state to S3 every 5 minutes — combined with Spot Fleet policies across 3+ instance types and multiple Availability Zones. AWS provides a 2-minute interruption notice; with checkpointing, worst-case rework is one checkpoint window.

How do I know if S3 Intelligent-Tiering is worth it for my bucket?

Intelligent-Tiering adds a monitoring fee of $0.0025 per 1,000 objects per month. For objects smaller than 128 KB, that monitoring cost equals or exceeds the storage savings — skip it for small objects. Apply it to buckets containing objects larger than 128 KB with mixed or unpredictable access patterns: backup archives, generated reports, large log files, media. For objects with predictable cold access after 90 days, S3 Glacier Instant Retrieval delivers better economics with no monitoring overhead.

When should I choose Reserved Instances over a Savings Plan?

Use Reserved Instances when a workload has been stable in production for 12+ months with no planned architectural changes in the next 36 months — core databases, infrastructure tooling, consistently-sized API servers. Use Savings Plans as the default for everything else: they apply across instance families, sizes, regions, and even Lambda and Fargate. The mistake I see most often is buying 3-year RIs on workloads that get containerised or migrated 18 months in, leaving orphaned reservations still billing.

What should never be cut during a cost reduction exercise?

CloudTrail, GuardDuty, and Security Hub. An undetected IAM compromise running cryptomining can generate more charges in a weekend than a year of monitoring costs. Multi-AZ on production databases — the cost is 2× single-AZ; the alternative is a 45-minute recovery with potential data loss. KMS encryption ($1/key/month) and TLS in transit. Monitoring and alerting. These are not cost line items — they are the controls that prevent a $30/month saving from becoming a $30,000 incident.