How We Cut AWS Costs by 85%: From $487 to $73/Month
This isn't another "we saved money on AWS" fluff piece. This is the exact playbook we used to cut a healthcare platform's AWS bill from $487 to $73/month - including the mistakes we made and the code we wrote.
📊 The Starting Point: A Typical Over-Provisioned Setup
❌ Before: $487/month
- 2x m5.large EC2 instances ($146.40)
- Application Load Balancer ($22.50)
- RDS MySQL t3.medium ($89.28)
- 200 GB EBS storage ($20.00)
- NAT Gateway ($45.00)
- Data transfer (~$45.00)
- CloudWatch, backups, etc. ($118.82)
✅ After: $73/month
- Lambda functions ($42.00)
- API Gateway (included)
- DynamoDB on-demand ($25.00)
- S3 storage ($2.30)
- CloudFront CDN ($1.20)
- CloudWatch Logs ($0.50)
- Minimal data transfer ($2.00)
🎯 Step 1: The Discovery Phase (Week 1)
We started by installing CloudWatch detailed monitoring and actually looking at the data. The results were shocking:
What We Found:
- 📊 Average CPU usage: 8% (paying for 92% idle time)
- 💾 Memory usage: 2.1 GB of 8 GB available
- 🌐 Traffic pattern: 80% of requests between 9 AM - 5 PM
- 💤 Night usage: < 100 requests/hour (still paying full price)
- 🗄️ Database: 43 GB used of 100 GB allocated
- ⚡ Response time: 847ms average (mostly database queries)
🚨 Mistake #1: Not Checking Actual Usage First
The client had been running oversized EC2 instances for 2 years based on "expected growth" that never materialized. They were literally paying $350/month for unused capacity.
📐 Step 2: Architecture Redesign (Week 2)
Day 1-2: Mapped All Dependencies
Created a complete diagram of every service, API endpoint, database table, and external integration. Found 23 endpoints, but only 5 were used daily.
Day 3-4: Designed Serverless Architecture
Planned migration to Lambda + API Gateway + DynamoDB. Identified which endpoints could be migrated first (stateless ones).
Day 5-7: Proof of Concept
Migrated the healthcheck endpoint to Lambda. It worked perfectly and cost $0.0000002 per request vs $0.00014 on EC2.
💻 Step 3: The Actual Migration (Weeks 3-4)
Phase 1: API Migration to Lambda
❌ Before: Express.js on EC2
// app.js - Running 24/7 on EC2
const express = require('express');
const mysql = require('mysql2');
const app = express();
const db = mysql.createConnection({
host: 'rds-instance.aws.com',
user: 'admin',
password: process.env.DB_PASS,
database: 'healthapp'
});
app.get('/api/patients/:id', (req, res) => {
db.query(
'SELECT * FROM patients WHERE id = ?',
[req.params.id],
(err, results) => {
if (err) return res.status(500).json({err});
res.json(results[0]);
}
);
});
app.listen(3000); // Running forever
✅ After: Lambda Function
// getPatient.js - Only runs when needed
const AWS = require('aws-sdk');
const dynamodb = new AWS.DynamoDB.DocumentClient();
exports.handler = async (event) => {
const { id } = event.pathParameters;
try {
const result = await dynamodb.get({
TableName: 'patients',
Key: { patientId: id }
}).promise();
return {
statusCode: 200,
headers: {
'Access-Control-Allow-Origin': '*'
},
body: JSON.stringify(result.Item)
};
} catch (error) {
return {
statusCode: 500,
body: JSON.stringify({ error: error.message })
};
}
};
Phase 2: Database Migration (The Tricky Part)
Moving from RDS MySQL to DynamoDB was the scariest part. Here's exactly how we did it:
Step 1: Analyzed Query Patterns
# Found that 94% of queries were simple key-value lookups
SELECT * FROM patients WHERE patient_id = ? -- 67%
SELECT * FROM appointments WHERE date = ? -- 18%
SELECT * FROM medications WHERE patient_id = ? -- 9%
Complex JOINs -- 6%
Step 2: Designed DynamoDB Schema
{
"TableName": "healthcare-app",
"PartitionKey": "PK", // PATIENT#12345
"SortKey": "SK", // PROFILE or APPT#2024-01-15
"GSI1": {
"PartitionKey": "GSI1PK", // DATE#2024-01-15
"SortKey": "GSI1SK" // APPT#12345
}
}
Step 3: Dual-Write Strategy
For 2 weeks, we wrote to both MySQL and DynamoDB, but only read from MySQL. This let us verify data integrity with zero risk.
Step 4: Gradual Cutover
Switched reads to DynamoDB one endpoint at a time. Started with low-traffic endpoints, monitored for 24 hours, then proceeded.
💡 Key Learning: Single-Table Design
Instead of 12 MySQL tables, we used ONE DynamoDB table with composite keys. This reduced costs by 60% and actually improved query performance.
🚀 Step 4: Performance Optimization (Week 5)
The Unexpected Performance Gains
Before Performance
- API Response: 847ms average
- Cold starts: N/A (always running)
- Scaling: Manual (scary)
- Availability: 99.5% (2 outages/year)
After Performance
- API Response: 124ms average
- Cold starts: 1.2s (< 1% of requests)
- Scaling: Automatic (infinite)
- Availability: 99.99% (AWS managed)
Optimization Techniques We Used:
- Lambda Memory Tuning: Started at 128MB, tested up to 3GB. Found sweet spot at 512MB (best cost/performance).
- Connection Pooling: Reused DynamoDB connections across invocations.
- CloudFront Caching: Cached API responses for 60 seconds (reduced Lambda invocations by 40%).
- Provisioned Concurrency: Set 2 warm instances for morning traffic spike.
💰 Step 5: Cost Optimization Tricks (Week 6)
Advanced Cost Savings We Implemented:
1. S3 Intelligent-Tiering
aws s3api put-bucket-lifecycle-configuration \
--bucket healthcare-uploads \
--lifecycle-configuration file://lifecycle.json
# Saved $18/month on old patient files
2. DynamoDB On-Demand vs Provisioned
Started with on-demand ($25/month). Would need 50% more traffic to justify provisioned capacity.
3. CloudWatch Logs Retention
aws logs put-retention-policy \
--log-group-name /aws/lambda/patient-api \
--retention-in-days 7
# Saved $12/month on log storage
4. Lambda ARM Architecture
Switched to Graviton2 (arm64) - 20% cheaper, 19% faster!
🚨 The Mistakes We Made (So You Don't Have To)
Mistake #2: Forgetting About VPC
Initially put Lambda in VPC for "security". This added NAT Gateway ($45/month) and cold starts (3+ seconds). Removed VPC, used IAM roles instead. Saved $45/month and 2.8 seconds per cold start.
Mistake #3: Over-Engineering Logging
Set up elaborate X-Ray tracing and custom metrics. Cost: $38/month. Actual value: minimal. Removed it, relied on basic CloudWatch Logs.
Mistake #4: Not Setting DLQs
Lambda function had infinite retries on one endpoint. One bad request triggered 10,000 invocations. Cost: $2. Could have been $200 if not caught quickly.
📈 The Results: 6 Months Later
Unexpected Benefits:
- ✅ Developer productivity: Deployments went from 30 minutes to 30 seconds
- ✅ No more maintenance windows: Zero-downtime deployments
- ✅ Better monitoring: Lambda's built-in metrics are superior
- ✅ Compliance: Easier HIPAA compliance with serverless
- ✅ Team morale: Developers love not managing servers
📋 Your 30-Day Migration Checklist
Week 1: Analysis
- ☐ Enable CloudWatch detailed monitoring
- ☐ Document actual CPU/memory usage
- ☐ Map all endpoints and dependencies
- ☐ Calculate per-request costs
Week 2: Planning
- ☐ Design serverless architecture
- ☐ Choose migration order (easiest first)
- ☐ Set up development environment
- ☐ Create rollback plan
Week 3-4: Migration
- ☐ Migrate stateless endpoints first
- ☐ Implement dual-write for databases
- ☐ Run parallel systems for validation
- ☐ Monitor costs daily
Week 5: Optimization
- ☐ Tune Lambda memory settings
- ☐ Implement caching strategies
- ☐ Remove unnecessary services
- ☐ Celebrate massive savings! 🎉
🔑 Key Takeaways
The 80/20 Rule of AWS Savings
- 80% of savings came from eliminating idle EC2 capacity
- 15% of savings from right-sizing resources
- 5% of savings from advanced optimizations
Focus on the big wins first. You can optimize forever, but the first 20% of effort yields 80% of savings.
🚀 Ready to Cut Your AWS Costs?
Get Your Free AWS Cost Audit
We'll analyze your AWS infrastructure and show you exactly where you're overspending. Average client savings: $4,200/month.
- ✅ Complete cost breakdown analysis
- ✅ Serverless migration feasibility assessment
- ✅ Custom optimization roadmap
- ✅ ROI calculations and timeline
❓ FAQ
Q: How long did the migration really take?
6 weeks total: 2 weeks planning, 3 weeks migration, 1 week optimization. Could have been 4 weeks if we hadn't made the VPC mistake.
Q: What about vendor lock-in?
Yes, we're more locked into AWS now. But we're saving $4,968/year. That buys a lot of migration budget if needed.
Q: Did anything break during migration?
One pagination endpoint had a bug for 2 hours. That's it. The dual-write strategy prevented data issues.
Q: Is serverless always cheaper?
No. If CPU usage is consistently > 60%, EC2 might be cheaper. But that's rare - we've seen it in only 15% of applications.
Q: What about cold starts?
They affect < 1% of requests and add 1.2 seconds. For a 85% cost savings, the client gladly accepted this tradeoff.