Lambda Concurrency
Understanding and configuring Lambda concurrency, scaling, and cold starts
Lambda concurrency controls how many function instances can run simultaneously. Understanding concurrency is essential for building scalable, reliable serverless applications.
Concurrency Fundamentals
What is Concurrency?
Concurrency is the number of requests your function can handle at the same time. Each concurrent execution requires a separate function instance.
| Term | Definition |
|---|---|
| Concurrent Execution | A function instance actively processing a request |
| Account Concurrency Limit | Total concurrent executions across all functions (default 1,000) |
| Reserved Concurrency | Concurrency dedicated to a specific function |
| Provisioned Concurrency | Pre-initialized execution environments |
Account-Level Limits
aws lambda get-account-settings{
"AccountLimit": {
"TotalCodeSize": 80530636800,
"CodeSizeUnzipped": 262144000,
"ConcurrentExecutions": 1000,
"UnreservedConcurrentExecutions": 900
},
"AccountUsage": {
"TotalCodeSize": 52093696,
"FunctionCount": 15
}
}The default account limit is 1,000 concurrent executions. Request an increase through Service Quotas if needed.
Reserved Concurrency
Reserve a portion of account concurrency for critical functions:
aws lambda put-function-concurrency \
--function-name my-critical-function \
--reserved-concurrent-executions 100aws lambda delete-function-concurrency \
--function-name my-critical-functionBenefits of Reserved Concurrency
Guaranteed Capacity
Your function always has capacity available, regardless of what other functions are doing.
Rate Limiting
Prevent a single function from consuming all account concurrency and throttling others.
Protection
Protect downstream resources (databases, APIs) from being overwhelmed.
Example: Database Protection
# Database has max 100 connections
# Reserve concurrency to prevent overwhelming it
aws lambda put-function-concurrency \
--function-name db-writer \
--reserved-concurrent-executions 50Setting reserved concurrency to 0 effectively disables a function—useful for emergency stops.
Provisioned Concurrency
Pre-initialize execution environments to eliminate cold starts:
aws lambda put-provisioned-concurrency-config \
--function-name my-function \
--qualifier prod \
--provisioned-concurrent-executions 50aws lambda get-provisioned-concurrency-config \
--function-name my-function \
--qualifier prodaws lambda delete-provisioned-concurrency-config \
--function-name my-function \
--qualifier prodWhen to Use Provisioned Concurrency
| Use Case | Recommendation |
|---|---|
| Latency-sensitive APIs | ✅ Use provisioned |
| Scheduled tasks | ❌ Not needed |
| High-volume event processing | Consider based on latency needs |
| Development/testing | ❌ Not needed |
Provisioned concurrency is billed even when not in use. Use it strategically for latency-critical paths.
Auto Scaling Provisioned Concurrency
Scale provisioned concurrency based on utilization:
aws application-autoscaling register-scalable-target \
--service-namespace lambda \
--resource-id function:my-function:prod \
--scalable-dimension lambda:function:ProvisionedConcurrency \
--min-capacity 10 \
--max-capacity 100aws application-autoscaling put-scaling-policy \
--service-namespace lambda \
--resource-id function:my-function:prod \
--scalable-dimension lambda:function:ProvisionedConcurrency \
--policy-name utilization-scaling \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{
"TargetValue": 70.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "LambdaProvisionedConcurrencyUtilization"
},
"ScaleInCooldown": 60,
"ScaleOutCooldown": 0
}'Scheduled Scaling
Scale provisioned concurrency based on a schedule:
aws application-autoscaling put-scheduled-action \
--service-namespace lambda \
--resource-id function:my-function:prod \
--scalable-dimension lambda:function:ProvisionedConcurrency \
--scheduled-action-name peak-hours \
--schedule "cron(0 8 * * ? *)" \
--scalable-target-action MinCapacity=50,MaxCapacity=200Cold Starts
Understanding Cold Starts
A cold start occurs when Lambda creates a new execution environment. This includes downloading your code, creating the container, and initializing the runtime.
Cold Start Duration by Runtime
| Runtime | Typical Cold Start |
|---|---|
| Python | 100-200ms |
| Node.js | 100-200ms |
| Go | 50-100ms |
| Java | 500ms-5s |
| .NET | 200-500ms |
Factors Affecting Cold Starts
| Factor | Impact |
|---|---|
| Package size | Larger = slower |
| VPC configuration | Adds 1-2 seconds |
| Memory allocation | More memory = faster (more CPU) |
| Dependencies | More = slower initialization |
| Runtime | Interpreted vs compiled |
Minimizing Cold Starts
Optimize Package Size
Keep deployment packages small. Use layers for dependencies.
ls -lh function.zipIncrease Memory
More memory = more CPU = faster initialization.
aws lambda update-function-configuration \
--function-name my-function \
--memory-size 1024Use Provisioned Concurrency
For latency-critical functions:
aws lambda put-provisioned-concurrency-config \
--function-name my-function \
--qualifier prod \
--provisioned-concurrent-executions 10Initialize Outside Handler
Move expensive initialization outside the handler:
// Initialize OUTSIDE handler (runs once per container)
import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
const client = new DynamoDBClient({});
// Handler (runs on every invocation)
export const handler = async (event) => {
// Use the pre-initialized client
return await client.send(command);
};SnapStart (Java)
For Java functions, SnapStart dramatically reduces cold starts:
aws lambda update-function-configuration \
--function-name my-java-function \
--snap-start ApplyOn=PublishedVersionsSnapStart creates a snapshot of the initialized execution environment. Cold starts drop from seconds to under 200ms.
Throttling
When requests exceed concurrency limits, Lambda throttles:
| Invocation Type | Throttle Behavior |
|---|---|
| Synchronous | Returns 429 error |
| Asynchronous | Retries automatically |
| Event Source | Varies by source |
Handling Throttling
import { InvokeCommand, LambdaClient } from "@aws-sdk/client-lambda";
const invokeWithRetry = async (payload, maxRetries = 3) => {
const client = new LambdaClient({});
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await client.send(new InvokeCommand({
FunctionName: 'my-function',
Payload: JSON.stringify(payload)
}));
} catch (error) {
if (error.name === 'TooManyRequestsException' && attempt < maxRetries - 1) {
// Exponential backoff
await new Promise(r => setTimeout(r, Math.pow(2, attempt) * 100));
continue;
}
throw error;
}
}
};Async invocations automatically retry twice:
export const handler = async (event, context) => {
const retryCount = context.clientContext?.retryAttempts || 0;
if (retryCount > 0) {
console.log(`Retry attempt: ${retryCount}`);
}
// Process event
};Configure retry behavior:
aws lambda put-function-event-invoke-config \
--function-name my-function \
--maximum-retry-attempts 1 \
--maximum-event-age-in-seconds 3600For SQS, configure ScalingConfig:
aws lambda update-event-source-mapping \
--uuid abc123 \
--scaling-config MaximumConcurrency=10For Kinesis/DynamoDB, use ParallelizationFactor:
aws lambda update-event-source-mapping \
--uuid xyz789 \
--parallelization-factor 2Burst Concurrency
Lambda can burst to high concurrency for sudden traffic spikes:
| Region | Initial Burst |
|---|---|
| US East/West, EU (Ireland) | 3,000 |
| Other regions | 500-1,000 |
After the initial burst, concurrency can increase by 500 additional instances per minute.
Burst limits are shared across all functions in an account. A sudden spike from one function affects others.
Concurrency Monitoring
CloudWatch Metrics
aws cloudwatch get-metric-statistics \
--namespace AWS/Lambda \
--metric-name ConcurrentExecutions \
--dimensions Name=FunctionName,Value=my-function \
--start-time 2024-01-01T00:00:00Z \
--end-time 2024-01-01T23:59:59Z \
--period 300 \
--statistics MaximumKey metrics:
- ConcurrentExecutions: Current concurrent executions
- ProvisionedConcurrentExecutions: Provisioned capacity
- ProvisionedConcurrencyUtilization: Usage percentage
- UnreservedConcurrentExecutions: Non-reserved usage
- Throttles: Number of throttled requests
Set Up Alarms
aws cloudwatch put-metric-alarm \
--alarm-name "LambdaThrottles" \
--metric-name Throttles \
--namespace AWS/Lambda \
--dimensions Name=FunctionName,Value=my-function \
--statistic Sum \
--period 60 \
--threshold 1 \
--comparison-operator GreaterThanOrEqualToThreshold \
--evaluation-periods 1 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:alertsConcurrency Patterns
Pattern 1: Shared Pool
Default behavior—all functions share the account limit:
Account Limit: 1,000
├── Function A: up to 1,000 (unreserved)
├── Function B: up to 1,000 (unreserved)
└── Function C: up to 1,000 (unreserved)Pattern 2: Reserved + Unreserved
Mix reserved and unreserved for critical functions:
Account Limit: 1,000
├── Function A: 200 (reserved) → guaranteed
├── Function B: 100 (reserved) → guaranteed
└── All others: 700 (unreserved) → shared poolPattern 3: Full Isolation
Reserve concurrency for all functions:
Account Limit: 1,000
├── Function A: 200 (reserved)
├── Function B: 300 (reserved)
├── Function C: 100 (reserved)
└── Remaining: 400 (unreserved for new functions)Concurrency Calculator
Estimate required concurrency:
Required Concurrency = (Requests/Second) × (Average Duration in Seconds)
Example:
- 100 requests/second
- 200ms average duration
Concurrency = 100 × 0.2 = 20 concurrent executionsFor spiky workloads, add headroom:
Recommended = Required × 1.5 (50% buffer)Best Practices
Concurrency Best Practices
- Reserve for critical functions - Guarantee capacity for important workloads
- Limit database-connected functions - Prevent connection pool exhaustion
- Use provisioned concurrency sparingly - Only for latency-sensitive paths
- Monitor throttles - Set up alarms for throttling events
- Request limit increases early - Service Quotas requests take time
- Test at scale - Validate behavior under expected load
- Use scaling configurations - Set SQS MaximumConcurrency appropriately
- Implement retries - Handle throttling gracefully in clients
Troubleshooting
| Issue | Cause | Solution |
|---|---|---|
| Constant throttling | Hit account limit | Request limit increase |
| Intermittent throttling | Burst behavior | Add reserved concurrency |
| Cold starts on API | No warm instances | Add provisioned concurrency |
| Database overwhelmed | Too many connections | Reduce function concurrency |
| Slow scaling | Burst limit hit | Pre-warm with provisioned |