DevDocsDev Docs
Lambda

Lambda Concurrency

Understanding and configuring Lambda concurrency, scaling, and cold starts

Lambda concurrency controls how many function instances can run simultaneously. Understanding concurrency is essential for building scalable, reliable serverless applications.

Concurrency Fundamentals

What is Concurrency?

Concurrency is the number of requests your function can handle at the same time. Each concurrent execution requires a separate function instance.

TermDefinition
Concurrent ExecutionA function instance actively processing a request
Account Concurrency LimitTotal concurrent executions across all functions (default 1,000)
Reserved ConcurrencyConcurrency dedicated to a specific function
Provisioned ConcurrencyPre-initialized execution environments

Account-Level Limits

Get account concurrency limit
aws lambda get-account-settings
Example output
{
  "AccountLimit": {
    "TotalCodeSize": 80530636800,
    "CodeSizeUnzipped": 262144000,
    "ConcurrentExecutions": 1000,
    "UnreservedConcurrentExecutions": 900
  },
  "AccountUsage": {
    "TotalCodeSize": 52093696,
    "FunctionCount": 15
  }
}

The default account limit is 1,000 concurrent executions. Request an increase through Service Quotas if needed.

Reserved Concurrency

Reserve a portion of account concurrency for critical functions:

Set reserved concurrency
aws lambda put-function-concurrency \
  --function-name my-critical-function \
  --reserved-concurrent-executions 100
Remove reserved concurrency
aws lambda delete-function-concurrency \
  --function-name my-critical-function

Benefits of Reserved Concurrency

Guaranteed Capacity

Your function always has capacity available, regardless of what other functions are doing.

Rate Limiting

Prevent a single function from consuming all account concurrency and throttling others.

Protection

Protect downstream resources (databases, APIs) from being overwhelmed.

Example: Database Protection

Limit database connections
# Database has max 100 connections
# Reserve concurrency to prevent overwhelming it
aws lambda put-function-concurrency \
  --function-name db-writer \
  --reserved-concurrent-executions 50

Setting reserved concurrency to 0 effectively disables a function—useful for emergency stops.

Provisioned Concurrency

Pre-initialize execution environments to eliminate cold starts:

Configure provisioned concurrency
aws lambda put-provisioned-concurrency-config \
  --function-name my-function \
  --qualifier prod \
  --provisioned-concurrent-executions 50
Get provisioned concurrency status
aws lambda get-provisioned-concurrency-config \
  --function-name my-function \
  --qualifier prod
Delete provisioned concurrency
aws lambda delete-provisioned-concurrency-config \
  --function-name my-function \
  --qualifier prod

When to Use Provisioned Concurrency

Use CaseRecommendation
Latency-sensitive APIs✅ Use provisioned
Scheduled tasks❌ Not needed
High-volume event processingConsider based on latency needs
Development/testing❌ Not needed

Provisioned concurrency is billed even when not in use. Use it strategically for latency-critical paths.

Auto Scaling Provisioned Concurrency

Scale provisioned concurrency based on utilization:

Register scalable target
aws application-autoscaling register-scalable-target \
  --service-namespace lambda \
  --resource-id function:my-function:prod \
  --scalable-dimension lambda:function:ProvisionedConcurrency \
  --min-capacity 10 \
  --max-capacity 100
Create scaling policy
aws application-autoscaling put-scaling-policy \
  --service-namespace lambda \
  --resource-id function:my-function:prod \
  --scalable-dimension lambda:function:ProvisionedConcurrency \
  --policy-name utilization-scaling \
  --policy-type TargetTrackingScaling \
  --target-tracking-scaling-policy-configuration '{
    "TargetValue": 70.0,
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "LambdaProvisionedConcurrencyUtilization"
    },
    "ScaleInCooldown": 60,
    "ScaleOutCooldown": 0
  }'

Scheduled Scaling

Scale provisioned concurrency based on a schedule:

Scheduled scaling action
aws application-autoscaling put-scheduled-action \
  --service-namespace lambda \
  --resource-id function:my-function:prod \
  --scalable-dimension lambda:function:ProvisionedConcurrency \
  --scheduled-action-name peak-hours \
  --schedule "cron(0 8 * * ? *)" \
  --scalable-target-action MinCapacity=50,MaxCapacity=200

Cold Starts

Understanding Cold Starts

A cold start occurs when Lambda creates a new execution environment. This includes downloading your code, creating the container, and initializing the runtime.

Cold Start Duration by Runtime

RuntimeTypical Cold Start
Python100-200ms
Node.js100-200ms
Go50-100ms
Java500ms-5s
.NET200-500ms

Factors Affecting Cold Starts

FactorImpact
Package sizeLarger = slower
VPC configurationAdds 1-2 seconds
Memory allocationMore memory = faster (more CPU)
DependenciesMore = slower initialization
RuntimeInterpreted vs compiled

Minimizing Cold Starts

Optimize Package Size

Keep deployment packages small. Use layers for dependencies.

Check package size
ls -lh function.zip

Increase Memory

More memory = more CPU = faster initialization.

Increase memory
aws lambda update-function-configuration \
  --function-name my-function \
  --memory-size 1024

Use Provisioned Concurrency

For latency-critical functions:

Add provisioned concurrency
aws lambda put-provisioned-concurrency-config \
  --function-name my-function \
  --qualifier prod \
  --provisioned-concurrent-executions 10

Initialize Outside Handler

Move expensive initialization outside the handler:

Optimized initialization
// Initialize OUTSIDE handler (runs once per container)
import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
const client = new DynamoDBClient({});

// Handler (runs on every invocation)
export const handler = async (event) => {
  // Use the pre-initialized client
  return await client.send(command);
};

SnapStart (Java)

For Java functions, SnapStart dramatically reduces cold starts:

Enable SnapStart
aws lambda update-function-configuration \
  --function-name my-java-function \
  --snap-start ApplyOn=PublishedVersions

SnapStart creates a snapshot of the initialized execution environment. Cold starts drop from seconds to under 200ms.

Throttling

When requests exceed concurrency limits, Lambda throttles:

Invocation TypeThrottle Behavior
SynchronousReturns 429 error
AsynchronousRetries automatically
Event SourceVaries by source

Handling Throttling

Client-side retry
import { InvokeCommand, LambdaClient } from "@aws-sdk/client-lambda";

const invokeWithRetry = async (payload, maxRetries = 3) => {
  const client = new LambdaClient({});
  
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await client.send(new InvokeCommand({
        FunctionName: 'my-function',
        Payload: JSON.stringify(payload)
      }));
    } catch (error) {
      if (error.name === 'TooManyRequestsException' && attempt < maxRetries - 1) {
        // Exponential backoff
        await new Promise(r => setTimeout(r, Math.pow(2, attempt) * 100));
        continue;
      }
      throw error;
    }
  }
};

Async invocations automatically retry twice:

Check retry count
export const handler = async (event, context) => {
  const retryCount = context.clientContext?.retryAttempts || 0;
  
  if (retryCount > 0) {
    console.log(`Retry attempt: ${retryCount}`);
  }
  
  // Process event
};

Configure retry behavior:

Configure async retry
aws lambda put-function-event-invoke-config \
  --function-name my-function \
  --maximum-retry-attempts 1 \
  --maximum-event-age-in-seconds 3600

For SQS, configure ScalingConfig:

SQS concurrency limit
aws lambda update-event-source-mapping \
  --uuid abc123 \
  --scaling-config MaximumConcurrency=10

For Kinesis/DynamoDB, use ParallelizationFactor:

Kinesis parallelization
aws lambda update-event-source-mapping \
  --uuid xyz789 \
  --parallelization-factor 2

Burst Concurrency

Lambda can burst to high concurrency for sudden traffic spikes:

RegionInitial Burst
US East/West, EU (Ireland)3,000
Other regions500-1,000

After the initial burst, concurrency can increase by 500 additional instances per minute.

Burst limits are shared across all functions in an account. A sudden spike from one function affects others.

Concurrency Monitoring

CloudWatch Metrics

Get concurrent executions
aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name ConcurrentExecutions \
  --dimensions Name=FunctionName,Value=my-function \
  --start-time 2024-01-01T00:00:00Z \
  --end-time 2024-01-01T23:59:59Z \
  --period 300 \
  --statistics Maximum

Key metrics:

  • ConcurrentExecutions: Current concurrent executions
  • ProvisionedConcurrentExecutions: Provisioned capacity
  • ProvisionedConcurrencyUtilization: Usage percentage
  • UnreservedConcurrentExecutions: Non-reserved usage
  • Throttles: Number of throttled requests

Set Up Alarms

Throttle alarm
aws cloudwatch put-metric-alarm \
  --alarm-name "LambdaThrottles" \
  --metric-name Throttles \
  --namespace AWS/Lambda \
  --dimensions Name=FunctionName,Value=my-function \
  --statistic Sum \
  --period 60 \
  --threshold 1 \
  --comparison-operator GreaterThanOrEqualToThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:alerts

Concurrency Patterns

Pattern 1: Shared Pool

Default behavior—all functions share the account limit:

Account Limit: 1,000
├── Function A: up to 1,000 (unreserved)
├── Function B: up to 1,000 (unreserved)
└── Function C: up to 1,000 (unreserved)

Pattern 2: Reserved + Unreserved

Mix reserved and unreserved for critical functions:

Account Limit: 1,000
├── Function A: 200 (reserved) → guaranteed
├── Function B: 100 (reserved) → guaranteed
└── All others: 700 (unreserved) → shared pool

Pattern 3: Full Isolation

Reserve concurrency for all functions:

Account Limit: 1,000
├── Function A: 200 (reserved)
├── Function B: 300 (reserved)
├── Function C: 100 (reserved)
└── Remaining: 400 (unreserved for new functions)

Concurrency Calculator

Estimate required concurrency:

Required Concurrency = (Requests/Second) × (Average Duration in Seconds)

Example:
- 100 requests/second
- 200ms average duration

Concurrency = 100 × 0.2 = 20 concurrent executions

For spiky workloads, add headroom:

Recommended = Required × 1.5 (50% buffer)

Best Practices

Concurrency Best Practices

  1. Reserve for critical functions - Guarantee capacity for important workloads
  2. Limit database-connected functions - Prevent connection pool exhaustion
  3. Use provisioned concurrency sparingly - Only for latency-sensitive paths
  4. Monitor throttles - Set up alarms for throttling events
  5. Request limit increases early - Service Quotas requests take time
  6. Test at scale - Validate behavior under expected load
  7. Use scaling configurations - Set SQS MaximumConcurrency appropriately
  8. Implement retries - Handle throttling gracefully in clients

Troubleshooting

IssueCauseSolution
Constant throttlingHit account limitRequest limit increase
Intermittent throttlingBurst behaviorAdd reserved concurrency
Cold starts on APINo warm instancesAdd provisioned concurrency
Database overwhelmedToo many connectionsReduce function concurrency
Slow scalingBurst limit hitPre-warm with provisioned

Next Steps

On this page