DevDocsDev Docs
Lambda

Lambda Monitoring

Complete guide to Lambda observability with CloudWatch, X-Ray, and best practices

Effective monitoring is essential for maintaining healthy Lambda functions. This guide covers logs, metrics, tracing, and alerting strategies.

Monitoring Overview

The Three Pillars

  • Logs: Detailed function output and errors
  • Metrics: Quantitative performance data
  • Traces: Request flow across services

CloudWatch Logs

Lambda automatically sends logs to CloudWatch:

Log Groups

Each function has a log group: /aws/lambda/function-name

View log groups
aws logs describe-log-groups \
  --log-group-name-prefix /aws/lambda/

# Get recent logs
aws logs tail /aws/lambda/my-function --follow

Log Streams

Each function instance creates a log stream:

List log streams
aws logs describe-log-streams \
  --log-group-name /aws/lambda/my-function \
  --order-by LastEventTime \
  --descending \
  --limit 5

Querying Logs

Filter logs
aws logs filter-log-events \
  --log-group-name /aws/lambda/my-function \
  --filter-pattern "ERROR" \
  --start-time $(date -d '1 hour ago' +%s000)

Common patterns:

  • ERROR - Lines containing ERROR
  • "Error" - Exact match
  • ?ERROR ?WARN - ERROR or WARN
  • { $.level = "error" } - JSON field match
CloudWatch Insights
aws logs start-query \
  --log-group-name /aws/lambda/my-function \
  --start-time $(date -d '24 hours ago' +%s) \
  --end-time $(date +%s) \
  --query-string '
    fields @timestamp, @message
    | filter @message like /ERROR/
    | sort @timestamp desc
    | limit 20
  '

Useful queries:

Error rate by hour
filter @type = "REPORT"
| stats count(*) as invocations,
        sum(strcontains(@message, "ERROR")) as errors
        by bin(1h)
| sort by bin desc
Cold starts
filter @type = "REPORT"
| filter @message like /Init Duration/
| parse @message /Init Duration: (?<initDuration>.*?) ms/
| stats count(*) as coldStarts, avg(initDuration) as avgInitMs by bin(1h)
P99 latency
filter @type = "REPORT"
| stats pct(@duration, 99) as p99, pct(@duration, 95) as p95, avg(@duration) as avg by bin(5m)

Structured Logging

Use JSON for better querying:

Structured logging
const log = (level, message, data = {}) => {
  console.log(JSON.stringify({
    level,
    message,
    timestamp: new Date().toISOString(),
    requestId: globalRequestId,
    ...data
  }));
};

export const handler = async (event, context) => {
  globalRequestId = context.awsRequestId;
  
  log('info', 'Processing request', { 
    path: event.path, 
    method: event.httpMethod 
  });
  
  try {
    const result = await processEvent(event);
    log('info', 'Request successful', { statusCode: 200 });
    return result;
  } catch (error) {
    log('error', 'Request failed', { 
      error: error.message, 
      stack: error.stack 
    });
    throw error;
  }
};
Structured logging
import json
import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)

class JsonFormatter(logging.Formatter):
    def format(self, record):
        log_record = {
            'level': record.levelname,
            'message': record.getMessage(),
            'timestamp': self.formatTime(record),
            'requestId': getattr(record, 'requestId', None)
        }
        return json.dumps(log_record)

handler = logging.StreamHandler()
handler.setFormatter(JsonFormatter())
logger.addHandler(handler)

def handler(event, context):
    logger.info('Processing request', extra={'requestId': context.aws_request_id})
    # Process event

Lambda Powertools Logger

Use AWS Lambda Powertools for enhanced logging:

Using Powertools
import { Logger } from '@aws-lambda-powertools/logger';

const logger = new Logger({ serviceName: 'my-service' });

export const handler = async (event, context) => {
  logger.addContext(context);
  
  logger.info('Processing order', { orderId: event.orderId });
  
  try {
    const result = await processOrder(event);
    logger.info('Order processed', { result });
    return result;
  } catch (error) {
    logger.error('Order failed', { error });
    throw error;
  }
};

CloudWatch Metrics

Lambda publishes metrics automatically:

Key Metrics

MetricDescriptionUnit
InvocationsNumber of invocationsCount
DurationExecution timeMilliseconds
ErrorsInvocations with errorsCount
ThrottlesThrottled invocationsCount
ConcurrentExecutionsConcurrent runsCount
IteratorAgeStream processing lagMilliseconds
DeadLetterErrorsDLQ delivery failuresCount

Getting Metrics

Get invocation count
aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name Invocations \
  --dimensions Name=FunctionName,Value=my-function \
  --start-time $(date -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date +%Y-%m-%dT%H:%M:%SZ) \
  --period 300 \
  --statistics Sum
Get duration percentiles
aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name Duration \
  --dimensions Name=FunctionName,Value=my-function \
  --start-time $(date -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date +%Y-%m-%dT%H:%M:%SZ) \
  --period 300 \
  --extended-statistics p50 p95 p99

Custom Metrics

Publish your own metrics:

Embedded Metric Format—publish metrics in logs:

EMF custom metrics
import { MetricUnits, Metrics } from '@aws-lambda-powertools/metrics';

const metrics = new Metrics({ 
  namespace: 'MyApp', 
  serviceName: 'OrderService' 
});

export const handler = async (event) => {
  metrics.addDimension('Environment', process.env.ENVIRONMENT);
  
  const startTime = Date.now();
  const result = await processOrder(event);
  const duration = Date.now() - startTime;
  
  metrics.addMetric('OrderProcessingTime', MetricUnits.Milliseconds, duration);
  metrics.addMetric('OrderValue', MetricUnits.Count, result.total);
  
  metrics.publishStoredMetrics();
  
  return result;
};

Direct CloudWatch API:

PutMetricData
import { CloudWatchClient, PutMetricDataCommand } from "@aws-sdk/client-cloudwatch";

const cw = new CloudWatchClient({});

const publishMetric = async (name, value, unit = 'Count') => {
  await cw.send(new PutMetricDataCommand({
    Namespace: 'MyApp',
    MetricData: [{
      MetricName: name,
      Value: value,
      Unit: unit,
      Dimensions: [
        { Name: 'FunctionName', Value: process.env.AWS_LAMBDA_FUNCTION_NAME }
      ],
      Timestamp: new Date()
    }]
  }));
};

PutMetricData adds latency. Use EMF for high-volume metrics.

X-Ray Tracing

Distributed tracing across services:

Enable X-Ray

Enable active tracing
aws lambda update-function-configuration \
  --function-name my-function \
  --tracing-config Mode=Active

Add X-Ray SDK

X-Ray tracing
import AWSXRay from 'aws-xray-sdk-core';
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';

// Instrument AWS SDK
const dynamodb = AWSXRay.captureAWSv3Client(new DynamoDBClient({}));

export const handler = async (event) => {
  // Create custom subsegment
  const segment = AWSXRay.getSegment();
  const subsegment = segment.addNewSubsegment('ProcessOrder');
  
  try {
    subsegment.addAnnotation('orderId', event.orderId);
    subsegment.addMetadata('orderDetails', event);
    
    const result = await processOrder(event);
    
    subsegment.close();
    return result;
  } catch (error) {
    subsegment.addError(error);
    subsegment.close();
    throw error;
  }
};
X-Ray tracing
from aws_xray_sdk.core import xray_recorder
from aws_xray_sdk.core import patch_all
import boto3

# Patch AWS SDK
patch_all()

dynamodb = boto3.resource('dynamodb')

def handler(event, context):
    # Create custom subsegment
    with xray_recorder.in_subsegment('ProcessOrder') as subsegment:
        subsegment.put_annotation('orderId', event['orderId'])
        subsegment.put_metadata('orderDetails', event)
        
        result = process_order(event)
        return result

Lambda Powertools Tracer

Powertools Tracer
import { Tracer } from '@aws-lambda-powertools/tracer';

const tracer = new Tracer({ serviceName: 'OrderService' });

export const handler = async (event, context) => {
  const segment = tracer.getSegment();
  
  // Automatic annotation of handler
  tracer.captureMethod(processOrder);
  
  const result = await processOrder(event);
  
  tracer.addResponseAsMetadata(result);
  
  return result;
};

CloudWatch Alarms

Set up alerts for Lambda issues:

Error rate alarm
aws cloudwatch put-metric-alarm \
  --alarm-name "Lambda-HighErrorRate" \
  --metric-name Errors \
  --namespace AWS/Lambda \
  --dimensions Name=FunctionName,Value=my-function \
  --statistic Sum \
  --period 300 \
  --threshold 5 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 2 \
  --treat-missing-data notBreaching \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:alerts
Throttle alarm
aws cloudwatch put-metric-alarm \
  --alarm-name "Lambda-Throttles" \
  --metric-name Throttles \
  --namespace AWS/Lambda \
  --dimensions Name=FunctionName,Value=my-function \
  --statistic Sum \
  --period 60 \
  --threshold 1 \
  --comparison-operator GreaterThanOrEqualToThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:alerts
Duration alarm
aws cloudwatch put-metric-alarm \
  --alarm-name "Lambda-SlowExecution" \
  --metric-name Duration \
  --namespace AWS/Lambda \
  --dimensions Name=FunctionName,Value=my-function \
  --extended-statistic p95 \
  --period 300 \
  --threshold 3000 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 2 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:alerts
Concurrency alarm
aws cloudwatch put-metric-alarm \
  --alarm-name "Lambda-HighConcurrency" \
  --metric-name ConcurrentExecutions \
  --namespace AWS/Lambda \
  --dimensions Name=FunctionName,Value=my-function \
  --statistic Maximum \
  --period 60 \
  --threshold 800 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 2 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:alerts

Lambda Insights

Enhanced monitoring with Lambda Insights:

Enable Lambda Insights
aws lambda update-function-configuration \
  --function-name my-function \
  --layers "arn:aws:lambda:us-east-1:580247275435:layer:LambdaInsightsExtension:38"

Lambda Insights provides:

  • CPU utilization
  • Memory utilization
  • Network I/O
  • Disk I/O
  • Process-level metrics

Dashboard Setup

Create a comprehensive dashboard:

Create CloudWatch dashboard
aws cloudwatch put-dashboard \
  --dashboard-name "LambdaMonitoring" \
  --dashboard-body '{
    "widgets": [
      {
        "type": "metric",
        "x": 0, "y": 0,
        "width": 12, "height": 6,
        "properties": {
          "title": "Invocations",
          "metrics": [
            ["AWS/Lambda", "Invocations", "FunctionName", "my-function"]
          ],
          "period": 300,
          "stat": "Sum"
        }
      },
      {
        "type": "metric",
        "x": 12, "y": 0,
        "width": 12, "height": 6,
        "properties": {
          "title": "Duration",
          "metrics": [
            ["AWS/Lambda", "Duration", "FunctionName", "my-function", {"stat": "p50"}],
            ["...", {"stat": "p95"}],
            ["...", {"stat": "p99"}]
          ],
          "period": 300
        }
      },
      {
        "type": "metric",
        "x": 0, "y": 6,
        "width": 12, "height": 6,
        "properties": {
          "title": "Errors & Throttles",
          "metrics": [
            ["AWS/Lambda", "Errors", "FunctionName", "my-function"],
            [".", "Throttles", ".", "."]
          ],
          "period": 300,
          "stat": "Sum"
        }
      }
    ]
  }'

Log Retention

Configure log retention to manage costs:

Set log retention
aws logs put-retention-policy \
  --log-group-name /aws/lambda/my-function \
  --retention-in-days 30
RetentionUse Case
1 dayDevelopment/testing
7 daysShort-lived functions
30 daysStandard production
90 daysCompliance needs
Never expireAudit requirements

Debugging Tips

Check Invocations

Are requests reaching your function?

aws logs tail /aws/lambda/my-function --since 1h

Check for Errors

Look at error metrics:

aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name Errors \
  --dimensions Name=FunctionName,Value=my-function \
  --start-time $(date -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date +%Y-%m-%dT%H:%M:%SZ) \
  --period 60 \
  --statistics Sum

Check for Throttles

Are you hitting concurrency limits?

aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name Throttles \
  ...

Review Traces

Use X-Ray to trace request flow:

aws xray get-trace-summaries \
  --start-time $(date -d '1 hour ago' +%s) \
  --end-time $(date +%s) \
  --filter-expression 'service("my-function")'

Best Practices

Monitoring Best Practices

  1. Use structured logging - JSON for easy querying
  2. Include request IDs - Correlate logs across services
  3. Set up alarms - Errors, throttles, and duration
  4. Enable X-Ray - For distributed tracing
  5. Configure retention - Balance cost and needs
  6. Use dashboards - Visualize key metrics
  7. Monitor cold starts - Track initialization performance
  8. Set up anomaly detection - For dynamic thresholds

Next Steps

On this page