DVA-C02Chapter 20 of 101Objective 1.3

DynamoDB Streams and Lambda Triggers

This chapter covers DynamoDB Streams and how to trigger AWS Lambda functions from them, a critical integration pattern for building event-driven architectures. DynamoDB Streams capture a time-ordered sequence of item-level modifications in a DynamoDB table and can be consumed by Lambda to react to changes in near real-time. This topic is tested in Domain 1: Development with Secure and Scalable Applications and represents approximately 5-8% of the DVA-C02 exam questions, often appearing in scenarios requiring real-time data processing, cross-region replication, or maintaining materialized views.

25 min read
Intermediate
Updated May 31, 2026

DynamoDB Streams: The River of Changes

Imagine a large river (DynamoDB table) with a constant flow of water (data). Every time a fish jumps (an item is created, updated, or deleted), a small, automated camera (DynamoDB Streams) captures a snapshot of that fish at the exact moment it breaks the surface. These snapshots are recorded in chronological order and stored in a series of lockers (shards) along the riverbank. Each locker holds a sequence of snapshots. You have a team of observers (Lambda functions) assigned to monitor specific lockers. When a new snapshot appears in their assigned locker, the observer immediately picks it up and processes it (e.g., updates a downstream database). The observers can request to see snapshots from the past (trim horizon) or start from the newest snapshot (latest). If an observer fails to pick up a snapshot in time (within 24 hours), the snapshot is discarded (expired). This system ensures that every change in the river is captured exactly once and can be processed in real-time, without the observers having to constantly scan the entire river.

How It Actually Works

What Are DynamoDB Streams?

DynamoDB Streams is a fully managed, durable, and ordered data stream that captures changes to items in a DynamoDB table. When you enable streams on a table, DynamoDB records a sequence of stream records each time an item is created, updated, or deleted. Each stream record contains a before-and-after image of the item (configurable) and metadata such as the event type, timestamp, and sequence number. Streams are designed for high-throughput tables and provide exactly-once delivery within a shard.

How DynamoDB Streams Work Internally

When you enable a stream on a DynamoDB table, the service partitions the stream into shards. A shard is a container for stream records and is automatically created, managed, and deleted by DynamoDB. Each shard corresponds to a partition of the table. The number of shards scales with the number of table partitions. Stream records are written to shards in the order they occur within that partition. Each shard has a sequence of records identified by a sequence number. The stream endpoint is separate from the table endpoint, and you use the DynamoDB Streams API (e.g., DescribeStream, GetShardIterator, GetRecords) to read records.

Key Stream Record Components

Each stream record contains: - eventID: A globally unique identifier for the record. - eventName: INSERT, MODIFY, or REMOVE. - eventVersion: Stream record version (e.g., 1.0). - eventSource: aws:dynamodb. - awsRegion: Region of the table. - dynamodb: Contains: - Keys: Key attributes of the item. - NewImage: Entire item after the change (if StreamViewType is NEW_IMAGE or NEW_AND_OLD_IMAGES). - OldImage: Entire item before the change (if StreamViewType is OLD_IMAGE or NEW_AND_OLD_IMAGES). - SequenceNumber: Monotonically increasing number within a shard. - SizeBytes: Size of the record. - StreamViewType: The view type configured. - userIdentity: Details about the user if the request was made by an IAM user or role.

Stream View Types

You choose a StreamViewType when enabling streams: - KEYS_ONLY: Only the key attributes of the modified item. - NEW_IMAGE: The entire item after it was modified. - OLD_IMAGE: The entire item before it was modified. - NEW_AND_OLD_IMAGES: Both the before and after images.

Default: KEYS_ONLY. The exam often tests the trade-offs: NEW_AND_OLD_IMAGES provides the most data but increases stream record size and cost.

Enabling DynamoDB Streams

You can enable streams via the AWS Management Console, AWS CLI, or CloudFormation. Using the AWS CLI:

aws dynamodb update-table --table-name MyTable --stream-specification StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGES

The response includes the LatestStreamArn and StreamLabel. The Stream Arn format is: arn:aws:dynamodb:<region>:<account-id>:table/<table-name>/stream/<timestamp>

Reading from a Stream (Without Lambda)

To read stream records manually: 1. Call DescribeStream to get shard IDs. 2. For each shard, call GetShardIterator with the shard ID and iterator type (e.g., TRIM_HORIZON, LATEST, AT_SEQUENCE_NUMBER, AFTER_SEQUENCE_NUMBER). 3. Call GetRecords with the shard iterator to retrieve a batch of records (max 1MB per call). 4. The response includes a NextShardIterator for pagination.

Iterator Types: - TRIM_HORIZON: Start from the oldest unexpired record. - LATEST: Start from the newest record (records added after this call). - AT_SEQUENCE_NUMBER: Start exactly at a given sequence number. - AFTER_SEQUENCE_NUMBER: Start after a given sequence number.

Lambda Triggers from DynamoDB Streams

Lambda can be automatically triggered by DynamoDB Streams. You create an event source mapping that connects a Lambda function to a DynamoDB stream. The Lambda service polls the stream on your behalf, reading batches of records and invoking your function synchronously (event invocation).

Key Configuration Parameters: - Batch Size: Number of records to read from the stream per invocation (default 100, max 10,000). - Batch Window: Maximum time to wait for a full batch (default 0 seconds, max 300 seconds). - Starting Position: TRIM_HORIZON or LATEST (similar to iterator type). - Maximum Retry Attempts: Number of times to retry failed invocations (default -1, meaning infinite). - Maximum Record Age: Maximum age of a record before it is discarded (default -1, meaning infinite). - Bisect Batch on Function Error: If true, the batch is split in two and retried separately. - Destination: You can configure an SQS queue or SNS topic for records that are not processed after all retries.

Lambda Execution Role Permissions

The Lambda function's execution role must have permissions for: - dynamodb:DescribeStream - dynamodb:GetRecords - dynamodb:GetShardIterator - dynamodb:ListStreams

And optionally: - kms:Decrypt if the table uses AWS KMS encryption.

Event Payload Example

The Lambda event payload is a JSON object with an array of Records. Example:

{
  "Records": [
    {
      "eventID": "1",
      "eventName": "INSERT",
      "eventVersion": "1.0",
      "eventSource": "aws:dynamodb",
      "awsRegion": "us-east-1",
      "dynamodb": {
        "Keys": {
          "Id": {
            "N": "101"
          }
        },
        "NewImage": {
          "Id": {
            "N": "101"
          },
          "Message": {
            "S": "Hello"
          }
        },
        "SequenceNumber": "111",
        "SizeBytes": 26,
        "StreamViewType": "NEW_AND_OLD_IMAGES"
      },
      "eventSourceARN": "arn:aws:dynamodb:us-east-1:123456789012:table/ExampleTable/stream/2015-06-27T00:48:05.899"
    }
  ]
}

Error Handling and Retries

When a Lambda invocation fails (returns an error or times out), the Lambda service retries the batch of records. The retry behavior is governed by: - Maximum Retry Attempts: By default, retries are infinite. You can set a finite number. - Maximum Record Age: Once a record's age exceeds this value (in seconds), it is discarded. Default is infinite. - Bisect Batch on Function Error: If enabled, the failed batch is split into two halves, each retried separately. This isolates problematic records. - Destination: You can specify a dead-letter queue or SNS topic for records that fail after all retries.

Monitoring and Metrics

CloudWatch metrics for DynamoDB Streams include: - ReturnedRecordCount: Number of records returned by GetRecords. - ReturnedBytes: Bytes returned. - UserErrors: Errors from user requests. - SystemErrors: Internal errors.

Lambda monitoring includes: - IteratorAge: How old the last record processed is (in milliseconds). High age indicates the function is falling behind. - Throttles: When Lambda invocations are throttled (concurrency limit).

Interaction with TTL (Time to Live)

DynamoDB TTL automatically deletes expired items. These deletions generate stream records with eventName = "REMOVE" (if streams are enabled). The stream record includes the item's content before deletion (if StreamViewType is OLD_IMAGE or NEW_AND_OLD_IMAGES). This is commonly used to trigger cleanup actions after item expiration.

Limits and Performance

Stream retention: 24 hours. Records older than 24 hours are automatically deleted (trimmed).

Maximum records per GetRecords call: 1,000 records or 1 MB (whichever is smaller).

Maximum shard iterator timeout: 300 seconds (5 minutes) for idle shard iterators.

Throughput: Stream throughput scales with table partitions. Each shard can handle up to 5 read requests per second, with up to 2 MB per second of data.

Lambda concurrency: Each shard can invoke one Lambda function at a time (sequential processing per shard). Multiple shards can invoke concurrently.

Best Practices

Idempotency: Since Lambda may retry batches, your function should be idempotent (processing the same record multiple times should produce the same result).

Batch Window: Use a batch window to accumulate records and reduce invocation frequency.

Error Handling: Always implement error handling and consider using bisect batch for large batches.

Monitor IteratorAge: If IteratorAge grows, your function is not keeping up. Increase Lambda concurrency or reduce batch size.

Use Destinations: Configure a destination for failed records to avoid data loss.

Walk-Through

1

Enable DynamoDB Streams

First, you must enable the stream on the DynamoDB table. You specify the StreamViewType: KEYS_ONLY, NEW_IMAGE, OLD_IMAGE, or NEW_AND_OLD_IMAGES. This action creates a stream endpoint and begins capturing changes. The stream is enabled at the table level, and all changes (INSERT, MODIFY, REMOVE) are recorded. Use the AWS CLI: `aws dynamodb update-table --table-name MyTable --stream-specification StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGES`. The response includes the stream ARN, which you need for the next step.

2

Create Lambda Function

Write and deploy a Lambda function that will process stream records. The function receives an event object containing an array of records. Each record has fields like eventID, eventName, dynamodb (with NewImage, OldImage, Keys). The function should be idempotent and handle errors gracefully. Ensure the execution role has permissions for dynamodb:DescribeStream, dynamodb:GetRecords, dynamodb:GetShardIterator, and dynamodb:ListStreams.

3

Create Event Source Mapping

Create an event source mapping that connects the Lambda function to the DynamoDB stream. This tells Lambda to poll the stream and invoke your function with new records. Use the AWS CLI: `aws lambda create-event-source-mapping --function-name my-function --event-source-arn <stream-arn> --starting-position TRIM_HORIZON --batch-size 100`. The mapping can be created via console, CLI, or CloudFormation. You can configure batch size, batch window, starting position, and error handling settings.

4

Lambda Polls the Stream

Once the event source mapping is active, the Lambda service begins polling the DynamoDB stream. It uses the DynamoDB Streams API to read records from each shard. The polling is internal to AWS; you do not manage it manually. Lambda processes each shard sequentially (one invocation at a time per shard), but multiple shards can be processed concurrently. The service handles shard discovery and load balancing.

5

Invoke Lambda with Batch of Records

When Lambda has a batch of records (based on batch size and batch window), it invokes your function synchronously. The function receives the event payload and processes each record. If the function succeeds, Lambda marks the records as processed and moves the shard iterator forward. If the function fails (throws an exception or times out), Lambda retries the batch based on the configured retry policy (max retry attempts, maximum record age, bisect batch).

What This Looks Like on the Job

Scenario 1: Real-Time Data Replication

A global e-commerce company needs to replicate order data from a primary DynamoDB table in us-east-1 to a secondary table in eu-west-1 for disaster recovery and low-latency reads. They enable DynamoDB Streams on the primary table with StreamViewType=NEW_AND_OLD_IMAGES. A Lambda function is triggered by the stream, which writes each change to the secondary table using the DynamoDB PutItem or UpdateItem API. The function is deployed in the same region as the primary table to minimize latency. They configure the Lambda function with a batch size of 500 and a batch window of 10 seconds to balance throughput and cost. They monitor the IteratorAge metric; if it exceeds 5 seconds, they increase Lambda concurrency or reduce batch size. They also implement idempotency by using conditional writes (e.g., ConditionExpression: attribute_not_exists(version) or version < new_version) to handle retries. This setup replicates millions of orders per day with sub-second latency.

Scenario 2: Real-Time Analytics Pipeline

A social media platform tracks user interactions (likes, comments, shares) stored in a DynamoDB table. They want to update a leaderboard in Amazon ElastiCache in real-time. They enable streams with StreamViewType=NEW_IMAGE on the interactions table. A Lambda function processes each stream record and increments a counter in Redis (e.g., ZINCRBY). They use a batch size of 1000 and a batch window of 5 seconds. The function handles spikes in traffic (e.g., during a viral post) by scaling Lambda concurrency. They configure the Lambda function with a reserved concurrency of 500 to avoid throttling. They also set MaximumRetryAttempts to 3 and MaximumRecordAge to 60 seconds to avoid stale data. If a record fails after retries, they send it to an SQS dead-letter queue for manual inspection. This pipeline processes 10,000 events per second with an average latency of 200ms.

Scenario 3: Cross-Region Replication with Conflict Resolution

A multinational corporation maintains user profiles in DynamoDB tables in multiple regions for low latency. They use DynamoDB Streams and Lambda to replicate changes across regions. However, conflicts can arise when the same item is updated in two regions simultaneously. They implement a last-writer-wins strategy using a timestamp attribute. The Lambda function checks the timestamp in the NewImage against the current item in the target table. If the incoming change is older, it is ignored. They also use conditional writes to ensure atomic updates. They configure the stream with StreamViewType=NEW_AND_OLD_IMAGES to compare timestamps. They set the Lambda function timeout to 30 seconds and batch size to 100. They monitor CloudWatch metrics for errors and iterator age. This setup handles 50 million user profiles with 99.99% uptime.

How DVA-C02 Actually Tests This

What DVA-C02 Tests on DynamoDB Streams and Lambda Triggers

The DVA-C02 exam (Domain 1: Development with Secure and Scalable Applications, Objective 1.3: Develop code that processes data streams) tests your understanding of how to configure and consume DynamoDB Streams with Lambda. Key areas:

StreamViewType options and their impact on data volume and cost.

Event source mapping configuration: batch size, batch window, starting position.

Error handling: retry behavior, bisect batch, maximum record age, destinations.

IAM permissions required for Lambda to read from the stream.

TTL integration: TTL deletions generate stream records with eventName REMOVE.

Monitoring: IteratorAge metric, Throttles.

Common Wrong Answers and Why Candidates Choose Them

1.

Choosing KEYS_ONLY for a use case requiring full item data: Candidates often pick KEYS_ONLY to reduce cost, but the scenario requires the entire item (e.g., for replication). They overlook the need for NEW_IMAGE or NEW_AND_OLD_IMAGES.

2.

Setting Starting Position to LATEST when the function must process existing records: LATEST only processes new records after the mapping is created. If the function needs to process records that were created before the mapping, TRIM_HORIZON is needed.

3.

Assuming Lambda processes records in order across all shards: Records within a shard are ordered, but across shards, order is not guaranteed. The exam may present a scenario requiring strict ordering; the correct answer is to use a single shard (but DynamoDB Streams does not allow controlling shard count).

4.

Configuring a Dead Letter Queue (DLQ) on the Lambda function instead of using Destinations: The exam expects you to know that for stream-based triggers, you should use the Destination configuration on the event source mapping (SQS or SNS) rather than the Lambda DLQ.

Specific Numbers and Terms to Memorize

Stream retention: 24 hours.

Maximum batch size: 10,000 records (default 100).

Maximum batch window: 300 seconds (5 minutes).

Default MaximumRetryAttempts: -1 (infinite).

Default MaximumRecordAge: -1 (infinite).

IteratorAge: CloudWatch metric indicating how old the last processed record is.

StreamViewType: KEYS_ONLY, NEW_IMAGE, OLD_IMAGE, NEW_AND_OLD_IMAGES.

Starting position: TRIM_HORIZON, LATEST.

Edge Cases and Exceptions

TTL and Streams: TTL deletions generate stream records with eventName REMOVE. The record includes the item's content before deletion if StreamViewType includes OLD_IMAGE. This is a common exam scenario for cleanup tasks.

Empty batches: Lambda may invoke with an empty batch if the batch window expires with no records. Your function should handle empty batches gracefully (e.g., return success immediately).

Throttling: If Lambda concurrency is exhausted, invocations are throttled. The stream records are not lost but the IteratorAge increases. The exam may ask how to mitigate: increase reserved concurrency or reduce batch size.

KMS Encryption: If the table uses AWS managed KMS key, Lambda needs kms:Decrypt permission.

How to Eliminate Wrong Answers

If the scenario requires processing every change exactly once, eliminate any option that uses SQS (at-least-once) or Kinesis (at-least-once). DynamoDB Streams provides exactly-once within a shard.

If the scenario requires processing changes in real-time, eliminate options that involve batch jobs (e.g., EMR, Glue).

If the scenario involves cross-region replication, eliminate options that do not mention streams or that suggest using a separate polling mechanism.

Key Takeaways

DynamoDB Streams capture item-level changes (INSERT, MODIFY, REMOVE) with a 24-hour retention period.

StreamViewType determines what data is included: KEYS_ONLY, NEW_IMAGE, OLD_IMAGE, or NEW_AND_OLD_IMAGES.

Lambda can be triggered by DynamoDB Streams via an event source mapping with configurable batch size (max 10,000) and batch window (max 300 seconds).

Records within a shard are ordered; across shards, order is not guaranteed.

Error handling options: MaximumRetryAttempts, MaximumRecordAge, BisectBatchOnFunctionError, and Destinations (SQS/SNS).

TTL deletions generate stream records with eventName REMOVE and include the item's old image if configured.

Lambda execution role must have dynamodb:DescribeStream, dynamodb:GetRecords, dynamodb:GetShardIterator, and dynamodb:ListStreams permissions.

Monitor IteratorAge CloudWatch metric to detect if Lambda is falling behind.

Default starting position is TRIM_HORIZON (oldest unexpired record) or LATEST (new records only).

Each shard processes one Lambda invocation at a time; multiple shards can invoke concurrently.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

DynamoDB Streams with Lambda

Automatically enabled per table; no shard management.

Exactly-once delivery within a shard.

24-hour retention; cannot be extended.

No ability to replay records beyond 24 hours.

Integrated with DynamoDB TTL for automatic deletion events.

Amazon Kinesis Data Streams with Lambda

Requires manual creation and shard management.

At-least-once delivery; possible duplicates.

Retention configurable from 1 day to 365 days (extended retention).

Supports replay of records via shard iterators (up to retention period).

No native TTL integration; must implement custom expiry.

Watch Out for These

Mistake

DynamoDB Streams records are retained indefinitely.

Correct

Stream records are retained for a maximum of 24 hours. After that, they are automatically deleted (trimmed). You must process records within 24 hours or they are lost.

Mistake

Lambda processes records from all shards in strict order.

Correct

Records within a single shard are ordered, but across shards, order is not guaranteed. If you need global ordering, you must use a single shard (which is not configurable in DynamoDB Streams) or use a different service like Kinesis.

Mistake

Setting MaximumRetryAttempts to 0 disables retries.

Correct

MaximumRetryAttempts of 0 means zero retries; the record is discarded immediately if the function fails. However, the default is -1 (infinite retries). Setting it to 0 may cause data loss.

Mistake

You must create the Lambda function before enabling the stream.

Correct

You can enable the stream at any time. The event source mapping can be created after both the stream and function exist. The stream captures records from the moment it is enabled, regardless of when the mapping is created.

Mistake

Lambda can be triggered by streams on a DynamoDB table in a different AWS account.

Correct

The event source mapping must be in the same account as the DynamoDB table. However, you can use cross-account access by having the Lambda function in the same account as the stream and then forwarding data to another account.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between DynamoDB Streams and Kinesis Data Streams for Lambda triggers?

DynamoDB Streams is tightly integrated with DynamoDB tables and automatically captures changes with no shard management. It provides exactly-once delivery within a shard but has a 24-hour retention. Kinesis Data Streams requires manual shard management, offers at-least-once delivery, and supports configurable retention up to 365 days. Use DynamoDB Streams for simple, real-time reactions to DynamoDB changes; use Kinesis for complex streaming analytics or when you need longer retention.

How do I handle duplicate records from DynamoDB Streams in Lambda?

DynamoDB Streams provides exactly-once delivery within a shard, so duplicates are rare. However, if your function fails and retries, the same batch may be processed again. Make your function idempotent by using the eventID as a unique identifier and checking if the record has already been processed (e.g., by storing processed IDs in a DynamoDB table or using conditional writes).

Can I process DynamoDB Stream records in order across all items?

No, DynamoDB Streams only guarantees order within a single shard (which corresponds to a table partition). If your table has multiple partitions, records from different partitions may be processed out of order. If you need global ordering, consider using a single shard (not possible with DynamoDB Streams) or use a different service like Kinesis with a single shard.

What happens if my Lambda function fails to process a batch from DynamoDB Streams?

Lambda will retry the batch based on the MaximumRetryAttempts and MaximumRecordAge settings. By default, retries are infinite until the record expires (24 hours). You can enable BisectBatchOnFunctionError to split the batch and isolate problematic records. You can also configure a Destination (SQS or SNS) for records that fail after all retries.

How do I enable DynamoDB Streams on an existing table?

You can enable streams via the console (Table > Exports and streams > Manage stream) or using the AWS CLI: `aws dynamodb update-table --table-name MyTable --stream-specification StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGES`. The table remains available during the update. Once enabled, you can create a Lambda event source mapping referencing the stream ARN.

What is the cost of DynamoDB Streams?

DynamoDB Streams incurs read request unit (RRU) costs when reading from the stream. Each GetRecords call consumes 1 RRU per 4 KB of data read. There is no cost for enabling the stream or for writing records. Lambda invocations incur standard Lambda costs. The StreamViewType affects the size of each record and thus the cost.

Can I use DynamoDB Streams with global tables?

Yes, DynamoDB Global Tables use streams internally to replicate changes across regions. You can also set up your own cross-region replication using streams and Lambda, but Global Tables handle this automatically. If you enable streams on a global table, you will see stream records for both local writes and replicated writes from other regions.

Terms Worth Knowing

Ready to put this to the test?

You've just covered DynamoDB Streams and Lambda Triggers — now see how well it sticks with free DVA-C02 practice questions. Full explanations included, no account needed.

Done with this chapter?