SAA-C03Chapter 143 of 189Objective 3.2

Kinesis Data Streams: Shards and Consumers

For streaming applications ingesting large volumes of real-time data, Kinesis Data Streams shards and consumers — the core components that determine throughput, scalability, and cost — are fundamental. As a key topic under Domain 3 (High Performance), it appears in roughly 10-15% of SAA-C03 exam questions, often in scenarios involving real-time data ingestion, log aggregation, or stream processing. You will learn the internal mechanics of shards, how consumers interact with them, and the critical configuration choices that affect performance and cost.

25 min read

Intermediate

Updated Jul 20, 2026

Reviewed by Johnson Ajibi· Senior Network & Security Engineer · MSc IT Security

Jump to a section

Explain it to me simply Where people get tripped up Test what I know Look up key terms

Kinesis Shards: Conveyor Belt Analogy

An Amazon fulfillment center has multiple conveyor belts (shards). Each conveyor belt carries packages (data records) at a fixed speed: 1,000 packages per second for standard belts, or 2,000 for enhanced belts. Workers (consumers) stand at the end of each belt to pick up packages. Each worker has a notebook (shard iterator) where they record the last package they picked up. If a worker falls behind, packages pile up (data retention up to 365 days). If too many packages arrive for a single belt, some overflow and are lost (WriteProvisionedThroughputExceeded). You can add more belts (increase shard count) to handle more packages, but each belt still has its own speed limit. Workers can share a belt by taking turns (consumer fan-out), but each worker gets their own notebook and progresses independently. The key rule: a single worker can process at most 2 MB/s (data + partition key) per belt. If workers need faster access, they can use enhanced fan-out, which gives each worker their own dedicated copy of the belt, allowing 2 MB/s per worker per belt.

How It Actually Works

What is a Kinesis Data Stream?

A Kinesis Data Stream is a massively scalable, durable, real-time data streaming service. It captures and stores data records in ordered sequences across multiple shards. Each shard is a uniquely identified group of data records with a fixed capacity: 1 MB/s write throughput or 1,000 records/s (whichever is hit first) for standard shards, and 2 MB/s read throughput. Shards are the fundamental scaling unit — you increase stream capacity by adding shards.

Shard Mechanics and Data Model

A data record consists of: - Partition key: A Unicode string, up to 256 bytes, used to determine which shard the record goes to (via MD5 hash). - Data blob: Up to 1 MB (base64-encoded). - Sequence number: A unique, monotonically increasing number assigned by Kinesis when the record is written, used for ordering.

Kinesis uses the partition key to compute an MD5 hash, which maps to a specific shard. The hash space is 0 to 2^128 - 1, and each shard covers a contiguous range. When you split or merge shards (resharding), these ranges are reassigned.

Write Throughput

Each shard supports up to 1,000 records/s or 1 MB/s write throughput (including the partition key). If you exceed this, you get ProvisionedThroughputExceededException. The SAA-C03 exam often tests whether you know that the 1 MB/s includes the partition key size. For example, if your records are 1 KB each, you can write up to 1,000 records/s (since 1,000 * 1 KB = 1 MB). But if records are 10 KB, you are limited by the 1 MB/s cap, so you can only write about 100 records/s.

To increase write throughput, you must increase the number of shards. For instance, 5 shards give 5 MB/s write and 5,000 records/s.

Read Throughput (Standard Consumer)

Each shard supports up to 2 MB/s read throughput for standard consumers (shared fan-out). However, each consumer (application worker) gets a shard iterator that allows reading at a maximum of 2 MB/s per shard. But since multiple consumers share the same shard's throughput, the total read throughput across all consumers for a shard is 2 MB/s. This is a common exam trap: the 2 MB/s is per shard, not per consumer.

Enhanced Fan-Out Consumers

Enhanced fan-out (EFO) gives each consumer its own dedicated 2 MB/s read throughput per shard. This eliminates contention among consumers. The trade-off is cost — EFO costs more per GB read. The exam expects you to choose EFO when you have multiple consumers reading the same stream and need high throughput for each.

Shard Iterator Types

Consumers read records using shard iterators, which define the starting point. Types: - AT_SEQUENCE_NUMBER: Start after a specific sequence number. - AFTER_SEQUENCE_NUMBER: Start after a specific sequence number. - AT_TIMESTAMP: Start at a specific timestamp. - TRIM_HORIZON: Start from the oldest record in the shard. - LATEST: Start from the newest record (just after the latest).

The exam may ask which iterator to use for reprocessing from a specific time.

Data Retention

Default retention is 24 hours, extendable to 365 days (at extra cost). This allows replaying data. The exam often tests the maximum retention period.

Resharding (Scaling)

Resharding involves splitting or merging shards. Split increases shard count by dividing a shard's hash range into two (e.g., shard-1 covers 0-100, split into 0-50 and 51-100). Merge combines two adjacent shards into one. Both operations are asynchronous and take seconds to minutes. During resharding, the stream remains available, but the old shards expire after the retention period. The exam may ask about the impact of resharding on data ordering: data is ordered per shard, but across shards, order is not guaranteed. If you need global ordering, use a single shard (limited to 1 MB/s).

Consumer Implementation

Consumers are typically Kinesis Client Library (KCL) applications. KCL uses DynamoDB to track shard ownership and checkpointing. Each consumer instance leases one or more shards. Leases are stored in a DynamoDB table (default table name is kinesis-<stream-name>). The number of consumer instances should not exceed the number of shards for optimal performance. The exam often tests that you need a DynamoDB table for KCL.

Integration with AWS Services

Lambda: Can poll from Kinesis (max 5 concurrent batches per shard, each batch up to 10,000 records or 6 MB). Lambda's concurrency limit per shard is 1 concurrent invocation per shard unless using ParallelizationFactor (up to 10).

Kinesis Data Analytics: Uses SQL or Apache Flink to process streams.

Kinesis Firehose: Can read from Kinesis Data Streams and deliver to S3, Redshift, Elasticsearch, etc.

Monitoring

GetRecords.IteratorAgeMilliseconds: How far behind the consumer is. If > retention period, data is lost.

WriteProvisionedThroughputExceeded: Indicates shard write throttling.

ReadProvisionedThroughputExceeded: Indicates shard read throttling.

Encryption

Server-side encryption (SSE) with AWS KMS is available. Client-side encryption is also possible. The exam may test SSE-KMS integration.

Limits

Maximum shards per stream: 500 (default), can be increased via service quota.

Maximum data blob size: 1 MB.

Maximum partition key size: 256 bytes.

Maximum retention period: 365 days.

Exam Tips

Know the difference between shard-level throughput and stream-level throughput.

Remember that enhanced fan-out is for multiple consumers needing dedicated read throughput.

Resharding is for scaling write throughput, not read throughput (since read throughput is per shard).

KCL uses DynamoDB for lease management.

Lambda's ParallelizationFactor can increase concurrency per shard.

Walk-Through

Data Record Ingestion

A producer sends a `PutRecord` or `PutRecords` API call with a partition key and data blob. Kinesis applies an MD5 hash to the partition key to determine the target shard. If the shard's write throughput is exceeded (1 MB/s or 1,000 records/s), the request returns `ProvisionedThroughputExceededException`. The producer should implement exponential backoff. Each record gets a sequence number that is unique per shard and monotonically increasing.

Shard Iterator Creation

A consumer calls `GetShardIterator` with the shard ID and iterator type (e.g., TRIM_HORIZON, LATEST). Kinesis returns a shard iterator that points to a specific position in the shard. The iterator is valid for 5 minutes (300 seconds). If not used within that time, it expires and the consumer must get a new one. The exam often tests the 5-minute expiration.

Record Retrieval via GetRecords

The consumer calls `GetRecords` with the shard iterator. Kinesis returns up to 10,000 records or 10 MB (whichever is smaller) per call. The response includes a new shard iterator for the next call. The maximum throughput per shard for standard consumers is 2 MB/s. If the consumer exceeds this, it gets `ProvisionedThroughputExceededException`. Each `GetRecords` call consumes throughput; you can make multiple calls in parallel but must stay under the 2 MB/s limit.

Checkpointing in KCL

The KCL application periodically saves the sequence number of processed records to a DynamoDB table (lease table). This checkpoint allows the application to resume from where it left off after a failure. The checkpoint interval is configurable (e.g., every 1 minute). If the application crashes, the next instance reads from the last checkpoint. The exam may test that checkpointing is stored in DynamoDB and that the table must have sufficient capacity.

Lease Assignment in KCL

KCL uses DynamoDB to manage leases — each shard is assigned to a consumer instance via a lease. The number of consumer instances should be ≤ number of shards. If instances > shards, some instances remain idle. If an instance fails, its leases are reassigned to other instances after a heartbeat timeout (default 10 seconds). The exam may ask about optimal instance count.

What This Looks Like on the Job

Scenario 1: Real-Time Log Aggregation

A large e-commerce platform ingests clickstream data from millions of users. They use Kinesis Data Streams with 50 shards to handle 40,000 records/s (each record ~1 KB). Producers (web servers) use PutRecords to batch up to 500 records per call. Consumers include: (1) a Lambda function that aggregates metrics (e.g., page views per second) and writes to CloudWatch, (2) a Kinesis Firehose delivery stream that writes raw logs to S3 for batch analytics, and (3) a Kinesis Data Analytics application that detects anomalies in real-time. Since multiple consumers read the same stream, they enable enhanced fan-out for the Data Analytics app to avoid throttling. The Lambda consumer uses standard fan-out because its throughput is low. They monitor IteratorAgeMilliseconds to ensure consumers keep up. If the age exceeds 1 hour, they add shards or increase consumer parallelism.

Scenario 2: IoT Sensor Data

An industrial company collects sensor readings from thousands of IoT devices. Each device sends a 2 KB record every 5 seconds. They use Kinesis with 10 shards (10 MB/s write). The data is processed by a KCL application running on EC2 instances that does real-time anomaly detection and stores results in DynamoDB. They set data retention to 7 days for replay. A common misconfiguration is not using a partition key that distributes load evenly — if all devices use the same partition key, all data goes to one shard, causing throttling. They solved this by using device ID as partition key, which hashes evenly across shards. They also use PutRecords with up to 500 records per call to maximize throughput.

Scenario 3: Financial Transaction Processing

A financial services company processes stock trades. Each trade record is 500 bytes. They need exactly-once processing and global ordering per stock symbol. They use a single shard for each stock symbol (many streams) to guarantee ordering. They use enhanced fan-out because multiple downstream systems (fraud detection, risk analysis, reporting) need to read the same stream independently. They set retention to 365 days for audit compliance. A common failure is not monitoring ProvisionedThroughputExceeded — they use CloudWatch alarms to trigger auto-scaling (increase shards) when write throughput exceeds 80%.

How SAA-C03 Actually Tests This

The SAA-C03 exam tests Kinesis Data Streams under Domain 3.2 (High Performance). Key objectives: design scalable streaming solutions, choose appropriate throughput, and understand consumer patterns.

Common Wrong Answers

"Adding more consumers increases read throughput" — This is false for standard fan-out. Multiple consumers share the same 2 MB/s per shard. Only enhanced fan-out gives each consumer dedicated throughput.

"Shards can be added automatically" — Kinesis does not auto-scale shards. You must use AWS Auto Scaling or manual resharding. The exam often tests that you need to implement auto-scaling via CloudWatch metrics and Lambda.

"Kinesis Data Streams can directly load data into Redshift" — It cannot. You must use Kinesis Firehose to load into Redshift. This is a classic trap.

"Data is ordered globally across all shards" — Order is only guaranteed per shard. If you need global ordering, use a single shard.

Specific Numbers and Terms

Write throughput per shard: 1 MB/s or 1,000 records/s.

Read throughput per shard (standard): 2 MB/s.

Enhanced fan-out: 2 MB/s per consumer per shard.

Maximum data blob: 1 MB.

Maximum partition key: 256 bytes.

Default retention: 24 hours, max 365 days.

Shard iterator expiration: 5 minutes.

KCL uses DynamoDB for lease management.

Lambda maximum batch size: 10,000 records or 6 MB.

Edge Cases

If you need to reprocess data from a specific time, use AT_TIMESTAMP iterator.

If you need to reduce shards, you can merge adjacent shards (they must be adjacent in hash range).

Server-side encryption uses AWS KMS; you must have permissions for KMS keys.

Eliminating Wrong Answers

If the question mentions "multiple consumers needing high throughput," look for enhanced fan-out.

If the question mentions "cost-effective," standard fan-out is cheaper.

If the question mentions "ordering across all records," the answer likely involves a single shard or a different service like SQS FIFO.

If the question mentions "exactly-once processing," Kinesis does not guarantee exactly-once; you need idempotency in your consumer.

Key Takeaways

Each shard supports 1 MB/s write and 1,000 records/s write (whichever is lower).

Each shard supports 2 MB/s read throughput for standard consumers, shared among all consumers.

Enhanced fan-out gives each consumer its own 2 MB/s read throughput per shard.

Data retention can be set from 24 hours to 365 days.

Shard iterators expire after 5 minutes (300 seconds).

KCL uses DynamoDB for checkpointing and lease management.

Resharding (split/merge) is manual and takes seconds to minutes.

Lambda can poll Kinesis with a maximum batch of 10,000 records or 6 MB.

Order is guaranteed only within a shard, not across shards.

Use `ProvisionedThroughputExceeded` metrics to trigger auto-scaling.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Standard Consumer

Shared read throughput of 2 MB/s per shard across all consumers

Lower cost per GB read

Suitable for few consumers or low throughput

Can cause throttling if multiple consumers read heavily

Uses shard iterators that expire in 5 minutes

Enhanced Fan-Out Consumer

Dedicated read throughput of 2 MB/s per consumer per shard

Higher cost per GB read

Suitable for many consumers needing high throughput

No throttling due to other consumers

Uses dedicated throughput via HTTP/2; no shard iterator expiration concerns

Watch Out for These

Mistake

Adding more consumers increases read throughput per shard.

Correct

For standard consumers, the total read throughput per shard is fixed at 2 MB/s, shared by all consumers. Only enhanced fan-out gives each consumer its own 2 MB/s.

Mistake

Kinesis Data Streams can automatically scale shards.

Correct

Kinesis does not auto-scale. You must use AWS Auto Scaling with CloudWatch metrics (e.g., `WriteProvisionedThroughputExceeded`) or manually reshard.

Mistake

Data in a Kinesis Data Stream is ordered globally.

Correct

Order is guaranteed only within a single shard. Across shards, there is no global order.

Mistake

Kinesis Data Streams can load data directly into Amazon Redshift.

Correct

Kinesis Data Streams cannot directly load into Redshift. You must use Kinesis Firehose to deliver to Redshift.

Mistake

The maximum retention period for Kinesis Data Streams is 7 days.

Correct

The default is 24 hours, but you can increase it up to 365 days (additional cost).

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the maximum write throughput per shard in Kinesis Data Streams?

Each shard supports up to 1 MB/s or 1,000 records per second, whichever is reached first. This includes the size of the partition key. If you exceed this, you get `ProvisionedThroughputExceededException`. To increase throughput, you must add more shards.

How does enhanced fan-out differ from standard consumers?

Standard consumers share the 2 MB/s read throughput per shard. Enhanced fan-out gives each consumer its own dedicated 2 MB/s per shard, eliminating contention. Enhanced fan-out costs more per GB read but is necessary when multiple consumers need high throughput.

What is the maximum retention period for data in Kinesis Data Streams?

The default retention period is 24 hours, but you can increase it up to 365 days. Extending retention incurs additional storage costs. This is useful for replaying data or compliance.

How does KCL (Kinesis Client Library) manage shard assignment?

KCL uses DynamoDB to store leases. Each shard has a lease, and each consumer instance claims one or more leases. DynamoDB is also used for checkpointing (storing the last processed sequence number). If a consumer fails, its leases are reassigned after a heartbeat timeout (default 10 seconds).

Can Kinesis Data Streams automatically scale shards?

No, Kinesis does not auto-scale shards. You must implement auto-scaling using CloudWatch metrics (e.g., `WriteProvisionedThroughputExceeded`) and a Lambda function to trigger resharding via the UpdateShardCount API. Alternatively, you can manually reshard.

What is the difference between PutRecord and PutRecords?

`PutRecord` sends a single record. `PutRecords` sends up to 500 records in a single call, reducing API overhead and costs. Both use partition key hashing to determine the shard. `PutRecords` is more efficient for high-throughput producers.

How does Lambda integrate with Kinesis Data Streams?

Lambda can poll a Kinesis stream as an event source. It reads batches of up to 10,000 records or 6 MB per invocation. By default, Lambda processes one batch per shard at a time. You can increase concurrency using `ParallelizationFactor` (up to 10). Lambda uses the `IteratorAge` metric to track lag.

Terms Worth Knowing

IAM Kinesis Region

Ready to put this to the test?

You've just covered Kinesis Data Streams: Shards and Consumers — now see how well it sticks with free SAA-C03 practice questions. Full explanations included, no account needed.

Try SAA-C03 practice questions Back to all chapters

Done with this chapter?

CloudFront Functions for Edge Logic

Kinesis Data Firehose Delivery Streams

See the full SAA-C03 study guide