This chapter dives deep into Kinesis Data Streams, focusing on shards and consumers — the core components that determine throughput, scalability, and cost. As a key topic under Domain 3 (High Performance), it appears in roughly 10-15% of SAA-C03 exam questions, often in scenarios involving real-time data ingestion, log aggregation, or stream processing. You will learn the internal mechanics of shards, how consumers interact with them, and the critical configuration choices that affect performance and cost.
Jump to a section
Imagine an Amazon fulfillment center with multiple conveyor belts (shards). Each conveyor belt carries packages (data records) at a fixed speed: 1,000 packages per second for standard belts, or 2,000 for enhanced belts. Workers (consumers) stand at the end of each belt to pick up packages. Each worker has a notebook (shard iterator) where they record the last package they picked up. If a worker falls behind, packages pile up (data retention up to 365 days). If too many packages arrive for a single belt, some overflow and are lost (WriteProvisionedThroughputExceeded). You can add more belts (increase shard count) to handle more packages, but each belt still has its own speed limit. Workers can share a belt by taking turns (consumer fan-out), but each worker gets their own notebook and progresses independently. The key rule: a single worker can process at most 2 MB/s (data + partition key) per belt. If workers need faster access, they can use enhanced fan-out, which gives each worker their own dedicated copy of the belt, allowing 2 MB/s per worker per belt.
What is a Kinesis Data Stream?
A Kinesis Data Stream is a massively scalable, durable, real-time data streaming service. It captures and stores data records in ordered sequences across multiple shards. Each shard is a uniquely identified group of data records with a fixed capacity: 1 MB/s write throughput or 1,000 records/s (whichever is hit first) for standard shards, and 2 MB/s read throughput. Shards are the fundamental scaling unit — you increase stream capacity by adding shards.
Shard Mechanics and Data Model
A data record consists of: - Partition key: A Unicode string, up to 256 bytes, used to determine which shard the record goes to (via MD5 hash). - Data blob: Up to 1 MB (base64-encoded). - Sequence number: A unique, monotonically increasing number assigned by Kinesis when the record is written, used for ordering.
Kinesis uses the partition key to compute an MD5 hash, which maps to a specific shard. The hash space is 0 to 2^128 - 1, and each shard covers a contiguous range. When you split or merge shards (resharding), these ranges are reassigned.
Write Throughput
Each shard supports up to 1,000 records/s or 1 MB/s write throughput (including the partition key). If you exceed this, you get ProvisionedThroughputExceededException. The SAA-C03 exam often tests whether you know that the 1 MB/s includes the partition key size. For example, if your records are 1 KB each, you can write up to 1,000 records/s (since 1,000 * 1 KB = 1 MB). But if records are 10 KB, you are limited by the 1 MB/s cap, so you can only write about 100 records/s.
To increase write throughput, you must increase the number of shards. For instance, 5 shards give 5 MB/s write and 5,000 records/s.
Read Throughput (Standard Consumer)
Each shard supports up to 2 MB/s read throughput for standard consumers (shared fan-out). However, each consumer (application worker) gets a shard iterator that allows reading at a maximum of 2 MB/s per shard. But since multiple consumers share the same shard's throughput, the total read throughput across all consumers for a shard is 2 MB/s. This is a common exam trap: the 2 MB/s is per shard, not per consumer.
Enhanced Fan-Out Consumers
Enhanced fan-out (EFO) gives each consumer its own dedicated 2 MB/s read throughput per shard. This eliminates contention among consumers. The trade-off is cost — EFO costs more per GB read. The exam expects you to choose EFO when you have multiple consumers reading the same stream and need high throughput for each.
Shard Iterator Types
Consumers read records using shard iterators, which define the starting point. Types:
- AT_SEQUENCE_NUMBER: Start after a specific sequence number.
- AFTER_SEQUENCE_NUMBER: Start after a specific sequence number.
- AT_TIMESTAMP: Start at a specific timestamp.
- TRIM_HORIZON: Start from the oldest record in the shard.
- LATEST: Start from the newest record (just after the latest).
The exam may ask which iterator to use for reprocessing from a specific time.
Data Retention
Default retention is 24 hours, extendable to 365 days (at extra cost). This allows replaying data. The exam often tests the maximum retention period.
Resharding (Scaling)
Resharding involves splitting or merging shards. Split increases shard count by dividing a shard's hash range into two (e.g., shard-1 covers 0-100, split into 0-50 and 51-100). Merge combines two adjacent shards into one. Both operations are asynchronous and take seconds to minutes. During resharding, the stream remains available, but the old shards expire after the retention period. The exam may ask about the impact of resharding on data ordering: data is ordered per shard, but across shards, order is not guaranteed. If you need global ordering, use a single shard (limited to 1 MB/s).
Consumer Implementation
Consumers are typically Kinesis Client Library (KCL) applications. KCL uses DynamoDB to track shard ownership and checkpointing. Each consumer instance leases one or more shards. Leases are stored in a DynamoDB table (default table name is kinesis-<stream-name>). The number of consumer instances should not exceed the number of shards for optimal performance. The exam often tests that you need a DynamoDB table for KCL.
Integration with AWS Services
Lambda: Can poll from Kinesis (max 5 concurrent batches per shard, each batch up to 10,000 records or 6 MB). Lambda's concurrency limit per shard is 1 concurrent invocation per shard unless using ParallelizationFactor (up to 10).
Kinesis Data Analytics: Uses SQL or Apache Flink to process streams.
Kinesis Firehose: Can read from Kinesis Data Streams and deliver to S3, Redshift, Elasticsearch, etc.
Monitoring
GetRecords.IteratorAgeMilliseconds: How far behind the consumer is. If > retention period, data is lost.
WriteProvisionedThroughputExceeded: Indicates shard write throttling.
ReadProvisionedThroughputExceeded: Indicates shard read throttling.
Encryption
Server-side encryption (SSE) with AWS KMS is available. Client-side encryption is also possible. The exam may test SSE-KMS integration.
Limits
Maximum shards per stream: 500 (default), can be increased via service quota.
Maximum data blob size: 1 MB.
Maximum partition key size: 256 bytes.
Maximum retention period: 365 days.
Exam Tips
Know the difference between shard-level throughput and stream-level throughput.
Remember that enhanced fan-out is for multiple consumers needing dedicated read throughput.
Resharding is for scaling write throughput, not read throughput (since read throughput is per shard).
KCL uses DynamoDB for lease management.
Lambda's ParallelizationFactor can increase concurrency per shard.
Data Record Ingestion
A producer sends a `PutRecord` or `PutRecords` API call with a partition key and data blob. Kinesis applies an MD5 hash to the partition key to determine the target shard. If the shard's write throughput is exceeded (1 MB/s or 1,000 records/s), the request returns `ProvisionedThroughputExceededException`. The producer should implement exponential backoff. Each record gets a sequence number that is unique per shard and monotonically increasing.
Shard Iterator Creation
A consumer calls `GetShardIterator` with the shard ID and iterator type (e.g., TRIM_HORIZON, LATEST). Kinesis returns a shard iterator that points to a specific position in the shard. The iterator is valid for 5 minutes (300 seconds). If not used within that time, it expires and the consumer must get a new one. The exam often tests the 5-minute expiration.
Record Retrieval via GetRecords
The consumer calls `GetRecords` with the shard iterator. Kinesis returns up to 10,000 records or 10 MB (whichever is smaller) per call. The response includes a new shard iterator for the next call. The maximum throughput per shard for standard consumers is 2 MB/s. If the consumer exceeds this, it gets `ProvisionedThroughputExceededException`. Each `GetRecords` call consumes throughput; you can make multiple calls in parallel but must stay under the 2 MB/s limit.
Checkpointing in KCL
The KCL application periodically saves the sequence number of processed records to a DynamoDB table (lease table). This checkpoint allows the application to resume from where it left off after a failure. The checkpoint interval is configurable (e.g., every 1 minute). If the application crashes, the next instance reads from the last checkpoint. The exam may test that checkpointing is stored in DynamoDB and that the table must have sufficient capacity.
Lease Assignment in KCL
KCL uses DynamoDB to manage leases — each shard is assigned to a consumer instance via a lease. The number of consumer instances should be ≤ number of shards. If instances > shards, some instances remain idle. If an instance fails, its leases are reassigned to other instances after a heartbeat timeout (default 10 seconds). The exam may ask about optimal instance count.
Scenario 1: Real-Time Log Aggregation
A large e-commerce platform ingests clickstream data from millions of users. They use Kinesis Data Streams with 50 shards to handle 40,000 records/s (each record ~1 KB). Producers (web servers) use PutRecords to batch up to 500 records per call. Consumers include: (1) a Lambda function that aggregates metrics (e.g., page views per second) and writes to CloudWatch, (2) a Kinesis Firehose delivery stream that writes raw logs to S3 for batch analytics, and (3) a Kinesis Data Analytics application that detects anomalies in real-time. Since multiple consumers read the same stream, they enable enhanced fan-out for the Data Analytics app to avoid throttling. The Lambda consumer uses standard fan-out because its throughput is low. They monitor IteratorAgeMilliseconds to ensure consumers keep up. If the age exceeds 1 hour, they add shards or increase consumer parallelism.
Scenario 2: IoT Sensor Data
An industrial company collects sensor readings from thousands of IoT devices. Each device sends a 2 KB record every 5 seconds. They use Kinesis with 10 shards (10 MB/s write). The data is processed by a KCL application running on EC2 instances that does real-time anomaly detection and stores results in DynamoDB. They set data retention to 7 days for replay. A common misconfiguration is not using a partition key that distributes load evenly — if all devices use the same partition key, all data goes to one shard, causing throttling. They solved this by using device ID as partition key, which hashes evenly across shards. They also use PutRecords with up to 500 records per call to maximize throughput.
Scenario 3: Financial Transaction Processing
A financial services company processes stock trades. Each trade record is 500 bytes. They need exactly-once processing and global ordering per stock symbol. They use a single shard for each stock symbol (many streams) to guarantee ordering. They use enhanced fan-out because multiple downstream systems (fraud detection, risk analysis, reporting) need to read the same stream independently. They set retention to 365 days for audit compliance. A common failure is not monitoring ProvisionedThroughputExceeded — they use CloudWatch alarms to trigger auto-scaling (increase shards) when write throughput exceeds 80%.
The SAA-C03 exam tests Kinesis Data Streams under Domain 3.2 (High Performance). Key objectives: design scalable streaming solutions, choose appropriate throughput, and understand consumer patterns.
Common Wrong Answers
"Adding more consumers increases read throughput" — This is false for standard fan-out. Multiple consumers share the same 2 MB/s per shard. Only enhanced fan-out gives each consumer dedicated throughput.
"Shards can be added automatically" — Kinesis does not auto-scale shards. You must use AWS Auto Scaling or manual resharding. The exam often tests that you need to implement auto-scaling via CloudWatch metrics and Lambda.
"Kinesis Data Streams can directly load data into Redshift" — It cannot. You must use Kinesis Firehose to load into Redshift. This is a classic trap.
"Data is ordered globally across all shards" — Order is only guaranteed per shard. If you need global ordering, use a single shard.
Specific Numbers and Terms
Write throughput per shard: 1 MB/s or 1,000 records/s.
Read throughput per shard (standard): 2 MB/s.
Enhanced fan-out: 2 MB/s per consumer per shard.
Maximum data blob: 1 MB.
Maximum partition key: 256 bytes.
Default retention: 24 hours, max 365 days.
Shard iterator expiration: 5 minutes.
KCL uses DynamoDB for lease management.
Lambda maximum batch size: 10,000 records or 6 MB.
Edge Cases
If you need to reprocess data from a specific time, use AT_TIMESTAMP iterator.
If you need to reduce shards, you can merge adjacent shards (they must be adjacent in hash range).
Server-side encryption uses AWS KMS; you must have permissions for KMS keys.
Eliminating Wrong Answers
If the question mentions "multiple consumers needing high throughput," look for enhanced fan-out.
If the question mentions "cost-effective," standard fan-out is cheaper.
If the question mentions "ordering across all records," the answer likely involves a single shard or a different service like SQS FIFO.
If the question mentions "exactly-once processing," Kinesis does not guarantee exactly-once; you need idempotency in your consumer.
Each shard supports 1 MB/s write and 1,000 records/s write (whichever is lower).
Each shard supports 2 MB/s read throughput for standard consumers, shared among all consumers.
Enhanced fan-out gives each consumer its own 2 MB/s read throughput per shard.
Data retention can be set from 24 hours to 365 days.
Shard iterators expire after 5 minutes (300 seconds).
KCL uses DynamoDB for checkpointing and lease management.
Resharding (split/merge) is manual and takes seconds to minutes.
Lambda can poll Kinesis with a maximum batch of 10,000 records or 6 MB.
Order is guaranteed only within a shard, not across shards.
Use `ProvisionedThroughputExceeded` metrics to trigger auto-scaling.
These come up on the exam all the time. Here's how to tell them apart.
Standard Consumer
Shared read throughput of 2 MB/s per shard across all consumers
Lower cost per GB read
Suitable for few consumers or low throughput
Can cause throttling if multiple consumers read heavily
Uses shard iterators that expire in 5 minutes
Enhanced Fan-Out Consumer
Dedicated read throughput of 2 MB/s per consumer per shard
Higher cost per GB read
Suitable for many consumers needing high throughput
No throttling due to other consumers
Uses dedicated throughput via HTTP/2; no shard iterator expiration concerns
Mistake
Adding more consumers increases read throughput per shard.
Correct
For standard consumers, the total read throughput per shard is fixed at 2 MB/s, shared by all consumers. Only enhanced fan-out gives each consumer its own 2 MB/s.
Mistake
Kinesis Data Streams can automatically scale shards.
Correct
Kinesis does not auto-scale. You must use AWS Auto Scaling with CloudWatch metrics (e.g., `WriteProvisionedThroughputExceeded`) or manually reshard.
Mistake
Data in a Kinesis Data Stream is ordered globally.
Correct
Order is guaranteed only within a single shard. Across shards, there is no global order.
Mistake
Kinesis Data Streams can load data directly into Amazon Redshift.
Correct
Kinesis Data Streams cannot directly load into Redshift. You must use Kinesis Firehose to deliver to Redshift.
Mistake
The maximum retention period for Kinesis Data Streams is 7 days.
Correct
The default is 24 hours, but you can increase it up to 365 days (additional cost).
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Each shard supports up to 1 MB/s or 1,000 records per second, whichever is reached first. This includes the size of the partition key. If you exceed this, you get `ProvisionedThroughputExceededException`. To increase throughput, you must add more shards.
Standard consumers share the 2 MB/s read throughput per shard. Enhanced fan-out gives each consumer its own dedicated 2 MB/s per shard, eliminating contention. Enhanced fan-out costs more per GB read but is necessary when multiple consumers need high throughput.
The default retention period is 24 hours, but you can increase it up to 365 days. Extending retention incurs additional storage costs. This is useful for replaying data or compliance.
KCL uses DynamoDB to store leases. Each shard has a lease, and each consumer instance claims one or more leases. DynamoDB is also used for checkpointing (storing the last processed sequence number). If a consumer fails, its leases are reassigned after a heartbeat timeout (default 10 seconds).
No, Kinesis does not auto-scale shards. You must implement auto-scaling using CloudWatch metrics (e.g., `WriteProvisionedThroughputExceeded`) and a Lambda function to trigger resharding via the UpdateShardCount API. Alternatively, you can manually reshard.
`PutRecord` sends a single record. `PutRecords` sends up to 500 records in a single call, reducing API overhead and costs. Both use partition key hashing to determine the shard. `PutRecords` is more efficient for high-throughput producers.
Lambda can poll a Kinesis stream as an event source. It reads batches of up to 10,000 records or 6 MB per invocation. By default, Lambda processes one batch per shard at a time. You can increase concurrency using `ParallelizationFactor` (up to 10). Lambda uses the `IteratorAge` metric to track lag.
You've just covered Kinesis Data Streams: Shards and Consumers — now see how well it sticks with free SAA-C03 practice questions. Full explanations included, no account needed.
Done with this chapter?