This chapter covers Azure Event Hubs, Microsoft's fully managed, real-time data streaming platform. Event Hubs is a core service for ingesting millions of events per second from sources like IoT devices, application logs, and clickstreams. For the DP-900 exam, expect roughly 5-8% of questions to touch on Event Hubs, focusing on its role in the analytics pipeline, throughput units, partitions, and comparison with IoT Hub. Mastery of Event Hubs is essential for understanding how streaming data enters Azure analytics services like Stream Analytics and Power BI.
Jump to a section
Imagine a major highway with thousands of cars (data events) entering a toll plaza every second. Each car must pay a toll and be counted. The toll plaza has multiple toll booths (partitions) open, each with its own lane. Cars are assigned to a specific lane based on their license plate (partition key) — all cars with the same key go to the same booth, ensuring they stay in order. Each booth has a ticket dispenser that stamps the arrival time and a counter that increments for every car. The toll operator (consumer) sits at a booth and collects tickets in batches, processing them in order. If one operator falls behind, the cars keep coming — the plaza doesn't stop. The plaza can also retain tickets for up to 7 days (retention) so operators can catch up or replay. If a lane gets too many cars, the plaza can add more booths (scale out partitions) but only during off-hours (no repartitioning while live). The toll plaza manager (Event Hubs namespace) oversees all booths and ensures each lane gets a fair share of traffic (throughput units). This is exactly how Event Hubs works: events are ingested into partitions, consumers read from partitions in order, and the system scales by adding throughput units or partitions.
What is Azure Event Hubs and Why Does It Exist?
Azure Event Hubs is a big data streaming platform and event ingestion service. It can receive and process millions of events per second from concurrent sources. Its primary purpose is to decouple event producers from event consumers — producers send events without waiting for consumers to process them, and consumers read events at their own pace. This is critical in scenarios where data velocity is high, such as telemetry from IoT devices, application logs, or clickstreams.
Event Hubs sits at the beginning of an analytics pipeline. It acts as a buffer or 'front door' for streaming data, allowing you to ingest data reliably and then route it to downstream services like Azure Stream Analytics, Azure Data Lake Storage, or Azure Functions. The DP-900 exam tests your understanding of Event Hubs' role in this pipeline, not deep implementation details.
How Event Hubs Works Internally
Event Hubs uses a partitioned consumer model. Here's the mechanism:
Ingestion: Producers send events to an Event Hubs namespace using AMQP 1.0, HTTPS, or Kafka protocol. Events are not stored indefinitely; they are retained for a configurable period (1 to 7 days, default 1 day).
Partitioning: Each event hub has at least 2 partitions (16 max by default, configurable up to 32). A partition is an ordered sequence of events. When an event is sent, it is assigned to a partition based on a partition key (a string). If no key is provided, events are distributed round-robin. All events with the same partition key go to the same partition, ensuring order for that key.
Throughput Units (TUs): Throughput is measured in Throughput Units. Each TU allows 1 MB/s ingress and 2 MB/s egress, plus 1000 events/sec ingress and 4096 events/sec egress. You can auto-inflate TUs from 1 to 20. TUs are billed per hour.
Consumer Groups: A consumer group is a view of the entire event hub. Each consumer group has its own offset pointer for each partition. Multiple consumer groups allow different applications to read the same stream independently. For example, one consumer group for real-time dashboard, another for archival.
Checkpointing: Consumers checkpoint their position (offset) in a partition to persistent storage (e.g., Azure Blob Storage). This allows recovery after a crash without reprocessing all events.
Event Retention: Events are retained for the configured retention period, after which they are removed. This allows replay or catch-up.
Key Components, Values, Defaults, and Timers
Namespace: Logical container for one or more event hubs. FQDN: <name>.servicebus.windows.net.
Event Hub: Specific data stream within a namespace.
Partition: Default 4 partitions (min 2, max 32). Partitions cannot be changed after creation.
Throughput Units: 1 TU default, can be set manually or auto-inflate (up to 20). Auto-inflate increases TUs as load increases, up to a maximum you set.
Retention: 1 day default, can be set from 1 to 7 days.
Consumer Groups: Default $Default consumer group. You can create up to 20 consumer groups.
Protocols: AMQP 1.0 (port 5671/5672), HTTPS (443), Kafka (9093).
Event Size: Max 1 MB per event. Larger payloads should be broken into smaller events.
Publisher Policy: Optional SAS tokens per publisher for security.
Configuration and Verification Commands
You can create an Event Hubs namespace and event hub via Azure CLI:
# Create namespace (Standard tier)
az eventhubs namespace create --name mynamespace --resource-group myrg --location eastus --sku Standard
# Create event hub
az eventhubs eventhub create --name myeventhub --namespace-name mynamespace --resource-group myrg --partition-count 4 --message-retention 1
# Enable auto-inflate
az eventhubs namespace update --name mynamespace --resource-group myrg --auto-inflate-enabled true --maximum-throughput-units 10To send events using .NET SDK:
var producer = new EventHubProducerClient(connectionString, eventHubName);
using EventDataBatch eventBatch = await producer.CreateBatchAsync();
eventBatch.TryAdd(new EventData(Encoding.UTF8.GetBytes("Hello, Event Hubs!")));
await producer.SendAsync(eventBatch);To consume events:
var consumer = new EventHubConsumerClient(consumerGroup, connectionString, eventHubName);
await foreach (PartitionEvent partitionEvent in consumer.ReadEventsAsync())
{
Console.WriteLine(Encoding.UTF8.GetString(partitionEvent.Data.Body.ToArray()));
}How Event Hubs Interacts with Related Technologies
Azure Stream Analytics: Directly takes Event Hubs as input for real-time analytics. Output can be to Power BI, SQL Database, or Blob Storage.
Azure Functions: Can be triggered by Event Hubs (Event Hubs trigger). Useful for lightweight processing.
Azure Data Lake Storage Gen2: Events can be captured to Data Lake Storage via Event Hubs Capture.
Azure IoT Hub: IoT Hub uses Event Hubs as its underlying telemetry ingestion path. For DP-900, know that IoT Hub is for IoT device management and bidirectional communication, while Event Hubs is for high-throughput event ingestion.
Azure Service Bus: Service Bus is for enterprise messaging (queues/topics) with features like dead-lettering and sessions. Event Hubs is for streaming large volumes of events.
Power BI: Event Hubs can feed into Power BI via Stream Analytics for real-time dashboards.
Trap Patterns
Common wrong answers on the exam: - Choosing IoT Hub over Event Hubs for high-throughput telemetry: IoT Hub is for device management and has lower throughput limits (e.g., 6000 messages/day per device). Event Hubs is for high-throughput ingestion. - Thinking partitions can be increased after creation: Partitions are fixed at creation. You cannot change partition count later. To scale, you must create a new event hub with more partitions. - Confusing Throughput Units with partitions: TUs control ingress/egress bandwidth and event rate. Partitions control parallelism and ordering. More partitions do not increase throughput unless TUs are also increased. - Believing events are stored forever: Retention is 1-7 days. For long-term storage, use Capture to Data Lake or Blob Storage. - Assuming consumer groups can read from any offset: By default, consumers start at the latest offset or earliest offset. You can specify an offset, but not arbitrarily skip around.
Summary of Numbers for the Exam
Minimum partitions: 2
Default partitions: 4
Maximum partitions: 32 (configurable up to 32)
Default retention: 1 day
Retention range: 1-7 days
Max event size: 1 MB
Throughput units: 1-20 (auto-inflate up to 20)
Ingress per TU: 1 MB/s or 1000 events/s
Egress per TU: 2 MB/s or 4096 events/s
Protocols: AMQP, HTTPS, Kafka
Consumer groups per event hub: up to 20
Namespace tiers: Basic (limited), Standard (full), Dedicated (isolated)
This comprehensive understanding will prepare you for DP-900 questions on Event Hubs.
Provision Event Hubs Namespace
First, create an Event Hubs namespace in your Azure subscription. Choose a unique name (3-24 characters, alphanumeric and hyphens), a region, and a pricing tier (Basic, Standard, or Dedicated). For DP-900, know that Standard tier is most common, offering features like auto-inflate, Kafka support, and Capture. The namespace provides a DNS endpoint (e.g., mynamespace.servicebus.windows.net) and acts as a container for your event hubs. You can create the namespace via Azure portal, CLI, or PowerShell.
Create an Event Hub
Within the namespace, create an event hub with a specific name. Choose the partition count (2-32) and message retention period (1-7 days). Partitions are fixed after creation, so choose based on expected parallelism. The default partition count is 4. The retention period determines how long events are kept before being discarded. Also, set the throughput units (TUs) — either manually or enable auto-inflate to automatically scale from 1 to a maximum (up to 20). This step defines the capacity and behavior of your stream.
Configure Security and Access
Secure your Event Hubs using Shared Access Signatures (SAS) or Azure Active Directory (AAD). SAS tokens can be generated for the namespace or specific event hubs with specific permissions (Send, Listen, Manage). For DP-900, understand that SAS is a common method for granting access. You can also use managed identities for Azure resources. Additionally, consider network security: enable firewall rules to restrict IP addresses or use Virtual Network service endpoints. This step ensures only authorized producers and consumers can access the event hub.
Producers Send Events
Producers (e.g., IoT devices, applications) send events to the event hub using AMQP 1.0, HTTPS, or Kafka protocol. Each event can be up to 1 MB and optionally includes a partition key. If a partition key is provided, all events with that key go to the same partition, preserving order. Without a key, events are distributed round-robin across partitions. Producers should handle retries and backoff to manage throttling if TUs are exceeded. The event hub buffers events in memory and persists them to durable storage within the retention period.
Consumers Read Events
Consumers (e.g., Stream Analytics, Azure Functions, custom apps) connect to the event hub using a consumer group. Each consumer group maintains its own offset per partition. Consumers can read from the beginning (earliest offset) or only new events (latest offset). They should checkpoint their progress periodically to persistent storage (e.g., Blob Storage) to enable recovery. Multiple consumers in the same consumer group should coordinate to avoid duplicate processing — typically using the Event Processor Host pattern. Consumers can also use the Event Hubs Capture feature to automatically write all events to Azure Blob Storage or Data Lake Storage at regular intervals.
Enterprise Scenario 1: IoT Telemetry Ingestion
A manufacturing company deploys thousands of sensors on factory equipment. Each sensor sends temperature, vibration, and pressure readings every second. They use Azure Event Hubs to ingest this high-velocity data. The event hub is configured with 16 partitions and 10 throughput units with auto-inflate enabled. Sensors send events using AMQP with a partition key based on sensor ID to ensure order per sensor. Downstream, Azure Stream Analytics reads from the event hub, computes rolling averages, and sends alerts to a dashboard when thresholds are exceeded. Additionally, all raw data is captured to Azure Data Lake Storage Gen2 for long-term analysis. Common issue: if TUs are under-provisioned, producers get throttled (HTTP 429 or AMQP link detach). Solution: monitor TU usage and enable auto-inflate. Another issue: if partition count is too low, consumers may not achieve desired parallelism. Best practice: choose partition count based on expected consumer parallelism, not throughput.
Enterprise Scenario 2: Clickstream Analytics
An e-commerce platform tracks user clicks on their website. They send click events to Event Hubs via HTTPS. The event hub has 8 partitions and 5 TUs. A consumer group is dedicated to real-time analytics using Stream Analytics, which outputs to Power BI for live traffic dashboards. Another consumer group is used by a custom .NET application that enriches events with user profile data and stores them in Cosmos DB for personalized recommendations. A third consumer group feeds into Azure Data Explorer for ad-hoc querying. The team uses checkpointing to ensure exactly-once processing. A common misconfiguration: using the same consumer group for multiple incompatible consumers, causing offset conflicts. Best practice: create separate consumer groups for each distinct consumer application.
Enterprise Scenario 3: Log Aggregation
A SaaS company aggregates application logs from microservices running in Kubernetes. Each microservice sends structured log events to Event Hubs using the Kafka protocol. The event hub is configured with 32 partitions and 20 TUs (auto-inflate max). Logs are consumed by Azure Functions that parse and route logs to different sinks: critical errors go to Azure Monitor, debug logs go to Blob Storage, and audit logs go to SQL Database. The team uses Event Hubs Capture to archive all logs to Data Lake Storage for compliance. Problem: if retention is set too low (default 1 day), logs are lost before consumers can process during a backlog. Solution: increase retention to 7 days for critical pipelines. Also, ensure that the number of partitions matches the number of parallel consumers to maximize throughput.
Exactly What DP-900 Tests on Event Hubs
The DP-900 exam objective 3.5 covers 'Describe Azure Event Hubs for streaming data.' You need to:
Identify the use cases for Event Hubs (high-throughput telemetry, log ingestion, clickstreams).
Understand the concept of partitions and how they enable parallel processing and ordering.
Know the role of throughput units and how they control capacity.
Differentiate Event Hubs from IoT Hub and Service Bus.
Recognize Event Hubs as an input source for Azure Stream Analytics.
Understand event retention (1-7 days) and that events are not stored permanently.
Know that Event Hubs supports AMQP, HTTPS, and Kafka protocols.
Most Common Wrong Answers and Why
IoT Hub for telemetry ingestion: Many candidates choose IoT Hub when asked about high-throughput event ingestion. IoT Hub is designed for device management and bidirectional communication, with throttling limits (e.g., 6000 messages/day per device). Event Hubs is for high-throughput, unidirectional event streaming. The exam will test this distinction.
Partitions can be increased after creation: Candidates often think partitions can be scaled up like TUs. Actually, partition count is fixed at creation. If you need more partitions, you must create a new event hub and migrate. The exam may ask about this limitation.
Throughput units and partitions are the same: Some confuse TUs with partitions. TUs control bandwidth and event rate; partitions control ordering and parallelism. More partitions do not increase throughput unless TUs are also increased. The exam may ask about scaling.
Events are stored forever: Retention is 1-7 days. For long-term storage, you must use Capture or a downstream service. The exam may ask about retention limits.
Event Hubs is a queuing service: It is not a queue; it is a stream. Consumers can read multiple times from different consumer groups. There is no message deletion after consumption. The exam may compare with Service Bus queues.
Specific Numbers and Terms on the Exam
Default partition count: 4
Maximum partitions: 32
Default retention: 1 day
Retention range: 1-7 days
Max event size: 1 MB
Throughput units: 1-20 (auto-inflate up to 20)
Ingress per TU: 1 MB/s or 1000 events/s
Egress per TU: 2 MB/s or 4096 events/s
Protocols: AMQP, HTTPS, Kafka
Consumer groups: up to 20 per event hub
Edge Cases and Exceptions
Event Hubs Capture can be enabled on Standard tier or above.
Basic tier does not support consumer groups (only $Default) and has lower limits.
Dedicated tier offers dedicated capacity and no throttling due to TUs.
You can send events using Kafka protocol without any code change if your app already uses Kafka.
Partition keys are hashed to assign partitions; you cannot control which partition a key maps to.
If you use the same consumer group for multiple consumer instances, they must coordinate to avoid duplicate processing (e.g., using EventProcessorHost).
How to Eliminate Wrong Answers
Read the scenario carefully. If the question mentions 'high-throughput telemetry from IoT devices,' the answer is likely Event Hubs, not IoT Hub. If it mentions 'device management' or 'commands to devices,' that's IoT Hub. If it mentions 'message queuing with dead-lettering,' that's Service Bus. If it mentions 'real-time analytics on streaming data,' that's Stream Analytics, but the input source is Event Hubs. Look for keywords like 'ingest,' 'streaming,' 'millions of events.' Remember that Event Hubs is about ingestion, not processing.
Event Hubs is a big data streaming platform for high-throughput event ingestion.
Partitions are fixed at creation (2-32) and enable ordered processing per partition key.
Throughput Units (TUs) control ingress (1 MB/s or 1000 events/s per TU) and egress (2 MB/s or 4096 events/s per TU).
Events are retained for 1-7 days (default 1 day). Use Capture for long-term storage.
Event Hubs supports AMQP, HTTPS, and Kafka protocols.
Consumer groups allow multiple independent consumers to read the same stream.
Distinguish Event Hubs from IoT Hub (device management) and Service Bus (messaging).
Auto-inflate can automatically scale TUs from 1 to a maximum (up to 20).
These come up on the exam all the time. Here's how to tell them apart.
Azure Event Hubs
High-throughput event ingestion (millions of events/sec).
Unidirectional streaming (producer to consumer).
No device management features.
Supports AMQP, HTTPS, and Kafka protocols.
Scales with Throughput Units (up to 20).
Azure IoT Hub
Lower throughput (e.g., 6000 messages/day per device).
Bidirectional communication (cloud-to-device commands).
Includes device identity registry, twins, and direct methods.
Supports AMQP, HTTPS, and MQTT protocols.
Scales with units (S1, S2, S3) based on daily message quota.
Azure Event Hubs
Optimized for high-throughput event streaming.
Events are retained for a fixed period (1-7 days).
No message deletion after consumption.
Partitioned model for parallelism.
Supports Kafka protocol.
Azure Service Bus
Optimized for enterprise messaging (queues/topics).
Messages are deleted after consumption (except for topics with subscriptions).
Supports dead-lettering, sessions, and transactions.
Supports competing consumers pattern.
Does not support Kafka protocol natively.
Mistake
Event Hubs stores events permanently.
Correct
Event Hubs retains events for a configurable period of 1 to 7 days (default 1 day). After that, events are automatically removed. For long-term storage, you must use Event Hubs Capture to write events to Azure Blob Storage or Azure Data Lake Storage.
Mistake
You can change the number of partitions after creating an event hub.
Correct
Partition count is fixed at creation and cannot be changed later. You must create a new event hub with the desired partition count and migrate your producers and consumers. Plan your partition count upfront based on expected parallelism.
Mistake
Increasing partitions automatically increases throughput.
Correct
Throughput is controlled by Throughput Units (TUs), not partitions. More partitions allow more parallel consumers but do not increase ingress/egress bandwidth. To increase throughput, you must add TUs. Partitions and TUs are independent.
Mistake
Event Hubs is the same as Azure IoT Hub.
Correct
IoT Hub is a managed service for IoT device management, bi-directional communication, and telemetry ingestion. Event Hubs is a high-throughput event streaming service. IoT Hub uses Event Hubs internally for telemetry, but it adds device identity, twin, and command features. For pure event streaming, Event Hubs is more appropriate.
Mistake
Events are deleted immediately after being consumed.
Correct
Events are not deleted upon consumption. They remain in the event hub for the retention period, regardless of how many consumer groups read them. This allows multiple consumers to read the same events independently.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
IoT Hub is a managed service for IoT device management and bi-directional communication. It includes device identity, twins, and commands. Event Hubs is for high-throughput event streaming without device management. IoT Hub internally uses Event Hubs for telemetry ingestion, but has lower throughput limits (e.g., 6000 messages/day per device). For pure streaming, use Event Hubs. For IoT scenarios requiring device control, use IoT Hub.
No, partition count is fixed at creation. You cannot increase or decrease partitions after the event hub is created. If you need more partitions, you must create a new event hub with the desired partition count and migrate your producers and consumers. Plan your partition count based on expected consumer parallelism.
Event Hubs preserves ordering within a partition. If you provide a partition key, all events with that key go to the same partition, maintaining order. Without a partition key, events are distributed round-robin, and order is not guaranteed across partitions. To ensure order for a specific entity, always use a partition key (e.g., device ID).
If you exceed the allocated TUs, Event Hubs throttles producers. Producers receive an HTTP 429 error (for HTTPS) or the AMQP link is detached. You can monitor throttling in Azure Metrics. To avoid throttling, enable auto-inflate to automatically increase TUs up to a maximum you set, or manually increase TUs.
Event Hubs Capture automatically writes all events from an event hub to Azure Blob Storage or Azure Data Lake Storage in Avro format. You can set a time interval (e.g., 5 minutes) or size threshold (e.g., 300 MB) to trigger a write. Capture is useful for long-term retention, archival, and batch processing. It is available in Standard and Dedicated tiers.
Yes, Event Hubs provides a Kafka endpoint that is compatible with Apache Kafka 1.0 and later. You can use existing Kafka clients (producers/consumers) to send and receive events without code changes. Just point your Kafka client to the Event Hubs namespace and use the Kafka protocol (port 9093).
The maximum event size is 1 MB. If your payload is larger than 1 MB, you must break it into smaller events. This limit applies to the entire event, including the body and metadata.
You've just covered Azure Event Hubs for Streaming Data — now see how well it sticks with free DP-900 practice questions. Full explanations included, no account needed.
Done with this chapter?