DP-900Chapter 29 of 101Objective 3.5

Azure Stream Analytics

This chapter covers Azure Stream Analytics, a fully managed real-time event processing engine on Azure. For the DP-900 exam, you must understand its role in analytics, its core components (inputs, outputs, queries), and typical use cases. Approximately 5-8% of exam questions touch on stream analytics, often comparing it to batch processing or testing your knowledge of its integration with other Azure services. By the end, you'll be able to describe how Stream Analytics ingests, processes, and outputs streaming data with low latency.

25 min read
Intermediate
Updated May 31, 2026

Stream Analytics as an Airport Baggage System

Imagine a busy airport baggage handling system. Suitcases (data events) arrive continuously from check-in counters (input sources like IoT devices or event hubs). Each suitcase has a tag (timestamp and schema). The system must sort, filter, and route bags to the correct gates (output sinks) in real time without stopping. Conveyor belts (streaming pipelines) move bags at high speed. At each junction (query step), a sensor (streaming operator) inspects tags: if a bag is for a delayed flight, it diverts to a holding area (windowed aggregation); if it's from a VIP customer, it routes to express lane (filtering). The system can handle thousands of bags per second (throughput). If a bag's tag is missing (malformed event), it goes to a lost-and-found bin (error output). A central control room (Azure Stream Analytics job) monitors the entire operation, adjusting belt speeds (scaling) and rerouting as needed. This mirrors Azure Stream Analytics: it ingests events from Azure Event Hubs or IoT Hub, processes them using SQL-like queries with temporal windows, and outputs results to Azure SQL Database, Blob Storage, or Power BI—all with sub-second latency and exactly-once semantics within a time window.

How It Actually Works

What is Azure Stream Analytics?

Azure Stream Analytics (ASA) is a cloud-based, fully managed stream processing engine designed to handle high volumes of streaming data with sub-second latency. It enables real-time analytics on data from sources like Azure Event Hubs, Azure IoT Hub, and Blob Storage. ASA uses a SQL-like query language to define transformations, aggregations, and patterns over time windows. It is part of Azure's analytics portfolio, complementing batch services like Azure Synapse Analytics and storage services like Azure Data Lake.

Why Stream Analytics Exists

Traditional batch processing (e.g., Azure Batch or scheduled jobs) introduces latency of minutes to hours, which is unacceptable for scenarios like fraud detection, real-time dashboards, or IoT alerts. Stream Analytics fills this gap by processing events as they arrive, enabling immediate insights. It is designed for continuous, unbounded data streams and provides built-in support for time windows, event ordering, and exactly-once delivery semantics within a time window.

How It Works Internally

ASA jobs consist of three core components: input, query, and output. The input source (e.g., Event Hub) emits events as JSON, CSV, or Avro. The query defines the processing logic—filtering, aggregating, joining streams, or detecting patterns. The output sink receives results (e.g., Azure SQL Database, Power BI, Blob Storage).

Internally, ASA uses a distributed, scalable engine that partitions the stream based on a partition key (e.g., device ID). Each partition is processed independently, allowing horizontal scaling. The engine maintains state for windowed operations (e.g., tumbling, hopping, sliding windows) and uses checkpoints to recover from failures. Events are processed in order based on arrival time or event time (timestamp), with a configurable late arrival tolerance (default 5 seconds).

Key Components and Defaults

Inputs: Azure Event Hubs, IoT Hub, Blob Storage (for reference data). Event Hubs can ingest millions of events per second. Default partitioning: 16 partitions per Event Hub (configurable).

Outputs: Azure SQL Database, Azure Cosmos DB, Azure Blob Storage, Azure Data Lake Storage Gen2, Event Hubs, Power BI, Azure Functions, and more. Each output supports a specific write pattern (e.g., batch inserts to SQL).

Query: SQL-like syntax with extensions for streaming: TIMESTAMP BY to define event time, GROUP BY TumblingWindow(second, 10), HAVING, JOIN over streams. Queries can include user-defined functions (UDFs) in JavaScript or C#.

Windows:

Tumbling: fixed-size, non-overlapping (e.g., 10-second window).

Hopping: fixed-size, overlapping (e.g., 10-second window with 5-second hop).

Sliding: windows that start when an event arrives (e.g., last 10 seconds).

Session: groups events based on inactivity gap (e.g., 5-second gap).

Event Ordering: ASA handles out-of-order events with a configurable tolerance (default 5 seconds). Beyond that, events are dropped or adjusted based on policy (Adjusted, Drop).

Late Arrival Tolerance: Default 5 seconds. Events arriving later are either adjusted to the window or dropped.

Exactly-Once Delivery: ASA guarantees exactly-once delivery to output sinks within a time window, using checkpointing and idempotent writes.

Configuration and Verification

Creating an ASA job via Azure CLI:

az stream-analytics job create \
  --resource-group myResourceGroup \
  --name myStreamJob \
  --location eastus \
  --compatibility-level 1.2 \
  --output-error-policy Drop

Define input and output:

az stream-analytics input create \
  --job-name myStreamJob \
  --resource-group myResourceGroup \
  --name eventhubinput \
  --properties '{"type":"Stream","datasource":{"type":"Microsoft.EventHub/EventHub","properties":{"serviceBusNamespace":"myNamespace","eventHubName":"myHub","sharedAccessPolicyName":"RootManageSharedAccessKey","sharedAccessPolicyKey":"key"}}}'

az stream-analytics output create \
  --job-name myStreamJob \
  --resource-group myResourceGroup \
  --name sqldboutput \
  --datasource '{"type":"Microsoft.Sql/Server/Database","properties":{"server":"myserver","database":"mydb","user":"admin","password":"pwd","table":"Results"}}'

Define query:

az stream-analytics transformation create \
  --job-name myStreamJob \
  --resource-group myResourceGroup \
  --name Transformation \
  --streaming-units 3 \
  --saql "SELECT DeviceId, AVG(Temperature) AS AvgTemp INTO sqldboutput FROM eventhubinput TIMESTAMP BY EventTime GROUP BY DeviceId, TumblingWindow(second, 10)"

Start the job:

az stream-analytics job start \
  --resource-group myResourceGroup \
  --name myStreamJob \
  --output-start-mode JobStartTime

Monitoring: Use Azure Monitor metrics like Input Events, Output Events, Watermark delay, and Utilization (SU%). SU% should stay below 80% to avoid throttling.

Interaction with Related Technologies

ASA integrates natively with Azure Event Hubs and IoT Hub for ingestion. It can reference static data from Blob Storage (reference data joins) to enrich streams. Outputs can feed Power BI for real-time dashboards, Azure SQL for persistent storage, or Azure Functions for custom processing. ASA can also send alerts to Azure Monitor or trigger Logic Apps. For complex event processing, ASA can join multiple streams, detect patterns (e.g., MATCH_RECOGNIZE), and handle temporal anomalies.

Performance and Scaling

ASA scales by increasing Streaming Units (SU), which represent compute capacity. 1 SU provides 1 MB/s throughput. For high throughput, use partitioned inputs (e.g., Event Hub partitions) and ensure the query uses PARTITION BY. The maximum SU per job is 192 (in most regions). For larger workloads, use multiple jobs with a load balancer (e.g., Event Hub consumer groups).

Security and Compliance

ASA supports managed identity for secure access to resources, eliminating the need for shared access keys. It also supports VNET integration (via private endpoints) for network isolation. Data is encrypted at rest and in transit. ASA is compliant with ISO 27001, SOC 2, and HIPAA.

Walk-Through

1

Define Input Source

The first step is to configure one or more input sources. In the Azure portal, under the Stream Analytics job, you add an input. You specify the source type (Event Hub, IoT Hub, Blob Storage), connection details (namespace, event hub name, consumer group, authentication method), and serialization format (JSON, CSV, Avro). The default consumer group is '$Default'. For high throughput, use a separate consumer group to avoid throttling. The input schema is inferred from the first event or defined manually.

2

Define Output Sink

Next, you define where the processed results go. Output types include Azure SQL Database, Cosmos DB, Blob Storage, Power BI, and more. For SQL, you provide server, database, table, and authentication. For Blob, specify storage account, container, path pattern (e.g., 'yyyy/mm/dd'), and format. ASA supports exactly-once delivery by using idempotent writes (e.g., upsert to Cosmos DB) or checkpointing. Multiple outputs can be defined within one job.

3

Write Streaming Query

The query defines the transformation logic. It is written in Stream Analytics Query Language (SAQL), an extension of SQL. You use `TIMESTAMP BY` to define the event time column. Example: `SELECT DeviceId, AVG(Temperature) AS AvgTemp INTO output FROM input TIMESTAMP BY EventTime GROUP BY DeviceId, TumblingWindow(second, 10)`. Windows can be Tumbling, Hopping, Sliding, or Session. The query can also join multiple streams or reference data.

4

Configure Event Ordering

ASA must handle out-of-order and late-arriving events. In the job configuration, set the 'Out-of-order tolerance' (default 0 seconds) and 'Late arrival tolerance' (default 5 seconds). Events out of order beyond tolerance are either adjusted to the correct window (if possible) or dropped. Choose a policy: 'Adjusted' or 'Drop'. For exactly-once semantics, use 'Adjusted' and ensure tolerances match your data characteristics.

5

Start and Monitor Job

Once configured, start the job. You can start from the job creation time, custom time, or when the last output was produced (for restart). Monitor using Azure Metrics: Input Events, Output Events, Watermark delay (should be low), and SU Utilization (keep below 80%). If SU% exceeds 80%, increase Streaming Units or optimize query parallelism. Use `PARTITION BY` in the query to align with input partitions.

What This Looks Like on the Job

Enterprise Scenario 1: Real-Time Dashboard for IoT Sensor Data

A manufacturing company deploys thousands of temperature sensors across factories. Each sensor sends a JSON event every second to Azure IoT Hub. The company needs a real-time dashboard showing average temperature per machine per minute, and alerts when temperature exceeds 100°C. They create an ASA job with IoT Hub input, Power BI output for the dashboard, and Azure SQL output for historical storage. The query uses a TumblingWindow of 1 minute, grouping by MachineId. For alerts, a separate query filters Temperature > 100 and outputs to Event Hub, which triggers an Azure Function to send SMS. In production, they scale to 6 SUs for 10,000 sensors, monitoring SU% at 70%. A common misconfiguration is using a single partition in the query, causing a bottleneck. They fixed it by using PARTITION BY DeviceId with 16 partitions in IoT Hub.

Scenario 2: Fraud Detection in Financial Transactions

A bank processes credit card transactions in real time. Each transaction event includes amount, merchant, and timestamp. The bank needs to flag transactions that exceed $10,000 or occur more than 5 times within 10 minutes from the same card. They use ASA with Event Hub input and Azure SQL output for flagged transactions. The query uses a HoppingWindow of 10 minutes with 1-minute hop to count transactions per card. For threshold, they use HAVING COUNT(*) > 5. They also join with reference data (fraudulent merchant list) from Blob Storage. The job runs with 12 SUs to handle 50,000 transactions per second. A misconfiguration that caused issues: not setting TIMESTAMP BY correctly, leading to wrong window assignments. They fixed by using the transaction timestamp from the event payload.

Scenario 3: Log Analytics from Web Servers

A SaaS company collects web server logs (requests, errors) from multiple regions. Logs are sent to Event Hubs partitioned by region. They need real-time monitoring of error rates and a dashboard for ops team. They use ASA with Event Hub input, output to Azure Data Lake Storage Gen2 for long-term storage, and a separate output to Power BI for live dashboard. The query filters Status >= 500 and aggregates by region using a SlidingWindow of 5 minutes. They also use a reference data join to map region codes to names. The job is partitioned by PartitionId to align with Event Hub partitions. They set late arrival tolerance to 10 seconds because logs from some regions have network delays. A common mistake is forgetting to set the output path pattern for Blob Storage, causing all data to go to a single file. They use {date}/{region} pattern for partitioning.

How DP-900 Actually Tests This

Exam Focus for DP-900 Objective 3.5

The DP-900 exam tests your understanding of Azure Stream Analytics as a real-time analytics service. You need to know: - What it is: A fully managed stream processing engine for real-time analytics on streaming data. - Common use cases: Real-time dashboards, alerting, IoT telemetry, fraud detection. - Core components: Inputs (Event Hubs, IoT Hub, Blob), Outputs (SQL, Cosmos DB, Blob, Power BI), Queries (SQL-like with time windows). - Time window types: Tumbling, Hopping, Sliding, Session. Know their characteristics (fixed, overlapping, event-driven). - Event ordering: Late arrival tolerance (default 5 seconds) and out-of-order tolerance. Policies: Adjusted or Drop. - Scaling: Streaming Units (SU) – 1 SU = 1 MB/s throughput. Max 192 SU per job. Use PARTITION BY for parallelism. - Exactly-once delivery: Guaranteed within a time window.

Common Wrong Answers and Traps

1.

Confusing with Azure Data Lake Analytics: Candidates often choose Data Lake Analytics for real-time processing, but it is batch-only. Stream Analytics is for real-time.

2.

Thinking Stream Analytics can process batch data natively: It is designed for streams; batch data needs to be sent to a stream source first (e.g., Event Hub).

3.

Assuming it supports exactly-once globally: Exactly-once is within a time window, not across restarts or failures without idempotent outputs.

4.

Mixing up window types: Tumbling windows are non-overlapping; Hopping windows overlap. Sliding windows start at each event. Session windows group by inactivity.

5.

Forgetting about event ordering: The exam may ask about handling late events. Default late arrival is 5 seconds; out-of-order tolerance is 0 seconds.

Specific Numbers and Terms on the Exam

Default late arrival tolerance: 5 seconds.

Default out-of-order tolerance: 0 seconds.

Streaming Unit throughput: 1 MB/s.

Maximum SUs per job: 192 (in most regions).

Supported serialization: JSON, CSV, Avro.

Reference data source: Azure Blob Storage (static data for joins).

Query language: Stream Analytics Query Language (SAQL), based on SQL.

Edge Cases the Exam Loves

What happens if an event arrives after the late arrival tolerance? It is dropped or adjusted based on policy.

Can you use Stream Analytics with Azure Synapse? Yes, via output to Azure Data Lake Storage or SQL.

Is Stream Analytics serverless? Yes, it is fully managed; you don't manage infrastructure.

How to Eliminate Wrong Answers

If a question asks for a real-time processing service, eliminate batch services like Azure Data Factory, Azure Batch, or Azure Synapse pipelines. Look for keywords: 'real-time', 'streaming', 'low latency', 'event'. For window types, remember: Tumbling = fixed non-overlapping, Hopping = fixed overlapping, Sliding = event-triggered, Session = gap-based. For event ordering, if the question mentions 'out-of-order events', the answer involves configuring tolerance and policy.

Key Takeaways

Azure Stream Analytics is a fully managed real-time stream processing engine.

Inputs: Event Hubs, IoT Hub, Blob Storage. Outputs: SQL, Cosmos DB, Blob, Power BI, Event Hubs.

Query language is SAQL, an extension of SQL with time windows (Tumbling, Hopping, Sliding, Session).

Default late arrival tolerance is 5 seconds; out-of-order tolerance is 0 seconds.

Streaming Units (SU) measure throughput: 1 SU = 1 MB/s. Max 192 SUs per job.

Use PARTITION BY in queries to align with input partitions for scalability.

Exactly-once delivery is guaranteed within a time window when using idempotent outputs.

Stream Analytics is ideal for IoT telemetry, real-time dashboards, fraud detection, and alerting.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Azure Stream Analytics

Real-time stream processing with sub-second latency.

SQL-like query language (SAQL) with time windows.

Fully managed, no infrastructure to manage.

Scales via Streaming Units (1 MB/s per SU).

Outputs to Power BI, SQL, Cosmos DB, Blob, Event Hubs.

Azure Data Lake Analytics

Batch processing only, minutes to hours latency.

Uses U-SQL (SQL + C#) for analytics.

Also fully managed, but designed for large-scale batch jobs.

Scales via Azure Data Lake Analytics Units (AU).

Outputs to Azure Data Lake Storage, SQL, Blob.

Watch Out for These

Mistake

Azure Stream Analytics can process both streaming and batch data natively.

Correct

ASA is designed exclusively for streaming data. To process batch data, you must first send it to a streaming source like Event Hubs or IoT Hub. For batch processing, use Azure Data Factory or Azure Synapse.

Mistake

Stream Analytics guarantees exactly-once delivery globally, even across job restarts.

Correct

Exactly-once delivery is guaranteed within a time window, but only if the output sink supports idempotent writes (e.g., Cosmos DB upsert). For non-idempotent sinks, duplicates may occur during restarts.

Mistake

You can use any SQL function in Stream Analytics queries.

Correct

SAQL is a subset of T-SQL with extensions for streaming. Not all SQL functions are supported (e.g., complex subqueries). It supports window functions, aggregates, joins, and pattern matching.

Mistake

Stream Analytics jobs automatically scale to handle any throughput.

Correct

You must manually configure Streaming Units (SU) and use PARTITION BY to scale. The maximum is 192 SUs per job. For higher throughput, you need multiple jobs or increase input partitions.

Mistake

Event order is always based on arrival time.

Correct

You can specify event time using `TIMESTAMP BY` in the query. If not specified, arrival time is used. ASA supports out-of-order handling with configurable tolerance.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between Azure Stream Analytics and Azure Data Lake Analytics?

Azure Stream Analytics is for real-time stream processing (sub-second latency), while Azure Data Lake Analytics is for batch processing (minutes/hours). Stream Analytics uses SAQL with time windows; Data Lake Analytics uses U-SQL. Both are fully managed, but Stream Analytics is event-driven, whereas Data Lake Analytics is job-based.

How do I handle out-of-order events in Azure Stream Analytics?

Configure 'Out-of-order tolerance' (default 0 seconds) and 'Late arrival tolerance' (default 5 seconds) in the job settings. Events arriving out of order within tolerance are adjusted to the correct window; beyond tolerance, they are dropped or adjusted based on the policy (Adjusted or Drop).

Can Azure Stream Analytics output to multiple destinations?

Yes, a single ASA job can have multiple outputs. You define each output in the job and use separate `INTO` clauses in the query to direct data to different sinks (e.g., one to SQL, one to Power BI).

What are Streaming Units and how do I choose the right number?

Streaming Units (SU) represent compute capacity. 1 SU provides 1 MB/s throughput. Start with 1 SU for low throughput and monitor SU Utilization. If it exceeds 80%, increase SUs. The maximum is 192 SUs per job. For high throughput, ensure your query uses PARTITION BY to parallelize.

Is Azure Stream Analytics serverless?

Yes, ASA is fully managed and serverless. You do not need to provision or manage infrastructure. You only define the job, inputs, outputs, and query, and Azure handles scaling and fault tolerance.

What serialization formats does Azure Stream Analytics support?

ASA supports JSON, CSV, and Avro for input and output. JSON is the most common. For Avro, the schema must be provided inline or via a reference file.

Can I use Azure Stream Analytics with reference data?

Yes, you can join streaming data with static reference data stored in Azure Blob Storage. Reference data is loaded into memory at job start and can be refreshed periodically (e.g., every 5 minutes). Use `JOIN` with `ON` conditions in the query.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Azure Stream Analytics — now see how well it sticks with free DP-900 practice questions. Full explanations included, no account needed.

Done with this chapter?