This chapter covers Azure Container Apps and KEDA (Kubernetes Event-Driven Autoscaling), two core services for deploying and scaling containerized applications in Azure. For the AZ-204 exam, this topic appears in the Compute domain (Objective 1.2) and typically accounts for 5-10% of questions. You will be tested on how to configure event-driven scaling, integrate with Azure services like Service Bus and Event Hubs, and understand the differences between Container Apps, Azure Kubernetes Service (AKS), and Azure Functions. Mastery of KEDA scalers and Container Apps environment architecture is essential for passing the exam.
Jump to a section
Imagine a restaurant kitchen that prepares dishes (containers) based on customer orders (events). The kitchen has a fixed number of chefs (replicas) that can prepare dishes. Normally, a hostess (Azure Container Apps built-in HTTP scaler) checks how many customers are waiting at the door and assigns chefs accordingly. However, some orders come in via phone, online, or delivery apps (Kubernetes events). The hostess doesn't see these orders. So the restaurant hires a manager (KEDA) who monitors all order sources—phones, tablets, delivery services—using specialized sensors (scalers). When the manager sees a surge in phone orders, they walk to the kitchen and tell the hostess to add more chefs (scale replicas) even though the dining room is empty. The manager doesn't cook; they just observe external load and signal the hostess. The hostess still controls the actual hiring/firing (replica count via Kubernetes Horizontal Pod Autoscaler). If the manager fails (KEDA pod down), the kitchen runs on last-known settings until the manager returns. This separation of concerns—event-driven scaling (KEDA) vs. request-driven scaling (built-in HTTP)—is exactly how Azure Container Apps and KEDA work together.
What is Azure Container Apps?
Azure Container Apps is a fully managed serverless container platform that runs on top of Azure Kubernetes Service (AKS) but abstracts away the Kubernetes control plane. You define containers, ingress, and scaling rules declaratively, and Azure handles the underlying infrastructure. It is designed for microservices, API endpoints, and event-driven workloads. Unlike AKS, you do not manage nodes, pods, or the Kubernetes API directly. Container Apps is ideal for scenarios where you want container orchestration without operational overhead.
What is KEDA?
KEDA (Kubernetes Event-Driven Autoscaling) is an open-source project that extends Kubernetes to scale workloads based on events from external sources like Azure Service Bus, Event Hubs, Kafka, Redis, and many more. KEDA works with both Kubernetes Horizontal Pod Autoscaler (HPA) and Azure Container Apps. In Container Apps, KEDA is built-in and allows you to scale your container app replicas based on metrics like queue length, message count, or custom Prometheus metrics. KEDA consists of two components: - Scaler: A component that connects to an external source (e.g., Service Bus) and retrieves metrics. - Operator: A component that activates or deactivates the HPA based on scaler metrics.
How KEDA works internally
KEDA operates in a pull-based model. Here is the step-by-step mechanism:
Scaler queries external source: The KEDA Scaler pod periodically (default every 15 seconds) polls the configured external source (e.g., Azure Service Bus queue) to get the current metric value (e.g., number of messages).
Scaler exposes metric: The Scaler exposes this metric via an internal HTTP endpoint that the KEDA Operator can query.
Operator updates HPA: The KEDA Operator (running as a separate pod) queries the Scaler's metric endpoint and uses it to update the target value on the HPA object. For example, if the target is 10 messages per replica and there are 100 messages, the HPA will aim for 10 replicas.
HPA scales replicas: The Kubernetes HPA (or Container Apps scaling controller) adjusts the number of replicas based on the updated target. In Container Apps, this is handled by the Container Apps scaling subsystem, which is equivalent to HPA but integrated.
Scale-to-zero: KEDA can scale down to zero replicas when no events are present. The Scaler returns a value of 0, the operator deactivates the HPA (sets minReplicas to 0), and the workload is scaled to zero. When a new event arrives, KEDA reactivates the HPA and scales from zero.
Key components, values, defaults, and timers
Cooldown period: Default 300 seconds (5 minutes). This is the time KEDA waits after the last event before scaling down to zero. Configurable in the ScaledObject.
Polling interval: Default 30 seconds. How often KEDA polls the external source. Can be set to as low as 1 second for high-throughput scenarios.
Min replicas: Default 0 for scale-to-zero enabled workloads. Can be set to a minimum to keep warm instances.
Max replicas: Default 100. Upper limit to prevent runaway scaling.
Target metric value: For queue-based scalers, this is the number of messages per replica. Default varies by scaler (e.g., 10 for Service Bus queue).
KEDA version: As of 2025, KEDA 2.x is standard. Container Apps uses a managed version of KEDA.
Configuration and verification commands
To configure KEDA in Azure Container Apps, you define a scale rule in the Container Apps resource. Example ARM template snippet:
{
"properties": {
"template": {
"scale": {
"minReplicas": 0,
"maxReplicas": 10,
"rules": [
{
"name": "servicebus-queue",
"custom": {
"type": "azure-servicebus",
"metadata": {
"queueName": "myqueue",
"namespace": "mynamespace",
"messageCount": "10"
},
"auth": [
{
"secretRef": "servicebus-connection-string",
"triggerParameter": "connection"
}
]
}
}
]
}
}
}
}To verify scaling behavior, use Azure Monitor metrics:
az monitor metrics list --resource <container-app-id> --metric "ReplicaCount" --interval 5mHow it interacts with related technologies
Azure Functions: Functions also support event-driven scaling, but they are limited to specific triggers (HTTP, Service Bus, etc.) and run in a Functions host. Container Apps is more flexible for custom containers.
AKS with KEDA: In AKS, you install KEDA manually via Helm. Container Apps provides KEDA as a managed service, reducing operational burden.
Azure Event Hubs: KEDA can scale based on Event Hub partition count and event rate. The scaler uses the Azure Event Hubs SDK to fetch checkpoint information.
Azure Queue Storage: KEDA can scale based on queue message count. The scaler uses Azure Storage SDK.
Trap patterns and common mistakes
Confusing KEDA with HPA: KEDA is not a replacement for HPA; it provides custom metrics to HPA. In Container Apps, the built-in HTTP scaler is separate from KEDA scalers.
Forgetting authentication: KEDA scalers often require connection strings or managed identities. If authentication is misconfigured, the scaler returns 0 metrics, causing scale-to-zero even when events exist.
Setting cooldown too low: A low cooldown period (e.g., 10 seconds) can cause rapid scale-down/up oscillations (thrashing). The default 300 seconds is designed to avoid this.
Assuming scale-to-zero is always enabled: By default, minReplicas is 0, but if you set minReplicas to 1, the app never scales to zero. This is common for latency-sensitive apps that need warm instances.
Define KEDA ScaledObject
Create a ScaledObject YAML or ARM template that specifies the scale target (the container app revision), the scaler type (e.g., azure-servicebus), and parameters like queue name, namespace, and target message count. The ScaledObject also includes polling interval and cooldown period. This object is the contract between KEDA and the external event source.
KEDA Scaler polls source
The KEDA Scaler pod runs a polling loop at the configured interval (default 30s). It connects to the external service (e.g., Azure Service Bus) using the provided connection string or managed identity. It fetches the current metric value—e.g., the number of active messages in the queue. If the connection fails, the scaler returns 0, causing scale-down.
Scaler exposes metric endpoint
The Scaler exposes an HTTP endpoint that returns the metric value in a format compatible with Kubernetes custom metrics API. The endpoint is internal to the cluster. The KEDA Operator periodically queries this endpoint (every 15s by default) to get the latest metric.
Operator updates HPA target
The KEDA Operator receives the metric value and calculates the desired replica count using the formula: desired replicas = current metric value / target metric value. For example, if target is 10 messages per replica and current is 100, desired is 10. The operator updates the HPA object's target metric value (not the replica count directly). The HPA then adjusts replicas.
Container App scales replicas
The Container Apps scaling controller (equivalent to HPA) reads the updated target and adjusts the replica count of the container app revision. Scaling up happens quickly (within seconds). Scaling down respects the cooldown period: after the metric drops below target, KEDA waits for the cooldown (default 300s) before reducing replicas to avoid thrashing.
Enterprise Scenario 1: Order Processing with Azure Service Bus
A large e-commerce company processes millions of orders daily. Orders are placed via web and mobile apps, then pushed to an Azure Service Bus queue. A containerized order processor runs on Azure Container Apps. The processor must scale from zero during low traffic (e.g., 2 AM) to hundreds of instances during Black Friday. Using KEDA with the Service Bus scaler, the team sets target message count to 10 per replica. During peak, the queue depth reaches 50,000, so KEDA scales to 5,000 replicas (max set to 10,000). The cooldown is set to 5 minutes to avoid scaling down during brief lulls. A common mistake is forgetting to assign a managed identity to the Container App for accessing Service Bus; without it, KEDA fails to authenticate and scales to zero. The team uses Azure Monitor to track ReplicaCount and QueueDepth metrics. They also set up alerts if the queue depth exceeds 100,000 for more than 5 minutes.
Enterprise Scenario 2: Real-time Analytics with Event Hubs
A financial services firm ingests stock market data from multiple exchanges into Azure Event Hubs. Each partition of the Event Hub corresponds to a stock symbol. They deploy a containerized analytics app on Container Apps that processes each event. With KEDA's Event Hubs scaler, the app scales based on the number of unprocessed events per partition. The scaler uses the Event Hubs checkpoint store (Azure Blob Storage) to track progress. The team sets the polling interval to 10 seconds for near-real-time scaling. A common issue is that the checkpoint store is in a different region, causing latency and stale metrics. They mitigate this by colocating the storage account in the same region as the Container Apps environment. They also configure maxReplicas to 200 to avoid overwhelming downstream databases.
Scenario 3: Background Job Processing with RabbitMQ
A media company transcodes videos uploaded by users. Uploads are placed in a RabbitMQ queue. They run a containerized transcoder on Container Apps with KEDA's RabbitMQ scaler. The scaler queries the RabbitMQ management API for queue length. The team sets target to 1 message per replica because transcoding is CPU-intensive. They also set a cooldown of 10 minutes to avoid scaling down during a batch upload. A pitfall is that the RabbitMQ scaler requires the management plugin to be enabled; without it, KEDA cannot retrieve metrics. They also discovered that if the queue has many unacknowledged messages, the scaler counts them as well, leading to over-scaling. They mitigated by using a separate queue for dead letters.
What AZ-204 Tests on This Topic (Objective 1.2: Compute)
The exam focuses on: - Identifying when to use Container Apps vs. AKS vs. Functions: Container Apps is for event-driven microservices, AKS for full Kubernetes control, Functions for simple event handlers. - Configuring KEDA scalers: You must know which scaler to use for which Azure service (e.g., azure-servicebus for Service Bus, azure-eventhub for Event Hubs). - Scale-to-zero behavior: Understand that KEDA enables scale-to-zero when minReplicas is 0. The exam tests that the app scales from zero when an event arrives. - Authentication for scalers: Managed identity is preferred over connection strings. The exam may present a scenario where a scaler fails due to missing authentication.
Common Wrong Answers and Why Candidates Choose Them
Choosing AKS over Container Apps for a simple event-driven app: Candidates pick AKS because they think they need full Kubernetes. But the question emphasizes 'minimal management overhead' — Container Apps is the answer.
Believing KEDA replaces HPA: Candidates think KEDA is a standalone scaler. In reality, KEDA provides metrics to HPA. The exam may ask 'What component actually changes the replica count?' The answer is HPA (or Container Apps scaling controller), not KEDA.
Setting cooldown to 0 for faster scale-down: Candidates think immediate scale-down is good. But the exam tests that a too-short cooldown causes thrashing. The correct default is 300 seconds.
Confusing polling interval with cooldown: Polling interval is how often KEDA checks the source; cooldown is how long after the last event before scaling down. The exam may present a scenario where the app scales down too quickly, and the fix is to increase cooldown.
Specific Numbers and Terms
Default polling interval: 30 seconds
Default cooldown period: 300 seconds
Default target message count for Service Bus: 10 messages per replica
Scale-to-zero: Requires minReplicas = 0
KEDA version: 2.x (managed in Container Apps)
Scaler types: azure-servicebus, azure-eventhub, azure-queue, azure-blob, etc.
Edge Cases and Exceptions
If the scaler fails to authenticate, it returns 0 metrics, causing scale-to-zero even with pending events. This is a common troubleshooting scenario.
If the external source is temporarily unavailable, KEDA continues to use the last known metric for a limited time (default 5 minutes) before scaling to zero.
Multiple scalers can be combined; the replica count is the maximum of all scalers' desired counts.
How to Eliminate Wrong Answers
If a question mentions 'event-driven scaling' and 'minimal configuration', eliminate AKS and Functions. Container Apps + KEDA is the answer.
If a question asks 'What scales the container app based on queue length?' The answer is 'KEDA scaler and HPA' — not just KEDA.
If a question involves 'scale to zero', look for minReplicas = 0 and a scaler that supports it (most do).
If a question describes 'thrashing', the fix is to increase cooldown period.
Azure Container Apps is a serverless container platform that abstracts Kubernetes; KEDA provides event-driven autoscaling.
KEDA consists of a Scaler (polls external sources) and an Operator (updates HPA target).
Default polling interval is 30 seconds; default cooldown period is 300 seconds.
Scale-to-zero requires minReplicas = 0 and a scaler that supports it.
KEDA scalers include azure-servicebus, azure-eventhub, azure-queue, and many more.
Authentication for scalers can use connection strings or managed identities; managed identity is preferred.
The exam differentiates Container Apps (minimal management) from AKS (full control) and Functions (simple triggers).
These come up on the exam all the time. Here's how to tell them apart.
Azure Container Apps with KEDA
Fully managed Kubernetes control plane; no node management.
KEDA is built-in and configured declaratively via ARM/Bicep.
Scaling is managed by Container Apps scaling controller.
Ideal for event-driven microservices with minimal ops overhead.
Scale-to-zero supported out-of-the-box for event-driven workloads.
Azure Kubernetes Service (AKS) with KEDA
Full control over Kubernetes cluster, nodes, and networking.
KEDA must be installed manually via Helm or YAML manifests.
Scaling uses standard Kubernetes HPA.
Suitable for complex workloads requiring custom Kubernetes features.
Scale-to-zero possible but requires additional configuration (cluster autoscaler).
Azure Container Apps with KEDA
Runs any container image; supports custom runtimes.
Event-driven scaling via KEDA scalers for many sources.
Long-running requests supported (up to 24 hours).
More flexible for complex processing pipelines.
Requires container image management.
Azure Functions
Supports only specific trigger types (HTTP, Service Bus, etc.).
Built-in scaling for supported triggers; no KEDA needed.
Functions have a timeout (default 5 min, max 10 min for premium).
Simpler for single-purpose event handlers.
No container image management; code is uploaded directly.
Mistake
KEDA is a replacement for the Kubernetes Horizontal Pod Autoscaler (HPA).
Correct
KEDA is not a replacement; it is an extension that provides custom metrics to HPA. HPA still performs the actual scaling of replicas. In Container Apps, the scaling controller is equivalent to HPA.
Mistake
Azure Container Apps requires Kubernetes knowledge to operate.
Correct
Container Apps abstracts Kubernetes entirely. You define scaling rules declaratively without managing pods, nodes, or the Kubernetes API. KEDA is built-in and configured via ARM templates or Bicep.
Mistake
KEDA can scale any container workload to zero, even HTTP-triggered apps.
Correct
KEDA scales to zero only when the scaler is event-driven. For HTTP-triggered apps, the built-in HTTP scaler in Container Apps can scale to zero only if there are no active requests. KEDA scalers are for external event sources.
Mistake
Setting minReplicas to 0 always enables scale-to-zero.
Correct
While minReplicas=0 is necessary, the scaler must also support scale-to-zero. Most KEDA scalers do, but if the scaler returns a non-zero metric (e.g., due to a bug), the app will not scale to zero.
Mistake
KEDA polling interval and cooldown period are the same thing.
Correct
Polling interval (default 30s) is how often KEDA checks the external source for new metrics. Cooldown period (default 300s) is how long KEDA waits after the last event before scaling down. They serve different purposes.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
The built-in HTTP scaler scales based on HTTP request load, using metrics like request count or concurrent connections. KEDA scales based on external event sources (e.g., queue length, event hub messages). They can be used together: HTTP scaler for web traffic, KEDA for background processing. On the exam, remember that HTTP scaling is separate from KEDA.
Yes, if minReplicas is set to 0 and the scaler returns 0 metrics (no events), KEDA will scale the app to zero. When a new event arrives, KEDA scales back up. This is a key benefit for cost savings. However, if you need low latency, set minReplicas to 1 to keep a warm instance.
You can use connection strings stored in secrets or use a managed identity. Managed identity is recommended because it avoids storing secrets. In Container Apps, you assign a managed identity to the app and grant it permissions (e.g., Service Bus Data Receiver). Then reference the identity in the scaler configuration.
If the scaler cannot connect, it typically returns 0 metrics. This causes the app to scale down to zero (if minReplicas = 0) even if there are pending events. This is a common failure mode. To mitigate, configure health checks and alerts on scaler errors.
Yes, you can define multiple scale rules. The container app will scale to the maximum replica count across all rules. For example, if one scaler wants 5 replicas and another wants 10, the app will have 10 replicas. This is useful for apps that handle multiple event sources.
Container Apps abstracts Kubernetes, so you don't manage nodes, pods, or the control plane. AKS gives you full control but requires operational overhead. Container Apps is for event-driven microservices; AKS is for complex, custom workloads. The exam tests when to choose each.
Use Azure Monitor metrics like ReplicaCount, ScalerErrors, and custom scaler metrics. You can also view logs in Log Analytics. The exam may ask about troubleshooting scaling issues using these metrics.
You've just covered Azure Container Apps and KEDA — now see how well it sticks with free AZ-204 practice questions. Full explanations included, no account needed.
Done with this chapter?