This chapter covers microservices architecture on Azure, a key design pattern for building scalable, resilient cloud-native applications. For the AZ-305 exam, understanding microservices is critical as it appears in multiple domains, especially under 'Design for Infrastructure and Compute' and 'Design for High Availability.' Approximately 10-15% of exam questions touch on microservices, containerization, and related Azure services. This chapter provides the deep technical knowledge needed to design and explain microservices solutions on Azure, including container orchestration, API management, and service mesh integration.
Jump to a section
Imagine a car factory that used to be a single massive assembly line (monolith). Now it's redesigned as a modular factory with specialized mini-factories: one for engines, one for transmissions, one for electronics, and one for final assembly. Each mini-factory has its own team, inventory, and schedule. They communicate via standardized interfaces: the engine factory sends completed engines to a shared buffer, and the transmission factory picks them up when ready. If the electronics factory needs to upgrade its process, it can do so independently without stopping the other lines—just as a microservice can be updated without redeploying the entire application. However, this modularity introduces complexity: the factories must coordinate delivery times, handle delays (latency), and ensure parts are compatible (API versioning). A central coordinator (API gateway) routes requests and manages load. If the engine factory fails, the assembly line can still use backup engines from inventory (circuit breaker pattern). This mirrors how microservices on Azure use containers, orchestration, and managed services to achieve agility and resilience.
What is Microservices Architecture?
Microservices architecture is a software development approach where an application is composed of small, independent services that communicate over well-defined APIs. Each service is self-contained, implements a single business capability, and can be developed, deployed, and scaled independently. This contrasts with monolithic applications where all functionality is tightly coupled and deployed as a single unit.
Why Microservices on Azure?
Azure provides a comprehensive set of services to build and run microservices, including: - Azure Kubernetes Service (AKS) for container orchestration. - Azure Container Instances (ACI) for serverless containers. - Azure Service Fabric for stateful microservices. - Azure Functions for event-driven, serverless compute. - API Management for API gateway functionality. - Azure DevOps for CI/CD pipelines.
The exam expects you to know when to choose each service and how they integrate.
How Microservices Work Internally
Microservices communicate via synchronous protocols (HTTP/REST, gRPC) or asynchronous messaging (Azure Service Bus, Event Grid, Event Hubs). Each service runs in its own process, often inside a container. Containers provide isolation and consistency across environments. Orchestrators like Kubernetes manage container lifecycle, scaling, networking, and service discovery.
Service Discovery: In a dynamic environment where services scale up/down, a service registry (e.g., Azure DNS, Kubernetes DNS) maintains the current IP addresses of service instances. When Service A needs to call Service B, it queries the registry to get a healthy endpoint.
API Gateway: An API gateway acts as a single entry point for external clients. It handles authentication, rate limiting, request routing, and protocol translation. Azure API Management is a fully managed service that provides these capabilities.
Circuit Breaker Pattern: To prevent cascading failures, if a downstream service is failing, the circuit breaker trips and returns a fallback response instead of waiting endlessly. This is implemented using libraries like Polly or Istio's circuit breaker.
Load Balancing: Traffic is distributed across service instances using round-robin, least connections, or consistent hashing. In AKS, the kube-proxy component manages load balancing within the cluster.
Key Components and Defaults
Azure Kubernetes Service (AKS): Default node size is Standard_DS2_v2 (2 vCPUs, 7 GB RAM). The default pod CIDR is 10.0.0.0/16, and the service CIDR is 10.224.0.0/16 (customizable). AKS supports up to 1000 nodes per cluster.
Azure Container Registry (ACR): Stores container images. Supports geo-replication for cross-region deployments. Default SKU is Basic (10 GB storage, 2 webhooks).
Azure Service Bus: Premium tier supports up to 100 messaging units. Default message size limit is 256 KB (Standard) or 1 MB (Premium).
Azure API Management: Developer tier includes 1 unit, Premium tier supports up to 10 units. Default cache size is 250 MB per unit.
Configuration and Verification Commands
To create an AKS cluster:
az aks create --resource-group myGroup --name myCluster --node-count 3 --enable-addons monitoring --generate-ssh-keysTo deploy a microservice:
kubectl apply -f deployment.yamlTo verify cluster status:
az aks show --resource-group myGroup --name myCluster --query provisioningStateTo check pods:
kubectl get pods -o wideInteraction with Related Technologies
Microservices often integrate with: - Azure DevOps for CI/CD: Build pipelines create container images, push to ACR, and deploy to AKS. - Azure Monitor for observability: Collects metrics, logs, and traces from containers. - Azure Policy for governance: Enforces tagging, ensures only approved images are used. - Azure Active Directory (Azure AD) for authentication: AKS can integrate with Azure AD for Kubernetes RBAC. - Azure Key Vault for secrets management: Pods can mount secrets from Key Vault using CSI driver.
Scaling and High Availability
Horizontal Pod Autoscaler (HPA) in Kubernetes automatically scales pods based on CPU/memory metrics or custom metrics. For example:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70For high availability, deploy AKS clusters across availability zones (minimum 3 zones). Use Azure Traffic Manager or Azure Front Door to route traffic to multiple clusters in different regions.
Security Considerations
Network Policies: Use Calico or Azure Network Policies to restrict pod-to-pod communication.
Pod Identity: Use Azure AD Pod Identity to assign Azure resources (e.g., Key Vault access) to pods.
Secrets: Store secrets in Azure Key Vault, not in container images.
Image Scanning: Enable vulnerability scanning in ACR.
Monitoring and Debugging
Container Insights: Collects stdout/stderr, performance metrics, and Kubernetes events.
Application Insights: Enables distributed tracing for microservices.
Diagnostic Settings: Send AKS control plane logs (kube-apiserver, kube-controller-manager) to Log Analytics.
Common Pitfalls
Chatty Communication: Too many inter-service calls increase latency. Use asynchronous messaging or batch requests.
Data Consistency: Distributed transactions are hard. Use eventual consistency and saga patterns.
Versioning: Breaking changes in APIs cause failures. Use API versioning (e.g., /v1/, /v2/) and backward compatibility.
Resource Limits: Not setting CPU/memory limits can lead to resource starvation. Always specify requests and limits in pod specs.
Exam Relevance
The AZ-305 exam tests your ability to recommend appropriate Azure services for microservices. Key objectives include: - Objective 4.1: Recommend a compute solution (AKS, Container Instances, Service Fabric). - Objective 4.2: Recommend a container orchestration solution (AKS vs. Service Fabric). - Objective 4.4: Design for application architecture (microservices, API management, messaging). - Objective 5.1: Design for high availability (multi-region AKS, load balancing).
You must understand trade-offs: AKS is best for stateless microservices, Service Fabric supports stateful services, Container Instances are for simple, short-lived tasks. The exam also tests integration with API Management, Service Bus, and Event Grid.
Identify Business Capabilities
Begin by decomposing the application domain into bounded contexts, each representing a distinct business capability (e.g., order management, inventory, payment). This is based on Domain-Driven Design (DDD). Each bounded context becomes a candidate microservice. The goal is to ensure services are loosely coupled and highly cohesive. For example, an e-commerce platform might have separate services for product catalog, shopping cart, and order processing. This step is critical because poor decomposition leads to chatty communication or services that are too large.
Design API Contracts
Define the interfaces (APIs) that each microservice exposes. Use RESTful endpoints or gRPC for synchronous communication, and events for asynchronous. Specify request/response schemas (e.g., JSON Schema, Protobuf). Version the APIs from day one (e.g., /v1/orders). Document contracts using OpenAPI/Swagger. Azure API Management can enforce these contracts and provide developer portals. Test contracts early with consumer-driven contract tests to avoid breaking changes.
Select Azure Compute and Orchestration
Choose the right compute platform: AKS for complex, scalable microservices; Azure Container Instances for simple, burstable tasks; Azure Functions for event-driven functions; Azure Service Fabric for stateful microservices requiring low latency. For most production scenarios, AKS is recommended. Configure the cluster with appropriate node sizes, availability zones, and network policies. For example, a production AKS cluster should have at least 3 nodes across 3 availability zones.
Implement Communication Patterns
Set up synchronous communication via API Management (for external-facing APIs) or internal load balancers (for internal services). For asynchronous communication, use Azure Service Bus (for reliable messaging) or Event Grid (for event-driven architectures). Implement retry policies with exponential backoff and circuit breakers to handle transient failures. For example, use Polly library in .NET or the Istio service mesh for circuit breaking. Ensure idempotency in message handling to support retries.
Deploy and Orchestrate with CI/CD
Set up Azure DevOps pipelines to build container images, push to Azure Container Registry, and deploy to AKS using Helm charts or Kubernetes manifests. Use blue-green or canary deployment strategies to minimize downtime. Implement health probes (liveness and readiness) in Kubernetes to ensure traffic is only sent to healthy pods. Monitor the deployment with Azure Monitor and set up alerts for failure rates. For example, a blue-green deployment might route 10% of traffic to new version for validation.
Enterprise Scenario 1: E-Commerce Platform on AKS
A large online retailer migrated from a monolithic .NET application to microservices on AKS. They decomposed the monolith into 50+ services: product catalog, shopping cart, order processing, payment, inventory, shipping, and user management. Each service runs in its own Kubernetes namespace with resource limits (e.g., 500m CPU, 512MB memory). They use Azure API Management as the gateway for external clients (web and mobile apps) and internal services communicate via Azure Service Bus for order events. The cluster has 20 nodes (Standard_D8s_v3) across three availability zones. They implemented HPA to scale based on CPU and custom metrics (e.g., queue depth). The biggest challenge was managing data consistency across services; they adopted the saga pattern using Azure Service Bus and compensating transactions. Misconfiguration of network policies initially caused connectivity issues—they had to explicitly allow traffic between services using Kubernetes Network Policies.
Enterprise Scenario 2: Financial Services with Service Fabric
A bank built a real-time fraud detection system using Azure Service Fabric for stateful microservices. The system processes credit card transactions in near real-time, maintaining user session state and fraud models. Service Fabric was chosen over AKS because it provides built-in support for stateful services with reliable collections and low-latency communication. The cluster runs on 10 nodes (Standard_D4s_v3) across two regions for disaster recovery. They use Azure Event Hubs to ingest transaction events, and Service Fabric actors to update user profiles. The key performance metric is latency under 10ms per transaction. A common mistake was not configuring partition schemes correctly, leading to hot partitions. They resolved this by using a consistent hash based on user ID.
Enterprise Scenario 3: IoT Backend on Container Instances
A manufacturing company built a serverless backend for IoT device data ingestion. Each device sends telemetry to Azure IoT Hub, which triggers an Azure Function that processes and stores data. For compute-intensive tasks like image recognition, they use Azure Container Instances (ACI) with GPU support. ACI was chosen because it's simple, starts in seconds, and scales to zero when idle. They use Azure Logic Apps to orchestrate workflows. The challenge was managing container lifecycle—they implemented a custom scheduler using Azure Functions to start ACI containers based on queue length. Misconfiguration of container groups (e.g., not setting resource limits) caused cost overruns. They now enforce tags and budgets.
What AZ-305 Tests on Microservices
The exam focuses on three main areas: choosing the right compute service (AKS vs. Service Fabric vs. Container Instances vs. Functions), designing for communication (API Management, messaging services), and high availability and scalability (multi-region deployment, scaling patterns). Specific objectives: 4.1 (recommend compute), 4.2 (container orchestration), 4.4 (application architecture), 5.1 (high availability).
Common Wrong Answers and Traps
Choosing Service Fabric for stateless microservices – While Service Fabric supports stateless, AKS is the recommended choice for stateless containers due to broader ecosystem and community support. Candidates often pick Service Fabric because it's 'Azure-native' but AKS is the default for Kubernetes.
Selecting Azure Functions for long-running workflows – Functions have a timeout (default 5 min, max 10 min). For long-running processes, use AKS or Service Fabric.
Using blob storage for inter-service communication – Blob storage is not designed for messaging; use Service Bus or Event Grid. Candidates choose blob storage because it's simple, but it lacks features like queues, topics, and dead-lettering.
Neglecting API versioning – The exam often includes a scenario where a breaking API change causes failures. The correct answer is to version APIs (e.g., /v1/orders) and maintain backward compatibility.
Specific Numbers and Terms to Memorize
AKS default node size: Standard_DS2_v2
AKS max nodes per cluster: 1000
Service Bus message size limit: 256 KB (Standard), 1 MB (Premium)
Azure Functions timeout: 5 minutes (default), 10 minutes (max for Consumption plan), unlimited for Premium plan
Container Instances CPU limit: 4 vCPUs per container group
API Management unit capacity: Developer: 1 unit, Premium: up to 10 units
Edge Cases and Exceptions
Stateful microservices: If the scenario requires stateful services (e.g., session state, reliable collections), choose Service Fabric over AKS. AKS supports stateful sets but Service Fabric is more mature.
Serverless containers: ACI is for simple, burstable scenarios. If the need is for orchestration (scaling, rolling updates), use AKS.
Event-driven architectures: Use Event Grid for high-throughput event routing, Service Bus for reliable messaging with queues/topics.
How to Eliminate Wrong Answers
Rule of thumb: If the scenario mentions 'stateful' or 'reliable collections', eliminate AKS and Functions. If it mentions 'Kubernetes' or 'container orchestration', eliminate ACI and Functions. If it mentions 'serverless' and 'event-driven', eliminate AKS and Service Fabric.
Check for cost: Functions and ACI are pay-per-execution; AKS has fixed node costs. If the scenario emphasizes cost savings for low traffic, choose serverless.
Check for latency: Service Fabric has lower latency for inter-service communication than AKS (no sidecar proxies). For sub-millisecond latency, choose Service Fabric.
Microservices decompose an application into independent services that communicate via APIs or messaging.
AKS is the recommended compute service for most stateless microservices on Azure.
Azure Service Fabric is preferred for stateful microservices requiring low latency.
Azure Container Instances is for simple, short-lived containers without orchestration.
Azure Functions is for event-driven, serverless microservices with timeouts up to 10 minutes.
Use Azure API Management as an API gateway for routing, security, and throttling.
Azure Service Bus provides reliable messaging with queues and topics; Event Grid provides high-throughput event routing.
Always implement circuit breakers, retries, and idempotency in inter-service communication.
Design for high availability by deploying across availability zones and regions.
Use Azure DevOps for CI/CD with containerized microservices.
These come up on the exam all the time. Here's how to tell them apart.
Azure Kubernetes Service (AKS)
Open-source Kubernetes orchestration; broad ecosystem and community support.
Best for stateless microservices and containerized applications.
Supports stateful sets but less mature than Service Fabric for stateful services.
Uses etcd for state storage; can be complex to manage at scale.
Scaling via HPA and cluster autoscaler; integrates with Azure Monitor.
Azure Service Fabric
Azure-native platform with built-in support for stateful services (Reliable Collections).
Best for stateful microservices requiring low latency and high throughput.
Provides service discovery, load balancing, and rolling upgrades out of the box.
Uses distributed system primitives; tighter integration with Azure services.
Scales via partition schemes and instance counts; uses Service Fabric Explorer for monitoring.
Mistake
Microservices always require Kubernetes.
Correct
Kubernetes is not mandatory. Azure Service Fabric, Container Instances, and Functions can also run microservices. The choice depends on statefulness, latency, and orchestration needs.
Mistake
Azure Functions are stateless and cannot maintain state.
Correct
Azure Functions can be stateful using Durable Functions, which provide orchestration and state management. However, for high-frequency stateful operations, Service Fabric is better.
Mistake
Container Instances can be scaled automatically.
Correct
ACI does not have built-in autoscaling. You must implement custom scaling using Azure Functions or other schedulers. For autoscaling, use AKS.
Mistake
API Management is only for external APIs.
Correct
API Management can also be used for internal APIs (in virtual network) to provide security, throttling, and routing within a microservices architecture.
Mistake
Service Bus and Event Grid are interchangeable.
Correct
Service Bus is for reliable, ordered messaging with queues/topics. Event Grid is for high-throughput event routing with push delivery. Choose based on delivery guarantees and ordering requirements.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Choose AKS when you need Kubernetes orchestration, prefer open-source ecosystem, or are running stateless containers. Service Fabric is better for stateful services requiring Reliable Collections, low latency, or if you want a fully managed platform with built-in state management. On the exam, if the scenario mentions 'stateful' or 'reliable collections', select Service Fabric. If it mentions 'Kubernetes' or 'container orchestration', select AKS.
Microservices communicate synchronously via HTTP/REST or gRPC through an API gateway (Azure API Management) or directly using internal load balancers. Asynchronous communication uses Azure Service Bus (queues/topics) for reliable messaging or Azure Event Grid for event-driven patterns. For exam scenarios, choose Service Bus when ordering and delivery guarantees are needed; choose Event Grid for high-throughput event routing.
Yes, AKS supports stateful sets and persistent volumes (Azure Disks or Azure Files). However, Service Fabric provides more mature support for stateful services with Reliable Collections, which are distributed, replicated, and low-latency. If the scenario requires high-performance stateful services, Service Fabric is the recommended choice.
ACI is a serverless container platform that starts containers in seconds and scales to zero. It is ideal for simple, burstable tasks without orchestration. AKS is a managed Kubernetes service that provides orchestration, scaling, rolling updates, and service discovery. Use ACI for quick jobs, use AKS for production microservices.
Use Azure AD for authentication, Azure Key Vault for secrets, network policies (Calico or Azure) to restrict pod communication, and Azure Policy to enforce governance. Enable Azure Security Center for threat detection. For API Management, use OAuth2.0 and rate limiting. For inter-service communication, use mTLS with a service mesh like Istio or Open Service Mesh.
An API gateway is a single entry point for external clients. It handles authentication, rate limiting, request routing, response aggregation, and protocol translation. Azure API Management provides these capabilities and also offers developer portal, analytics, and policy enforcement. It reduces complexity for clients and centralizes cross-cutting concerns.
Implement circuit breakers to stop calls to failing services, retry policies with exponential backoff, and timeouts. Use dead-letter queues in Azure Service Bus for failed messages. Design for eventual consistency and use saga patterns for distributed transactions. On Azure, use Polly library or Istio for circuit breaking, and Azure Service Bus for reliable messaging.
You've just covered Microservices Architecture on Azure — now see how well it sticks with free AZ-305 practice questions. Full explanations included, no account needed.
Done with this chapter?