AZ-305Chapter 41 of 103Objective 4.4

Saga Pattern for Distributed Transactions

This chapter covers the Saga pattern for distributed transactions in Azure, a critical design pattern for maintaining data consistency across microservices without using distributed locking. For the AZ-305 exam, this topic falls under objective 4.4: Design a solution for data consistency, and appears in approximately 5-8% of exam questions. Understanding Sagas is essential for architects designing reliable, scalable applications on Azure, especially when using Azure Cosmos DB, Azure Service Bus, or Azure Functions.

25 min read
Intermediate
Updated May 31, 2026

The Hotel Reservation Saga

Imagine you are planning a wedding and need to book a hotel block for 50 guests. You call Hotel A to reserve 20 rooms, Hotel B for 20 rooms, and Hotel C for 10 rooms. You start with Hotel A: they confirm and give you a temporary hold. Then you call Hotel B: they confirm. Then you call Hotel C: they say they only have 8 rooms available. Now you have a problem: you have partial reservations but cannot complete the block. You must undo the reservations at Hotel A and Hotel B. You call Hotel A to cancel: they release the rooms. You call Hotel B to cancel: they release the rooms. You then try an alternative: Hotel D has 10 rooms, so you book those. Now you have Hotel A (20), Hotel B (20), and Hotel D (10) — success. This is the Saga pattern: a long-lived transaction broken into a series of local transactions, each with a compensating action (cancellation) to undo if a later step fails. In distributed systems, this avoids locking resources across multiple services for extended periods, improving availability and scalability. The coordinator (you) tracks the state and triggers compensations in reverse order if any step fails.

How It Actually Works

What is the Saga Pattern?

The Saga pattern is a design pattern for managing distributed transactions across multiple services in a microservices architecture. Unlike traditional ACID transactions that use two-phase commit (2PC) to ensure atomicity across databases, Sagas break a distributed transaction into a sequence of local transactions, each with a compensating action that can undo its effects. This avoids holding locks on resources for long periods, improving system availability and scalability.

Why Sagas Exist

In a monolithic application, a single database can enforce ACID properties using transactions. However, in a microservices architecture, each service typically owns its own database, and distributed transactions across databases are difficult and costly. Two-phase commit (2PC) requires a coordinator and locks resources, which can lead to blocking and reduced availability. Sagas provide a way to maintain data consistency without distributed locking, using eventual consistency and compensating actions.

How Sagas Work Internally

A Saga is a sequence of steps, each step being a local transaction that updates data within a single service and publishes a message or event. If a step fails, the Saga executes compensating transactions in reverse order to undo the effects of previous steps. There are two primary coordination approaches:

Choreography-based Saga: Each service publishes events and listens to events from other services. The decision to proceed or compensate is distributed.

Orchestration-based Saga: A central orchestrator (e.g., Azure Logic Apps, Durable Functions) tells each service what to do and manages compensation.

Key Components

Local Transaction: A database transaction within a single service that commits immediately.

Compensating Transaction: A transaction that semantically undoes the effects of a previous local transaction. It must be idempotent to handle retries.

Saga Log: A durable log that records the state of the Saga (e.g., pending, completed, compensating). In Azure, this can be stored in Cosmos DB or Azure Table Storage.

Coordinator/Orchestrator: In orchestration, a central component that tracks progress and triggers actions.

Default Values and Timers

Timeout: Typically 30 seconds to 5 minutes per step. If a step does not respond within the timeout, the Saga assumes failure and starts compensation.

Retry Policy: Often exponential backoff with initial delay 1 second, max delay 30 seconds, and up to 3 retries.

Idempotency Keys: Each request should include a unique idempotency key to ensure compensating transactions are applied only once.

Configuration in Azure

To implement a Saga on Azure:

Use Azure Durable Functions for orchestration. Define an orchestrator function that calls activity functions for each step and handles compensation using CallActivityWithRetryAsync and CallActivityAsync for compensation.

Use Azure Service Bus or Event Grid for choreography. Each service publishes events when a local transaction completes, and other services subscribe.

Store Saga state in Azure Cosmos DB with a TTL of 7 days for automatic cleanup.

Example orchestrator function (C#):

[FunctionName("OrderSaga")]
public async Task Run(
    [OrchestrationTrigger] IDurableOrchestrationContext context)
{
    var input = context.GetInput<OrderSagaInput>();
    var compensationActions = new List<Func<Task>>();

    try
    {
        // Step 1: Reserve Inventory
        await context.CallActivityAsync("ReserveInventory", input.OrderId);
        compensationActions.Add(() => context.CallActivityAsync("ReleaseInventory", input.OrderId));

        // Step 2: Process Payment
        await context.CallActivityAsync("ProcessPayment", input.PaymentInfo);
        compensationActions.Add(() => context.CallActivityAsync("RefundPayment", input.PaymentInfo));

        // Step 3: Ship Order
        await context.CallActivityAsync("ShipOrder", input.OrderId);
        compensationActions.Add(() => context.CallActivityAsync("CancelShipment", input.OrderId));

        // All steps succeeded
        await context.CallActivityAsync("UpdateOrderStatus", new { OrderId = input.OrderId, Status = "Completed" });
    }
    catch (Exception ex)
    {
        // Execute compensations in reverse order
        for (int i = compensationActions.Count - 1; i >= 0; i--)
        {
            await compensationActions[i]();
        }
        await context.CallActivityAsync("UpdateOrderStatus", new { OrderId = input.OrderId, Status = "Failed" });
    }
}

Interaction with Related Technologies

Azure Cosmos DB: Supports multi-document transactions within a single partition, but not across partitions. Use Sagas for cross-partition or cross-service consistency.

Azure Service Bus: Provides reliable message delivery with sessions and duplicate detection, useful for choreography.

Azure SQL Database: Supports Elastic Transactions for distributed transactions across shards, but these are still limited compared to Sagas.

Dapr: A runtime that provides a built-in Saga building block, simplifying implementation.

Verification Commands

- Monitor Durable Functions status:

az durabletask orchestrator list --task-hub-name MyTaskHub

- Check Saga log in Cosmos DB:

SELECT * FROM c WHERE c.sagaId = 'order-123' ORDER BY c._ts DESC

Performance Considerations

Sagas increase latency due to multiple round trips and compensating actions.

Idempotency is critical to prevent duplicate compensations.

Use async communication (Service Bus) to decouple services.

Monitor Saga timeouts and retries to avoid partial failures.

Common Pitfalls

Missing Compensation: Every step must have a compensating action. Otherwise, the system may be left in an inconsistent state.

Non-idempotent Compensations: If a compensation is applied twice, it may cause data corruption.

Long-running Sagas: Sagas that take hours increase risk of conflicts. Use timeout and escalation mechanisms.

Exam Relevance

For AZ-305, understand when to choose Sagas over 2PC or eventual consistency. The exam tests scenarios where Sagas are appropriate (e.g., long-running business processes, multiple services with independent databases) and where they are not (e.g., single database, low latency requirements).

Walk-Through

1

Initiate Saga

The saga is triggered by an external event, such as an order placement. The orchestrator (or first service in choreography) creates a saga log entry with a unique saga ID and sets the state to 'Pending'. It records the start time and the list of steps to execute. In Azure Durable Functions, this is the orchestrator function start. A timeout timer is started (e.g., 30 seconds for the entire saga). The saga ID is propagated to all subsequent steps via headers or message payload.

2

Execute Step 1

The orchestrator calls the first activity function (e.g., ReserveInventory). The activity performs a local transaction on its database (e.g., decrement inventory count) and commits immediately. It returns a success or failure response. If successful, the orchestrator adds a compensation action (e.g., ReleaseInventory) to a list. The saga log is updated with step 1 as 'Completed'. A timeout per step is enforced; if no response within 10 seconds, the step is considered failed.

3

Execute Subsequent Steps

Steps 2, 3, etc., are executed sequentially. Each step performs its local transaction and commits. The orchestrator appends the corresponding compensation action to the list. The saga log tracks the current step index. If any step fails (returns error or times out), the orchestrator catches the exception and proceeds to compensation. The saga log state is set to 'Compensating'.

4

Compensate on Failure

If a step fails, the orchestrator iterates through the compensation list in reverse order. For each compensation action, it calls the corresponding activity (e.g., ReleaseInventory, RefundPayment). Each compensation must be idempotent: if the original transaction did not actually occur (e.g., payment was not processed), the compensation should be a no-op. The saga log is updated with each compensation step as 'Compensated'. After all compensations, the saga state is set to 'Failed'.

5

Complete Saga

If all steps succeed, the orchestrator marks the saga as 'Completed' in the log. It may also send a notification (e.g., email, event) to indicate success. The saga log entry can be deleted after a retention period (e.g., 7 days in Cosmos DB). If compensation was executed, the saga ends in 'Failed' state, and an alert may be triggered for manual intervention.

What This Looks Like on the Job

Enterprise Scenario 1: E-Commerce Order Processing

A large online retailer uses microservices for inventory, payment, shipping, and notification. When a customer places an order, the system must reserve inventory, charge the credit card, ship the item, and send a confirmation. If the credit card is declined after inventory is reserved, the inventory must be released. The Saga pattern is implemented using Azure Durable Functions as the orchestrator. Each step calls a separate Azure Function that updates its own Azure SQL Database or Cosmos DB collection. The saga log is stored in Cosmos DB with a TTL of 30 days. Performance considerations: the entire saga must complete within 2 minutes to avoid customer timeout. The orchestrator uses retry policies with exponential backoff (initial delay 1 second, max 30 seconds, 3 retries). Common issues: if the inventory reservation times out but the database actually committed, the compensation may not run, leaving inventory locked. To mitigate, the compensation action queries the database to verify the state before releasing.

Enterprise Scenario 2: Travel Booking System

A travel agency books flights, hotels, and car rentals across multiple providers. Each booking is a separate service with its own database. The Saga pattern coordinates the booking: first book flight, then hotel, then car. If the car rental fails, the hotel and flight bookings must be cancelled. The system uses choreography with Azure Service Bus topics. Each service publishes an event (e.g., FlightBooked). The next service listens and proceeds. If a service fails, it publishes a CompensationRequired event, and all previous services listen and execute their compensating actions. Challenge: handling concurrent bookings for the same hotel room. The hotel service uses optimistic concurrency with version numbers; if a conflict occurs, it fails the step and triggers compensation. Scale: the system handles 10,000 bookings per hour. Saga logs are stored in Azure Table Storage for cost efficiency.

Enterprise Scenario 3: Banking Funds Transfer

A bank transfers money between accounts in different regions. Each account is in a separate database (Azure SQL Database in different regions). The Saga pattern ensures that debit and credit are atomic. Step 1: Debit from source account. Step 2: Credit to destination account. If credit fails, debit is reversed. The orchestrator runs in Azure Functions with geo-redundant storage for saga logs. Compensation must be exact: if the debit was $100, the credit compensation must credit $100 back. Idempotency keys are used to prevent duplicate reversals. Performance: the saga must complete within 5 seconds to meet SLA. The system uses Azure Redis Cache for temporary state to reduce latency. Misconfiguration: if the compensation is not idempotent and the orchestrator retries, the customer may be refunded twice. The bank implements a check: before compensating, it verifies that the original debit was not already reversed.

How AZ-305 Actually Tests This

AZ-305 Exam Focus on Saga Pattern

The AZ-305 exam tests the Saga pattern under objective 4.4: Design a solution for data consistency. Specifically, you must know when to use Sagas versus other consistency patterns like two-phase commit (2PC) or eventual consistency. The exam also tests the difference between choreography and orchestration.

Common Wrong Answers and Why

1.

Choosing 2PC for long-running transactions: Candidates often select 2PC because it guarantees atomicity. However, 2PC holds locks for the entire transaction, which is impractical for long-running operations (minutes to hours). Sagas are designed for such scenarios. The exam will present a scenario with multiple services and long duration; Sagas are the correct answer.

2.

Using Sagas for single-database transactions: Some candidates think Sagas are always better than traditional transactions. But if all data is in one database, a simple ACID transaction is more efficient. Sagas add complexity and latency. The exam will test this edge case.

3.

Ignoring compensating transactions: A question may describe a Saga without mentioning compensation. Candidates might assume it's fine, but every Saga step must have a compensation. The exam expects you to identify missing compensations as a design flaw.

4.

Confusing choreography with orchestration: The exam may ask which approach to use based on complexity. Choreography is simpler but harder to manage for complex workflows. Orchestration is better for complex, multi-step processes. Candidates often pick choreography for complex workflows, which is wrong.

Specific Numbers and Terms

Timeout per step: 30 seconds to 5 minutes (not specified exact, but know that timeouts are configurable).

Idempotency: Must be ensured for compensating transactions.

Saga log: Must be durable (e.g., Cosmos DB, Table Storage).

Azure Durable Functions: The recommended orchestrator for orchestration-based Sagas.

Azure Service Bus: Used for choreography with topics and subscriptions.

Edge Cases

Partial failure with no compensation: If a step fails and no compensation exists, the system is inconsistent. The exam expects you to identify this as a design flaw.

Non-idempotent compensation: If a compensation is applied twice, it may cause data corruption. The exam may ask how to prevent this (use idempotency keys).

Saga timeout: If the entire saga times out, the system must handle partial state. The exam may ask what happens to incomplete steps.

How to Eliminate Wrong Answers

If the scenario involves a single database, eliminate Sagas and choose a local transaction.

If the scenario requires strict consistency and short duration (milliseconds), consider 2PC.

If the scenario involves multiple services and long duration, Sagas are correct.

If the question mentions 'compensation', it's likely a Saga.

If the question mentions 'orchestrator', it's orchestration-based Saga.

If the question mentions 'events', it's choreography.

Key Takeaways

Saga pattern breaks a distributed transaction into local transactions with compensating actions.

Every step in a Saga must have a compensating transaction to undo its effects.

Compensating transactions must be idempotent to handle retries.

Orchestration uses a central coordinator (e.g., Azure Durable Functions); choreography uses events (e.g., Azure Service Bus).

Sagas provide eventual consistency, not ACID.

Sagas are preferred over 2PC for long-running transactions across multiple services.

Store Saga state in a durable log (e.g., Cosmos DB) with a TTL for cleanup.

Timeouts and retry policies are critical for Saga reliability.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Saga Pattern (Orchestration)

Uses compensating transactions for rollback.

Does not hold locks across services.

Suitable for long-running transactions (minutes to hours).

Provides eventual consistency.

Requires idempotent compensations.

Two-Phase Commit (2PC)

Uses prepare and commit phases with locks.

Holds locks on resources until commit or abort.

Suitable for short transactions (milliseconds to seconds).

Provides strict ACID consistency.

Fails if coordinator or participant fails during commit.

Watch Out for These

Mistake

Sagas guarantee ACID consistency across services.

Correct

Sagas provide eventual consistency, not ACID. They ensure that either all steps complete or all compensations run, but there is a window where data is inconsistent. ACID across services requires 2PC, which Sagas avoid.

Mistake

Compensating transactions are the same as rollback in databases.

Correct

Compensating transactions are semantic undo operations, not database rollbacks. They are separate transactions that may not perfectly reverse the original state (e.g., sending an email cannot be unsent). They must be idempotent and handle partial failures.

Mistake

Sagas are only for microservices.

Correct

While common in microservices, Sagas can be used in any distributed system where multiple independent data stores are involved, including serverless architectures and cloud-native applications.

Mistake

Choreography is always better than orchestration.

Correct

Choreography is simpler but can become complex to manage for workflows with many steps or error handling. Orchestration centralizes control and is better for complex, long-running processes. The choice depends on the scenario.

Mistake

Sagas require a separate saga log database.

Correct

The saga log can be stored in any durable store, including the same database as one of the services, but it is best practice to use a separate, highly available store like Cosmos DB to avoid single points of failure.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between Saga and two-phase commit?

Saga uses a sequence of local transactions with compensations, while 2PC uses a prepare/commit protocol with locks. Sagas are for long-running, distributed transactions where locks are impractical; 2PC is for short, ACID-critical transactions. Sagas provide eventual consistency; 2PC provides strong consistency.

When should I use orchestration vs choreography for Sagas?

Use orchestration when the workflow is complex, has many steps, or requires centralized error handling and monitoring. Use choreography when the workflow is simple, services are loosely coupled, and you want to avoid a single point of failure. For example, a simple order flow with 3 steps can use choreography; a multi-step travel booking with compensations is better with orchestration.

How do I ensure idempotency in compensating transactions?

Use a unique idempotency key (e.g., saga ID + step number) in each compensation request. The service checks if the compensation has already been applied by looking up the key in its database. If found, it returns success without re-applying. This prevents duplicate compensations.

What happens if a compensating transaction fails?

If a compensation fails, the Saga is stuck in an inconsistent state. The system should log the error and trigger an alert for manual intervention. Retry policies with exponential backoff can be used, but if the compensation continues to fail, human intervention is required.

Can Sagas be used with Azure SQL Database?

Yes, each service can use Azure SQL Database for its local transactions. The Saga pattern does not require a specific database type. However, for the saga log, a NoSQL store like Cosmos DB is often used for its scalability and low latency.

What is the role of Azure Durable Functions in Sagas?

Azure Durable Functions provides a framework for orchestrating Sagas. The orchestrator function defines the sequence of steps and compensations. It automatically handles retries, timeouts, and state management. It is the recommended approach for orchestration-based Sagas on Azure.

How do I handle timeouts in a Saga?

Set a timeout per step (e.g., 30 seconds) and an overall saga timeout. If a step times out, assume failure and start compensation. Use the Durable Functions timeout feature or implement custom timers. The saga log should record the timeout event.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Saga Pattern for Distributed Transactions — now see how well it sticks with free AZ-305 practice questions. Full explanations included, no account needed.

Done with this chapter?