This chapter covers AWS Step Functions, specifically the two workflow types: Standard and Express. Understanding when to use each is critical for the DVA-C02 exam, as questions on orchestration, state management, and cost optimization frequently appear. Approximately 10-15% of exam questions touch on Step Functions, often asking you to choose the correct workflow type for a given scenario or to identify limitations. By the end of this chapter, you will master the differences, use cases, and exam traps for both Standard and Express Workflows.
Jump to a section
Imagine a busy restaurant chain with two service models: a full-service dining room and an express counter. In the full-service dining room, every customer gets a dedicated waiter who takes their order, delivers food course by course, handles complaints, and waits for a tip before marking the table as 'done'. The waiter tracks the entire meal from start to finish, and the restaurant can pause between courses (like waiting for a steak to be cooked). This is like a Standard Workflow: it has an execution history, supports long pauses (up to one year), and charges per state transition. In contrast, the express counter has no waiter. A customer orders, receives their food immediately (if pre-prepared), and leaves. The kitchen does not track the customer after handing over the tray. If the customer wants ketchup, they get it themselves. The express counter is fast, cheap, and has no memory of past orders. This is like an Express Workflow: it runs in milliseconds, does not persist execution history (except to CloudWatch Logs if configured), and is ideal for high-volume, short-lived tasks. The key mechanistic difference: the full-service restaurant allocates a waiter (state) per customer and logs every interaction, whereas the express counter processes each order as a stateless, fire-and-forget event. In AWS, Standard Workflows allocate state resources and record each transition, while Express Workflows process events in a streaming fashion with no intermediate persistence.
What Are Step Functions Workflows?
AWS Step Functions is a serverless orchestration service that lets you coordinate multiple AWS services into a workflow. A workflow is defined as a state machine using Amazon States Language (ASL). Each workflow type—Standard or Express—determines how the state machine executes, how it is billed, and its operational characteristics.
Standard Workflows
Standard Workflows are the original Step Functions execution model. They are designed for long-running, durable, and idempotent processes. Each execution is a distinct, auditable run with a unique execution ID.
How They Work Internally
When you start a Standard Workflow execution, Step Functions creates an execution record. Each state transition is persisted to an execution history, which is stored for up to 90 days (configurable via CloudWatch Logs or direct query). The execution progresses state by state. Between transitions, the execution can pause indefinitely (up to one year total duration). This pause is possible because the service stores the current state and waits for a task token or callback. For example, a task might wait for a human approval via an SNS topic; the workflow pauses until the token is returned.
Key Components, Values, and Defaults
Execution Duration: Maximum 1 year.
Execution History: Up to 25,000 state transitions (entries) per execution. If exceeded, the execution fails.
Billing: Per state transition. The first 4,000 transitions per month are free under the Free Tier. After that, $0.025 per 1,000 transitions.
Rate Limits: Can start up to 2,000 executions per second per account per region (adjustable). Max execution history retention: 90 days.
Idempotency: Standard Workflows are idempotent by design. If you retry a failed execution with the same input, it will not create duplicate side effects if properly designed.
Execution Start: Can be synchronous or asynchronous via the StartExecution API. Synchronous means the API call returns after the workflow completes.
Configuration and Verification
To create a Standard Workflow via CLI:
aws stepfunctions create-state-machine \
--name MyStandardWorkflow \
--definition file://definition.asl \
--role-arn arn:aws:iam::123456789012:role/StepFunctionsRole \
--type STANDARDTo verify the type:
aws stepfunctions describe-state-machine --state-machine-arn <arn>Look for "type": "STANDARD" in the output.
Interactions
Standard Workflows integrate deeply with AWS services via service integrations (e.g., Lambda, DynamoDB, SQS, SNS, ECS, Glue). They support callbacks (task tokens) for human-in-the-loop scenarios. They also work with EventBridge to trigger executions on events.
Express Workflows
Express Workflows were introduced for high-volume, short-duration, event-processing workloads. They are designed to run in milliseconds to minutes, not hours or days.
How They Work Internally
Express Workflows do not persist execution history by default. Instead, they emit execution events to Amazon CloudWatch Logs (if configured) and optionally to CloudWatch Metrics. The execution is processed in a streaming fashion: each state transition is handled in memory without writing to durable storage until the end. This makes them much faster and cheaper per execution but at the cost of durability and observability.
Key Components, Values, and Defaults
Execution Duration: Maximum 5 minutes.
Execution History: Not persisted. You must enable CloudWatch Logs for logging; execution events are logged as they occur.
Billing: Per execution, plus per state transition. Pricing: $0.001 per 1,000 executions, plus $0.000025 per 1,000 state transitions. This is significantly cheaper than Standard for high volumes.
Rate Limits: Can start up to 100,000 executions per second per account per region (much higher than Standard).
Idempotency: Not inherently idempotent. You must design idempotency into your workflow if needed (e.g., using idempotency keys).
Execution Start: Only asynchronous via the StartExecution API. There is no synchronous invocation for Express.
Configuration and Verification
To create an Express Workflow via CLI:
aws stepfunctions create-state-machine \
--name MyExpressWorkflow \
--definition file://definition.asl \
--role-arn arn:aws:iam::123456789012:role/StepFunctionsRole \
--type EXPRESSTo verify, use the same describe command and look for "type": "EXPRESS".
Interactions
Express Workflows support the same service integrations as Standard, but with limitations:
They cannot use .waitForTaskToken (callbacks) because they do not persist state between transitions.
They cannot include activities (long-running workers) because activities require durable state.
They are ideal for streaming data processing, real-time analytics, and high-volume event ingestion.
Key Differences Summarized
Duration: Standard up to 1 year; Express up to 5 minutes.
Execution History: Standard persisted and queryable; Express only via CloudWatch Logs.
Billing Model: Standard per transition; Express per execution + per transition (cheaper at high volumes).
Rate Limits: Standard 2,000 executions/sec; Express 100,000 executions/sec.
Idempotency: Standard inherently idempotent; Express not.
Synchronous Invocation: Standard supports both sync and async; Express only async.
Callback Pattern: Standard supports .waitForTaskToken; Express does not.
Activities: Standard supports; Express does not.
When to Use Which
Standard: Use for business-critical workflows that require auditing, human approval, long-running processes (e.g., order fulfillment, data pipelines that run for hours), or when you need to query execution history.
Express: Use for high-volume, short-lived tasks such as real-time data transformation, IoT event processing, or when cost per execution is a major concern and you can tolerate not having a full history.
Exam Relevance
The DVA-C02 exam frequently tests the distinction by presenting a scenario and asking which workflow type to use. Common traps include:
Choosing Express when the workflow requires a human approval (callback) – Express does not support callbacks.
Choosing Standard for a high-volume real-time processing pipeline – Standard is too slow and expensive.
Assuming Express supports synchronous invocation – it does not.
Forgetting that Express has a 5-minute duration limit – if the scenario mentions a process that might take longer, Standard is required.
State Machine Definition Example (ASL)
A simple Standard Workflow that calls a Lambda and waits for a callback:
{
"Comment": "A Standard Workflow with callback",
"StartAt": "CallLambda",
"States": {
"CallLambda": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken",
"Parameters": {
"FunctionName": "arn:aws:lambda:us-east-1:123456789012:function:MyFunction",
"Payload": {
"token.$": "$$".Task.Token
}
},
"Next": "ProcessResult"
},
"ProcessResult": {
"Type": "Pass",
"End": true
}
}
}An Express Workflow cannot use .waitForTaskToken; it would use arn:aws:states:::lambda:invoke instead.
Monitoring and Debugging
Standard: Use the Step Functions console to view execution history, step through states, and see input/output. CloudWatch Logs can be enabled for additional logging.
Express: Must enable CloudWatch Logs at the state machine level. Without it, you have no visibility into individual executions. You can also publish custom CloudWatch Metrics.
Pricing Example
Consider 1 million executions with 10 state transitions each: - Standard: 10 million transitions × $0.025/1,000 = $250. Plus no per-execution cost. - Express: 1 million executions × $0.001/1,000 = $1.00 + 10 million transitions × $0.000025/1,000 = $0.25. Total = $1.25.
Express is 200x cheaper for this scenario.
Interaction with Other Services
EventBridge: Both workflow types can be triggered by EventBridge rules. Express is preferred for high-volume event streams.
API Gateway: Can start both workflow types via service integration. For Express, use synchronous integration if you need a response (but note the 29-second timeout).
Lambda: Both can invoke Lambda. Standard can wait for a callback; Express cannot.
DynamoDB: Both can read/write. Express is better for high-throughput item processing.
Exam Trap: Execution History Limit
A common exam question: "You have a Standard Workflow that may have more than 25,000 state transitions. What happens?" Answer: The execution fails. You must redesign the workflow to reduce transitions or use a different approach (e.g., break into multiple executions).
Exam Trap: Synchronous Invocation with Express
"You need to start a workflow and wait for the result. You choose Express and call StartSyncExecution." This is invalid. Express only supports StartExecution (async). The correct choice for synchronous is Standard with StartSyncExecution.
Exam Trap: Callback Pattern with Express
"Your workflow requires a human approval via email. You choose Express." Wrong. Express does not support callbacks. Use Standard.
Exam Trap: Cost Calculation
"You have a high volume of short executions. Should you use Standard?" No, Express is cheaper per execution and supports higher throughput.
Define the State Machine
Create a JSON document using Amazon States Language (ASL) that defines the workflow's states, transitions, and choices. Each state has a type (Task, Choice, Pass, Wait, Succeed, Fail, Parallel, Map). For a Task state, specify the resource ARN (e.g., Lambda function, DynamoDB table, SNS topic) and any parameters. The definition must have a StartAt field indicating the initial state, and each state must have a Next or End field. Example: a Task state that invokes a Lambda function asynchronously uses 'arn:aws:states:::lambda:invoke'. For Standard Workflows, you can use '.waitForTaskToken' suffix for callbacks. For Express, you cannot use that suffix.
Create the State Machine via CLI or Console
Use the AWS CLI, SDK, or Console to create the state machine. In the CLI, run 'aws stepfunctions create-state-machine --name MyWorkflow --definition file://definition.asl --role-arn <role-arn> --type STANDARD' (or EXPRESS). The role ARN must grant Step Functions permission to invoke the resources in the workflow. The type parameter is critical: STANDARD or EXPRESS. If omitted, default is STANDARD. After creation, you receive a state machine ARN.
Start an Execution
Use the StartExecution API to begin a workflow. For Standard, you can use StartSyncExecution (synchronous) or StartExecution (asynchronous). For Express, only StartExecution is available. Provide the state machine ARN and an optional input JSON. The service returns an execution ARN. For Standard, the execution is durable and recorded. For Express, the execution is ephemeral. If you use StartSyncExecution with a Standard workflow, the API waits until the workflow completes and returns the output.
Monitor Execution Progress
For Standard Workflows, you can describe the execution using 'aws stepfunctions describe-execution --execution-arn <arn>' to get status (RUNNING, SUCCEEDED, FAILED, TIMED_OUT, ABORTED) and current state. You can also list execution history using 'get-execution-history' to see each state transition with timestamps, inputs, and outputs. For Express Workflows, there is no execution history. You must rely on CloudWatch Logs (if enabled) to see events. The console shows aggregated metrics but not per-execution details.
Handle Errors and Retries
Define error handling in the state machine definition using Retry and Catch fields. A Retry specifies how many times to retry a state on a given error (e.g., Lambda.ServiceException) and the backoff rate (default exponential). For Standard Workflows, retries are durable; the execution pauses and retries later. For Express, retries happen immediately because the execution is not persisted. A Catch field redirects to a fallback state. Exam tip: Express workflows have a maximum execution time of 5 minutes, so retries must fit within that limit.
Complete or Time Out
The workflow ends when it reaches a Succeed or Fail state, or when the maximum duration is exceeded. For Standard, the maximum duration is 1 year. If exceeded, the execution status becomes TIMED_OUT. For Express, the maximum is 5 minutes; exceeding it causes TIMED_OUT. The execution history (Standard) or CloudWatch Logs (Express) will record the timeout. After completion, you can retrieve the final output via describe-execution for Standard; for Express, you must capture output in the workflow (e.g., write to DynamoDB) because the API does not return it.
Enterprise Scenario 1: Order Fulfillment Pipeline
A large e-commerce company uses Standard Workflows to orchestrate order fulfillment. The workflow includes: validating inventory (DynamoDB), charging a credit card (via Lambda calling Stripe API), updating order status (DynamoDB), and sending a confirmation email (SES). The workflow must wait for payment confirmation, which may take seconds to minutes. It also requires a human approval step for orders over $10,000, using a callback to an SNS topic that sends an email with a link to approve. The company uses Standard because it supports callbacks, long durations (up to 1 year), and full execution history for auditing. They configure CloudWatch Logs for additional logging. Scaling: they handle up to 1,000 orders per second, well within Standard's 2,000 executions/sec limit. However, they must ensure state transitions stay below 25,000 per execution (they average 15). If an order fails, they retry with exponential backoff. Misconfiguration: if they mistakenly used Express, the human approval callback would fail silently, and the execution history would be lost, causing compliance issues.
Enterprise Scenario 2: Real-Time IoT Data Processing
A smart home company processes telemetry from millions of devices. Each device sends a JSON payload every minute. The company uses Express Workflows to transform and enrich the data (e.g., convert temperature from Celsius to Fahrenheit) and then write to a DynamoDB table. The workflow runs in under 100 milliseconds. They chose Express because of the high volume (over 10,000 executions per second) and low cost. They enable CloudWatch Logs for debugging but do not need per-execution history. If a workflow fails, they rely on a dead-letter queue (DLQ) to capture failed inputs. Misconfiguration: if they used Standard, the cost would be 200x higher and they would hit the 2,000 executions/sec limit, causing throttling. Also, they do not need callbacks or long durations.
Enterprise Scenario 3: Data Lake ETL Pipeline
A media company runs nightly ETL jobs that process terabytes of data into a data lake (S3 + Glue). The workflow includes: start a Glue job, wait for it to complete (up to 2 hours), run a validation Lambda, and send a notification. They use Standard Workflows because the Glue job takes longer than 5 minutes, and they need to wait for it to finish using a task token. The workflow also requires idempotency to avoid duplicate processing if the job is retried. They use a unique run ID as input. Scaling: only one execution per night, so rate limits are irrelevant. Misconfiguration: if they used Express, the Glue job would time out after 5 minutes, and they could not use a callback to wait for completion.
Exam Focus on Step Functions (DVA-C02 Objective 1.6)
The DVA-C02 exam tests your ability to choose the correct Step Functions workflow type for a given scenario. Questions often present a use case and ask which type to use, or ask about limitations. Key exam codes: Domain 1: Development with AWS Services, Objective 1.6: Orchestrate serverless workflows using AWS Step Functions.
Common Wrong Answers and Why Candidates Choose Them
Choosing Express when the workflow needs a human approval callback. Candidates think Express is always better because it's cheaper and faster. They forget that Express does not support .waitForTaskToken. The correct answer is Standard.
Choosing Standard for a high-volume IoT event processing pipeline. Candidates default to Standard because they think it's more robust. They overlook the 2,000 executions/sec limit and higher cost. The correct answer is Express.
Assuming Express supports synchronous invocation. Candidates see that Standard supports StartSyncExecution and assume Express has an equivalent. They miss that Express only supports asynchronous StartExecution. The exam may ask: "You need to start a workflow and get the result in the same API call. Which type?" Answer: Standard.
Forgetting the 5-minute duration limit for Express. Candidates choose Express for a process that may take hours. The exam will explicitly state a duration >5 minutes, so Standard is required.
Specific Numbers and Terms to Memorize
Standard max duration: 1 year (365 days).
Express max duration: 5 minutes.
Standard max execution history transitions: 25,000.
Standard rate limit: 2,000 executions/sec (per account per region).
Express rate limit: 100,000 executions/sec.
Standard billing: $0.025 per 1,000 state transitions.
Express billing: $0.001 per 1,000 executions + $0.000025 per 1,000 state transitions.
Standard supports .waitForTaskToken; Express does not.
Standard supports Activities; Express does not.
Express must have CloudWatch Logs enabled for execution logging.
Edge Cases and Exceptions
If a Standard Workflow exceeds 25,000 state transitions, it fails. Redesign needed.
Express workflows can still use Map and Parallel states, but each branch must complete within 5 minutes total.
You can convert a Standard to Express only by creating a new state machine; you cannot change the type after creation.
Both types support tagging and IAM policies.
How to Eliminate Wrong Answers
First, identify if the scenario requires: (a) long duration (>5 min), (b) callbacks/human approval, (c) execution history auditing, (d) idempotency. If yes to any, eliminate Express. If the scenario involves high volume (>2,000 exec/sec), low cost, or sub-second execution, eliminate Standard. Also check if synchronous invocation is required; if so, only Standard works.
Standard Workflows are for long-running, durable, auditable processes (up to 1 year).
Express Workflows are for high-volume, short-lived, stateless event processing (up to 5 minutes).
Express Workflows do not support .waitForTaskToken, Activities, or synchronous invocation.
Standard Workflows have a maximum of 25,000 state transitions per execution; exceeding this causes failure.
Express Workflows can handle up to 100,000 executions per second; Standard up to 2,000.
Express Workflows require CloudWatch Logs to be enabled for any execution logging.
You cannot change the workflow type after creation; you must create a new state machine.
Standard Workflows are inherently idempotent; Express are not.
These come up on the exam all the time. Here's how to tell them apart.
Standard Workflow
Maximum execution duration: 1 year
Supports callbacks (.waitForTaskToken)
Supports Activities (long-running workers)
Execution history persisted and queryable (up to 90 days)
Billing: $0.025 per 1,000 state transitions (no per-execution fee)
Express Workflow
Maximum execution duration: 5 minutes
No callback support
No Activities support
No persisted execution history; must use CloudWatch Logs
Billing: $0.001 per 1,000 executions + $0.000025 per 1,000 state transitions
Mistake
Express Workflows are just a cheaper version of Standard Workflows with the same features.
Correct
Express Workflows have significant feature limitations: no callbacks (waitForTaskToken), no activities, no execution history (only CloudWatch Logs), no synchronous invocation, and a 5-minute maximum duration. They are designed for high-volume, short-lived, stateless processing, not as a drop-in replacement for Standard.
Mistake
Standard Workflows can run indefinitely.
Correct
Standard Workflows have a maximum execution duration of 1 year (365 days). While this is very long, it is not indefinite. The exam may test this exact limit.
Mistake
You can invoke an Express Workflow synchronously using StartSyncExecution.
Correct
StartSyncExecution is only available for Standard Workflows. Express Workflows only support the asynchronous StartExecution API. If you need a synchronous response, you must use Standard.
Mistake
Express Workflows are automatically idempotent.
Correct
Express Workflows are not inherently idempotent. If you retry an execution with the same input, it may produce duplicate side effects. You must design idempotency into your workflow (e.g., using idempotency keys in DynamoDB). Standard Workflows are idempotent by design because each execution has a unique ID and the service tracks state.
Mistake
Both workflow types support the same pricing model.
Correct
Standard Workflows are billed per state transition only. Express Workflows are billed per execution plus per state transition. For high volumes, Express is much cheaper because the per-execution cost is very low.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
5 minutes. If your workflow needs to run longer than 5 minutes, you must use a Standard Workflow, which supports up to 1 year. The exam often tests this limit by presenting a scenario where a process takes 10 minutes, and the correct answer is Standard.
No. Express Workflows do not support the .waitForTaskToken pattern because they do not persist state between transitions. If you need a callback, use a Standard Workflow. This is a common exam trap.
You must enable CloudWatch Logs on the state machine. Without it, you have no visibility into individual executions. You can also publish custom metrics to CloudWatch. For Standard Workflows, you can view execution history in the console without additional configuration.
The execution fails with an error. You must redesign the workflow to reduce the number of transitions, for example by using a Map state with parallelism or breaking the workflow into multiple executions. This is a key exam point.
Yes, but only for Standard Workflows using the StartSyncExecution API. Express Workflows only support asynchronous invocation via StartExecution. If you need the result in the same API call, you must use Standard.
Express Workflows. For example, 1 million executions with 10 transitions each costs about $1.25 with Express versus $250 with Standard. Express is designed for high throughput and low cost per execution.
Yes, you can integrate API Gateway with Express Workflows using the synchronous integration (StartExecution with a response). However, the API Gateway timeout is 29 seconds, so the workflow must complete within that time. For longer workflows, use Standard with StartSyncExecution.
You've just covered Step Functions: Standard vs Express Workflows — now see how well it sticks with free DVA-C02 practice questions. Full explanations included, no account needed.
Done with this chapter?