This chapter covers AWS Systems Manager Run Command, a powerful feature for executing commands across multiple EC2 instances and on-premises servers at scale. For the SOA-C02 exam, Run Command is a key topic under Domain 3 (Deployment) and Objective 3.2 (Automate deployment and management of AWS resources). Expect 5-10% of exam questions to touch on Systems Manager automation, with Run Command being the most commonly tested capability. Mastering Run Command is essential for automating operational tasks like patching, software installation, and inventory collection without SSH or RDP access.
Jump to a section
Imagine you are the IT manager for a company with 1,000 laptops spread across multiple offices. You need to run a diagnostic script on every laptop simultaneously. Instead of walking to each desk, you use a centralized remote control system. Each laptop has a small agent (like the SSM Agent) that listens for commands from a central server. You log into the central console, select all laptops by their asset tags (instance IDs), and issue the command. The central server sends the command to each agent over a secure channel. Each agent executes the command locally, captures the output, and sends it back. If a laptop is offline, the command fails for that device; you can retry later. The system also lets you target by department (tags) so you only affect certain groups. This mirrors how SSM Run Command works: the agent polls the SSM service for commands, executes them, and reports status. The command document defines the script to run, and parameters customize it per instance. Rate control prevents overwhelming the network, and error thresholds stop the operation if too many failures occur. Just like you wouldn't want to reboot 1,000 laptops at once without a plan, Run Command gives you control over concurrency and failure limits.
What is SSM Run Command?
AWS Systems Manager Run Command is a service that lets you remotely and securely manage the configuration of your managed instances (EC2 instances and on-premises servers) at scale. Instead of logging into each instance via SSH or RDP, you can run shell scripts, PowerShell commands, or predefined SSM documents across hundreds or thousands of instances simultaneously. Run Command is part of AWS Systems Manager, a suite of tools for operational management.
Run Command is designed for one-time or ad-hoc operations. For recurring tasks, you should use State Manager associations or Maintenance Windows. However, Run Command is ideal for urgent tasks like applying a security patch or gathering diagnostic data.
How Run Command Works Internally
SSM Agent: Each managed instance must have the SSM Agent installed and running. The agent is an open-source software that runs as a service (ssm-agent on Linux, AmazonSSMAgent on Windows). It communicates with the Systems Manager service in the AWS cloud over HTTPS (port 443). The agent does not open any inbound ports; it only initiates outbound connections to the SSM service endpoint.
Polling Mechanism: The agent polls the SSM service every 5 seconds by default to check for pending commands. This polling interval can be configured in the agent configuration file (amazon-ssm-agent.json). When a command is submitted via the AWS Management Console, CLI, or SDK, the SSM service makes it available for the agent to pick up.
Command Document: Every Run Command execution uses a document (SSM document) that defines the action to perform. Documents can be AWS-provided (predefined) or custom. A document is written in JSON or YAML and specifies the schema version, parameters, and steps. For example, the AWS-RunShellScript document runs a shell command on Linux instances.
Execution: The agent receives the command document, validates it, and executes the steps locally. The output (stdout, stderr) is captured and sent back to the SSM service. The agent also records the exit code (0 for success, non-zero for failure).
Status Reporting: The agent reports the status of each command execution to the SSM service. Statuses include Pending, InProgress, Success, Failed, TimedOut, Cancelled, and DeliveryTimedOut. You can view the status in the console or via the CLI.
Output Retrieval: By default, command output is stored in the SSM service for up to 30 days. You can also configure output to be written to an S3 bucket for long-term storage. The output is limited to 2500 characters per instance in the console; for full output, use S3 or the CLI with the --output-s3-bucket-name parameter.
Key Components and Defaults
SSM Agent: Version 2.2.93.0 or later recommended. The agent auto-updates by default (can be disabled).
Polling Interval: 5 seconds (configurable).
Command Timeout: Default is 3600 seconds (1 hour) for the command to complete. You can set a custom timeout per command.
Output Limit: 2500 characters displayed in console; full output available via S3 or CLI.
Maximum Concurrent Instances: No hard limit, but rate control is recommended for large fleets.
Error Threshold: You can set a maximum number or percentage of errors before the command stops executing on remaining instances.
Retry Count: Not automatically retried; you can resubmit the command.
Configuration and Verification Commands
Prerequisites:
- IAM role (instance profile) with permissions for SSM: AmazonSSMManagedInstanceCore policy.
- SSM Agent installed and running.
- Outbound internet access or VPC endpoint for Systems Manager (com.amazonaws.region.ssm, com.amazonaws.region.ec2messages, com.amazonaws.region.ssmmessages).
Example: Run a shell command on all instances tagged Environment=Production
aws ssm send-command \
--document-name "AWS-RunShellScript" \
--targets "Key=tag:Environment,Values=Production" \
--parameters '{"commands":["df -h"]}' \
--timeout-seconds 600 \
--max-concurrency "50" \
--max-errors "10" \
--region us-east-1Check command status:
aws ssm list-commands --command-id "command-id"List command invocations for each instance:
aws ssm list-command-invocations --command-id "command-id" --detailsGet output for a specific instance:
aws ssm get-command-invocation --command-id "command-id" --instance-id "i-1234567890abcdef0"Interaction with Related Technologies
State Manager: For recurring tasks, use State Manager associations instead of Run Command. Run Command is for one-time execution.
Maintenance Windows: Schedule Run Command executions during maintenance windows to control timing.
Patch Manager: Uses Run Command behind the scenes to apply patches.
Inventory: Uses Run Command to collect inventory data.
Parameter Store: Store sensitive parameters (passwords, keys) and reference them in commands.
S3: Store command output in S3 for audit and compliance.
CloudWatch Logs: Stream command output to CloudWatch Logs for real-time monitoring.
EventBridge: Trigger Run Command based on events (e.g., instance launch, CloudWatch alarms).
Rate Controls: MaxConcurrency and MaxErrors
MaxConcurrency: Specifies the maximum number of instances that can run the command simultaneously. Can be an absolute number or a percentage of the target set. For example, --max-concurrency "10" means at most 10 instances run at once. --max-concurrency "10%" means 10% of the targeted instances run concurrently. This prevents overwhelming the fleet or network.
MaxErrors: Specifies the maximum number or percentage of errors (failed commands) before the entire command execution stops. For example, --max-errors "5" stops after 5 failures. --max-errors "10%" stops after 10% of targeted instances fail. This is critical for large-scale operations to avoid cascading failures.
Targeting Instances
You can target instances using:
- Instance IDs: --instance-ids "i-xxx" "i-yyy"
- Tags: --targets "Key=tag:Name,Values=web-server"
- Resource Groups: --targets "Key=resource-groups:Name,Values=MyGroup" (requires resource group ARN)
- All managed instances: Omit --targets? Actually, you must specify targets. To target all, use a tag that applies to all instances.
Security and Permissions
To use Run Command, you need:
- Instance Profile: The instance must have an IAM role that grants ssm:SendCommand, ssm:ListCommands, etc. The managed policy AmazonSSMManagedInstanceCore provides these.
- User Permissions: The user or role issuing the command needs ssm:SendCommand, ssm:ListCommandInvocations, etc. The managed policy AmazonSSMFullAccess or a custom policy.
- KMS: If you want to encrypt command output, you can use a KMS key. The instance profile must have kms:Decrypt permission.
Common Use Cases
Patch Installation: Apply critical security patches to a fleet using AWS-RunPatchBaseline.
Software Installation: Install agents (e.g., CloudWatch Agent) using AWS-InstallApplication.
Inventory Collection: Gather system information using AWS-GatherSoftwareInventory.
Diagnostics: Run df -h, top, netstat to troubleshoot.
Configuration Changes: Update configuration files, restart services.
Limitations
Output Size: Maximum output per instance is 2500 characters in console; use S3 for larger output.
Timeout: Default 1 hour, max 48 hours (yes, 48 hours possible).
Command Payload: The entire command document and parameters must be less than 100 KB.
Rate Limits: There are API rate limits; for very large fleets, use rate controls.
Exam Tips
Know that Run Command is not for recurring tasks; that's State Manager.
Understand the difference between MaxConcurrency and MaxErrors.
Remember that the SSM Agent polls every 5 seconds.
Be able to identify the correct IAM permissions for both the instance and the user.
Know that you can target by tags, instance IDs, or resource groups.
Understand that output can be sent to S3 or CloudWatch Logs.
Remember that Run Command works for both EC2 and on-premises instances (with hybrid activation).
1. Install and Configure SSM Agent
Ensure the SSM Agent is installed and running on each target instance. For EC2 instances launched from AMIs that include the agent (Amazon Linux 2, Ubuntu 18.04+, Windows Server 2016+), it is pre-installed. For older AMIs or on-premises servers, install manually. The agent must have outbound HTTPS access to the SSM endpoints. Configure the agent with appropriate IAM instance profile permissions (AmazonSSMManagedInstanceCore). Verify agent status: `sudo systemctl status amazon-ssm-agent` on Linux or `Get-Service AmazonSSMAgent` on PowerShell.
2. Define the Command Document
Choose or create an SSM document that defines the command to run. Use an AWS-provided document like AWS-RunShellScript for Linux or AWS-RunPowerShellScript for Windows. For custom actions, write a document in JSON/YAML specifying schema version, parameters, and steps. The document can include multiple steps, conditional logic, and error handling. Upload custom documents using `aws ssm create-document`. The document is stored in SSM and referenced by name when sending commands.
3. Send the Command with Targeting
Use the AWS Management Console, CLI, or SDK to send the command. Specify the document name, parameters, and targets. Targets can be instance IDs, tags, or resource groups. Set rate controls: MaxConcurrency (how many instances run simultaneously) and MaxErrors (how many failures before stopping). Also set a timeout for command execution. The command is submitted to the SSM service, which makes it available for the agent to poll.
4. Agent Polls and Executes the Command
The SSM Agent on each target instance polls the SSM service every 5 seconds. When it finds a pending command that matches its instance ID or tag, it downloads the command document and parameters. The agent executes the command locally (e.g., runs the shell script). It captures stdout, stderr, and exit code. If the command exceeds the timeout, the agent terminates it and reports a TimedOut status.
5. Report Status and Retrieve Output
After execution, the agent sends the status (Success, Failed, etc.) and output back to the SSM service. You can view the status in the console or using `aws ssm list-commands` and `aws ssm list-command-invocations`. Retrieve output directly from the console (limited to 2500 characters) or from S3 if configured. For full output, use `aws ssm get-command-invocation` with the `--output-s3-bucket-name` option or download from S3.
Enterprise Scenario 1: Emergency Patching of a Critical Vulnerability
A financial services company discovers a critical remote code execution vulnerability in OpenSSL affecting all Linux servers. They have 2,000 EC2 instances across multiple accounts and regions. Using Run Command, they create a custom document that runs sudo yum update -y openssl and restarts the service. They target all instances tagged Environment=Production and OS=Linux. They set MaxConcurrency to 100 (5% of fleet) to avoid overwhelming the patch repository, and MaxErrors to 20 (1%) to halt if too many fail. The operation completes in 15 minutes. Without Run Command, each server would require SSH access, taking hours.
Scenario 2: Compliance Audit Data Collection
A healthcare provider must collect system inventory from 500 Windows servers for HIPAA compliance. They use Run Command with the AWS-GatherSoftwareInventory document to collect installed software, OS version, and patch status. Output is directed to an S3 bucket with server-side encryption. The operation runs weekly via a Scheduled Maintenance Window. If a server fails to respond (e.g., agent offline), the error is logged, and the team investigates. This automated collection replaces manual spreadsheet entry.
Scenario 3: Rolling Out a Custom Monitoring Agent
A SaaS company needs to install a custom monitoring agent on 1,000 EC2 instances. They create a custom SSM document that downloads the agent from S3, installs it, and starts the service. They use tags to target instances by environment (dev, staging, prod). For production, they set MaxConcurrency to 50 and MaxErrors to 5 to minimize impact. They also use a maintenance window to run during off-peak hours. The output is sent to CloudWatch Logs for real-time monitoring. After successful rollout, they use State Manager to ensure the agent remains installed.
Common Pitfalls in Production
Missing IAM Permissions: The instance profile lacks ssm:SendCommand or ec2:DescribeInstances. Result: instances are not visible as managed instances.
Agent Not Running: The SSM Agent may be stopped or outdated. Always verify agent status before large operations.
Network Issues: If instances cannot reach the SSM endpoints (e.g., no VPC endpoint, no internet gateway), commands will never be delivered. Use VPC endpoints for private subnets.
Rate Control Misconfiguration: Setting MaxConcurrency too high can overwhelm the fleet or downstream dependencies (e.g., patch repository). Setting MaxErrors too low can cause premature cancellation.
Output Too Large: If a command produces more than 2500 characters, the console truncates. Always configure S3 output for commands that generate large logs.
What SOA-C02 Tests on Run Command
This topic falls under Domain 3: Deployment (Objective 3.2: Automate deployment and management of AWS resources). The exam tests your ability to:
Identify when to use Run Command vs. State Manager vs. Maintenance Windows.
Configure rate controls (MaxConcurrency, MaxErrors) and understand their effect.
Troubleshoot common issues: agent not reporting, permission errors, network connectivity.
Interpret command statuses and output retrieval methods.
Understand targeting options: tags, instance IDs, resource groups.
Common Wrong Answers and Why Candidates Choose Them
"Run Command is for recurring tasks" – Candidates confuse Run Command with State Manager. Run Command is one-time; State Manager is for recurring associations.
"The SSM Agent listens on port 443 for incoming commands" – The agent initiates outbound connections; it does not listen on any port. This is a security design.
"MaxConcurrency controls the total number of instances that can run the command" – It controls the number that run *simultaneously*, not the total. The total is the target set.
"Command output is stored indefinitely" – Output is stored for 30 days by default. For longer retention, use S3.
"You must use instance IDs to target" – You can target by tags or resource groups, which is more scalable.
Specific Numbers and Terms to Memorize
Polling interval: 5 seconds
Default command timeout: 3600 seconds (1 hour)
Maximum output in console: 2500 characters
MaxConcurrency and MaxErrors can be absolute numbers or percentages (0-100%)
SSM Agent required version: 2.2.93.0 or later
Port: 443 (outbound HTTPS)
Statuses: Pending, InProgress, Success, Failed, TimedOut, Cancelled, DeliveryTimedOut
Edge Cases and Exceptions
Instance not managed: If an instance does not have the SSM Agent or proper IAM role, it won't appear as a target. The command will fail with "No targets matched."
Command timeout: If the command runs longer than the timeout, the agent terminates it and reports TimedOut. The partial output may be lost.
DeliveryTimedOut: This occurs when the agent does not pick up the command within a certain time (e.g., agent offline). The command is cancelled.
Cross-account targeting: Run Command cannot directly target instances in another account. Use Resource Groups or AWS Organizations.
On-premises servers: Use hybrid activations to register on-premises servers as managed instances. They require a managed instance ID.
How to Eliminate Wrong Answers
If the question mentions "scheduled" or "recurring", the answer is State Manager, not Run Command.
If the question asks about limiting the number of instances that run at the same time, think MaxConcurrency.
If the question asks about stopping the entire operation after too many failures, think MaxErrors.
If the question involves sending output to S3, remember the --output-s3-bucket-name parameter.
If the question involves encryption, think KMS.
If the question involves private subnets, think VPC endpoints.
Run Command executes commands on managed instances without opening inbound ports; the SSM Agent polls every 5 seconds.
Target instances using tags, instance IDs, or resource groups; tag-based targeting is exam-favorite.
MaxConcurrency controls simultaneous executions; MaxErrors stops the operation after too many failures.
Command output is stored for 30 days in SSM; use S3 for longer retention or large output.
The default command timeout is 3600 seconds (1 hour); max is 48 hours.
Run Command is for one-time tasks; use State Manager for recurring tasks.
Instances must have the SSM Agent installed and an IAM role with AmazonSSMManagedInstanceCore.
For private subnets, use VPC endpoints for Systems Manager (ssm, ec2messages, ssmmessages).
These come up on the exam all the time. Here's how to tell them apart.
SSM Run Command
One-time execution
Ad-hoc or manual trigger
Use for urgent tasks (patching, diagnostics)
No automatic retry
MaxConcurrency and MaxErrors for rate control
SSM State Manager
Recurring execution based on a schedule
Automatically enforced state
Use for ongoing compliance (e.g., ensure antivirus is running)
Retries automatically based on association
Association compliance reports
Mistake
Run Command opens an inbound SSH or RDP port to execute commands.
Correct
Run Command uses the SSM Agent, which initiates outbound HTTPS connections to the SSM service. No inbound ports are opened. Commands are pushed via the agent polling mechanism.
Mistake
Run Command is designed for recurring scheduled tasks.
Correct
Run Command is for one-time execution. For recurring tasks, use State Manager associations or Maintenance Windows.
Mistake
You can only target instances by instance ID.
Correct
You can target by instance IDs, tags, or resource groups. Tag-based targeting is preferred for scalability.
Mistake
Command output is stored forever in the SSM service.
Correct
Output is retained for 30 days. For long-term storage, configure output to be written to an S3 bucket.
Mistake
MaxConcurrency limits the total number of instances that will run the command.
Correct
MaxConcurrency limits the number of instances running the command simultaneously. The total number is the entire target set.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Use the `--targets` parameter with a tag key-value pair. For example, `--targets Key=tag:Environment,Values=Production`. This will target all instances with that tag, regardless of region or account? Actually, it targets instances in the same region and account. For cross-account, use Resource Groups.
MaxConcurrency limits how many instances run the command at the same time (e.g., 10 or 10%). MaxErrors limits how many failures are allowed before the entire command stops. Both can be absolute numbers or percentages.
Common causes: SSM Agent not installed or running, missing IAM instance profile (AmazonSSMManagedInstanceCore), no outbound internet access or VPC endpoints, or the instance is in a region that doesn't support SSM. Verify the agent status and network connectivity.
Yes, use the AWS-RunPowerShellScript document. For Linux, use AWS-RunShellScript. You can also create custom documents that run any command.
When sending the command, use the `--output-s3-bucket-name` and optionally `--output-s3-key-prefix` parameters. The SSM Agent will upload the output to the specified S3 bucket.
Yes, if you register them as managed instances using a hybrid activation. You need to install the SSM Agent and provide the activation code and ID. They appear as managed instances with an ID starting with 'mi-'.
The SSM Agent terminates the command and reports a TimedOut status. The output (if any) up to the timeout is captured. You can increase the timeout up to 48 hours.
You've just covered SSM Run Command for Bulk Operations — now see how well it sticks with free SOA-C02 practice questions. Full explanations included, no account needed.
Done with this chapter?