SAA-C03Chapter 140 of 189Objective 3.4

S3 Batch Operations

This chapter covers S3 Batch Operations, a powerful feature for performing bulk actions on billions of S3 objects with a single job. For the SAA-C03 exam, you need to understand when to use Batch Operations versus scripting, the types of operations supported, manifest formats, job lifecycle, and integration with Lambda. While not a major topic (appearing in perhaps 1-2 questions), it is often paired with questions on data lifecycle management, cost optimization, and automation. Master the key concepts, operation types, and failure handling to confidently answer exam questions.

25 min read
Intermediate
Updated May 31, 2026

Batch Processing Like a Factory Assembly Line

S3 Batch Operations is like a factory assembly line that processes a large batch of items (objects) stored in a warehouse (S3 bucket). The factory manager (you) creates a job specification (manifest) that lists every item to be processed. You then choose a specific task (operation) to perform on each item, such as tagging, copying, or restoring from archive. The factory has a conveyor belt that moves items one by one through the processing station. The station uses a specialized tool (Lambda function or built-in operation) to perform the work. The factory automatically retries failed items up to a set number of times (retry count). You can monitor the progress on a dashboard (console) and receive a completion report. The factory can also notify you (SNS) when the job finishes. However, the factory can only process items from a single warehouse (bucket) per job, and the manifest must be stored in the same region. This is analogous to S3 Batch Operations where you create a job with a manifest, choose an operation (e.g., 'Put object tagging', 'Copy', 'Restore', or invoke a Lambda function), and the service executes the operation on each object in the manifest, with retries and monitoring.

How It Actually Works

What is S3 Batch Operations?

S3 Batch Operations is a managed service that enables you to perform bulk, asynchronous operations on billions of S3 objects with a single API request, CLI command, or console action. It eliminates the need to write custom scripts that iterate over objects, handle retries, and manage concurrency. The service automatically scales to process objects in parallel across multiple AWS-managed resources, providing progress tracking, completion notifications, and detailed failure reports.

Why It Exists

Before S3 Batch Operations, performing bulk actions (like tagging, copying, or restoring objects) required custom code using the S3 API or SDK, often involving pagination, rate limiting, error handling, and multi-threading. This was time-consuming and error-prone. S3 Batch Operations abstracts away these complexities, offering a serverless, scalable, and auditable solution. It is especially useful for large-scale data management, compliance, and cost optimization tasks.

How It Works Internally

When you create a batch job, you provide:

A manifest that lists the objects to process (either a CSV file or an S3 Inventory report)

An operation to perform on each object

Optional Lambda invocation for custom processing

Priority and retry settings

Completion report configuration

The service reads the manifest, divides the objects into chunks, and processes them in parallel using AWS-managed compute resources. For each object, it executes the specified operation. If the operation fails, it retries up to the configured number of times (default 1, maximum 5). After all objects are processed (or failed), the service generates a completion report in CSV format, stored in a specified S3 bucket. You can also receive an SNS notification upon job completion.

Key Components, Values, Defaults, and Timers

- Manifest: Required. Can be a CSV object listing bucket, key, and version ID, or an S3 Inventory report (CSV or ORC format). The manifest must be in the same AWS region as the target bucket. - Operations: Supported operations include: - Put object tagging – add/replace tags - Copy – copy objects within or across buckets (same region or cross-region) - Restore – restore objects from S3 Glacier or S3 Glacier Deep Archive - Invoke AWS Lambda – run a Lambda function on each object - Put object ACL – update ACLs (legacy) - Put object legal hold – set legal hold - Put object retention – set retention period - Initiate multipart upload – start multipart upload (rarely used alone) - Lambda invocation: You can specify a Lambda function ARN. The function receives an event with object details and must return a result (success/failure). The Lambda function must be in the same region as the job. - Retry mode: You can set the number of retries (0-5). Default is 1. The retry count applies to each individual task (object operation). - Priority: Jobs with higher priority (1-1000) are processed first. Default is 0 (lowest). - Completion report: Optional. If enabled, a CSV report is written to a specified bucket after job completion (including partial failures). The report contains status for each object (succeeded/failed) and error codes. - SNS notification: Optional. You can configure an SNS topic to receive a notification when the job completes (success or failure). - Job statuses: Active, Cancelling, Cancelled, Complete, Completing, Failed, Failing, New, Paused, Pausing, Preparing, Ready, Suspended (due to permissions issues). - Permissions: The service requires an IAM role with permissions to read the manifest, perform the operation on objects, write the completion report, and invoke Lambda (if used). The role must be passed in the job creation. - Concurrent job limit: Up to 100,000 active jobs per account per region. Each job can process up to billions of objects. - Pricing: You pay per object processed (successful or failed), plus any S3 API costs (e.g., tagging, copying) and Lambda invocation costs.

Configuration and Verification Commands

To create a batch job using the AWS CLI:

aws s3control create-job \
    --account-id 123456789012 \
    --operation '{"S3PutObjectTagging": {"TagSet": [{"Key": "department", "Value": "finance"}]}}' \
    --manifest '{"Spec": {"Format": "S3BatchOperations_CSV_20180820", "Fields": ["Bucket", "Key"]}, "Location": {"ObjectArn": "arn:aws:s3:::my-manifest-bucket/manifest.csv", "ETag": "abc123"}}' \
    --report '{"Bucket": "arn:aws:s3:::my-report-bucket", "Format": "Report_CSV_20180820", "Enabled": true, "Prefix": "reports", "ReportScope": "AllTasks"}' \
    --priority 10 \
    --role-arn arn:aws:iam::123456789012:role/S3BatchOperationsRole \
    --region us-east-1

To monitor job status:

aws s3control describe-job --account-id 123456789012 --job-id <job-id> --region us-east-1

To list jobs:

aws s3control list-jobs --account-id 123456789012 --job-statuses Active Complete --region us-east-1

To update job priority:

aws s3control update-job-priority --account-id 123456789012 --job-id <job-id> --priority 50 --region us-east-1

To cancel a job:

aws s3control cancel-job --account-id 123456789012 --job-id <job-id> --region us-east-1

Interaction with Related Technologies

S3 Inventory: You can use an S3 Inventory report as the manifest, enabling you to process objects based on current inventory data. Inventory reports can be in CSV or ORC format.

AWS Lambda: You can invoke a Lambda function for custom processing (e.g., image thumbnailing, data transformation). The Lambda function must be idempotent because the same object may be retried.

Amazon SNS: Receive notifications when jobs complete, useful for triggering downstream workflows.

AWS CloudTrail: Records API calls for job creation, updates, and cancellations, enabling auditing.

IAM: Requires a service-linked role or custom role with specific permissions. The role must have iam:PassRole permission for the user creating the job.

S3 Glacier: Use batch operations to restore objects from Glacier or Deep Archive. The restore operation can take hours, but the batch job marks objects as pending restore and retries if the object is not yet available.

S3 Object Lock: You can set legal hold or retention via batch operations, useful for compliance.

Important Exam Considerations

Batch operations can only process objects in a single bucket per job. If you need to process multiple buckets, create separate jobs.

The manifest must list objects in the same region as the job. Cross-region manifests are not supported.

The Lambda function invoked must be in the same region as the job.

Batch operations do not support S3 Select or S3 Batch Operations on objects in S3 Express One Zone (though this may change).

The completion report includes both succeeded and failed tasks. You can use it to retry failed objects.

Jobs can be paused and resumed, but cancelling a job is irreversible.

The default retry count is 1, meaning each object will be attempted at most twice (original attempt + 1 retry).

Walk-Through

1

Create Manifest

Create a CSV file or use an S3 Inventory report listing all objects to process. The CSV must have headers: Bucket, Key, and optionally VersionId. Each row specifies an object. The manifest must be stored in an S3 bucket in the same region as the batch job. For large datasets, use S3 Inventory to generate the manifest automatically, which can be in CSV or ORC format. The manifest object must have an ETag that you provide when creating the job.

2

Define Operation

Choose one of the supported operations: Put object tagging, Copy, Restore, Invoke Lambda, Put object ACL, Put object legal hold, Put object retention, or Initiate multipart upload. For each operation, provide necessary parameters (e.g., tag set for tagging, destination bucket/key for copy, days for restore). If using Lambda, provide the function ARN. The operation is defined in JSON when using CLI or SDK.

3

Configure Job Settings

Set priority (1-1000, higher is processed first), retry count (0-5, default 1), completion report options (bucket, prefix, format, scope), and SNS topic ARN for notifications. Also specify an IAM role that grants permissions for reading manifest, performing operations, and writing reports. The role must have trust policy allowing S3 Batch Operations to assume it.

4

Submit Job

Use the AWS CLI, SDK, or Console to create the job. The job enters the 'New' state, then transitions to 'Preparing' while the service validates the manifest and permissions. If validation passes, it becomes 'Ready' and then 'Active' as processing begins. If permissions are missing, the job becomes 'Suspended' until you update the IAM role.

5

Monitor and Handle Results

Monitor job progress via CloudWatch metrics (e.g., number of tasks completed, failed), or by checking the job status. After all tasks are processed, the job enters 'Complete' or 'Failed' (if unrecoverable errors). The completion report is written to the specified S3 bucket. If configured, an SNS notification is sent. Review the report to identify failed objects and retry them if needed.

What This Looks Like on the Job

Enterprise Scenarios

1. Adding Tags to Millions of Objects for Cost Allocation

A large enterprise uses S3 to store petabytes of data across multiple accounts and regions. To implement cost allocation, they need to add a 'CostCenter' tag to every object. Instead of writing a Python script that uses list-objects and put-object-tagging with pagination and rate limiting, they create an S3 Inventory report for each bucket, then submit a batch job with the 'Put object tagging' operation. The job processes 50 million objects in a few hours, automatically retrying failures. The completion report shows 99.9% success; the remaining 0.1% are objects with permissions issues, which they fix and rerun a smaller job. This approach saved weeks of development and avoided throttling.

2. Bulk Restore from Glacier Deep Archive for Compliance Audit

A financial services company must retrieve archived data from S3 Glacier Deep Archive for a regulatory audit. They have 10 million objects archived. Using batch operations with the 'Restore' operation, they submit a job specifying a restore tier (Expedited, Standard, Bulk) and a retention period. The job automatically initiates restore requests for each object. However, they must ensure that the restore completes before accessing the data. The batch job does not wait for restoration to complete; it only initiates the request. To verify, they can use S3 Inventory to check the restore status. Common mistake: assuming the batch job waits for restore completion. It does not.

3. Copying Objects Between Buckets for Data Migration

A media company needs to copy 200 TB of video files from an old bucket to a new bucket with different encryption settings. They use the 'Copy' operation with the destination bucket and optional parameters like storage class, encryption, and metadata. The batch job handles copying, including large objects (up to 5 GB per copy; for larger objects, use S3 Batch Operations with Lambda to initiate multipart copy). However, cross-region copy is supported but incurs data transfer costs. They set a priority of 100 to expedite the job. After completion, they verify the copy using the completion report and then delete the source objects with a separate batch job.

How SAA-C03 Actually Tests This

SAA-C03 Exam Focus

S3 Batch Operations is tested under Domain 3: High Performance (Objective 3.4: Select appropriate automation and optimization strategies). Expect 1-2 scenario-based questions where you must choose between batch operations, S3 events, Lambda, or custom scripts.

Common Wrong Answers and Why

1.

Using S3 Event Notifications instead of Batch Operations – Candidates often choose S3 Events for bulk tagging or copying. However, S3 Events are real-time and triggered per object creation. For existing objects, you need batch operations. The exam will describe a scenario with millions of existing objects needing a one-time operation. Wrong answer: "Use S3 Events to trigger a Lambda function." Correct: "Use S3 Batch Operations."

2.

Assuming batch operations can process objects across multiple buckets in one job – The exam may describe a situation with objects in multiple buckets. Some candidates think a single batch job can handle all. But each job is limited to one bucket. The correct answer is to create separate jobs per bucket.

3.

Choosing AWS Glue or EMR for simple tagging – Overkill. Batch operations are simpler and cheaper for straightforward operations.

4.

Thinking Lambda invocation is synchronous – Candidates may assume the Lambda function runs immediately and waits for completion. In reality, the batch job invokes the Lambda function asynchronously and the function must return success/failure within a timeout (default 5 minutes). The job does not wait for the Lambda to finish processing the object; it only waits for the invocation response.

Specific Numbers and Values

Maximum retries: 5 (default 1)

Priority range: 1-1000 (higher is better)

Concurrent job limit: 100,000 per account per region

Supported operations: PutObjectTagging, Copy, Restore, InvokeLambda, PutObjectAcl, PutObjectLegalHold, PutObjectRetention, InitiateMultipartUpload

Manifest formats: CSV (S3BatchOperations_CSV_20180820) or S3 Inventory (CSV/ORC)

Completion report format: Report_CSV_20180820

Edge Cases

If the manifest lists objects that do not exist, they are marked as failed with 'NoSuchKey' error.

If the IAM role lacks permissions, the job is suspended (not failed). You can update the role and resume.

Cross-region copy: Supported but incurs data transfer costs. The job must be created in the destination region.

Batch operations on objects in S3 Express One Zone: Not supported as of 2025 (check exam updates).

Eliminating Wrong Answers

If the question mentions "existing objects" and "one-time bulk operation", eliminate S3 Events, Lambda triggers, and Lifecycle policies – those are for ongoing or future operations.

If the question mentions multiple buckets, reject any single-job answer.

If the question mentions custom processing beyond built-in operations, the answer likely involves Lambda invocation with batch operations, not just batch operations alone.

Key Takeaways

S3 Batch Operations is ideal for one-time bulk operations on billions of existing S3 objects.

Each job processes objects from a single bucket only; use multiple jobs for multiple buckets.

Supported operations: PutObjectTagging, Copy, Restore, InvokeLambda, PutObjectAcl, PutObjectLegalHold, PutObjectRetention, InitiateMultipartUpload.

Manifest can be a CSV file or an S3 Inventory report (CSV or ORC format).

Default retry count is 1; maximum is 5.

Priority ranges from 0 (lowest) to 1000 (highest).

Lambda functions invoked must be idempotent and complete within the Lambda timeout.

Restore operation only initiates the restore; it does not wait for completion.

Jobs can be paused, resumed, or cancelled. Cancellation is irreversible.

Completion report includes both success and failure details for each object.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

S3 Batch Operations

Managed service – no infrastructure to manage.

Automatically scales to billions of objects.

Built-in retry and error handling.

Provides completion report and SNS notifications.

Pay per object processed ($1 per million objects typical).

Custom Script (e.g., Python with boto3)

Requires writing and maintaining code.

Must handle pagination, rate limiting, and concurrency.

Retry logic must be implemented manually.

Custom logging and monitoring required.

Only pay for compute (e.g., EC2 or Lambda) and S3 API costs.

S3 Batch Operations

Designed for bulk operations on existing objects.

Manually triggered (one-time or scheduled).

Processes objects from a manifest.

Supports built-in operations without Lambda.

Can invoke Lambda for custom processing.

S3 Event Notifications + Lambda

Real-time, triggered on object creation/events.

Continuous processing for new objects.

Processes objects as they are created.

Requires Lambda for any processing.

Not suitable for backfilling existing objects.

Watch Out for These

Mistake

S3 Batch Operations can process objects across multiple buckets in a single job.

Correct

Each batch job can only process objects from one bucket. To process multiple buckets, you must create separate jobs, each with its own manifest.

Mistake

Batch operations with Restore will wait for the restore to complete before marking the task as successful.

Correct

The batch job only initiates the restore request. It does not wait for the restore to complete. The task is considered successful if the restore request is accepted, even if the object is still being restored.

Mistake

Lambda functions invoked by batch operations can take longer than 5 minutes to complete.

Correct

The Lambda function must return a response within the Lambda timeout (default 5 minutes, max 15 minutes). If the function takes longer, it will time out and the task will be marked as failed.

Mistake

Batch operations can use S3 Inventory reports in any format.

Correct

Only CSV and ORC formats are supported for S3 Inventory reports. Parquet is not supported.

Mistake

You can use batch operations to change the storage class of objects directly.

Correct

There is no direct 'change storage class' operation. You can achieve this by using the Copy operation to copy the object to the same key with a different storage class, or by using Lifecycle policies (which are not batch operations).

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

Can S3 Batch Operations process objects in multiple buckets at once?

No, each batch job is limited to objects in a single bucket. If you need to process objects in multiple buckets, you must create separate batch jobs for each bucket. This is a common exam trap.

What is the default retry count for S3 Batch Operations?

The default retry count is 1, meaning each object will be attempted up to two times (original attempt plus one retry). You can set it from 0 to 5. The exam may ask about the default value.

Can I use S3 Batch Operations to change the storage class of objects?

There is no direct operation to change storage class. However, you can achieve this by using the Copy operation to copy the object to the same key with a different storage class. Alternatively, use S3 Lifecycle policies for ongoing transitions.

Does S3 Batch Operations support cross-region copy?

Yes, the Copy operation supports cross-region copy. You must create the batch job in the destination region, and the manifest must list source objects in the source region. Data transfer costs apply.

What happens if the IAM role for a batch job lacks permissions?

The job will be suspended (status 'Suspended'). You can update the IAM role with the necessary permissions and then resume the job. It will not fail permanently.

Can I use S3 Batch Operations with objects in S3 Express One Zone?

As of 2025, S3 Batch Operations does not support S3 Express One Zone. Check the latest AWS documentation for updates, as this may change.

How does the Lambda invocation work with S3 Batch Operations?

You specify a Lambda function ARN. For each object, the batch job invokes the Lambda function asynchronously with an event containing the object details. The function must return a result (success/failure) within the Lambda timeout. The job does not wait for the function to complete any asynchronous work.

Terms Worth Knowing

Ready to put this to the test?

You've just covered S3 Batch Operations — now see how well it sticks with free SAA-C03 practice questions. Full explanations included, no account needed.

Done with this chapter?