Amazon S3 (Simple Storage Service) is the foundational object storage service on AWS, and it is heavily tested on the DVA-C02 exam. This chapter covers everything a developer needs to know about S3, including bucket configuration, object management, security, performance optimization, and integration with other AWS services. Expect approximately 10-15% of exam questions to directly involve S3, either as the primary topic or as part of a multi-service scenario.
Jump to a section
Imagine a massive warehouse with infinite shelving space. You can store any item (object) in any box (bucket), and each item has a unique barcode (key). The warehouse has multiple conveyor belts (S3 APIs) that let you add, retrieve, or remove items. But there's no central index; to find an item, you must know its exact barcode and which box it's in. The warehouse also has different storage zones: a 'frequent access' zone (Standard) for items you grab often, a 'cold storage' zone (Glacier) for items you might need once a year, and an 'archive' zone (Deep Archive) for items you'll rarely touch. To save money, you can set rules (lifecycle policies) that automatically move items from the frequent zone to cold storage after 30 days, then to archive after a year. The warehouse also has a special service (S3 Transfer Acceleration) that uses a network of fast couriers (AWS edge locations) to speed up long-distance deliveries. If you accidentally delete an item, you can recover it if you enabled versioning, which keeps all previous versions like a time machine. Access to the warehouse is controlled by a security guard (IAM policies) and a bucket-specific access list (bucket policies). The warehouse can also generate logs of every action (server access logs) and send notifications (S3 Events) to other services when items arrive or are removed.
What is Amazon S3 and Why It Exists
Amazon S3 is a fully managed object storage service that stores data as objects within buckets. Unlike block storage (EBS) or file storage (EFS), S3 is designed for scalability, durability, and low cost. It is the backbone of many cloud-native applications, serving as a data lake, backup target, static website host, and content distribution origin. The DVA-C02 exam focuses on developer-oriented features: programmatic access, security mechanisms, performance patterns, and event-driven integrations.
How S3 Works Internally
S3 is a key-value store. Each object is identified by a unique key (the full path, e.g., myfolder/image.jpg) within a bucket. The bucket name must be globally unique across all AWS accounts and regions. Objects consist of data (the file), metadata (key-value pairs), and a version ID (if versioning is enabled). When you upload an object, S3 stores it across multiple devices in multiple Availability Zones (for Standard storage) to achieve 99.999999999% (11 9's) durability. The service uses a distributed hash table to locate objects: given a bucket and key, the system routes the request to the correct partition.
Key Components, Values, Defaults, and Timers
Buckets: Up to 100 buckets per account by default (soft limit, can be increased to 1000). Bucket names must be 3-63 characters, lowercase, no underscores, and must not be formatted as an IP address.
Objects: Each object can be from 0 bytes to 5 TB. The maximum upload size in a single PUT is 5 GB. For larger objects, you must use multipart upload (up to 5 TB).
Storage Classes: Standard (frequent access), Intelligent-Tiering (auto-moves between tiers), Standard-IA (infrequent access), One Zone-IA, Glacier Instant Retrieval, Glacier Flexible Retrieval, Glacier Deep Archive. Minimum storage duration charges apply for IA and Glacier classes: 30 days for Standard-IA and One Zone-IA, 90 days for Glacier Instant Retrieval, 90 days for Glacier Flexible Retrieval, 180 days for Glacier Deep Archive.
Lifecycle Policies: Transition actions (move to another class after N days) and expiration actions (delete objects after N days). For versioned buckets, you can apply separate rules for current and noncurrent versions.
Versioning: Off by default. Once enabled, it cannot be disabled, only suspended. Each object version has a unique version ID. When you delete an object without specifying a version, S3 adds a delete marker instead of actually removing the object.
Consistency Model: S3 provides read-after-write consistency for PUTS of new objects, but eventual consistency for overwrite PUTS and DELETES. This means after updating an existing object, a subsequent GET might return the old version for a short time.
Event Notifications: Can be configured to publish to SNS, SQS, or Lambda when objects are created, deleted, or restored. Eventual consistency applies: you may not receive a notification for a delete if it was overwritten quickly.
Configuration and Verification Commands
Using the AWS CLI (v2):
# Create a bucket
aws s3api create-bucket --bucket my-bucket --region us-east-1
# List objects
aws s3api list-objects --bucket my-bucket
# Enable versioning
aws s3api put-bucket-versioning --bucket my-bucket --versioning-configuration Status=Enabled
# Set lifecycle policy
aws s3api put-bucket-lifecycle-configuration --bucket my-bucket --lifecycle-configuration file://lifecycle.json
# Upload a file (multipart automatically used for large files)
aws s3 cp largefile.zip s3://my-bucket/
# Generate a presigned URL (valid for 3600 seconds)
aws s3 presign s3://my-bucket/myfile.txt --expires-in 3600Verification:
Use aws s3api get-bucket-versioning --bucket my-bucket to check versioning status.
Use aws s3api get-bucket-lifecycle-configuration --bucket my-bucket to review lifecycle rules.
Use aws s3api get-object --bucket my-bucket --key myfile.txt output.txt to download and verify.
How S3 Interacts with Related Technologies
CloudFront: S3 can serve as the origin for a CloudFront distribution. Use Origin Access Control (OAC) to restrict access so that only CloudFront can read from the bucket. CloudFront caches objects at edge locations, reducing latency and S3 data transfer costs.
Lambda: S3 event notifications can trigger Lambda functions. For example, automatically resize an image when uploaded. Ensure the Lambda execution role has permissions to read from the bucket and write to the destination.
SQS/SNS: S3 can send events directly to SQS queues or SNS topics for decoupled processing. This is useful for large-scale workflows.
AWS KMS: S3 integrates with KMS for server-side encryption (SSE-KMS). You can specify a customer managed key. Note that SSE-KMS has a request rate limit of 5,500, 10,000, or 30,000 requests per second per region (depending on the key), which can affect high-throughput applications.
AWS Glue: S3 is a common data lake store. Glue crawlers can catalog data in S3, and Glue ETL jobs can read/write from S3.
Performance Optimization
Multipart Upload: Recommended for objects over 100 MB. For files over 5 GB, it is required. It parallelizes uploads and improves throughput.
Byte-Range Fetches: For large downloads, use byte-range GETs to parallelize reads. This is useful for large files or when you need only portions of an object.
S3 Transfer Acceleration: Uses AWS edge locations to speed up uploads over long distances. Enabled per bucket. Costs extra. Use the aws s3 cp --endpoint-url https://bucketname.s3-accelerate.amazonaws.com or set the endpoint in SDK.
Burst vs. Baseline Performance: S3 automatically scales to handle high request rates. There is no limit to the number of buckets or objects, but you should distribute objects across many prefixes to achieve higher throughput. For example, use a random prefix like hex/ instead of date/ to avoid hot partitions.
S3 Select: Use SQL-like queries to retrieve only a subset of data from an object (CSV, JSON, Parquet). Reduces data transfer and latency for analytics.
Security Mechanisms
- IAM Policies: Grant or deny actions at the user/role level. Can be resource-based (attached to a bucket or object) or identity-based. - Bucket Policies: Resource-based policies attached to the bucket. They can grant cross-account access or enforce conditions (e.g., HTTPS, IP restrictions). - Access Control Lists (ACLs): Legacy mechanism. Not recommended for new deployments. Best practice is to use bucket policies or IAM. - Pre-Signed URLs: Generate temporary URLs for time-limited access to specific objects. Useful for sharing private content or allowing uploads from unauthenticated users. The URL includes a signature that expires. - Encryption: - SSE-S3: Server-side encryption with S3-managed keys. AES-256. - SSE-KMS: Server-side encryption with AWS KMS managed keys. Provides audit trail (CloudTrail) and control over key rotation. - SSE-C: Server-side encryption with customer-provided keys. You manage the keys; S3 discards them after decryption. - Client-Side Encryption: Encrypt data before uploading. AWS SDK supports this with the AmazonS3EncryptionClient. - Block Public Access: Four settings that can be applied at account or bucket level to prevent public access. These are a safety net.
Cross-Region Replication (CRR) and Same-Region Replication (SRR)
Replicates objects across buckets in different (CRR) or same (SRR) regions. Requires versioning enabled on both source and destination. Replication is asynchronous and eventual. Only new objects are replicated by default; you can optionally replicate existing objects. Delete markers are not replicated by default (unless configured). Replication time depends on object size and distance.
S3 Batch Operations
Allows you to perform bulk actions (e.g., copy, tag, restore) on billions of objects using a manifest. You can invoke a Lambda function per object. Useful for large-scale data management.
S3 Inventory
Generates a daily or weekly CSV/Parquet report of all objects in a bucket, including metadata. Helps with compliance and lifecycle management.
S3 Object Lock
Prevents objects from being deleted or overwritten for a fixed period (retention) or indefinitely (legal hold). Requires versioning. Used for regulatory compliance.
S3 Glacier and Glacier Deep Archive
For long-term archival. Retrieval times: Glacier Flexible Retrieval (1-5 minutes to 12 hours), Glacier Deep Archive (12-48 hours). You can use expedited, standard, or bulk retrievals. Lifecycle policies can transition objects automatically.
S3 Event Notifications and Destination Configurations
Events can be sent to SNS, SQS, or Lambda. You can filter by object key prefix and suffix. For high-volume scenarios, consider using SQS or Lambda with reserved concurrency to handle bursts. Events are delivered at least once, but duplicates may occur.
S3 Access Points
Simplify managing access for large datasets. Each access point has its own policy and network controls (e.g., VPC origin). You can create multiple access points for different use cases.
S3 Multi-Region Access Points
Provide a single global endpoint that routes requests to the nearest region's bucket. Useful for multi-region applications. Requires CRR.
S3 Object Lambda
Allows you to modify objects on the fly as they are retrieved. For example, redact PII or resize images without storing multiple copies. You define a Lambda function that transforms the data.
S3 Storage Lens
Provides a dashboard and metrics for storage usage and activity across all accounts/regions. Helps optimize costs and identify anomalies.
S3 Outposts
Extends S3 to on-premises AWS Outposts. Provides local object storage for low-latency applications.
S3 on AWS Snow Family
Snowball Edge devices can store S3 objects for offline data transfer.
S3 and VPC Endpoints
Use Gateway VPC Endpoints (for S3) to access S3 from within a VPC without going over the internet. This is free and uses AWS private IPs. For on-premises, use Interface Endpoints (PrivateLink) which cost per hour.
S3 and CloudWatch Metrics
You can enable request metrics (e.g., number of 4xx errors, latency) at the bucket or prefix level. These are billed. Storage metrics are available via CloudWatch automatically.
S3 and AWS CloudTrail
CloudTrail logs all S3 API calls for auditing. You can also enable S3 server access logs (to a separate bucket) for detailed access records.
S3 and AWS Config
Can track configuration changes to buckets (e.g., policy changes, encryption settings).
S3 and AWS Trusted Advisor
Provides checks for bucket permissions, performance, and cost optimization (e.g., identifying underutilized storage classes).
S3 and AWS Cost Explorer
Helps analyze S3 costs by storage class, region, request type, and data transfer.
Create an S3 Bucket
Use the AWS Management Console, CLI, or SDK to create a bucket. The bucket name must be globally unique and DNS-compliant. You must specify a region where the bucket will reside. Optionally, you can enable versioning, default encryption, and block public access settings. The bucket is created with a default set of permissions (private).
Configure Bucket Policies and IAM
Define who can access the bucket and what actions they can perform. Use IAM policies for users/roles within your account, and bucket policies for cross-account or public access. Conditions such as IP address range, VPC endpoint, or MFA can be enforced. For fine-grained control, use bucket policies with principal, action, resource, and condition elements.
Upload Objects
Objects can be uploaded via the console, CLI (aws s3 cp), or SDK. For large objects, use multipart upload to parallelize and improve throughput. Each object has a key (name), metadata, and optional tags. You can set storage class at upload time or via lifecycle policy later. Encryption can be applied server-side or client-side.
Set Up Lifecycle Policies
Define rules to transition objects to lower-cost storage classes or expire them after a specified number of days. For example, move objects to Standard-IA after 30 days, to Glacier after 90 days, and delete after 365 days. Lifecycle policies can be applied to a whole bucket or filtered by prefix/tags. For versioned buckets, you can set separate rules for current and noncurrent versions.
Enable Event Notifications
Configure the bucket to send events (e.g., s3:ObjectCreated:Put) to a destination (SNS, SQS, Lambda). You can filter by prefix and suffix. Ensure the destination resource policy allows S3 to publish. For Lambda, the function must have permission to be invoked by S3. Events are delivered asynchronously.
Generate Pre-Signed URLs
Use the AWS SDK to generate a URL that grants temporary access to a specific object. The URL includes a signature that expires after a specified time (default 3600 seconds). The user who generates the URL must have permissions to perform the intended operation (GET or PUT). Pre-signed URLs are commonly used for private content sharing or direct uploads from clients.
Enterprise Scenario 1: Media Asset Management
A media company stores raw video files (10-50 GB each) in an S3 bucket with versioning enabled. Editors need to download files from offices worldwide. The company uses S3 Transfer Acceleration to speed up uploads from remote locations. Lifecycle policies automatically move completed projects from Standard to Glacier after 90 days. For distribution, they use CloudFront with OAC to serve processed videos. They monitor costs with Storage Lens and set up budget alerts. Misconfiguration: They initially used a single prefix for all uploads, causing throttling during peak hours. They solved it by distributing files across random prefixes (e.g., UUID-based keys).
Enterprise Scenario 2: Data Lake for Analytics
A financial institution ingests terabytes of transaction data daily into an S3 data lake. They use S3 Inventory to track objects and S3 Select to query subsets without downloading entire files. They enable SSE-KMS for encryption and restrict access via VPC endpoints to prevent data exfiltration. They replicate critical data to another region using CRR for disaster recovery. Common issue: They exceeded the KMS request rate limit during high-volume ingestion, causing throttling. They switched to SSE-S3 for less sensitive data and used a dedicated KMS key for sensitive data with higher limits.
Enterprise Scenario 3: Web Application Static Assets
A SaaS company hosts its frontend (HTML, CSS, JS) on S3 with static website hosting enabled. They use a CloudFront distribution with a custom domain and HTTPS. They configure bucket policy to allow only CloudFront's OAC to read objects. They use Lambda@Edge to modify headers. They set up S3 event notifications to invalidate CloudFront cache when assets are updated. Mistake: They initially made the bucket public, which posed a security risk. They later enabled Block Public Access and restricted access via OAC.
What DVA-C02 Tests on S3
The DVA-C02 exam tests your ability to choose the correct S3 feature for a given scenario. Key objective codes: Domain 1 (Development) with Objective 1.4 (Identify the appropriate AWS service for a given use case), and Domain 2 (Security) with Objective 2.1 (Implement and manage security and compliance). Expect questions on: - Storage Classes: When to use Standard vs. Intelligent-Tiering vs. Glacier. Remember the minimum storage duration charges. - Encryption: Differences between SSE-S3, SSE-KMS, SSE-C, and client-side encryption. Know that SSE-KMS provides audit trail and separate permissions. - Pre-Signed URLs: How to generate and use them for temporary access. The default expiration is 3600 seconds. - Versioning: Behavior of delete markers, how to permanently delete, and that versioning cannot be disabled once enabled. - Lifecycle Policies: Transition rules and expiration actions, especially for versioned buckets. - Event Notifications: Destinations and filtering. Know that S3 can send events to SNS, SQS, and Lambda. - Cross-Region Replication: Requires versioning, asynchronous, only new objects by default. - Performance: Multipart upload (required >5 GB, recommended >100 MB), byte-range fetches, random prefix for high throughput.
Common Wrong Answers and Why Candidates Choose Them
Using S3 Standard-IA for frequently accessed data: Candidates think 'infrequent access' means 'cheaper', but they forget the retrieval fee and minimum duration. Standard is better for frequent access.
Enabling versioning to protect against accidental deletion without knowing about delete markers: They might think versioning prevents any deletion, but actually it adds a delete marker, and the object is still recoverable. The exam tests understanding of delete markers.
Choosing SSE-C when they need an audit trail: SSE-C does not log key usage; SSE-KMS does via CloudTrail. Candidates often pick SSE-C because they think they control the keys, but they miss the audit requirement.
Selecting S3 Transfer Acceleration for all scenarios: It is only beneficial for long-distance uploads. For short distances, it may add latency and cost. The exam tests when to use it.
Assuming S3 provides strong consistency for overwrites: It does not. It provides eventual consistency for overwrite PUTs and DELETEs. Candidates might think it's read-after-write for all operations.
Specific Numbers and Terms That Appear Verbatim
5 TB maximum object size
5 GB maximum single PUT
100 buckets per account (soft limit)
11 9's durability (99.999999999%)
99.99% availability for Standard
3600 seconds default pre-signed URL expiration
30 days minimum for Standard-IA
90 days minimum for Glacier Flexible Retrieval
180 days minimum for Glacier Deep Archive
Edge Cases and Exceptions
Object Lock with Governance mode: Users with special permissions can override retention settings. The exam might test that Compliance mode cannot be overridden.
Multipart upload completion: If you don't complete the upload, parts are stored and incur charges. Use abort multipart upload lifecycle policy.
Event notification duplicates: S3 delivers at least once; duplicates can occur. Design your application to be idempotent.
Bucket policy size limit: 20 KB for bucket policies. If you need more, use multiple policies or IAM.
How to Eliminate Wrong Answers
Read the question carefully for keywords: 'frequently accessed', 'audit trail', 'temporary access', 'disaster recovery', 'cost optimization'. Map each requirement to the specific feature. For example, if the question mentions 'audit trail for encryption keys', eliminate SSE-S3 and SSE-C because they don't provide key usage logs. If it says 'minimize cost for data accessed once a year', choose Glacier Deep Archive (but check retrieval time requirements). Always consider the trade-offs.
S3 is an object storage service with 99.999999999% durability and 99.99% availability for Standard.
Maximum object size is 5 TB; single PUT limit is 5 GB; use multipart upload for larger objects.
Versioning cannot be disabled once enabled; it creates delete markers instead of permanent deletion.
Pre-signed URLs default to 3600 seconds expiration; you can set any expiration time up to 7 days (for AWS SDK v2) or 12 hours (for AWS CLI).
Lifecycle policies can transition objects to cheaper storage classes or expire them; minimum durations apply for IA and Glacier.
S3 event notifications can be sent to SNS, SQS, or Lambda; filtering by prefix and suffix is supported.
Cross-Region Replication requires versioning and is asynchronous; only new objects are replicated by default.
For high throughput, use random object key prefixes to avoid hot partitions.
SSE-KMS provides an audit trail; SSE-C requires you to manage keys; client-side encryption encrypts before upload.
S3 Transfer Acceleration uses edge locations; test with the speed comparison tool before enabling.
These come up on the exam all the time. Here's how to tell them apart.
SSE-S3
S3 manages the encryption keys.
No additional cost.
No audit trail for key usage.
Less control over key rotation.
Suitable for most applications.
SSE-KMS
AWS KMS manages the keys; you can use a customer managed key.
Additional cost for KMS API calls.
Provides CloudTrail audit trail for key usage.
You can control key rotation and access policies.
Suitable for compliance requirements.
Mistake
S3 provides strong consistency for all operations.
Correct
S3 provides read-after-write consistency for PUTS of new objects, but eventual consistency for overwrite PUTS and DELETES. This means after updating an existing object, a subsequent GET might return the old version for a short time.
Mistake
Versioning prevents object deletion entirely.
Correct
Versioning does not prevent deletion. When you delete an object without specifying a version, S3 adds a delete marker. The object is still present and recoverable. To permanently delete, you must specify the version ID.
Mistake
S3 Standard-IA is always cheaper than Standard.
Correct
Standard-IA has a lower storage price but charges a retrieval fee. For frequently accessed data, the retrieval fees can make it more expensive than Standard. Also, there is a minimum 30-day storage charge.
Mistake
S3 Transfer Acceleration always improves upload speed.
Correct
Transfer Acceleration is beneficial only for long-distance uploads (e.g., cross-continent). For short distances, it may add latency and cost. It should be tested using the speed comparison tool.
Mistake
Bucket policies and IAM policies are interchangeable.
Correct
They are different. IAM policies are attached to users/roles and define what they can do. Bucket policies are attached to the bucket and define who can access it. Both are evaluated, and the effective permission is the union of all applicable policies (with an explicit deny overriding any allow).
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
To make a bucket public, you must first disable Block Public Access settings at the account or bucket level. Then attach a bucket policy that grants public read access, e.g., `{"Effect":"Allow","Principal":"*","Action":"s3:GetObject","Resource":"arn:aws:s3:::bucket-name/*"}`. However, this is strongly discouraged for security reasons. Use CloudFront with OAC or pre-signed URLs instead.
S3 Standard is for frequently accessed data with no retrieval fees. S3 Standard-IA has lower storage cost but charges a retrieval fee per GB and requires a 30-day minimum storage duration. Use Standard-IA for data accessed less than once a month. For data with unknown access patterns, use Intelligent-Tiering.
If versioning is enabled, you can recover the object by removing the delete marker. Use the AWS Console: Show versions, delete the delete marker. Or use CLI: `aws s3api delete-object --bucket my-bucket --key myfile.txt --version-id <delete-marker-version-id>`. If versioning was not enabled, the object is permanently deleted and cannot be recovered.
With the AWS SDK (v2), you can set expiration up to 7 days (604800 seconds). With the AWS CLI, the maximum is 12 hours (43200 seconds). The default is 3600 seconds (1 hour).
You can use server-side encryption (SSE-S3, SSE-KMS, SSE-C) or client-side encryption. SSE-S3 is automatic with S3-managed keys. SSE-KMS uses AWS KMS for key management and auditing. SSE-C requires you to provide your own encryption key. Client-side encryption encrypts data before uploading; you manage the keys.
Yes. Enable Static Website Hosting on the bucket, specify index and error documents. The bucket must be publicly accessible or use CloudFront with OAC. The website endpoint is `http://bucket-name.s3-website-region.amazonaws.com`. For HTTPS, use CloudFront.
S3 Transfer Acceleration uses AWS edge locations to speed up uploads over long distances. It is beneficial when you have users or data sources far from the bucket region. You can test the speed improvement using the AWS provided speed comparison tool. It costs extra per GB transferred.
You've just covered S3 for Developers — now see how well it sticks with free DVA-C02 practice questions. Full explanations included, no account needed.
Done with this chapter?