This chapter covers AWS Storage Gateway, a hybrid cloud storage service that enables on-premises applications to seamlessly use AWS cloud storage. Understanding Storage Gateway is critical for the CLF-C02 exam as it falls under Domain 3: Cloud Technology Services, Objective 3.2 (Identify AWS services for storage). While this objective is broad, Storage Gateway appears in roughly 5-10% of storage-related questions. We will explore the service's architecture, its four gateway types, use cases, and how it bridges on-premises and cloud storage.
Jump to a section
Imagine you run a busy retail store in a city, but your main inventory warehouse is in a different state. You need quick access to your most popular items without waiting days for a truck delivery. AWS Storage Gateway is like having a small, automated loading dock at your store that syncs with your distant warehouse. The loading dock has a limited shelf space (the cache) where it keeps copies of your best-selling products. When a customer buys something, you first check the shelf. If it's there, you hand it over instantly. If not, the loading dock automatically requests a restock from the main warehouse, fetches it, and stores a copy on the shelf for next time. The loading dock also keeps a record of every sale and sends it back to the warehouse so the central inventory stays accurate. You don't need to know the truck routes or the warehouse layout—you just interact with the loading dock as if it were your local storage. The key mechanism: the gateway caches frequently accessed data locally, writes data to the local cache first for low latency, and asynchronously transfers that data to AWS durable storage (S3 or Glacier). This hybrid approach gives you the speed of local storage with the durability and scalability of the cloud.
What is AWS Storage Gateway?
AWS Storage Gateway is a hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage. It provides a set of four different gateway types, each designed for a specific use case: File Gateway, Volume Gateway (with stored and cached modes), Tape Gateway, and Amazon S3 File Gateway (a subset of File Gateway). The service runs as a virtual machine (VM) on your on-premises hypervisor (VMware ESXi, Microsoft Hyper-V, or Linux KVM) or as a hardware appliance. It connects to AWS over the internet or AWS Direct Connect.
The Problem It Solves
Many enterprises have on-premises applications that require low-latency access to data but also want the durability, scalability, and cost benefits of cloud storage. Moving all data to the cloud can cause latency issues for frequently accessed data. Conversely, keeping everything on-premises limits scalability and disaster recovery. Storage Gateway solves this by caching frequently accessed data locally while storing the full dataset in AWS. This provides low-latency access to hot data and durable, cost-effective storage for cold data.
How It Works
The Storage Gateway VM presents storage interfaces (NFS, SMB, iSCSI, or VTL) to your on-premises applications. Data written to these interfaces is first stored in the gateway's local cache (SSD or HDD). The gateway then asynchronously uploads this data to AWS services (Amazon S3, Amazon S3 Glacier, or Amazon EBS Snapshots). For reads, if the requested data is in the local cache, it is served immediately; if not, the gateway fetches it from AWS, stores a copy in the cache, and serves it. This caching mechanism is the core of the service.
Gateway Types
File Gateway: Provides NFS (Network File System) and SMB (Server Message Block) access to objects stored in Amazon S3. It caches frequently accessed files locally. Use cases: file shares, backup, content management.
- Volume Gateway: Presents iSCSI block storage volumes to on-premises applications. Two modes: - Cached Volumes: Primary data is stored in S3, with frequently accessed data cached locally. Provides low-latency access to hot data. - Stored Volumes: Entire dataset is stored locally and asynchronously backed up to S3 as EBS snapshots. Use cases: disaster recovery, backup.
Tape Gateway: Emulates a physical tape library (VTL) using iSCSI. Virtual tapes are stored in Amazon S3 Glacier or S3 Glacier Deep Archive. Use cases: long-term backup, archival, compliance.
Pricing Model
You pay for:
Gateway usage per hour (varies by type and size)
Storage consumed in AWS (S3, Glacier, EBS snapshots)
Data transfer out to the internet (free for uploads)
Local cache storage (you provide your own storage)
Comparison to On-Premises
Traditional on-premises storage requires upfront hardware investment, ongoing maintenance, and capacity planning. Storage Gateway eliminates these by providing a pay-as-you-go model. However, it introduces dependency on network connectivity. For latency-sensitive applications, the local cache mitigates this.
When to Use vs Alternatives
Use File Gateway when you need file-based access to S3 (e.g., lift-and-shift of on-premises file servers).
Use Volume Gateway for block-level applications like databases that need low-latency access with cloud backup.
Use Tape Gateway to replace physical tape libraries with virtual tapes in Glacier.
Use AWS DataSync for one-time large data transfers, not continuous access.
Use AWS Transfer Family for managed SFTP/FTPS to S3, but without caching.
Key Limits and Defaults
Each gateway can have up to 12 TB of local cache (SSD) per gateway.
File Gateway supports up to 10 file shares per gateway.
Volume Gateway supports up to 32 volumes per gateway.
Tape Gateway: maximum of 1,500 virtual tapes per gateway.
Minimum cache size: 150 GiB (SSD) for File Gateway.
Security
Data is encrypted in transit using SSL/TLS and at rest using AWS KMS (SSE-S3 or SSE-KMS). IAM roles control access to S3 buckets. On-premises, you manage local encryption.
Monitoring
CloudWatch metrics: CacheHitPercent, CachePercentUsed, UploadBytes, etc. You can set alarms for cache full or low hit rates.
CLI and Console
You can manage gateways via AWS Management Console, AWS CLI, or SDK. Example CLI command to create a file gateway:
aws storagegateway create-gateway --gateway-name MyFileGateway --gateway-timezone GMT --gateway-type FILE_S3 --gateway-region us-east-1CloudFormation
Storage Gateway resources can be provisioned via AWS CloudFormation, enabling infrastructure as code.
Exam Tip
The CLF-C02 exam focuses on understanding the four gateway types, their use cases, and the caching behavior. You will not be asked to configure a gateway, but you must know which type to recommend for a given scenario.
Choose a Gateway Type
First, identify the use case. For file-based access to S3 (e.g., on-premises file server migration), select File Gateway. For block-level storage with low-latency access and cloud backup, select Volume Gateway (cached or stored). For tape backup replacement, select Tape Gateway. The wrong choice leads to performance issues or unnecessary costs. For example, using Tape Gateway for active file sharing would be inefficient because tape is designed for sequential access and archival, not random reads/writes.
Deploy the Gateway VM
Download the Storage Gateway VM image from AWS and deploy it on your on-premises hypervisor (VMware, Hyper-V, or KVM). You must allocate CPU, memory, and local storage for the cache and, for stored volumes, the local data. AWS provides recommended specifications: e.g., 4 vCPU, 16 GiB RAM for File Gateway. The VM must have outbound internet access to AWS endpoints. You can also use a hardware appliance, but the exam focuses on the VM.
Activate the Gateway
After deployment, you activate the gateway by associating it with your AWS account. This is done via the AWS Management Console or CLI. You provide the gateway's IP address or activation key. The gateway registers itself and establishes a secure connection to AWS. During activation, you specify the gateway's time zone and region. The gateway will then appear in the Storage Gateway console as available.
Configure Storage and Create Shares/Volumes/Tapes
For File Gateway, you create NFS or SMB file shares backed by an S3 bucket. You specify the S3 bucket name, IAM role, and optional settings like object metadata. For Volume Gateway, you create iSCSI volumes with a size (up to 16 TB per volume) and attach them to on-premises servers. For Tape Gateway, you create virtual tape cartridges (50 GiB to 5 TiB) and a tape library. Each share/volume/tape is associated with the gateway.
Mount and Use Storage from On-Premises
On-premises applications mount the file share (NFS/SMB), connect to the iSCSI volume, or access the virtual tape library as if it were local storage. Data written is cached locally and asynchronously uploaded to AWS. For reads, the gateway checks the local cache first; if there's a cache miss, it fetches from S3. This step is transparent to the application. The cache hit ratio is critical for performance; if too low, consider increasing cache size or moving to stored volumes.
Scenario 1: On-Premises File Server Migration to Amazon S3
A media production company has 50 TB of video files on an on-premises NAS. They want to reduce storage costs and enable remote team access. They deploy a File Gateway and create NFS shares pointing to an S3 bucket. The gateway caches the most recent projects locally, so editors experience low latency. Older projects are stored only in S3 (S3 Standard-IA for cost savings). The gateway automatically syncs metadata. Costs: $0.10/GB/month for S3 Standard-IA vs. $0.25/GB/month for on-premises storage. However, if the cache is too small, editors face latency as the gateway fetches data from S3 each time. Misconfiguration: setting the cache to 100 GB for 50 TB of active data results in constant cache misses, degrading performance.
Scenario 2: Disaster Recovery for On-Premises Databases
A financial services firm runs a critical SQL Server database on-premises. They use Volume Gateway (stored volumes) to create EBS snapshots of the database volume every hour. The entire dataset is stored locally for zero latency, and snapshots are asynchronously copied to S3. In the event of a disaster, they can restore the database from the latest snapshot to an EC2 instance in AWS. Cost: paying for gateway hours ($0.125/hour) and S3 snapshot storage. If they mistakenly use cached volumes, the database might experience latency during cache misses, which is unacceptable for transactions.
Scenario 3: Tape Backup Replacement for Compliance
A healthcare provider must retain patient records for 7 years. They previously used physical LTO tapes stored offsite. They deploy Tape Gateway, which emulates a tape library. Backup software writes to virtual tapes on the gateway, which are then stored in S3 Glacier (retrieval time: minutes). This eliminates tape hardware failures and manual handling. Cost: S3 Glacier at $0.004/GB/month is cheaper than tape media and offsite storage. However, if they need frequent access to archived data, Glacier's retrieval costs can add up. Misunderstanding: thinking Tape Gateway provides instant access like S3; in reality, retrieval from Glacier can take minutes to hours.
What CLF-C02 Tests
Domain 3, Objective 3.2: "Identify AWS services for storage." Questions on Storage Gateway test your ability to match the correct gateway type to a scenario. You must know:
The four gateway types: File, Volume (cached and stored), Tape.
The underlying AWS storage service each uses: S3 for File and Volume (cached), S3 and EBS snapshots for Volume (stored), S3 Glacier for Tape.
The caching behavior: cached vs stored volumes.
Use cases: file shares, block storage, tape backup.
Common Wrong Answers and Why
Choosing "AWS Storage Gateway File Gateway" when the scenario describes block-level storage. Candidates confuse file and block protocols. The exam often describes an application that uses iSCSI (block), so the answer must be Volume Gateway.
Selecting "Tape Gateway" for active data that needs frequent access. Tape Gateway is designed for archival, not active use. Candidates see "tape" and think backup, but miss the access pattern.
Picking "Volume Gateway (cached volumes)" when the scenario requires zero latency for all data. Cached volumes only cache hot data; stored volumes keep the full dataset locally. If the scenario says "low latency for all data," stored volumes are correct.
Confusing Storage Gateway with AWS DataSync. DataSync is for one-time or scheduled transfers, not continuous access. The exam might describe a need for "ongoing access" vs "migration."
Specific Terms That Appear on the Exam
"NFS" and "SMB" for File Gateway.
"iSCSI" for Volume and Tape Gateway.
"Virtual Tape Library (VTL)" for Tape Gateway.
"Cache" and "Upload buffer."
"EBS snapshots" for stored volumes.
Tricky Distinctions
File Gateway vs. Volume Gateway: File Gateway provides file-level access (NFS/SMB) to S3; Volume Gateway provides block-level access (iSCSI) to volumes backed by S3.
Cached vs. Stored Volumes: Cached volumes store primary data in S3 and cache hot data locally; stored volumes store primary data locally and back up to S3.
Tape Gateway vs. S3 Glacier: Tape Gateway presents a VTL interface; S3 Glacier is the underlying storage class. They are not alternatives; Tape Gateway uses Glacier.
Decision Rule
For exam questions: 1. Identify the protocol: file (NFS/SMB) -> File Gateway; block (iSCSI) -> Volume Gateway; tape (VTL) -> Tape Gateway. 2. Determine access pattern: low latency for all data -> stored volumes; low latency for hot data -> cached volumes. 3. If the goal is to replace physical tape -> Tape Gateway. 4. If the goal is to provide on-premises access to S3 -> File Gateway.
AWS Storage Gateway has four types: File Gateway, Volume Gateway (cached and stored), and Tape Gateway.
File Gateway provides file-level access (NFS/SMB) to Amazon S3; it caches frequently accessed files locally.
Volume Gateway provides block-level access (iSCSI); cached volumes store primary data in S3, stored volumes store primary data locally with S3 backup.
Tape Gateway emulates a physical tape library using virtual tapes stored in Amazon S3 Glacier or Glacier Deep Archive.
Data is written to local cache first, then asynchronously uploaded to AWS; this provides low-latency writes.
Minimum cache size for File Gateway is 150 GiB (SSD).
Storage Gateway can be deployed as a VM on VMware, Hyper-V, or KVM, or as a hardware appliance.
Use File Gateway for file share migration to S3, Volume Gateway for block storage backup/disaster recovery, Tape Gateway for tape replacement.
The exam tests matching gateway type to scenario; focus on protocol (file vs block vs tape) and access pattern (cached vs stored).
These come up on the exam all the time. Here's how to tell them apart.
AWS Storage Gateway (File Gateway)
Provides NFS/SMB access to S3 objects.
Caches data locally for low latency.
Requires on-premises or EC2 gateway VM.
Supports both NFS and SMB.
Useful for hybrid cloud file shares.
Amazon EFS (Elastic File System)
Provides NFS access to a fully managed file system.
No local cache; all data in AWS.
Accessible from EC2 and on-premises via VPN or Direct Connect.
Only NFS (Linux) and EFS for Windows (Windows File Server).
Native cloud file system for AWS.
Mistake
Storage Gateway requires a dedicated AWS hardware appliance.
Correct
Storage Gateway can run as a VM on VMware ESXi, Microsoft Hyper-V, or Linux KVM, or as a hardware appliance. The VM option is more common and cost-effective. The exam focuses on the VM deployment.
Mistake
All data written to File Gateway is immediately stored in S3.
Correct
Data is written to the local cache first and then asynchronously uploaded to S3. There is a slight delay (typically seconds) before data appears in S3. This is important for consistency.
Mistake
Volume Gateway (cached volumes) provides the same performance as locally attached storage.
Correct
Cached volumes only cache frequently accessed data. If the working set exceeds the cache size, performance degrades due to cache misses. For consistent low latency, use stored volumes.
Mistake
Tape Gateway stores data directly in S3.
Correct
Tape Gateway stores virtual tapes in S3 Glacier or S3 Glacier Deep Archive, not S3 Standard. Retrieval from Glacier takes minutes to hours.
Mistake
Storage Gateway can only be used with on-premises infrastructure.
Correct
While typically used on-premises, Storage Gateway can also be deployed on an EC2 instance to provide access to S3 for applications running in AWS that require legacy protocols (e.g., NFS). This is a lesser-known use case.
Cached volumes store the primary data in Amazon S3 and cache only frequently accessed data on-premises. This provides low latency for hot data and reduces on-premises storage needs. Stored volumes store the entire dataset on-premises and asynchronously back up snapshots to S3. Stored volumes are used when you need low latency for all data and have sufficient local storage. On the exam, if a scenario requires consistent low latency for all data, choose stored volumes; if it emphasizes cost savings and only hot data needs speed, choose cached volumes.
Yes, you can deploy Storage Gateway on an Amazon EC2 instance. This is useful when you have applications in AWS that require legacy storage protocols like NFS, SMB, or iSCSI to access S3. The gateway VM runs on EC2 and connects to S3. However, the typical use case is on-premises. The exam focuses on the hybrid cloud scenario.
Data in transit is encrypted using SSL/TLS between the gateway and AWS. Data at rest in S3 is encrypted using SSE-S3 or SSE-KMS. The local cache can be encrypted using your own encryption software. The gateway also supports AWS KMS for key management. On the exam, know that encryption is enabled by default for data in transit and at rest in AWS.
When the cache is full, the gateway will evict least recently used (LRU) data to make room for new data. This can cause cache misses for evicted files, leading to higher latency as the gateway fetches data from S3. To avoid this, monitor the CachePercentUsed metric and increase cache size if it consistently exceeds 80%. The exam may test that cache eviction impacts performance.
Virtual tapes can range from 50 GiB to 5 TiB. A Tape Gateway can have up to 1,500 virtual tapes. Tapes are stored in S3 Glacier or S3 Glacier Deep Archive. The exam might ask about these limits or the storage class used.
Storage Gateway provides continuous, low-latency access to cloud storage via local caching, making it suitable for active workloads. DataSync is a data transfer service for moving large datasets to/from AWS, but it does not provide ongoing access; it's for one-time or scheduled migrations. The exam may present a scenario where you need to choose between them based on access pattern.
File Gateway uses Amazon S3 as its backing store. Objects stored in the file share become S3 objects. The gateway supports S3 storage classes like S3 Standard, S3 Intelligent-Tiering, S3 Standard-IA, and S3 One Zone-IA. It does not support S3 Glacier directly for active shares, but you can configure lifecycle policies to transition objects to Glacier after a period.
You've just covered AWS Storage Gateway — now see how well it sticks with free CLF-C02 practice questions. Full explanations included, no account needed.
Done with this chapter?