This chapter covers the four primary storage options in Google Cloud Platform (GCP): Blob (object) storage via Cloud Storage, Block storage via Persistent Disk, File storage via Filestore, and Archive storage as a tier within Cloud Storage. Understanding the differences is critical for the GCDL exam, as approximately 15-20% of questions touch on storage options, use cases, and cost implications. By the end of this chapter, you will be able to select the appropriate storage service based on performance, durability, availability, and cost requirements, and explain how each service works at a technical level.
Jump to a section
Imagine a massive warehouse that stores items for a company. The warehouse has different sections: a high-speed shelf for frequently accessed small items (like Amazon CloudFront with Cloud Storage), a large open area for pallets of bulk goods (Cloud Storage buckets), a section for archival documents in deep storage (Archive storage), and a file cabinet system for shared files that need hierarchical organization (Filestore). Each item has a unique barcode (object key) and metadata. When a worker needs an item, they scan the barcode and the warehouse management system locates the item in seconds. The high-speed shelf uses a conveyor belt that brings items to the worker instantly (low latency), while the deep storage requires a request and a waiting period for retrieval (retrieval delay). The file cabinet system has folders and subfolders, and multiple workers can access files simultaneously with locks to prevent overwrites (file locking). The warehouse also has a loading dock where trucks deliver and pick up items; this is analogous to the upload/download endpoints. The warehouse manager can set lifecycle policies to automatically move items from the high-speed shelf to the deep storage after 30 days to save costs, mirroring Cloud Storage lifecycle rules.
What is Blob (Object) Storage?
Blob storage, known as Cloud Storage in GCP, is a scalable, durable, and highly available object storage service. It stores data as objects within buckets. Each object consists of data, metadata, and a unique key (object name). There is no hierarchy; objects are stored flat within a bucket. Cloud Storage offers four storage classes: Standard, Nearline, Coldline, and Archive. Standard is for frequently accessed data, Nearline (30-day min storage), Coldline (90-day min), and Archive (365-day min). Archive is the cheapest for long-term retention but has the highest retrieval costs and latency (up to 365 days for retrieval). Objects are immutable; to update, you overwrite the object. Versioning can be enabled to keep historical versions. Encryption is at rest by default using Google-managed keys, or you can use CMEK or CSEK. Access control is via IAM roles or ACLs.
How Block Storage Works
Block storage in GCP is provided by Persistent Disk (PD). It is a network-attached block device that can be attached to Compute Engine VMs. Data is divided into blocks, each with a unique address. The operating system manages the filesystem on top of the block device. PD offers three types: Standard (HDD, for sequential I/O), Balanced (SSD, for moderate performance), and Extreme (SSD, for high-performance workloads). Performance scales with disk size and provisioned IOPS. Snapshots can be taken for backup and can be used to create new disks. Regional persistent disks replicate data across two zones for higher durability. PD supports encryption at rest and in transit.
File Storage with Filestore
Filestore provides managed file storage for applications that require a shared filesystem, typically using NFS v3. It is used for lift-and-shift of legacy applications, media rendering, and data analytics. Filestore offers three service tiers: Basic HDD, Basic SSD, and High Scale SSD. Capacity ranges from 1 TB to 100 TB. It provides a fully managed NFS server with high throughput and low latency. Access controls are based on NFS export policies and IAM. Filestore can be mounted by multiple VMs simultaneously, providing a shared filesystem.
Archive Storage – The Deep Archive Tier
Archive storage is not a separate service but a storage class within Cloud Storage. It is designed for data that is accessed less than once a year. The minimum storage duration is 365 days. Retrieval costs are higher than other classes, and data retrieval can take up to 365 days (though typically faster). Archive is ideal for regulatory compliance and long-term backup. It offers the same durability (99.999999999% annual) as other classes. Objects can be transitioned to Archive via lifecycle policies.
Key Differences and Use Cases
Blob (Object): Best for static assets, backups, media files, and data lakes. Accessed via HTTP/HTTPS. No need to manage filesystem.
Block: Best for databases, high-performance computing, and operating system disks. Attached to a single VM (except shared disks).
File: Best for shared filesystems, legacy applications, and content management. Supports multiple concurrent readers/writers.
Archive: Best for long-term retention, compliance, and disaster recovery where retrieval time is not critical.
Performance and Cost Considerations
Blob (Standard): Low latency (first byte in milliseconds) and high throughput. Cost per GB per month: ~$0.020 (Standard).
Block (SSD): IOPS up to 100,000 per instance. Cost: ~$0.170/GB/month (Balanced).
File (High Scale SSD): Throughput up to 1.2 GB/s. Cost: ~$0.30/GB/month.
Archive: Cost: ~$0.0012/GB/month. Retrieval cost: $0.05/GB.
Data Durability and Availability
All GCP storage options provide 99.999999999% (11 9's) annual durability by default. Availability varies: Cloud Storage Standard offers 99.95% (multi-region) or 99.99% (dual-region). PD offers 99.99% availability within a zone. Filestore offers 99.95% availability.
Encryption and Security
All data at rest is encrypted using AES-256 by default.
Data in transit is encrypted using TLS for Cloud Storage and Filestore; for PD, encryption is at the VM level (encrypted in transit between VM and PD using internal GCP mechanisms).
Customer-managed encryption keys (CMEK) are supported for Cloud Storage, PD, and Filestore.
Access control: IAM for Cloud Storage and Filestore; for PD, access is via the VM's service account.
Lifecycle Management
Cloud Storage supports lifecycle policies that can automatically transition objects between storage classes or delete them. For example, you can set a rule to move objects to Nearline after 30 days, to Coldline after 90, and to Archive after 365, then delete after 3650 days. This is a key cost optimization feature.
Transfer Services
Storage Transfer Service: For transferring data from other clouds or on-premises to Cloud Storage.
Transfer Appliance: Physical device for large data transfers (up to 480 TB).
gsutil: Command-line tool for managing Cloud Storage.
Commands and Examples
gsutil commands:
- gsutil mb gs://my-bucket – create bucket
- gsutil cp file.txt gs://my-bucket – upload object
- gsutil lifecycle set lifecycle.json gs://my-bucket – set lifecycle policy
- gsutil ls gs://my-bucket – list objects
Persistent Disk:
- gcloud compute disks create my-disk --size=100GB --type=pd-standard
- gcloud compute instances attach-disk my-instance --disk=my-disk
Filestore:
- gcloud filestore instances create my-filestore --tier=BASIC_HDD --file-share=name=vol1,capacity=1TB --network=name=default
Integration with Other GCP Services
Cloud Storage integrates with BigQuery for querying data directly from GCS (external tables).
Cloud Functions and Cloud Run can trigger on object changes.
Persistent Disk is used by Compute Engine, GKE, and GCE instances.
Filestore is used by Compute Engine and GKE for shared storage.
Monitoring and Logging
Cloud Monitoring provides metrics for all storage services (e.g., object count, bytes, IOPS).
Cloud Audit Logs track admin activity and data access for Cloud Storage and Filestore.
For PD, monitoring is via VM metrics.
Best Practices
Use multi-region buckets for global access and high availability.
Use lifecycle policies to reduce costs.
For databases, use Persistent Disk SSD with appropriate IOPS provisioning.
For shared filesystems, use Filestore with the correct tier based on throughput needs.
Always enable versioning for critical data.
Use CMEK for regulatory compliance.
Common Misconfigurations
Using Standard storage for archival data (costly).
Using PD HDD for high IOPS workloads.
Not setting lifecycle policies, leading to unnecessary costs.
Attaching PD to multiple writers without proper filesystem support (e.g., read-only).
Not planning for retrieval time with Archive storage.
Performance Tuning
For Cloud Storage, use parallel composite uploads for large files.
For PD, increase disk size to get more IOPS (HDD IOPS scales with size; SSD IOPS is based on provisioned size and type).
For Filestore, choose High Scale SSD for high throughput workloads.
Disaster Recovery
Use regional PD for zonal failures.
Use multi-region Cloud Storage buckets for regional failures.
Use backup and DR plans with snapshots and transfer services.
Pricing Models
Cloud Storage: Pay per GB per month, plus retrieval and operation costs.
Persistent Disk: Pay per GB provisioned per month, plus snapshot storage.
Filestore: Pay per GB provisioned per month.
Archive: Cheapest per GB per month, but high retrieval costs.
Summary of Key Values
Cloud Storage object size limit: 5 TB per object.
Persistent Disk max size: 64 TB (Standard), 32 TB (SSD).
Filestore capacity: 1 TB to 100 TB.
Cloud Storage bucket name: globally unique, 3-63 characters.
Minimum storage durations: Nearline 30 days, Coldline 90 days, Archive 365 days.
Identify Storage Requirements
First, determine the type of data: is it unstructured (blob), structured database (block), or shared files (file)? Also assess performance needs: IOPS, throughput, latency. For archival, consider access frequency and retrieval time tolerance. This step ensures you select the correct service from the start, avoiding costly re-architecture.
Choose Storage Class or Tier
For Cloud Storage, pick the appropriate class: Standard for hot data, Nearline for data accessed monthly, Coldline for quarterly, Archive for yearly. For Persistent Disk, choose between Standard (HDD) for sequential I/O and SSD for random I/O. For Filestore, select Basic or High Scale based on throughput. Each choice directly impacts cost and performance.
Configure Redundancy and Location
Decide on regional, dual-region, or multi-region for Cloud Storage. For PD, choose zonal or regional. For Filestore, only zonal is available. Multi-region provides highest availability but higher cost. Regional is cheaper but vulnerable to region failures. This decision affects SLA and disaster recovery.
Set Up Access Controls
Use IAM roles for broad access (e.g., roles/storage.objectViewer) and ACLs for finer control on Cloud Storage. For PD, access is controlled via VM permissions. For Filestore, use NFS export policies and IAM. Ensure least privilege to minimize security risk.
Implement Lifecycle Policies
For Cloud Storage, create lifecycle rules to transition objects between classes or delete them automatically. For example, move to Nearline after 30 days, then to Archive after 365, then delete after 10 years. This optimizes costs without manual intervention. For PD, use snapshots and automated backup schedules.
Enterprise Scenario 1: Media Streaming Platform
A video streaming company stores raw footage, transcoded videos, and thumbnails in Cloud Storage. They use Standard class for frequently accessed content (thumbnails and popular videos), Nearline for older videos accessed monthly, and Archive for backups and compliance. They set lifecycle policies to automatically transition objects after 90 days. They use multi-region buckets for global availability and low latency. They also use Persistent Disk SSD for their video processing VMs to handle high IOPS during transcoding. The system handles petabytes of data with 99.999% durability. Misconfiguration example: if they used Archive for all content, retrieval latency would break the user experience; if they used Standard for archives, costs would be 10x higher.
Enterprise Scenario 2: Financial Services Database Backup
A bank uses Persistent Disk SSD for their Oracle databases running on Compute Engine. They take hourly snapshots and store them in a regional Cloud Storage bucket with Nearline class. After 30 days, lifecycle moves snapshots to Coldline, and after 1 year to Archive. They also use Filestore for shared configuration files across multiple application servers. They have strict compliance requiring encryption with CMEK. The system provides 99.99% availability for the database and 11 9's durability for backups. A common mistake is not setting lifecycle policies, resulting in 40% higher storage costs.
Enterprise Scenario 3: Genomics Research
A research lab generates massive genomics data files (FASTQ, BAM) that are accessed frequently during active research, then archived after project completion. They use Cloud Storage Standard for active projects, then lifecycle transitions to Archive after 6 months. They use Filestore High Scale SSD for shared analysis tools and reference genomes accessed by multiple compute nodes. They use Transfer Service to move data from on-premises. They monitor costs using Cloud Billing reports. A pitfall is assuming Archive can be accessed instantly; retrieval can take hours to days, which must be communicated to researchers.
The GCDL exam (Objective 2.3) focuses on selecting the appropriate storage option based on use case, cost, and performance. Key areas tested: - Storage classes: Know the minimum storage durations (30, 90, 365 days) and typical use cases. The exam often asks: 'Which storage class is most cost-effective for data accessed once a year?' Answer: Archive. - Durability: All options offer 99.999999999% (11 9's). This is a common trick: candidates might think different services have different durability, but they don't. - Availability: Multi-region Cloud Storage: 99.95%; regional Cloud Storage: 99.99%; PD: 99.99% (zonal); Filestore: 99.95%. The exam may ask which has the highest availability – answer is regional Cloud Storage (99.99%) vs multi-region (99.95% – lower due to eventual consistency). - Block vs Object vs File: Typical question: 'Which storage type is best for a shared filesystem?' Answer: Filestore (file). 'Which for a database?' Answer: Persistent Disk (block). 'Which for static website assets?' Answer: Cloud Storage (object). - Lifecycle policies: Understand that they can move objects between classes but cannot change bucket location. - Encryption: Know that all data is encrypted at rest by default. CMEK is for customer control. - Common wrong answers: 1. Choosing Filestore for a database (should be block). 2. Thinking Archive storage is a separate service (it's a class within Cloud Storage). 3. Confusing durability (11 9's) with availability (99.95% etc.). 4. Assuming all storage classes have the same retrieval latency. - Exam tips: Eliminate options that don't match the access pattern. If data is accessed frequently, eliminate Archive and Coldline. If data needs to be shared across VMs, eliminate block (unless using shared PD with read-only). Always check for 'minimum storage duration' questions – they love those.
Cloud Storage offers four classes: Standard, Nearline (30-day min), Coldline (90-day min), Archive (365-day min).
All GCP storage services provide 99.999999999% (11 9's) annual durability.
Persistent Disk is block storage; attach to a single VM (read-write) or multiple VMs (read-only).
Filestore is managed NFS v3 file storage for shared access from multiple VMs.
Archive storage is a Cloud Storage class, not a separate service.
Lifecycle policies can automatically transition objects between storage classes to optimize cost.
Encryption at rest is enabled by default for all storage services; CMEK available for customer-managed keys.
These come up on the exam all the time. Here's how to tell them apart.
Cloud Storage (Object)
Unstructured data (images, videos, backups)
Accessed via HTTP/HTTPS
No filesystem management
11 9's durability, 99.95% availability (multi-region)
Pay per GB stored + operations
Persistent Disk (Block)
Structured data (databases, OS disks)
Attached as a block device to VMs
Requires filesystem (ext4, NTFS)
11 9's durability, 99.99% availability (zonal)
Pay per GB provisioned
Filestore (File)
Shared filesystem for multiple VMs
NFS v3 protocol
Hierarchical directory structure
Throughput up to 1.2 GB/s (High Scale)
Pay per GB provisioned
Cloud Storage (Object)
Object storage for individual files
RESTful API access
Flat namespace (no folders, but prefix simulation)
Scalable to exabytes
Pay per GB stored + operations
Mistake
Cloud Storage is only for static files.
Correct
Cloud Storage can serve dynamic content via Cloud Functions triggers, host static websites, and serve as a data lake for analytics. It supports object versioning and lifecycle management.
Mistake
Persistent Disk is the same as local SSD.
Correct
Persistent Disk is network-attached block storage, while local SSD is physically attached to the host. Local SSD provides higher IOPS but data is lost on VM stop/delete; PD persists data.
Mistake
Filestore supports any protocol like SMB or iSCSI.
Correct
Filestore only supports NFS v3. It does not support SMB/CIFS or iSCSI. For SMB, you need a third-party solution or use Cloud Storage with a file gateway.
Mistake
Archive storage is a separate service with different durability.
Correct
Archive is a storage class within Cloud Storage with the same 11 9's durability as other classes. It is not a separate service.
Mistake
You can attach a Persistent Disk to multiple VMs in read-write mode.
Correct
Persistent Disk can be attached to multiple VMs only in read-only mode. For shared read-write, you need a shared filesystem like Filestore.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Cloud Storage is object storage for unstructured data accessed via HTTP. Persistent Disk is block storage attached to VMs for databases and OS disks. Cloud Storage is accessed globally, while PD is zonal. Cloud Storage pay per GB stored; PD pay per GB provisioned.
Not natively. You can use tools like gcsfuse to mount a Cloud Storage bucket as a filesystem, but it's not a true POSIX filesystem. For a shared filesystem, use Filestore.
365 days. If you delete an object before 365 days, you are billed for the remaining days. This is a common exam point.
Persistent Disk (block storage) with SSD. MySQL requires block-level access for performance and consistency. Cloud Storage is not suitable for databases.
For online transfers, use Storage Transfer Service or gsutil. For very large datasets (over 10 TB), use Transfer Appliance, a physical device shipped to Google.
No, Filestore only supports NFS v3. For SMB/CIFS, you need a third-party solution or use Cloud Storage with a file gateway like Cloud Volumes ONTAP.
5 TB per object. Larger objects require composite objects or segmenting.
You've just covered GCP Storage Options: Blob, Block, File, Archive — now see how well it sticks with free GCDL practice questions. Full explanations included, no account needed.
Done with this chapter?