GCDLChapter 98 of 101Objective 2.3

GCP Storage Options: Blob, Block, File, Archive

This chapter covers the four primary storage options in Google Cloud Platform (GCP): Blob (object) storage via Cloud Storage, Block storage via Persistent Disk, File storage via Filestore, and Archive storage as a tier within Cloud Storage. Understanding the differences is critical for the GCDL exam, as approximately 15-20% of questions touch on storage options, use cases, and cost implications. By the end of this chapter, you will be able to select the appropriate storage service based on performance, durability, availability, and cost requirements, and explain how each service works at a technical level.

25 min read
Intermediate
Updated May 31, 2026

Cloud Storage as a Warehouse System

Imagine a massive warehouse that stores items for a company. The warehouse has different sections: a high-speed shelf for frequently accessed small items (like Amazon CloudFront with Cloud Storage), a large open area for pallets of bulk goods (Cloud Storage buckets), a section for archival documents in deep storage (Archive storage), and a file cabinet system for shared files that need hierarchical organization (Filestore). Each item has a unique barcode (object key) and metadata. When a worker needs an item, they scan the barcode and the warehouse management system locates the item in seconds. The high-speed shelf uses a conveyor belt that brings items to the worker instantly (low latency), while the deep storage requires a request and a waiting period for retrieval (retrieval delay). The file cabinet system has folders and subfolders, and multiple workers can access files simultaneously with locks to prevent overwrites (file locking). The warehouse also has a loading dock where trucks deliver and pick up items; this is analogous to the upload/download endpoints. The warehouse manager can set lifecycle policies to automatically move items from the high-speed shelf to the deep storage after 30 days to save costs, mirroring Cloud Storage lifecycle rules.

How It Actually Works

What is Blob (Object) Storage?

Blob storage, known as Cloud Storage in GCP, is a scalable, durable, and highly available object storage service. It stores data as objects within buckets. Each object consists of data, metadata, and a unique key (object name). There is no hierarchy; objects are stored flat within a bucket. Cloud Storage offers four storage classes: Standard, Nearline, Coldline, and Archive. Standard is for frequently accessed data, Nearline (30-day min storage), Coldline (90-day min), and Archive (365-day min). Archive is the cheapest for long-term retention but has the highest retrieval costs and latency (up to 365 days for retrieval). Objects are immutable; to update, you overwrite the object. Versioning can be enabled to keep historical versions. Encryption is at rest by default using Google-managed keys, or you can use CMEK or CSEK. Access control is via IAM roles or ACLs.

How Block Storage Works

Block storage in GCP is provided by Persistent Disk (PD). It is a network-attached block device that can be attached to Compute Engine VMs. Data is divided into blocks, each with a unique address. The operating system manages the filesystem on top of the block device. PD offers three types: Standard (HDD, for sequential I/O), Balanced (SSD, for moderate performance), and Extreme (SSD, for high-performance workloads). Performance scales with disk size and provisioned IOPS. Snapshots can be taken for backup and can be used to create new disks. Regional persistent disks replicate data across two zones for higher durability. PD supports encryption at rest and in transit.

File Storage with Filestore

Filestore provides managed file storage for applications that require a shared filesystem, typically using NFS v3. It is used for lift-and-shift of legacy applications, media rendering, and data analytics. Filestore offers three service tiers: Basic HDD, Basic SSD, and High Scale SSD. Capacity ranges from 1 TB to 100 TB. It provides a fully managed NFS server with high throughput and low latency. Access controls are based on NFS export policies and IAM. Filestore can be mounted by multiple VMs simultaneously, providing a shared filesystem.

Archive Storage – The Deep Archive Tier

Archive storage is not a separate service but a storage class within Cloud Storage. It is designed for data that is accessed less than once a year. The minimum storage duration is 365 days. Retrieval costs are higher than other classes, and data retrieval can take up to 365 days (though typically faster). Archive is ideal for regulatory compliance and long-term backup. It offers the same durability (99.999999999% annual) as other classes. Objects can be transitioned to Archive via lifecycle policies.

Key Differences and Use Cases

Blob (Object): Best for static assets, backups, media files, and data lakes. Accessed via HTTP/HTTPS. No need to manage filesystem.

Block: Best for databases, high-performance computing, and operating system disks. Attached to a single VM (except shared disks).

File: Best for shared filesystems, legacy applications, and content management. Supports multiple concurrent readers/writers.

Archive: Best for long-term retention, compliance, and disaster recovery where retrieval time is not critical.

Performance and Cost Considerations

Blob (Standard): Low latency (first byte in milliseconds) and high throughput. Cost per GB per month: ~$0.020 (Standard).

Block (SSD): IOPS up to 100,000 per instance. Cost: ~$0.170/GB/month (Balanced).

File (High Scale SSD): Throughput up to 1.2 GB/s. Cost: ~$0.30/GB/month.

Archive: Cost: ~$0.0012/GB/month. Retrieval cost: $0.05/GB.

Data Durability and Availability

All GCP storage options provide 99.999999999% (11 9's) annual durability by default. Availability varies: Cloud Storage Standard offers 99.95% (multi-region) or 99.99% (dual-region). PD offers 99.99% availability within a zone. Filestore offers 99.95% availability.

Encryption and Security

All data at rest is encrypted using AES-256 by default.

Data in transit is encrypted using TLS for Cloud Storage and Filestore; for PD, encryption is at the VM level (encrypted in transit between VM and PD using internal GCP mechanisms).

Customer-managed encryption keys (CMEK) are supported for Cloud Storage, PD, and Filestore.

Access control: IAM for Cloud Storage and Filestore; for PD, access is via the VM's service account.

Lifecycle Management

Cloud Storage supports lifecycle policies that can automatically transition objects between storage classes or delete them. For example, you can set a rule to move objects to Nearline after 30 days, to Coldline after 90, and to Archive after 365, then delete after 3650 days. This is a key cost optimization feature.

Transfer Services

Storage Transfer Service: For transferring data from other clouds or on-premises to Cloud Storage.

Transfer Appliance: Physical device for large data transfers (up to 480 TB).

gsutil: Command-line tool for managing Cloud Storage.

Commands and Examples

gsutil commands: - gsutil mb gs://my-bucket – create bucket - gsutil cp file.txt gs://my-bucket – upload object - gsutil lifecycle set lifecycle.json gs://my-bucket – set lifecycle policy - gsutil ls gs://my-bucket – list objects

Persistent Disk: - gcloud compute disks create my-disk --size=100GB --type=pd-standard - gcloud compute instances attach-disk my-instance --disk=my-disk

Filestore: - gcloud filestore instances create my-filestore --tier=BASIC_HDD --file-share=name=vol1,capacity=1TB --network=name=default

Integration with Other GCP Services

Cloud Storage integrates with BigQuery for querying data directly from GCS (external tables).

Cloud Functions and Cloud Run can trigger on object changes.

Persistent Disk is used by Compute Engine, GKE, and GCE instances.

Filestore is used by Compute Engine and GKE for shared storage.

Monitoring and Logging

Cloud Monitoring provides metrics for all storage services (e.g., object count, bytes, IOPS).

Cloud Audit Logs track admin activity and data access for Cloud Storage and Filestore.

For PD, monitoring is via VM metrics.

Best Practices

Use multi-region buckets for global access and high availability.

Use lifecycle policies to reduce costs.

For databases, use Persistent Disk SSD with appropriate IOPS provisioning.

For shared filesystems, use Filestore with the correct tier based on throughput needs.

Always enable versioning for critical data.

Use CMEK for regulatory compliance.

Common Misconfigurations

Using Standard storage for archival data (costly).

Using PD HDD for high IOPS workloads.

Not setting lifecycle policies, leading to unnecessary costs.

Attaching PD to multiple writers without proper filesystem support (e.g., read-only).

Not planning for retrieval time with Archive storage.

Performance Tuning

For Cloud Storage, use parallel composite uploads for large files.

For PD, increase disk size to get more IOPS (HDD IOPS scales with size; SSD IOPS is based on provisioned size and type).

For Filestore, choose High Scale SSD for high throughput workloads.

Disaster Recovery

Use regional PD for zonal failures.

Use multi-region Cloud Storage buckets for regional failures.

Use backup and DR plans with snapshots and transfer services.

Pricing Models

Cloud Storage: Pay per GB per month, plus retrieval and operation costs.

Persistent Disk: Pay per GB provisioned per month, plus snapshot storage.

Filestore: Pay per GB provisioned per month.

Archive: Cheapest per GB per month, but high retrieval costs.

Summary of Key Values

Cloud Storage object size limit: 5 TB per object.

Persistent Disk max size: 64 TB (Standard), 32 TB (SSD).

Filestore capacity: 1 TB to 100 TB.

Cloud Storage bucket name: globally unique, 3-63 characters.

Minimum storage durations: Nearline 30 days, Coldline 90 days, Archive 365 days.

Walk-Through

1

Identify Storage Requirements

First, determine the type of data: is it unstructured (blob), structured database (block), or shared files (file)? Also assess performance needs: IOPS, throughput, latency. For archival, consider access frequency and retrieval time tolerance. This step ensures you select the correct service from the start, avoiding costly re-architecture.

2

Choose Storage Class or Tier

For Cloud Storage, pick the appropriate class: Standard for hot data, Nearline for data accessed monthly, Coldline for quarterly, Archive for yearly. For Persistent Disk, choose between Standard (HDD) for sequential I/O and SSD for random I/O. For Filestore, select Basic or High Scale based on throughput. Each choice directly impacts cost and performance.

3

Configure Redundancy and Location

Decide on regional, dual-region, or multi-region for Cloud Storage. For PD, choose zonal or regional. For Filestore, only zonal is available. Multi-region provides highest availability but higher cost. Regional is cheaper but vulnerable to region failures. This decision affects SLA and disaster recovery.

4

Set Up Access Controls

Use IAM roles for broad access (e.g., roles/storage.objectViewer) and ACLs for finer control on Cloud Storage. For PD, access is controlled via VM permissions. For Filestore, use NFS export policies and IAM. Ensure least privilege to minimize security risk.

5

Implement Lifecycle Policies

For Cloud Storage, create lifecycle rules to transition objects between classes or delete them automatically. For example, move to Nearline after 30 days, then to Archive after 365, then delete after 10 years. This optimizes costs without manual intervention. For PD, use snapshots and automated backup schedules.

What This Looks Like on the Job

Enterprise Scenario 1: Media Streaming Platform

A video streaming company stores raw footage, transcoded videos, and thumbnails in Cloud Storage. They use Standard class for frequently accessed content (thumbnails and popular videos), Nearline for older videos accessed monthly, and Archive for backups and compliance. They set lifecycle policies to automatically transition objects after 90 days. They use multi-region buckets for global availability and low latency. They also use Persistent Disk SSD for their video processing VMs to handle high IOPS during transcoding. The system handles petabytes of data with 99.999% durability. Misconfiguration example: if they used Archive for all content, retrieval latency would break the user experience; if they used Standard for archives, costs would be 10x higher.

Enterprise Scenario 2: Financial Services Database Backup

A bank uses Persistent Disk SSD for their Oracle databases running on Compute Engine. They take hourly snapshots and store them in a regional Cloud Storage bucket with Nearline class. After 30 days, lifecycle moves snapshots to Coldline, and after 1 year to Archive. They also use Filestore for shared configuration files across multiple application servers. They have strict compliance requiring encryption with CMEK. The system provides 99.99% availability for the database and 11 9's durability for backups. A common mistake is not setting lifecycle policies, resulting in 40% higher storage costs.

Enterprise Scenario 3: Genomics Research

A research lab generates massive genomics data files (FASTQ, BAM) that are accessed frequently during active research, then archived after project completion. They use Cloud Storage Standard for active projects, then lifecycle transitions to Archive after 6 months. They use Filestore High Scale SSD for shared analysis tools and reference genomes accessed by multiple compute nodes. They use Transfer Service to move data from on-premises. They monitor costs using Cloud Billing reports. A pitfall is assuming Archive can be accessed instantly; retrieval can take hours to days, which must be communicated to researchers.

How GCDL Actually Tests This

The GCDL exam (Objective 2.3) focuses on selecting the appropriate storage option based on use case, cost, and performance. Key areas tested: - Storage classes: Know the minimum storage durations (30, 90, 365 days) and typical use cases. The exam often asks: 'Which storage class is most cost-effective for data accessed once a year?' Answer: Archive. - Durability: All options offer 99.999999999% (11 9's). This is a common trick: candidates might think different services have different durability, but they don't. - Availability: Multi-region Cloud Storage: 99.95%; regional Cloud Storage: 99.99%; PD: 99.99% (zonal); Filestore: 99.95%. The exam may ask which has the highest availability – answer is regional Cloud Storage (99.99%) vs multi-region (99.95% – lower due to eventual consistency). - Block vs Object vs File: Typical question: 'Which storage type is best for a shared filesystem?' Answer: Filestore (file). 'Which for a database?' Answer: Persistent Disk (block). 'Which for static website assets?' Answer: Cloud Storage (object). - Lifecycle policies: Understand that they can move objects between classes but cannot change bucket location. - Encryption: Know that all data is encrypted at rest by default. CMEK is for customer control. - Common wrong answers: 1. Choosing Filestore for a database (should be block). 2. Thinking Archive storage is a separate service (it's a class within Cloud Storage). 3. Confusing durability (11 9's) with availability (99.95% etc.). 4. Assuming all storage classes have the same retrieval latency. - Exam tips: Eliminate options that don't match the access pattern. If data is accessed frequently, eliminate Archive and Coldline. If data needs to be shared across VMs, eliminate block (unless using shared PD with read-only). Always check for 'minimum storage duration' questions – they love those.

Key Takeaways

Cloud Storage offers four classes: Standard, Nearline (30-day min), Coldline (90-day min), Archive (365-day min).

All GCP storage services provide 99.999999999% (11 9's) annual durability.

Persistent Disk is block storage; attach to a single VM (read-write) or multiple VMs (read-only).

Filestore is managed NFS v3 file storage for shared access from multiple VMs.

Archive storage is a Cloud Storage class, not a separate service.

Lifecycle policies can automatically transition objects between storage classes to optimize cost.

Encryption at rest is enabled by default for all storage services; CMEK available for customer-managed keys.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Cloud Storage (Object)

Unstructured data (images, videos, backups)

Accessed via HTTP/HTTPS

No filesystem management

11 9's durability, 99.95% availability (multi-region)

Pay per GB stored + operations

Persistent Disk (Block)

Structured data (databases, OS disks)

Attached as a block device to VMs

Requires filesystem (ext4, NTFS)

11 9's durability, 99.99% availability (zonal)

Pay per GB provisioned

Filestore (File)

Shared filesystem for multiple VMs

NFS v3 protocol

Hierarchical directory structure

Throughput up to 1.2 GB/s (High Scale)

Pay per GB provisioned

Cloud Storage (Object)

Object storage for individual files

RESTful API access

Flat namespace (no folders, but prefix simulation)

Scalable to exabytes

Pay per GB stored + operations

Watch Out for These

Mistake

Cloud Storage is only for static files.

Correct

Cloud Storage can serve dynamic content via Cloud Functions triggers, host static websites, and serve as a data lake for analytics. It supports object versioning and lifecycle management.

Mistake

Persistent Disk is the same as local SSD.

Correct

Persistent Disk is network-attached block storage, while local SSD is physically attached to the host. Local SSD provides higher IOPS but data is lost on VM stop/delete; PD persists data.

Mistake

Filestore supports any protocol like SMB or iSCSI.

Correct

Filestore only supports NFS v3. It does not support SMB/CIFS or iSCSI. For SMB, you need a third-party solution or use Cloud Storage with a file gateway.

Mistake

Archive storage is a separate service with different durability.

Correct

Archive is a storage class within Cloud Storage with the same 11 9's durability as other classes. It is not a separate service.

Mistake

You can attach a Persistent Disk to multiple VMs in read-write mode.

Correct

Persistent Disk can be attached to multiple VMs only in read-only mode. For shared read-write, you need a shared filesystem like Filestore.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between Cloud Storage and Persistent Disk?

Cloud Storage is object storage for unstructured data accessed via HTTP. Persistent Disk is block storage attached to VMs for databases and OS disks. Cloud Storage is accessed globally, while PD is zonal. Cloud Storage pay per GB stored; PD pay per GB provisioned.

Can I use Cloud Storage as a filesystem?

Not natively. You can use tools like gcsfuse to mount a Cloud Storage bucket as a filesystem, but it's not a true POSIX filesystem. For a shared filesystem, use Filestore.

What is the minimum storage duration for Archive class?

365 days. If you delete an object before 365 days, you are billed for the remaining days. This is a common exam point.

Which storage option is best for a MySQL database?

Persistent Disk (block storage) with SSD. MySQL requires block-level access for performance and consistency. Cloud Storage is not suitable for databases.

How do I transfer large amounts of data to Cloud Storage?

For online transfers, use Storage Transfer Service or gsutil. For very large datasets (over 10 TB), use Transfer Appliance, a physical device shipped to Google.

Does Filestore support SMB protocol?

No, Filestore only supports NFS v3. For SMB/CIFS, you need a third-party solution or use Cloud Storage with a file gateway like Cloud Volumes ONTAP.

What is the maximum object size in Cloud Storage?

5 TB per object. Larger objects require composite objects or segmenting.

Terms Worth Knowing

Ready to put this to the test?

You've just covered GCP Storage Options: Blob, Block, File, Archive — now see how well it sticks with free GCDL practice questions. Full explanations included, no account needed.

Done with this chapter?