This chapter covers Google Cloud's storage options: Cloud Storage, Persistent Disk, and Filestore. It details their use cases, performance characteristics, and how to choose between them. For the GCDL exam, storage questions appear in roughly 10-15% of the total, often as part of scenario-based questions. Understanding the differences between object, block, and file storage is critical for architecting cost-effective and performant solutions.
Jump to a section
Imagine a global network of public warehouses, each with different services. Cloud Storage is like a standard self-service warehouse where you rent shelves (buckets) and place boxes (objects) on them. You can access any box from anywhere using a unique address (URL). The warehouse is infinitely scalable — you never run out of shelves, and you only pay for the space you use. Filestore is like a dedicated storage room you can rent inside a shared office building. Multiple employees (VMs) can access the same files simultaneously, but the room has a fixed size and performance level. Persistent Disk is like a high-speed external hard drive you plug directly into your computer (VM). It's fast and reliable, but only that computer can use it unless you unplug and move it. Cloud Storage is for unstructured data accessed via HTTP, Filestore for shared file systems, and Persistent Disk for block storage attached to compute instances. Each has a specific use case, and choosing the wrong one leads to performance or cost issues.
What is Google Cloud Storage?
Google Cloud Storage is a scalable, fully managed object storage service for unstructured data. It stores data as objects in buckets, with each object having a unique URL. It is designed for high durability (99.999999999% annual durability) and global accessibility via HTTP/HTTPS. Cloud Storage is not a file system — you cannot mount it as a drive. Instead, you access objects via API calls or tools like gsutil.
How Cloud Storage Works
Data is stored as objects (files) within buckets (containers). Each object has: - Data: the file content - Metadata: key-value pairs describing the object - Object name: a unique identifier within the bucket - Generation number: a monotonically increasing integer for versioning
Buckets have: - Globally unique name: must be unique across all of Google Cloud - Location: region, dual-region, or multi-region - Storage class: Standard, Nearline, Coldline, Archive - Access control: IAM or ACLs - Lifecycle policies: rules to automatically transition or delete objects
When you upload an object, Cloud Storage calculates a CRC32C checksum and MD5 hash for integrity. Objects are replicated across multiple devices and facilities within the location. For multi-region buckets, data is replicated across at least two geographic regions separated by at least 160 km.
Key Components and Defaults
Bucket naming: 3-63 characters, lowercase letters, numbers, hyphens, underscores, and dots. Cannot start with "goog".
Object size limit: 5 TB per object
Upload size limit: 5 GB per upload (for larger files, use multipart upload or streaming)
Default storage class: Standard
Minimum storage duration: Nearline (30 days), Coldline (90 days), Archive (365 days). Early deletion incurs a fee.
Versioning: disabled by default
Encryption: server-side encryption with Google-managed keys by default; customer-managed keys (CMEK) or customer-supplied keys (CSEK) optional
Storage Classes
Standard: high-frequency access, no minimum storage duration. Ideal for frequently accessed data.
Nearline: low-frequency access (once a month or less). 30-day minimum.
Coldline: very low-frequency access (once a quarter). 90-day minimum.
Archive: rarely accessed (once a year). 365-day minimum. Lowest cost but higher retrieval fees.
How Persistent Disk Works
Persistent Disk (PD) provides block storage for Compute Engine VMs. Disks are network-attached, not physically attached, allowing live migration of VMs. Each disk is a virtual block device that can be formatted with a file system (ext4, XFS) or used raw. PD supports: - Standard PD: HDD-based, lower cost, suitable for sequential reads/writes. - Balanced PD: SSD-based, balanced performance and cost. - SSD PD: high-performance SSD for latency-sensitive workloads. - Extreme PD: highest performance with configurable IOPS.
Disks have a capacity from 10 GB to 64 TB. IOPS and throughput scale with size. For example, a 1 TB SSD PD provides 15,000 read IOPS and 30,000 write IOPS. Disks can be attached in read-write mode to one VM or in read-only mode to multiple VMs. Snapshots are incremental and can be used to create new disks.
How Filestore Works
Filestore is a managed file storage service for applications that require a shared file system. It provides NFS (Network File System) v3 and v4.1 access. Filestore offers: - Basic HDD: low-cost, 1 TB to 6.5 TB - Basic SSD: higher performance, 2.5 TB to 6.5 TB - High Scale SSD: up to 100 TB, designed for high-throughput workloads like HPC and media rendering - Enterprise: multi-region, high availability with replication across zones
Filestore instances are provisioned with a fixed capacity and performance tier. Throughput scales with capacity. For example, a 10 TB High Scale SSD instance provides 1.2 GB/s read throughput and 620 MB/s write throughput.
Interaction with Related Technologies
Cloud Storage integrates with BigQuery for querying data directly from buckets, with Cloud Functions for event-driven processing, and with Compute Engine via gcsfuse (FUSE adapter) for mounting buckets as file systems (not recommended for performance-critical workloads).
Persistent Disk is the primary boot disk for Compute Engine VMs. It can be used with Kubernetes PersistentVolumeClaims. Snapshots can be used for backup or migration.
Filestore is often used for shared home directories, content management systems, and applications requiring POSIX compatibility. It can be mounted on multiple VMs simultaneously.
Configuration and Verification Commands
Cloud Storage (gsutil)
gsutil mb -l us-central1 -c Standard gs://my-bucket
gsutil cp file.txt gs://my-bucket
gsutil ls gs://my-bucket
gsutil acl ch -u user@example.com:READ gs://my-bucket/file.txt
gsutil lifecycle set lifecycle.json gs://my-bucketPersistent Disk (gcloud)
gcloud compute disks create my-disk --size=100GB --zone=us-central1-a --type=pd-ssd
gcloud compute instances attach-disk my-instance --disk=my-disk --zone=us-central1-a
gcloud compute disks snapshot my-disk --zone=us-central1-a --snapshot-names=my-snapshotFilestore (gcloud)
gcloud filestore instances create my-filestore --zone=us-central1-c --tier=BASIC_SSD --file-share=name="myshare",capacity=2.5TB --network=name="default"Performance Considerations
Cloud Storage latency: tens of milliseconds for first byte, high throughput for large objects.
Persistent Disk latency: single-digit milliseconds for SSDs, higher for HDDs.
Filestore latency: depends on tier, with SSD tiers offering low latency.
For sustained high IOPS, use SSD PD or Extreme PD. For throughput, use Filestore High Scale SSD.
Cost Optimization
Use lifecycle policies to move older data to Nearline/Coldline/Archive.
Use object versioning carefully — it increases storage costs.
For Persistent Disk, delete unattached disks. Use snapshots for backup instead of keeping full copies.
For Filestore, choose the right tier; basic HDD is cheapest, but High Scale SSD is expensive.
Identify Data Characteristics
Determine if your data is structured or unstructured. If it's unstructured (images, videos, backups), Cloud Storage is likely the best fit. If it requires block-level access (databases, boot disks), use Persistent Disk. If multiple VMs need concurrent access to the same files with POSIX semantics, use Filestore. Also consider access frequency — frequently accessed data belongs in Standard tier, infrequent in Nearline or Coldline.
Choose Location and Redundancy
Select a region (lowest latency for local users), dual-region (higher availability), or multi-region (global access) for Cloud Storage. For Persistent Disk, choose the same zone as your VM. For Filestore, choose a zone or region (Enterprise tier). Redundancy options: multi-region for Cloud Storage replicates across regions; Filestore Enterprise replicates across zones. Persistent Disk snapshots can be copied to other regions.
Configure Access Control
Use IAM for broad permissions (e.g., roles/storage.objectViewer) and ACLs for fine-grained object-level access. For Persistent Disk, IAM controls disk creation and attachment; OS-level permissions control file access. For Filestore, access is controlled via NFS export policies and VPC firewall rules. Use private IPs for internal access; Cloud Storage can be accessed via public internet or Private Google Access.
Set Up Lifecycle Policies
Define rules to automatically transition objects to colder storage classes or delete them. For example, move objects older than 30 days to Nearline, 90 days to Coldline, 365 days to Archive, and delete after 10 years. This reduces costs without manual intervention. Lifecycle policies are bucket-level and evaluated daily at midnight Pacific Time.
Monitor and Optimize
Use Cloud Monitoring to track storage usage, request rates, and errors. For Cloud Storage, monitor object counts and bucket sizes. For Persistent Disk, monitor IOPS and throughput. For Filestore, monitor capacity and throughput. Adjust storage classes, resize disks (Persistent Disk can be resized without downtime), or add more Filestore capacity as needed.
Enterprise Scenario 1: Media Asset Management
A media company stores raw video files, transcoded proxies, and final exports. They use Cloud Storage with multi-region buckets for global access. Raw footage is uploaded to a Standard bucket, then a Cloud Function triggers transcoding. Proxies are stored in Nearline, and final exports in Archive after 90 days. Lifecycle policies automate transitions. They use gsutil for bulk uploads and Cloud CDN for fast delivery. Misconfiguration: if they set lifecycle to delete objects instead of transitioning, they lost data. They learned to test policies on a sample bucket first.
Enterprise Scenario 2: High-Performance Database
A financial services company runs Oracle databases on Compute Engine VMs. They attach SSD Persistent Disks for data and logs. Each disk is 1 TB providing 15,000 IOPS. They take daily snapshots to a regional bucket for disaster recovery. To migrate to a different zone, they create a new disk from a snapshot. Performance issue: they initially used standard PD, causing high latency. Switching to SSD PD resolved it.
Enterprise Scenario 3: Shared Home Directories
A research lab uses Filestore to provide shared home directories for hundreds of researchers. They use Basic SSD for low latency. Each user mounts the NFS share on their VM. They set quotas to prevent any single user from filling the share. They monitor capacity and increase it as needed. Problem: they initially used Cloud Storage with gcsfuse, but it was too slow for frequent small writes. Switching to Filestore solved the performance issue.
The GCDL exam tests your ability to choose the right storage service based on workload requirements. Key objective codes: 2.3 (Storage on Google Cloud). You will see scenario-based questions asking which service to use for a given use case.
Common Wrong Answers
Choosing Cloud Storage for a database: Many candidates think Cloud Storage can replace a database because it stores objects. However, databases need block storage (Persistent Disk) for low latency and ACID transactions. Cloud Storage is not a file system and has higher latency.
Using Persistent Disk for shared file storage: Persistent Disk can be attached to only one VM in read-write mode. Candidates might think you can attach it to multiple VMs simultaneously — you can't. For shared access, use Filestore.
Selecting Filestore for archival data: Filestore is expensive per GB compared to Cloud Storage Archive class. Candidates might choose Filestore for simplicity, but cost would be prohibitive. Use Cloud Storage for archival.
Misunderstanding storage class minimums: Candidates forget that Nearline has a 30-day minimum, Coldline 90 days, Archive 365 days. Deleting early incurs a fee. The exam may ask which storage class is cheapest for data accessed once a year — Archive, but only if you keep it for at least a year.
Specific Numbers to Know
Cloud Storage durability: 99.999999999% (11 9's)
Object size limit: 5 TB
Maximum upload size per request: 5 GB (use multipart for larger)
Persistent Disk max size: 64 TB
Filestore Basic HDD max: 6.5 TB, Basic SSD: 6.5 TB, High Scale SSD: 100 TB
Nearline minimum: 30 days, Coldline: 90 days, Archive: 365 days
Multi-region: at least 2 regions separated by 160 km
Edge Cases
Dual-region buckets: they replicate across two regions within a continent. Not the same as multi-region (which spans continents).
Object holds: you can place a retention policy on a bucket to prevent deletion until a specific time. This overrides lifecycle policies.
Requester Pays: you can configure a bucket so the requester pays for egress and operations, not the bucket owner.
How to Eliminate Wrong Answers
If the question mentions "shared access across multiple VMs", eliminate Cloud Storage and Persistent Disk. If it mentions "lowest cost for long-term archival", eliminate Standard, Nearline, and Filestore. If it mentions "block storage for a database", eliminate Cloud Storage and Filestore. Use the mechanism: block storage is for single-VM low-latency access; file storage is for shared POSIX access; object storage is for HTTP-accessible unstructured data.
Cloud Storage is for unstructured data; Persistent Disk is for block storage; Filestore is for shared file systems.
Cloud Storage offers 11 9's durability and four storage classes: Standard, Nearline, Coldline, Archive.
Nearline has a 30-day minimum, Coldline 90 days, Archive 365 days — early deletion incurs a fee.
Persistent Disk max size is 64 TB; can only be attached in read-write to one VM.
Filestore offers Basic HDD, Basic SSD, High Scale SSD, and Enterprise tiers.
Use lifecycle policies to automatically transition objects to colder storage or delete them.
For databases, use Persistent Disk; for shared home directories, use Filestore; for backups, use Cloud Storage Archive.
These come up on the exam all the time. Here's how to tell them apart.
Cloud Storage
Object storage (unstructured data)
Access via HTTP/HTTPS API
Global scalability, unlimited objects
Lower cost per GB for infrequent access
Durability: 99.999999999%
Persistent Disk
Block storage (raw disk)
Attached to a single VM (read-write)
Max 64 TB per disk
Higher performance for databases
Snapshots for backup
Cloud Storage
Object storage
No POSIX compliance
Access via HTTP
Auto-scale, pay per GB stored
Lifecycle management
Filestore
File storage (NFS)
POSIX-compliant shared file system
Access via NFS v3/v4.1
Fixed capacity, provisioned throughput
Shared access across multiple VMs
Mistake
Cloud Storage can be used as a file system mounted on multiple VMs.
Correct
Cloud Storage is object storage, not a file system. While gcsfuse can mount it as a FUSE filesystem, it is not POSIX-compliant and has higher latency. For shared file systems, use Filestore.
Mistake
Persistent Disk can be attached to multiple VMs in read-write mode.
Correct
Persistent Disk can be attached in read-write mode to only one VM. Attaching to multiple VMs in read-only mode is supported. For shared read-write access, use Filestore.
Mistake
Filestore is cheaper than Cloud Storage for archival data.
Correct
Filestore is more expensive per GB than Cloud Storage Archive class. Archive class costs about $0.0012/GB/month, while Filestore Basic HDD starts at $0.06/GB/month. Filestore is designed for active workloads.
Mistake
All storage classes have no minimum storage duration.
Correct
Standard has no minimum, but Nearline (30 days), Coldline (90 days), and Archive (365 days) have minimums. Early deletion incurs a fee equal to the cost of the remaining days.
Mistake
Cloud Storage buckets are region-specific and cannot be accessed globally.
Correct
Cloud Storage buckets can be accessed from anywhere via the global endpoint (storage.googleapis.com). However, for best performance, use a bucket in the same region as your clients. Multi-region buckets provide lower latency globally.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Cloud Storage is object storage for unstructured data accessed via HTTP. Persistent Disk is block storage attached to a VM as a virtual disk. Use Cloud Storage for backups, images, and videos. Use Persistent Disk for databases and boot disks. Persistent Disk offers lower latency but is limited to a single VM in read-write mode.
Yes, using gcsfuse (a FUSE adapter). However, it is not POSIX-compliant and has higher latency than Filestore. It is suitable for read-heavy workloads with large files, not for databases or applications requiring low-latency file operations.
Use Archive class. It is the cheapest at ~$0.0012/GB/month, but has a 365-day minimum. If you delete before 365 days, you pay an early deletion fee. For data accessed monthly, use Nearline; quarterly, use Coldline.
Persistent Disk itself is highly available within a zone. For cross-zone HA, use regional Persistent Disk (available in beta) or take snapshots and create disks in other zones. Alternatively, use a replicated database solution like Cloud SQL.
5 TB per object. Uploads of individual objects up to 5 GB can be done in a single request. For larger objects, use multipart upload or streaming.
Yes, Filestore can be provisioned as a PersistentVolume using the CSI driver. This allows pods to mount the same NFS share. It is commonly used for shared stateful workloads.
Use lifecycle policies to transition older data to Nearline, Coldline, or Archive. Delete unnecessary objects. Use object versioning only if needed. Consider using Requester Pays for shared datasets. Compress objects before uploading.
You've just covered Storage on Google Cloud — now see how well it sticks with free GCDL practice questions. Full explanations included, no account needed.
Done with this chapter?