Google Professional Cloud Architect PCA Questions 226–300 | Page 4/7

226

MCQhard

A company has a Spanner instance for global transactions. They need to ensure reliability during a regional outage. What is the best approach?

A.Spanner is already resilient across zones; use backup/restore for regions.

B.Use multiple instances in different regions.

C.Enable multi-region configuration with read-only replicas.

D.Enable leader placement option.

AnswerC

Automatic failover and read scalability.

Why this answer

Spanner's multi-region configuration with read-only replicas provides global transactional consistency and automatic failover without manual intervention. Read-only replicas can serve reads locally and, in the event of a regional outage, one can be promoted to a read-write replica to maintain availability. This approach ensures reliability during a regional outage while preserving Spanner's strong consistency guarantees.

Exam trap

Google Cloud often tests the misconception that multiple independent instances (Option B) are needed for cross-region resilience, when in fact Spanner's multi-region configuration with read-only replicas provides built-in global high availability without application changes.

How to eliminate wrong answers

Option A is wrong because Spanner's zone-level resilience does not protect against a full regional outage; backup/restore is a disaster recovery mechanism with significant RTO/RPO, not a high-availability solution. Option B is wrong because using multiple independent Spanner instances in different regions would require application-level sharding or eventual consistency, breaking Spanner's native strong consistency and global transaction support. Option D is wrong because leader placement only controls which zone holds the leader for a given split, but does not provide read-only replicas or automatic failover across regions for regional outage protection.

Full explanation →

227

MCQmedium

A developer notices that web-server-1 is preemptible. They want to ensure their application remains available even if this instance is terminated. What should they do?

A.Modify the instance's preemptible flag to false.

B.Create a managed instance group for web-server-1 and set an autoscaler.

C.Create a load balancer pointing to web-server-1's external IP.

D.Create a snapshot schedule for web-server-1.

AnswerB

Managed instance groups automatically recreate instances, including preemptible ones, if they are terminated.

Why this answer

A managed instance group (MIG) with an autoscaler ensures that if the preemptible instance is terminated, the MIG automatically recreates it to maintain the desired number of instances. This provides resilience against preemption by restoring capacity without manual intervention. The load balancer can then distribute traffic across healthy instances in the group.

Exam trap

Google Cloud often tests the misconception that a load balancer alone provides high availability, but without a managed instance group to recreate terminated instances, the load balancer has no healthy backends to route traffic to.

How to eliminate wrong answers

Option A is wrong because modifying the preemptible flag to false would make the instance a standard (non-preemptible) instance, but this does not address availability during termination—it only prevents future preemption, and the instance could still fail for other reasons. Option C is wrong because a load balancer pointing to a single instance's external IP does not provide high availability; if the instance is terminated, the load balancer has no healthy backend and traffic is lost. Option D is wrong because a snapshot schedule only backs up persistent disks, it does not recreate the instance or maintain application availability after termination.

Full explanation →

228

Multi-Selectmedium

A company runs a stateful workload on Compute Engine with regional persistent disks (PD). They need to implement a disaster recovery (DR) plan with a Recovery Point Objective (RPO) of less than 1 hour and Recovery Time Objective (RTO) of less than 4 hours. Which THREE steps should they include in their DR plan? (Choose three.)

Select 3 answers

A.Take snapshots of the persistent disk every 30 minutes and copy them to a Cloud Storage bucket in another region

B.Create a snapshot schedule for the persistent disk every 4 hours

C.Create a custom machine image of the instance and store it in a Cloud Storage bucket in the DR region

D.Use regional persistent disks to automatically replicate data to a second zone

E.Test the failover procedure quarterly to validate RTO and RPO

AnswersA, C, E

Correct: meets RPO and protects against regional failure.

Why this answer

Option A is correct because taking snapshots every 30 minutes meets the RPO of less than 1 hour. By copying these snapshots to a Cloud Storage bucket in another region, you ensure data is available in a DR region for recovery, which is essential for cross-region disaster recovery.

Exam trap

The trap here is confusing zonal replication (regional PD) with cross-region disaster recovery; regional PDs only protect against zonal failures, not regional outages, so they cannot meet a cross-region DR requirement.

Full explanation →

229

MCQeasy

A startup wants to grant a new employee read-only access to view all Compute Engine instances in a project. What is the minimum IAM role they should assign?

A.roles/owner

B.roles/compute.viewer

C.roles/iam.securityReviewer

D.roles/compute.admin

AnswerB

Viewer role provides read-only access to compute resources.

Why this answer

roles/compute.viewer provides read-only access to Compute Engine resources. compute.admin is full access; iam.securityReviewer gives read access to a wider set of resources; owner has full control.

Full explanation →

230

MCQhard

An organization uses Cloud Functions (2nd gen) for event-driven processing. They notice that some functions fail with 'memory limit exceeded' errors during peak load. The function processes messages from Pub/Sub and writes to Firestore. What should they do to improve reliability without sacrificing throughput?

A.Increase the maximum number of concurrent function instances.

B.Increase the memory allocated to the Cloud Function.

C.Enable Pub/Sub batching to reduce the number of function invocations.

D.Split the function into multiple smaller functions, each handling a subset of the data.

AnswerB

More memory allows the function to handle larger data per invocation without hitting the limit.

Why this answer

The 'memory limit exceeded' error indicates that the function's allocated memory is insufficient for the workload during peak load. Increasing the memory allocation (Option B) directly resolves this by providing more RAM for processing larger messages or concurrent operations, without altering the invocation pattern or throughput. Cloud Functions (2nd gen) allow memory to be set up to 32 GiB, and this change does not reduce the number of events processed per second.

Exam trap

Google Cloud often tests the misconception that scaling out (more instances) solves memory issues, but the trap here is that memory limits are per-instance, so only increasing the per-instance memory allocation directly resolves the error.

How to eliminate wrong answers

Option A is wrong because increasing the maximum number of concurrent instances does not address the per-instance memory limit; it may actually worsen the problem by allowing more instances to hit the same memory ceiling simultaneously. Option C is wrong because Pub/Sub batching reduces the number of function invocations but does not increase the memory available per invocation; it could also increase latency and does not fix the root cause of memory exhaustion. Option D is wrong because splitting the function into multiple smaller functions does not increase the memory per function instance; it adds complexity and may reduce throughput due to additional overhead, without guaranteeing that each smaller function avoids memory limits.

Full explanation →

231

MCQhard

A healthcare organization is storing sensitive patient data in Cloud Storage. They need to ensure that all objects are encrypted with a key managed by their on-premises HSM. Which encryption approach should they use?

A.Use Customer-Supplied Encryption Keys (CSEK) and store the key in a Secret Manager accessible only from the on-premises HSM.

B.Use Cloud External Key Manager (EKM) with a key hosted on the on-premises HSM.

C.Use Customer-Managed Encryption Keys (CMEK) with a Cloud KMS key that is generated from the on-premises HSM.

D.Encrypt each object client-side with a key from the on-premises HSM before uploading to Cloud Storage.

AnswerB

EKM allows you to use an external key management partner, including on-premises HSMs, to wrap the Google-managed encryption key.

Why this answer

Option D is correct because Cloud External Key Manager (EKM) allows you to use an external key management system (like on-premises HSM) to wrap a Google-managed key. The data is encrypted with a Google-managed key, which is then encrypted with the external key. Option A is wrong because CSEK is deprecated and does not support on-premises key rotation.

Option B is wrong because CMEK uses Cloud KMS, not an external HSM directly. Option C is wrong because CSEK requires supplying the key with each request, not managed by an HSM.

Full explanation →

232

MCQhard

Refer to the exhibit. An architect created a VM instance using the above command. After the instance starts, the architect tries to access the nginx default page from the internet but gets a timeout. What is the most likely reason?

A.The VM is in a subnet without a Cloud NAT

B.The firewall rule allowing HTTP traffic is missing

C.The startup script failed to install nginx

D.The VM was created without an external IP address

AnswerD

The 'no-address' flag omits the external IP, making the VM unreachable from the internet.

Why this answer

The most likely reason for the timeout is that the VM was created without an external (public) IP address. Without an external IP, the VM is not directly reachable from the internet, even if nginx is running and firewall rules allow HTTP traffic. The timeout occurs because the internet has no route to the VM's private IP address.

Exam trap

Google Cloud often tests the distinction between connectivity failures caused by missing firewall rules (which produce connection refused or dropped packets) versus missing external IP addresses (which produce timeouts because the host is unreachable).

How to eliminate wrong answers

Option A is wrong because Cloud NAT is used for outbound internet access from private VMs, not for inbound access from the internet; inbound traffic requires an external IP or a load balancer. Option B is wrong because even if a firewall rule allowing HTTP traffic exists, it cannot help if the VM has no external IP to receive the traffic from the internet. Option C is wrong because a failed nginx installation would result in a connection refused error, not a timeout; a timeout indicates the packet never reached the VM at all.

Full explanation →

233

MCQeasy

A company uses Cloud Spanner for a global financial application. They experience increased latency and transaction aborts during peak hours. Which measure should they take first to improve reliability?

A.Increase the number of nodes in the Spanner instance.

B.Reduce the number of indexes on frequently updated columns.

C.Optimize transactions to reduce lock contention.

D.Use interleaved tables to co-locate related data.

AnswerC

Short, single-partition transactions reduce the chance of conflicts and aborts.

Why this answer

Option C is correct because transaction aborts and latency in Cloud Spanner are most commonly caused by lock contention during peak hours. By optimizing transactions—such as reducing their scope, using read-only transactions where possible, and avoiding hot-spot writes—you directly address the root cause of contention without incurring additional cost or schema changes. This aligns with Google's best practices for Spanner reliability.

Exam trap

Google Cloud often tests the misconception that scaling nodes (Option A) is the universal fix for performance issues, but the trap here is that Spanner's horizontal scaling does not resolve lock contention—it only increases parallelism, which can worsen contention if transactions are not optimized.

How to eliminate wrong answers

Option A is wrong because increasing nodes primarily improves throughput and storage capacity, not latency or abort rates caused by lock contention; adding nodes can even increase distributed transaction overhead. Option B is wrong because reducing indexes on frequently updated columns may reduce write amplification but does not address the immediate issue of lock contention and aborts; indexes are not the primary cause of transaction conflicts. Option D is wrong because interleaved tables co-locate parent-child rows for faster joins and lower latency, but they do not reduce lock contention; in fact, they can increase contention if the parent row becomes a hot spot.

Full explanation →

234

MCQeasy

A developer wants to monitor a custom application metric from their application running on GKE. What should they use?

A.Cloud Logging

B.Cloud Trace

C.Cloud Debugger

D.Cloud Monitoring custom metrics API

AnswerD

The custom metrics API allows ingesting and monitoring custom application metrics.

Why this answer

Cloud Monitoring custom metrics API (option D) is the correct choice because it allows a developer to push custom application-specific metrics (e.g., request latency, queue depth) from a GKE pod using the `custom.googleapis.com` metric domain. This integrates directly with Cloud Monitoring for alerting and dashboards, whereas Cloud Logging is for log data, not metrics.

Exam trap

The trap here is that candidates confuse Cloud Logging (for logs) with Cloud Monitoring (for metrics), or assume that Cloud Trace can handle custom metrics because it deals with application performance data.

How to eliminate wrong answers

Option A is wrong because Cloud Logging ingests log entries (text-based events), not numeric metric data points; it cannot be used to monitor custom application metrics like counters or gauges. Option B is wrong because Cloud Trace is a distributed tracing system for latency analysis of requests, not for publishing custom numeric metrics. Option C is wrong because Cloud Debugger is used for inspecting application state at specific code points without stopping the app, not for collecting or monitoring time-series metrics.

Full explanation →

235

Multi-Selecteasy

Which TWO of the following are benefits of using a VPC Service Controls perimeter?

Select 2 answers

A.Prevent data exfiltration from managed services like BigQuery and Cloud Storage

B.Act as a network firewall for Compute Engine instances

C.Provide encryption of data in transit between on-premises and Google Cloud

D.Replace Identity and Access Management (IAM) for service access control

E.Allow access to Google Cloud services only from within an authorized VPC network

AnswersA, E

VPC Service Controls restrict data movement outside the perimeter.

Why this answer

VPC Service Controls perimeters prevent data exfiltration by creating a security boundary around Google Cloud managed services (e.g., BigQuery, Cloud Storage). Within the perimeter, data can only be copied to other resources inside the same perimeter, blocking unauthorized transfers to external projects or the internet. This is achieved through context-aware access policies that enforce data access based on the client's network identity and project membership, not by inspecting packet contents.

Exam trap

Google Cloud often tests the misconception that VPC Service Controls are a firewall or encryption mechanism, when in fact they are a context-aware access boundary that works alongside IAM and network controls.

Full explanation →

236

Multi-Selectmedium

Which TWO options are valid ways to connect an on-premises network to a VPC in Google Cloud? (Choose two.)

Select 2 answers

A.Cloud VPN.

B.Dedicated Interconnect.

C.Cloud NAT.

D.VPC Network Peering.

E.Private Google Access.

AnswersA, B

Cloud VPN provides IPsec tunnels to on-premises.

Why this answer

Cloud VPN is a valid way to connect an on-premises network to a VPC in Google Cloud. It uses IPsec (IKEv1 or IKEv2) to create an encrypted tunnel over the public internet between your on-premises VPN gateway and a Cloud VPN gateway in your VPC. This allows secure communication between your on-premises resources and your VPC subnets, making it a standard hybrid connectivity option.

Exam trap

Google Cloud often tests the distinction between services that provide connectivity to a VPC (like VPN and Interconnect) versus services that only enable outbound internet access or internal VPC-to-VPC peering, leading candidates to mistakenly select Cloud NAT or VPC Network Peering.

Full explanation →

237

MCQmedium

A company migrated their on-premises database to Cloud SQL and now experiences high latency for read-heavy workloads. How can they optimize performance?

A.Switch to a higher machine type.

B.Enable automatic storage increase.

C.Use connection pooling.

D.Add read replicas.

AnswerD

Read replicas serve read traffic, reducing load on primary and improving read latency.

Why this answer

Adding read replicas is the correct optimization because Cloud SQL read replicas offload read traffic from the primary instance, reducing latency for read-heavy workloads. Read replicas asynchronously replicate data from the primary using MySQL or PostgreSQL native replication, allowing queries to be distributed across multiple instances. This directly addresses the high latency by scaling read capacity horizontally without impacting write performance.

Exam trap

Google Cloud often tests the misconception that vertical scaling (higher machine type) is the universal fix for performance issues, but the trap here is that read-heavy workloads require horizontal scaling via read replicas to distribute the read load, not just a more powerful single instance.

How to eliminate wrong answers

Option A is wrong because switching to a higher machine type (vertical scaling) may improve performance but does not specifically address read-heavy workloads; it increases cost without distributing the read load, and latency improvements are limited by the single instance's resources. Option B is wrong because enabling automatic storage increase only prevents storage-full errors and does not affect query latency or read throughput; it is a capacity management feature, not a performance optimization. Option C is wrong because connection pooling reduces the overhead of establishing new database connections but does not reduce latency for read-heavy workloads; it improves connection management efficiency, not query execution speed or read distribution.

Full explanation →

238

MCQhard

A global e-commerce platform uses Cloud Spanner in a multi-region configuration across us-central1 (leader) and europe-west1. The application writes all orders to a single table and reads from both regions. During a flash sale, write latency spikes, causing order failures. The team notices that the leader region's CPU utilization is at 95%, while the europe-west1 region is mostly idle. The application uses partitioned DML for batch updates. The development team proposes increasing node count. What should the architect do to reduce write latency while maintaining global read performance?

A.Implement manual sharding by splitting the large table into multiple smaller tables across instances.

B.Use interleaved tables to reduce query latency for reads.

C.Create a new node pool with a machine type that has at least 16 vCPUs to handle the write-intensive workload.

D.Change the placement configuration to use a dual-region with multiple writable leaders.

AnswerC

Correct: The current node pool's machine type may have insufficient CPU capacity for the write load; a larger machine type provides more vCPUs per node, improving write throughput.

Why this answer

Option C is correct because increasing the node count in Cloud Spanner directly increases the total processing capacity (CPU and I/O) available to handle write operations. Since the leader region (us-central1) is at 95% CPU, adding nodes distributes the write load across more split servers, reducing write latency without affecting read performance, as reads can still be served from both regions using the same multi-region configuration.

Exam trap

Google Cloud often tests the misconception that increasing node count only scales storage, but in Cloud Spanner, nodes scale both compute and storage, making it the correct response for CPU-bound write latency.

How to eliminate wrong answers

Option A is wrong because manual sharding into multiple tables across instances is not a native Cloud Spanner pattern; it would break transactional consistency and increase operational complexity without addressing the root cause of insufficient node capacity. Option B is wrong because interleaved tables optimize read performance by colocating related rows, but they do not reduce write latency or CPU pressure caused by high write throughput. Option D is wrong because changing to a dual-region with multiple writable leaders would require a different configuration (e.g., dual-region with two writable regions) and does not solve the immediate CPU bottleneck in the current leader region; it also risks increased write conflicts and latency due to cross-region replication.

Full explanation →

239

MCQmedium

Refer to the exhibit. The process-image function fails intermittently with a memory limit exceeded error. Which action will MOST effectively resolve the issue?

A.Increase the function memory to 256MB.

B.Increase the function timeout to 120 seconds.

C.Reduce the maximum concurrent executions to 5.

D.Change the trigger to Cloud Pub/Sub to reduce load.

AnswerA

More memory directly addresses the 'memory limit exceeded' error.

Why this answer

The error indicates memory limit exceeded. Increasing the memory allocation will give the function more memory to process images, and is the direct fix.

Full explanation →

240

MCQmedium

Your company is using Cloud Storage to store sensitive customer data. The security team requires that all objects be encrypted with a customer-managed encryption key (CMEK) and that the key be automatically rotated every 90 days. You need to implement this without changing the application code. You have created a Cloud KMS key ring and a key with rotation period set to 90 days. What additional configuration is required?

A.Set a bucket lifecycle rule to transition objects to a different storage class.

B.Create a custom customer-supplied encryption key (CSEK) and provide it in each request.

C.Grant the Cloud KMS CryptoKey Encrypter/Decrypter role to the Cloud Storage service account.

D.Set the default encryption key of the Cloud Storage bucket to the Cloud KMS key.

AnswerD

Setting the default encryption key on the bucket ensures all new objects are automatically encrypted with the CMEK without code changes. Cloud KMS handles automatic rotation.

Why this answer

Option D is correct because setting the default encryption key of the Cloud Storage bucket to the Cloud KMS key ensures that all objects written to the bucket are automatically encrypted with that CMEK, without requiring any application code changes. The Cloud KMS key's rotation period of 90 days is already configured, so the key will be rotated automatically, meeting the security team's requirement.

Exam trap

The trap here is that candidates may think granting the Cloud KMS role to the Cloud Storage service account (Option C) is sufficient, but they overlook the critical step of actually setting the key as the default encryption key on the bucket to enforce automatic encryption.

How to eliminate wrong answers

Option A is wrong because bucket lifecycle rules manage object transitions between storage classes or deletion, not encryption key configuration or rotation. Option B is wrong because CSEK requires providing the key in each request, which would necessitate changing application code, and CSEK keys cannot be automatically rotated by Cloud KMS. Option C is wrong because granting the Cloud KMS CryptoKey Encrypter/Decrypter role to the Cloud Storage service account is necessary for the service account to use the key, but it is not the additional configuration required to enforce encryption on the bucket; the key must also be set as the default encryption key on the bucket.

Full explanation →

241

MCQhard

The exhibit shows a command to create a Compute Engine instance. The instance is intended to run a web server that needs to access Cloud Storage buckets using its service account. However, the web server fails to read from a storage bucket. What is the most likely cause?

A.The service account is not attached to the instance

B.The tags http-server and https-server block outbound traffic

C.The boot disk type is SSD, which is not compatible with Cloud Storage

D.The service account lacks IAM permissions to read from Cloud Storage

AnswerD

Cloud Platform scope grants access, but IAM permissions on the bucket are missing.

Why this answer

Option A is correct because the scopes=cloud-platform grants all cloud API scopes, but authentication may fail if the service account does not have the proper IAM permissions (e.g., storage.objectViewer). Option B is wrong because pd-ssd is fine. Option C is wrong because the tags are for firewall rules, not storage access.

Option D is wrong because the SA is specified.

Full explanation →

242

MCQmedium

Company A runs a containerized application on Google Kubernetes Engine (GKE) with 3 node pools: one for frontend, one for backend, and one for stateful databases. The backend services experience periodic latency spikes. After investigation, they found that the spikes correlate with the node pool autoscaler scaling down nodes. The backend services are deployed as Deployments with resource requests and limits set to 100m CPU and 200Mi memory each. The node pool uses n1-standard-2 machine types. The cluster autoscaler is enabled. What should they do to prevent the latency spikes?

A.Disable cluster autoscaler for the backend node pool.

B.Use node taints and tolerations to isolate the backend services.

C.Increase the resource requests for the backend services to ensure they are scheduled on dedicated nodes.

D.Configure a PodDisruptionBudget for the backend Deployment with minAvailable set to a high value.

AnswerD

Limits the number of pods that can be disrupted during voluntary disruptions.

Why this answer

The latency spikes occur because the cluster autoscaler is terminating nodes that host backend Pods, causing those Pods to be rescheduled and disrupting traffic. A PodDisruptionBudget (PDB) with a high minAvailable value ensures that a minimum number of backend Pods remain available during voluntary disruptions like node scale-down, preventing the sudden loss of capacity that leads to latency spikes. This directly addresses the root cause without disabling autoscaling or misconfiguring scheduling.

Exam trap

The trap here is that candidates often confuse resource requests/limits or node isolation with disruption protection, failing to recognize that PodDisruptionBudgets are the specific Kubernetes mechanism to control voluntary disruptions like autoscaler-driven node termination.

How to eliminate wrong answers

Option A is wrong because disabling the cluster autoscaler for the backend node pool would prevent automatic scaling entirely, leading to either over-provisioning (waste) or under-provisioning (capacity issues), and does not solve the disruption caused by scaling events. Option B is wrong because node taints and tolerations isolate Pods to specific nodes but do not prevent the autoscaler from terminating those nodes, so latency spikes would still occur during scale-down. Option C is wrong because increasing resource requests would only affect scheduling priority and node selection, not protect Pods from being evicted when the autoscaler decides to scale down a node.

Full explanation →

243

MCQmedium

A team is designing a multi-tier web application on Compute Engine. They need to ensure that only the web tier can access the application tier over a specific port. They plan to use VPC firewall rules. Which approach minimizes the attack surface?

A.Allow ingress from the web tier's instances' service accounts to the application tier's instances

B.Allow ingress from any source to the application tier on the port

C.Allow ingress from the web tier's subnet to the application tier's instances on the port

D.Allow egress from the web tier to the application tier

AnswerA

Restricts access based on identity, minimizing attack surface.

Why this answer

Option A is correct because it uses service account-based firewall rules, which allow you to specify the source as the service account attached to the web tier's instances rather than their IP addresses or subnets. This ensures that only instances with that specific service account (i.e., the web tier) can reach the application tier on the designated port, regardless of their IP or subnet. By scoping access to a specific identity, you minimize the attack surface because no other instances, even those in the same subnet, can reach the application tier unless they also use that service account.

Exam trap

Google Cloud often tests the misconception that subnet-based rules are the most secure approach, but the trap here is that service account-based rules provide finer-grained, identity-based access control that reduces the attack surface more effectively than subnet-based rules.

How to eliminate wrong answers

Option B is wrong because allowing ingress from any source to the application tier on the port exposes the application tier to the entire internet or VPC, which dramatically increases the attack surface and defeats the purpose of restricting access. Option C is wrong because allowing ingress from the web tier's subnet permits any instance in that subnet (including compromised or unauthorized instances) to reach the application tier, which is broader than necessary and does not leverage identity-based controls. Option D is wrong because an egress rule on the web tier does not control inbound traffic to the application tier; firewall rules are stateful in GCP, but the direction of the rule must match the traffic flow (ingress to the application tier), and egress rules alone cannot restrict who can reach the application tier.

Full explanation →

244

MCQmedium

A company uses Cloud Spanner for a global financial application. They need to ensure that a regional outage does not cause data loss. The application requires strong consistency and low latency reads and writes across multiple regions. Which configuration meets the reliability requirements?

A.Use a multi-region Spanner instance with read replicas in two other regions

B.Use a single-region Spanner instance and schedule backups to Cloud Storage

C.Use a multi-region Spanner instance with a primary region and two witness regions

D.Use a single-region Spanner instance with point-in-time recovery (PITR) enabled

AnswerC

Correct: provides synchronous replication and automatic failover.

Why this answer

Option C is correct because a multi-region Spanner instance with a primary region and two witness regions uses Google's synchronous replication across three regions, ensuring strong consistency and no data loss during a regional outage. Witness regions participate in the Paxos quorum without serving read traffic, guaranteeing that writes are committed in at least two regions before acknowledgment, which meets the requirement for zero data loss and low latency reads and writes.

Exam trap

Google Cloud often tests the misconception that read replicas or backups can prevent data loss during a regional outage, but in Spanner, only synchronous replication via a multi-region instance with a quorum of regions (including witness regions) guarantees zero data loss and strong consistency across regions.

How to eliminate wrong answers

Option A is wrong because read replicas in Spanner are not a supported configuration; Spanner uses multi-region instances with regional replicas or witness regions, and read replicas would not participate in the write quorum, thus failing to prevent data loss during a regional outage. Option B is wrong because a single-region instance with backups to Cloud Storage cannot provide strong consistency and low latency across multiple regions, and backups are asynchronous, risking data loss of recent writes during an outage. Option D is wrong because point-in-time recovery (PITR) only protects against accidental data deletion or corruption within a single region, not against a regional outage, and it does not provide multi-region availability or low latency reads and writes across regions.

Full explanation →

245

MCQmedium

Why did the VM resource fail while the disk succeeded?

A.The disk and VM must be in the same zone; us-central1-a is consistent.

B.The VM definition is missing a boot disk source reference.

C.The VM's machine type is not available in us-central1-a.

D.The VM's network is misspelled as 'global/networks/default' instead of 'global/networks/default' (correct).

AnswerB

A VM instance typically requires a boot disk; the disk resource exists but VM doesn't reference it as boot disk.

Why this answer

Option B is correct because when you define a VM instance in Google Cloud, you must include a reference to a boot disk source. If the `source` field under `disks` is missing or empty, the API will reject the VM creation but may still succeed in creating the disk resource separately, since the disk creation does not depend on the VM. This explains why the disk succeeded while the VM failed.

Exam trap

Google Cloud often tests the subtle dependency that a boot disk must have an explicit `source` reference in the VM definition, and candidates mistakenly think the disk creation implies the VM will also succeed, or they confuse zone constraints with missing required fields.

How to eliminate wrong answers

Option A is wrong because the disk and VM do not need to be in the same zone for the VM to be created; the disk can be in a different zone and attached as a non-boot disk, but the boot disk must be in the same zone as the VM. However, the question states the disk succeeded, so zone consistency is not the issue. Option C is wrong because if the machine type were unavailable in us-central1-a, the API would return a specific 'machine type not found' error, but the question does not indicate that error; the failure is due to a missing boot disk source.

Option D is wrong because the network string 'global/networks/default' is correctly formatted; the option claims it is misspelled but then shows the same string, which is a typo in the option itself and not a real issue.

Full explanation →

246

MCQeasy

A company runs a batch processing job that runs daily and can handle interruptions. The job runs on a single Compute Engine instance. Which machine configuration is the most cost-effective?

A.A n2-standard-4 VM with sustained use discount

B.A standard n1-standard-4 VM

C.A preemptible n1-standard-4 VM

D.A n1-standard-4 VM with a GPU

AnswerC

Preemptible VMs are much cheaper and suitable for fault-tolerant batch jobs.

Why this answer

Option C is correct because a preemptible VM costs significantly less than a standard VM (up to 80% discount) and is ideal for batch processing jobs that can handle interruptions. The job runs daily and can tolerate being stopped, so the lower cost of a preemptible instance provides the most cost-effective solution without sacrificing functionality.

Exam trap

Google Cloud often tests the misconception that sustained use discounts are the most cost-effective option, but the trap here is that preemptible VMs provide a much deeper discount for fault-tolerant workloads, and candidates may overlook the 'can handle interruptions' requirement in the question.

How to eliminate wrong answers

Option A is wrong because a n2-standard-4 VM with sustained use discount is more expensive than a preemptible VM; sustained use discounts apply automatically for running instances over a month, but they do not match the deep discount of preemptible instances, and the n2 series is a newer, higher-performance generation that is unnecessary for a batch job that can handle interruptions. Option B is wrong because a standard n1-standard-4 VM incurs full on-demand pricing, which is not cost-effective for a fault-tolerant batch job that can use cheaper preemptible instances. Option D is wrong because adding a GPU to an n1-standard-4 VM increases cost significantly and provides no benefit for a batch processing job that does not require GPU acceleration, making it the least cost-effective option.

Full explanation →

247

Multi-Selecteasy

Which TWO of the following are valid ways to deploy a Cloud Function? (Choose two.)

Select 2 answers

A.gcloud functions deploy

B.Cloud Source Repositories

C.Cloud Run

D.Cloud Scheduler

E.Cloud Build triggers

AnswersA, E

Correct. gcloud is a primary method.

Why this answer

Option A is correct because `gcloud functions deploy` is the primary command-line interface (CLI) method to deploy a Cloud Function directly from a local source or a specified source location. Option E is correct because Cloud Build triggers can be configured to automatically deploy a Cloud Function whenever a change is pushed to a repository (e.g., Cloud Source Repositories, GitHub), enabling continuous deployment.

Exam trap

Google Cloud often tests the distinction between services that *trigger* or *store* code versus services that *deploy* code; the trap here is confusing Cloud Source Repositories (a source code host) or Cloud Scheduler (a job scheduler) with actual deployment methods, leading candidates to select them as valid deployment options.

Full explanation →

248

MCQeasy

A company has a Cloud Run service that processes images uploaded by users. The service reads the images from a Cloud Storage bucket and writes processed images to another bucket. The team recently updated the service to use a custom service account named 'image-processor-sa' with minimal permissions. After the update, the service fails with permission errors when trying to read from the source bucket. The team verified that the service account has the Storage Object Viewer role on the source bucket and Storage Object Creator role on the destination bucket. What should the architect do to resolve the issue?

A.Ensure the Cloud Run service uses the correct service account by redeploying with the --service-account flag set to 'image-processor-sa@project-id.iam.gserviceaccount.com'.

B.Grant the service account the Cloud Run Invoker role on the Cloud Run service.

C.Assign the Storage Admin role to the service account.

D.Enable the Cloud Storage API for the project.

AnswerA

Correct: This ensures the Cloud Run service uses the custom service account with appropriate permissions.

Why this answer

The error occurs because the Cloud Run service is not using the custom service account 'image-processor-sa' despite it being created and granted permissions. By default, Cloud Run uses the Compute Engine default service account unless explicitly overridden. Redeploying with the --service-account flag attaches the correct identity to the Cloud Run revision, allowing it to authenticate with Cloud Storage using the minimal permissions already assigned.

Exam trap

Google Cloud often tests the distinction between granting permissions to a service account versus actually attaching that service account to a resource; candidates mistakenly assume that creating and granting roles to a service account automatically makes it the active identity of the Cloud Run service.

How to eliminate wrong answers

Option B is wrong because the Cloud Run Invoker role grants permission to invoke the service (i.e., call its HTTP endpoint), not to read from Cloud Storage; it does not resolve the missing identity binding. Option C is wrong because assigning Storage Admin is an overly permissive solution that violates the principle of least privilege; the service account already has the necessary Object Viewer and Object Creator roles, so the issue is not about missing permissions but about the service not using the correct account. Option D is wrong because the Cloud Storage API is enabled by default when Cloud Storage is used; the error is not due to a disabled API but due to the service running under the wrong identity.

Full explanation →

249

MCQeasy

A data scientist needs read-only access to a Cloud Storage bucket containing training data. What is the least privileged IAM role to grant at the bucket level?

A.roles/storage.objectAdmin

B.roles/storage.objectCreator

C.roles/storage.admin

D.roles/storage.objectViewer

AnswerD

ObjectViewer grants read-only access to objects.

Why this answer

The roles/storage.objectViewer role grants read access to objects without listing or other permissions. Option A is too broad (project-level). Option C allows listing but is more than read-only.

Option D includes identity and access management permissions.

Full explanation →

250

MCQhard

A company is migrating a monolithic application to microservices on Google Cloud. They need to manage service-to-service authentication and authorization. Which service should they use?

A.Cloud NAT

B.Cloud Identity-Aware Proxy

C.Cloud Endpoints

D.Service Mesh (Anthos)

AnswerD

Anthos Service Mesh provides mTLS, authorization policies, and observability for microservices.

Why this answer

Service Mesh (Anthos) provides a dedicated infrastructure layer for managing service-to-service communication, including mutual TLS (mTLS) authentication, fine-grained authorization policies, and observability. It uses sidecar proxies (Envoy) to intercept traffic and enforce security policies without modifying application code, making it ideal for microservices authentication and authorization.

Exam trap

The trap here is that candidates often confuse Cloud Endpoints (API management for external clients) with the internal service-to-service security needs of microservices, or assume Cloud IAP can be extended to internal traffic, but IAP only works for user-facing HTTP(S) requests and cannot enforce policies between backend services.

How to eliminate wrong answers

Option A is wrong because Cloud NAT is a network address translation service for outbound internet access from private instances, not for service-to-service authentication or authorization. Option B is wrong because Cloud Identity-Aware Proxy (IAP) is designed for user-to-application authentication and access control at the edge, not for internal service-to-service communication within a VPC. Option C is wrong because Cloud Endpoints is an API management service that handles API keys, authentication, and quotas for external-facing APIs, but it does not provide the sidecar-based, fine-grained service-to-service authentication and authorization needed for microservices.

Full explanation →

251

MCQmedium

Your company runs a customer-facing API on Cloud Run with a concurrency setting of 80. The API calls a backend Cloud Function that performs a heavy computation (2–5 seconds). During peak hours, the API experiences increased latency and some requests time out after 60 seconds. Monitoring shows that the Cloud Run max instances is set to 100, and the Cloud Function max instances is set to 10. The timeout for Cloud Run is set to 300 seconds. The Cloud Function's timeout is set to 540 seconds. You need to reduce end-to-end latency and prevent timeouts while minimizing cost. Which action is most effective?

A.Increase Cloud Run max instances from 100 to 500

B.Increase Cloud Run request timeout from 300 to 600 seconds

C.Increase Cloud Function max instances from 10 to 100

D.Reduce Cloud Run concurrency from 80 to 10

AnswerC

Correct: removes backend capacity bottleneck.

Why this answer

Option C is correct because the bottleneck is the Cloud Function's low max instances (10), causing queuing. Increasing Cloud Function max instances allows more concurrent requests to be processed, reducing latency and timeouts. Option A is wrong because concurrency on Cloud Run is separate from backend; reducing concurrency would require more Cloud Run containers and increase cost.

Option B is wrong because increasing Cloud Run max instances alone doesn't help if Cloud Function capacity is the limit. Option D is wrong because increasing Cloud Run timeout doesn't reduce latency; it just keeps the connection alive longer.

Full explanation →

252

Multi-Selectmedium

A company is migrating a legacy monolithic application to Google Cloud. They want to adopt microservices and improve deployment frequency. Which THREE practices should they adopt? (Choose 3.)

Select 3 answers

A.Use Infrastructure as Code (IaC) with Terraform.

B.Build a single deployment pipeline for all services.

C.Implement blue/green deployments.

D.Use canary releases with traffic splitting.

E.Apply immutable infrastructure principles.

AnswersA, D, E

IaC enables rapid provisioning of environments, increasing deployment frequency.

Why this answer

Option A is correct because Infrastructure as Code (IaC) with Terraform enables declarative, version-controlled provisioning of cloud resources. This supports the microservices migration by allowing teams to spin up consistent, repeatable environments for each service, which is essential for increasing deployment frequency without manual configuration errors.

Exam trap

The trap here is that candidates often confuse 'blue/green deployments' with 'canary releases' and assume both are equally valid, but the question specifically asks for three practices that improve deployment frequency, and canary releases with traffic splitting directly enable faster, safer rollouts, whereas blue/green is a broader strategy that may not inherently increase frequency.

Full explanation →

253

MCQmedium

A company is deploying a new application on Compute Engine and wants to automate the installation of a custom agent on every newly created VM in a specific project. Which Google Cloud service should they use?

A.VM Manager (OS Config) with a guest policy to install the agent.

B.Instance templates with startup scripts.

C.Deployment Manager with a template that includes the agent installation.

D.Cloud Build triggered on new VM creation events.

AnswerA

Os Config can enforce agent installation on all VMs in a project.

Why this answer

VM Manager (OS Config) with a guest policy is the correct choice because it provides a native, agent-based configuration management service that can enforce the installation of a custom agent on all existing and newly created VMs in a project without requiring changes to instance templates or startup scripts. Guest policies are evaluated and applied at VM boot time and periodically thereafter, ensuring consistent agent deployment across the fleet.

Exam trap

The trap here is that candidates often confuse configuration management (OS Config guest policies) with provisioning-time automation (startup scripts in instance templates), assuming that startup scripts are sufficient for fleet-wide enforcement when they only apply at creation time and are not re-evaluated.

How to eliminate wrong answers

Option B is wrong because instance templates with startup scripts only apply to VMs created from that specific template; they do not automatically cover VMs created from other templates, images, or via other methods, and they do not enforce the agent on existing VMs. Option C is wrong because Deployment Manager is an infrastructure-as-code tool for deploying resources, not a configuration management service; it cannot automatically apply agent installation to VMs created outside its deployment scope. Option D is wrong because Cloud Build is a CI/CD service for building and testing artifacts, and it cannot be triggered directly by new VM creation events; there is no native event trigger for Compute Engine VM creation in Cloud Build.

Full explanation →

254

MCQhard

A company runs a critical application on Compute Engine with a stateful workload. They want to achieve 99.99% availability within a single region. Which architecture should they recommend?

A.Two instances in different zones with a zonal persistent disk each and data replication using a custom script

B.One instance in a single zone with a persistent disk snapshot every hour

C.Two instances in different zones with a regional persistent disk attached to the active instance and failover using a load balancer

D.Four instances across two zones with a regional persistent disk and active-passive failover using a health check

AnswerC

Regional disk replicates data synchronously across zones; load balancer provides automated failover.

Why this answer

Option C is correct because it uses a regional persistent disk, which synchronously replicates data across two zones within the same region, ensuring data durability and availability. The active instance in one zone attaches the disk, and a load balancer with health checks detects failures and redirects traffic to the standby instance in the other zone, enabling automatic failover to meet the 99.99% availability target.

Exam trap

Google Cloud often tests the misconception that more instances or zones automatically increase availability, but the key is the data replication mechanism—regional persistent disks provide synchronous replication, while zonal disks with custom scripts or snapshots introduce data loss or latency that fails the 99.99% SLA.

How to eliminate wrong answers

Option A is wrong because zonal persistent disks are tied to a single zone and cannot be attached to instances in another zone; a custom script for data replication introduces latency and potential data loss, failing to meet the synchronous replication needed for 99.99% availability. Option B is wrong because a single instance with hourly snapshots provides no automatic failover and can result in up to an hour of data loss, which is insufficient for 99.99% availability (which allows only ~52.56 minutes of downtime per year). Option D is wrong because four instances across two zones with a regional persistent disk is over-provisioned and unnecessarily complex; the active-passive failover using a health check can be achieved with just two instances, and adding more instances does not improve availability beyond what the regional disk and load balancer already provide.

Full explanation →

255

MCQeasy

A company is deploying a web application on Compute Engine. They want to ensure that only authenticated users can access the application. Which Google Cloud service should they use?

A.Identity-Aware Proxy

B.Cloud Load Balancing

C.Cloud CDN

D.Cloud DNS

AnswerA

IAP uses identity and context to enforce access control.

Why this answer

Identity-Aware Proxy (IAP) is the correct choice because it enforces access control at the edge of Google's network, verifying user identity and context before allowing traffic to reach the Compute Engine instance. IAP uses OAuth 2.0 and signed headers to authenticate users, ensuring only authorized requests are forwarded to the backend, without requiring any changes to the application itself.

Exam trap

The trap here is that candidates often confuse network-level services like Cloud Load Balancing or Cloud CDN with security controls, assuming they provide authentication simply because they sit in front of the application, but they lack any identity verification mechanism.

How to eliminate wrong answers

Option B (Cloud Load Balancing) is wrong because it distributes traffic across instances but does not authenticate users; it operates at Layer 4 or Layer 7 without any built-in identity verification. Option C (Cloud CDN) is wrong because it caches content at edge locations to reduce latency, but it does not enforce user authentication; it can be combined with IAP but alone provides no access control. Option D (Cloud DNS) is wrong because it translates domain names to IP addresses and has no mechanism for user authentication or authorization.

Full explanation →

256

MCQeasy

When creating a Compute Engine instance from a custom image stored in another project, which gcloud flag is required?

A.--image-project

B.--source-instance

C.--image

D.--image-family

AnswerC

Specifies the image name, which is always required.

Why this answer

Option C is correct because the `--image` flag is required when creating a Compute Engine instance from a custom image, regardless of whether the image is in the same project or a different project. This flag specifies the name of the custom image to use as the boot disk source. Without it, gcloud would default to a public image or fail to create the instance.

Exam trap

The trap here is that candidates often think `--image-project` is required when using a custom image from another project, but it is only needed for public images or when the image is in a different project — the `--image` flag itself is always required, and `--image-project` is optional if the image is in the same project.

How to eliminate wrong answers

Option A is wrong because `--image-project` is not required when using a custom image from another project; it is only needed when specifying a public image from a different project (e.g., `--image-project debian-cloud`). Option B is wrong because `--source-instance` is used to create an image from an existing instance, not to specify an image when creating a new instance. Option D is wrong because `--image-family` is used to select the latest non-deprecated image from a family (e.g., `ubuntu-2204-lts`), not to reference a specific custom image by name.

Full explanation →

257

MCQmedium

A developer ran the above command to create a health check for a backend service. Which of the following should they do to resolve the error?

A.Change the request-path to a different value.

B.Delete the existing health check and recreate it.

C.Add the --global flag to the command.

D.Use --load-balancer-type internal to create a new health check with the same name.

E.Use a different name for the health check.

AnswerE

Using a different name resolves the conflict without disruption.

Why this answer

Option E is correct because the error indicates that a health check with the same name already exists for the load balancer. In AWS, health check names must be unique within a load balancer. By using a different name, the developer can create a new health check without conflicting with the existing one.

Exam trap

Google Cloud often tests the misconception that modifying parameters like request-path or load balancer type can resolve naming conflicts, when in fact the core issue is a duplicate name that must be changed.

How to eliminate wrong answers

Option A is wrong because changing the request-path does not resolve a naming conflict; it only alters the path used for health checks. Option B is wrong because deleting and recreating the health check with the same name would still fail if the name is already in use. Option C is wrong because the --global flag is used for global accelerators, not for resolving health check naming conflicts.

Option D is wrong because --load-balancer-type internal specifies the load balancer type, not the health check name; it does not address the duplicate name error.

Full explanation →

258

MCQhard

Your company runs a data pipeline on Google Cloud using Cloud Dataflow for streaming processing from Pub/Sub to BigQuery. The pipeline writes to a BigQuery table partitioned by day. The data is used for real-time dashboards. Recently, a spike in traffic caused the Dataflow pipeline to fall behind, and the dashboard displayed stale data. You need to design the pipeline to handle traffic spikes without data loss or long delays. The pipeline must be cost-efficient and use defaults where possible. Which solution should you implement?

A.Enable autoscaling in the Dataflow pipeline and use Streaming Engine to handle larger throughput

B.Modify the pipeline to use a batch (non-streaming) approach, writing hourly batches from Pub/Sub to BigQuery

C.Create a Cloud Scheduler job that increases the number of Dataflow workers every 5 minutes based on Pub/Sub subscription backlog

D.Change the Dataflow worker machine type from n1-standard-4 to n1-highmem-8

AnswerA

Correct: autoscaling dynamically adjusts workers; Streaming Engine reduces checkpoint overhead.

Why this answer

Option A is correct because enabling autoscaling in Dataflow allows the pipeline to dynamically adjust the number of workers based on the processing backlog, while Streaming Engine offloads the shuffle and state storage to Google-managed resources, reducing the impact of traffic spikes. This combination ensures the pipeline can scale up quickly to handle increased throughput without data loss or long delays, and it remains cost-efficient by scaling down when demand decreases.

Exam trap

Google Cloud often tests the misconception that manual scaling (Option C) or static resource changes (Option D) are sufficient for handling spikes, when in fact Dataflow's built-in autoscaling and Streaming Engine are the designed, cost-efficient solutions for dynamic workloads.

How to eliminate wrong answers

Option B is wrong because switching to a batch approach introduces inherent latency (hourly batches) that would make the real-time dashboard stale, violating the requirement for minimal delays; it also does not handle spikes within the batch window. Option C is wrong because using Cloud Scheduler to manually adjust worker count every 5 minutes is reactive, not adaptive, and cannot respond quickly enough to sudden spikes; Dataflow's native autoscaling is designed to adjust more granularly and efficiently. Option D is wrong because simply changing the worker machine type to a larger instance (n1-highmem-8) does not address the need for dynamic scaling; it increases cost without guaranteeing sufficient capacity during spikes and does not leverage Dataflow's autoscaling capabilities.

Full explanation →

259

MCQhard

A financial services company runs a stateful backend service on Google Kubernetes Engine (GKE) using StatefulSets with Persistent Volumes. They observe that after a node failure, the pod is rescheduled on a different node but the Persistent Volume cannot be attached because it is still "released" and not "available". What is the most likely cause and solution?

A.The PersistentVolume has retain policy "Retain"; manually delete and recreate the volume.

B.The PersistentVolume has reclaim policy "Recycle"; it is not supported in GKE.

C.The PersistentVolumeClaim was not created with the correct storage class; recreate with reclaim policy "Delete".

D.The PersistentVolumeClaim's access mode is ReadWriteOnce, which prevents attachment to a new node; change to ReadWriteMany.

E.The PersistentVolume has reclaim policy "Retain" and the previous pod's volume attachment is not cleared; use a StatefulSet with volumeClaimTemplates and reclaim policy "Delete".

AnswerA

Retain policy leaves PV in 'Released' state; manual intervention is needed.

Why this answer

Option A is correct because when a PersistentVolume (PV) has a reclaim policy of 'Retain', after the PersistentVolumeClaim (PVC) is deleted, the PV enters a 'Released' state and is not automatically recycled for reuse. The underlying storage resource (e.g., a Compute Engine persistent disk) still exists but the PV cannot be re-attached until an administrator manually deletes the PV and recreates it, or edits the PV to remove the claim reference. This explains why the pod rescheduled on a new node cannot attach the volume.

Exam trap

Google Cloud often tests the distinction between PV reclaim policies and the 'Released' vs 'Available' states, where candidates mistakenly think the issue is with the PVC's access mode or storage class rather than the PV's manual cleanup requirement.

How to eliminate wrong answers

Option B is wrong because the 'Recycle' reclaim policy is deprecated and not supported in GKE, but the scenario describes a 'Released' state, not a 'Recycle' issue. Option C is wrong because the storage class and reclaim policy are not the cause; the problem is the PV's 'Retain' policy, not the PVC's creation parameters. Option D is wrong because ReadWriteOnce allows attachment to a single node at a time, but after a node failure the PVC is unbound and can be re-attached to a new node; the issue is the PV's state, not the access mode.

Option E is wrong because while using volumeClaimTemplates with reclaim policy 'Delete' would avoid the problem, the existing PV has 'Retain' policy, and the solution is to manually handle the released PV, not to change the StatefulSet definition.

Full explanation →

260

MCQmedium

A DevOps engineer notices that a GKE cluster has nodes that are frequently preempted. They want to reduce costs but maintain resilience. What should they do?

A.Use node auto-repair

B.Use preemptible VMs for all nodes

C.Use committed use discounts

D.Use a regional cluster with multiple zones

AnswerD

Regional clusters distribute nodes across zones, improving resilience to zone-level preemption.

Why this answer

A regional cluster with multiple zones distributes workloads across zones, so if nodes in one zone are preempted, the cluster remains resilient by using nodes in other zones. This reduces costs by allowing the use of preemptible VMs (which are cheaper) while maintaining high availability, as the cluster can tolerate zone-level failures. The key is that regional clusters provide a managed control plane and node distribution across zones, which is essential for resilience against preemption.

Exam trap

The trap here is that candidates may think preemptible VMs are inherently unreliable and thus avoid them entirely, or they may confuse node auto-repair (which fixes health issues) with resilience against preemption, missing the key benefit of a regional architecture that distributes risk across zones.

How to eliminate wrong answers

Option A is wrong because node auto-repair only fixes unhealthy nodes (e.g., those with kernel issues) but does not prevent or mitigate the impact of preemption, which is a deliberate termination by GCP. Option B is wrong because using preemptible VMs for all nodes would cause the entire cluster to fail if all nodes are preempted simultaneously, offering no resilience; preemptible VMs can be terminated at any time within 24 hours. Option C is wrong because committed use discounts require a 1- or 3-year commitment and do not address the resilience issue; they reduce costs for sustained usage but do not protect against preemption.

Full explanation →

261

MCQhard

An organization needs to connect an on-premises data center to Google Cloud using Dedicated Interconnect with a 10 Gbps link. They require high availability and want to achieve 99.99% SLA. What is the minimum number of VLAN attachments and Interconnect connections needed?

A.Two Interconnect connections, each with one VLAN attachment.

B.One Interconnect connection with two VLAN attachments.

C.Four Interconnect connections with one VLAN attachment each.

D.Two Interconnect connections, each with two VLAN attachments.

AnswerA

Two connections in different edge availability domains provide 99.99% SLA.

Why this answer

To achieve 99.99% SLA with Dedicated Interconnect, you need two separate Interconnect connections (for physical diversity) and each connection must have at least one VLAN attachment. The SLA requires redundant paths; a single Interconnect connection, even with multiple VLAN attachments, does not provide physical diversity and thus cannot meet the 99.99% uptime target.

Exam trap

The trap here is that candidates confuse VLAN attachments with physical redundancy, thinking multiple VLAN attachments on a single Interconnect connection can achieve the 99.99% SLA, but the SLA explicitly requires two separate Interconnect connections for physical diversity.

How to eliminate wrong answers

Option B is wrong because one Interconnect connection with two VLAN attachments provides only logical redundancy on the same physical link; a single fiber cut or device failure would still cause an outage, failing the 99.99% SLA requirement. Option C is wrong because four Interconnect connections with one VLAN attachment each is excessive and unnecessary; the minimum to meet the SLA is two connections, not four. Option D is wrong because two Interconnect connections each with two VLAN attachments is over-provisioned; the SLA only requires one VLAN attachment per connection, and adding extra attachments does not improve the SLA beyond what two physically diverse connections already provide.

Full explanation →

262

MCQhard

A media streaming company uses Google Cloud CDN to deliver content. They notice that users in certain regions experience high latency despite CDN caching. The content is dynamic based on user location (e.g., local news). What should they do to improve performance?

A.Deploy Cloud Run services in multiple regions and use a global external HTTPS load balancer with backend services to route requests to the nearest region

B.Use Cloud Functions with a regional HTTP trigger and Cloud CDN to cache the responses

C.Use Cloud Armor to route traffic to the nearest point of presence

D.Configure Cloud CDN to use cache keys based on user location headers

AnswerA

This reduces latency for dynamic content by serving from the nearest region.

Why this answer

Option A is correct because deploying Cloud Run services in multiple regions and using a global external HTTPS load balancer with backend services enables location-based routing via the load balancer's anycast IP and backend service configuration. The load balancer automatically routes user requests to the nearest Cloud Run backend that has capacity, reducing latency for dynamic, location-specific content that cannot be effectively cached by Cloud CDN.

Exam trap

Google Cloud often tests the misconception that Cloud CDN alone can solve latency for dynamic content, but the trap here is that dynamic, location-specific content cannot be effectively cached, so the correct solution is to deploy compute resources closer to users and use a global load balancer for intelligent routing.

How to eliminate wrong answers

Option B is wrong because Cloud Functions with a regional HTTP trigger cannot be fronted by Cloud CDN for dynamic content; Cloud CDN caches responses at edge locations, but the content is dynamic based on user location, so caching would serve stale or incorrect content to users in different regions. Option C is wrong because Cloud Armor is a web application firewall and DDoS protection service, not a traffic routing mechanism; it cannot route traffic to the nearest point of presence based on latency or geography. Option D is wrong because configuring Cloud CDN to use cache keys based on user location headers does not solve the latency issue for dynamic content; the content is already uncacheable or must be generated per region, and cache keys only affect how cached responses are served, not the origin latency.

Full explanation →

263

MCQhard

A financial services company needs a disaster recovery plan for a critical application running on GKE. The application uses Cloud SQL for MySQL. The RPO is 5 minutes and RTO is 15 minutes. Which design meets these requirements cost-effectively?

A.Use Cloud SQL cross-region replication (MySQL) with a failover replica in another region, and deploy GKE cluster in that region with the same configuration.

B.Use Cloud SQL for MySQL with multi-region database flag and route traffic to nearest region via Cloud Load Balancing.

C.Set up VPC peering between two regions and use Cloud DNS to direct traffic in failover.

D.Use Cloud SQL backups to Cloud Storage and restore in another region, with GKE cluster recreated via Deployment Manager.

AnswerA

Cross-region replication can meet 5 min RPO.

Why this answer

Cloud SQL cross-region replication provides an asynchronous replica in another region with a typical replication lag of a few seconds, meeting the 5-minute RPO. The failover replica can be promoted in minutes, and deploying a GKE cluster in the same region with identical configuration allows the application to connect to the promoted replica, achieving the 15-minute RTO. This approach is cost-effective because you only pay for the replica and the second GKE cluster when needed for failover testing or actual disaster recovery.

Exam trap

The trap here is that candidates often confuse Cloud SQL's cross-region replication with Cloud Spanner's multi-region configuration, or assume that network-level solutions like VPC peering and DNS can satisfy data replication requirements without a dedicated database replication mechanism.

How to eliminate wrong answers

Option B is wrong because Cloud SQL for MySQL does not support a 'multi-region database flag'; that concept applies to Spanner, not MySQL, and Cloud SQL does not have built-in multi-region routing via Cloud Load Balancing for MySQL instances. Option C is wrong because VPC peering and Cloud DNS alone do not provide any database replication or failover mechanism; they only handle network connectivity and DNS resolution, leaving the critical data un-replicated and unable to meet the RPO/RTO. Option D is wrong because restoring from Cloud SQL backups to Cloud Storage and recreating a GKE cluster via Deployment Manager would take significantly longer than 15 minutes due to backup download, restore time, and cluster provisioning, failing the RTO requirement.

Full explanation →

264

MCQhard

A company runs batch processing jobs on a GKE cluster using preemptible node pools. The jobs are fault-tolerant and can be interrupted. However, the cluster is experiencing high costs due to underutilized nodes. The batch jobs run for 2-3 hours each. What is the most cost-effective optimization?

A.Switch to compute-optimized (C2) machine types for faster job completion.

B.Use regional persistent disks for stateful workloads to improve performance.

C.Reduce the number of min-nodes in the node pool to zero during idle times and use cluster autoscaler.

D.Create multiple node pools with different machine types and use node auto-provisioning with preemptible nodes and custom machine types.

AnswerD

Node auto-provisioning with custom machine types ensures resources match job requirements, reducing waste.

Why this answer

Option D is the most cost-effective because it leverages node auto-provisioning with preemptible nodes and custom machine types, which dynamically creates node pools tailored to the specific resource requirements of each batch job. This eliminates waste from over-provisioned nodes while maintaining fault tolerance for interruptible workloads. Combined with preemptible instances (up to 60-80% cheaper than regular VMs), this approach minimizes cost without sacrificing job completion, as the jobs are already designed to handle interruptions.

Exam trap

Google Cloud often tests the misconception that simply reducing node count (Option C) is sufficient for cost optimization, ignoring that node auto-provisioning with custom machine types can eliminate waste from mismatched instance sizes, which is the primary driver of underutilization costs in preemptible node pools.

How to eliminate wrong answers

Option A is wrong because switching to compute-optimized (C2) machine types increases per-node cost significantly (C2 instances are premium-priced for high CPU performance) and does not address underutilization; faster job completion may reduce runtime but not overall cost if nodes remain idle. Option B is wrong because regional persistent disks are designed for stateful workloads requiring high availability and durability, but the batch jobs are fault-tolerant and stateless; adding regional PDs increases storage costs without improving compute utilization. Option C is wrong because reducing min-nodes to zero and using cluster autoscaler only scales down idle nodes, but the cluster autoscaler cannot change machine types or right-size nodes for specific jobs; it still uses the same preemptible node pool configuration, leaving potential waste from mismatched instance sizes.

Full explanation →

265

MCQhard

A company runs a stateful application on Google Kubernetes Engine (GKE) that requires persistent storage and low-latency access across multiple zones. The application needs to perform well even during zonal failures. Which storage solution should they use?

A.Zonal persistent disk with snapshots to another zone

B.Local SSDs attached to nodes

C.Cloud Filestore

D.Regional persistent disk

AnswerD

Regional PD replicates across zones and provides high availability.

Why this answer

Regional persistent disks (RPDs) synchronously replicate data across two zones in the same region, providing both the persistent storage and low-latency access required by the stateful application. This ensures that if one zone fails, the disk can be attached to a pod in the surviving zone without data loss or significant performance degradation, meeting the high-availability and multi-zone access requirements.

Exam trap

The trap here is that candidates confuse high-availability features like snapshots or local SSDs with true synchronous replication, overlooking that only regional persistent disks provide both persistence and zero-RPO failover across zones without manual restore steps.

How to eliminate wrong answers

Option A is wrong because zonal persistent disks with snapshots to another zone introduce recovery time (snapshot restore) and potential data loss (snapshot frequency), failing to provide the synchronous, low-latency multi-zone access needed during zonal failures. Option B is wrong because local SSDs are ephemeral and tied to a specific node; data is lost if the node or zone fails, and they cannot be shared across zones, violating the persistent storage requirement. Option C is wrong because Cloud Filestore is a managed NFS file storage service designed for shared file systems, not for low-latency block storage access required by stateful applications on GKE, and it introduces network latency compared to directly attached persistent disks.

Full explanation →

266

MCQhard

A company has a Cloud SQL for PostgreSQL instance with high read traffic. They want to offload read queries without modifying the application. Which strategy should they implement?

A.Increase the machine type of the primary instance

B.Implement a caching layer using Memorystore

C.Create read replicas and configure the application to use them

D.Migrate to Cloud Spanner for better scalability

AnswerC

Read replicas offload reads without app changes.

Why this answer

Creating read replicas in Cloud SQL for PostgreSQL allows you to offload read traffic from the primary instance without modifying the application, because the replicas are fully managed and can serve read queries directly. The application can be configured to point to the read replica's endpoint, which handles SELECT statements while the primary continues to handle writes. This is the most direct and cost-effective solution for high read traffic when application changes are not permitted.

Exam trap

Google Cloud often tests the misconception that caching (Memorystore) is the only way to offload reads, but the key constraint here is 'without modifying the application,' which eliminates caching because it requires application-level cache integration.

How to eliminate wrong answers

Option A is wrong because increasing the machine type of the primary instance only scales vertically, which does not offload read queries—it simply gives the same instance more resources, and the application still sends all traffic to a single endpoint. Option B is wrong because implementing a caching layer using Memorystore would require modifying the application to check the cache before querying the database, which violates the requirement of not modifying the application. Option D is wrong because migrating to Cloud Spanner is a significant architectural change that requires application modifications (e.g., using Spanner-specific client libraries and handling strong consistency differently), and it is overkill for simply offloading read traffic.

Full explanation →

267

MCQmedium

An organization wants to define an SLO for their API hosted on Cloud Endpoints. Which metric should they use as a Service Level Indicator (SLI) for availability?

A.Number of HTTP 5xx errors

B.Request latency at the 99th percentile

C.Ratio of HTTP 200 responses to total requests

D.CPU utilization of backend instances

AnswerC

This directly measures the availability of the API.

Why this answer

For an availability SLO, the SLI must measure the proportion of successful requests. In Cloud Endpoints, availability is defined as the ratio of successful (HTTP 200) responses to total requests, as this directly reflects whether the API is functioning correctly. Option C is correct because it captures the fraction of requests that completed without error, which is the standard definition of availability in service-level monitoring.

Exam trap

Google Cloud often tests the distinction between availability and performance metrics, so the trap here is that candidates confuse latency (a performance SLI) with availability, or they mistakenly think that counting only server-side errors (5xx) is sufficient for an availability SLI, ignoring that availability is a ratio of successful to total requests.

How to eliminate wrong answers

Option A is wrong because HTTP 5xx errors are only one component of unavailability; they do not account for other failure modes (e.g., timeouts, 4xx errors caused by infrastructure issues) and using just the count of 5xx errors would not produce a ratio suitable for an availability SLI. Option B is wrong because request latency at the 99th percentile measures performance, not availability; an API can be available but slow, and latency is used for a different SLO (e.g., responsiveness). Option D is wrong because CPU utilization of backend instances is an infrastructure metric that does not directly measure whether the API is serving requests successfully; high CPU may indicate performance issues but does not equate to availability failures.

Full explanation →

268

MCQeasy

A company wants to provision multiple similar environments (dev, test, prod) with consistent networking configurations. Which approach is a best practice for infrastructure as code?

A.Use Ansible playbooks to run ad-hoc commands.

B.Use a single Terraform configuration with workspaces.

C.Run separate gcloud commands for each environment.

D.Use Cloud Deployment Manager templates with environment-specific parameters.

AnswerB

Workspaces allow reusable configuration across environments.

Why this answer

Terraform workspaces allow you to manage multiple distinct environments (e.g., dev, test, prod) from a single configuration by maintaining separate state files. This ensures consistent networking configurations across environments while avoiding duplication of code, which is a core best practice for infrastructure as code.

Exam trap

Google Cloud often tests the misconception that environment-specific parameters in Deployment Manager templates are equivalent to Terraform workspaces, but the trap is that Terraform's workspace feature provides native state isolation and multi-cloud portability, whereas Deployment Manager is GCP-specific and lacks the same level of abstraction for consistent multi-environment management.

How to eliminate wrong answers

Option A is wrong because Ansible playbooks are primarily for configuration management and ad-hoc command execution, not for declaratively provisioning cloud infrastructure with state management and drift detection. Option C is wrong because running separate gcloud commands for each environment is imperative, error-prone, and lacks version control and repeatability, violating IaC principles. Option D is wrong because Cloud Deployment Manager templates with environment-specific parameters can work but are less portable and flexible than Terraform workspaces, and Terraform is the more widely adopted multi-cloud IaC tool for consistent provisioning.

Full explanation →

269

Multi-Selecteasy

Which TWO IAM predefined roles grant read-only access to Cloud Storage objects but not the ability to list buckets?

Select 2 answers

A.roles/storage.legacyBucketReader

B.roles/storage.objectViewer

C.roles/storage.objectAdmin

D.roles/storage.viewer

E.roles/storage.admin

AnswersA, B

Allows reading objects if you know the bucket, but does not grant storage.buckets.list.

Why this answer

Option A, roles/storage.legacyBucketReader, is correct because it grants read access to Cloud Storage objects (via the storage.objects.get permission) but does not include the storage.buckets.list permission, so users cannot list buckets. Option B, roles/storage.objectViewer, is correct because it provides read-only access to objects (storage.objects.get and storage.objects.list) without the ability to list buckets, as it lacks storage.buckets.list.

Exam trap

The trap here is that candidates often confuse roles/storage.viewer (which includes bucket listing) with roles/storage.objectViewer (which does not), or assume that any 'viewer' role excludes bucket listing, but the storage.viewer role actually includes storage.buckets.list, making it incorrect for this specific requirement.

Full explanation →

270

MCQhard

An organization requires that all Compute Engine instances in a project must have a specific tag for firewall rule compliance. How can they enforce this?

A.Use IAM roles to restrict instance creation

B.Use a startup script to add the tag

C.Use a mandatory tag via organization policy

D.Use Cloud Asset Inventory

AnswerC

Organization policies can enforce constraints like `compute.requireTags`.

Why this answer

Option C is correct because Organization Policies in Google Cloud can enforce constraints that require resources, including Compute Engine instances, to have specific labels or tags. The `compute.requireOsLogin` or custom constraint `compute.requireInstanceTag` can be used to mandate that all instances must have a particular tag, and any instance creation that violates this policy will be denied at the API level, ensuring compliance without relying on user behavior.

Exam trap

The trap here is that candidates often confuse IAM roles with Organization Policies, thinking that restricting creation permissions (Option A) is sufficient, but IAM cannot enforce resource-level attributes like tags, which is a common misconception in policy-based governance questions.

How to eliminate wrong answers

Option A is wrong because IAM roles control who can create instances, not what tags are applied to the instances; they cannot enforce a specific tag value. Option B is wrong because a startup script runs after the instance is created, so it cannot prevent the creation of an instance without the required tag, and the instance would already exist in violation of the firewall rule compliance. Option D is wrong because Cloud Asset Inventory is a service for discovering and monitoring cloud resources, not for enforcing policies or preventing non-compliant resource creation.

Full explanation →

271

MCQmedium

A company is deploying a microservices application on Google Kubernetes Engine (GKE). They want to optimize costs without sacrificing availability. They have varying traffic patterns. Which strategy should they recommend?

A.Use committed use discounts with a 3-year term on all nodes.

B.Use GKE Autopilot with a single node pool.

C.Use a regional cluster with node pools of different machine types.

D.Use node auto-provisioning with preemptible nodes.

AnswerD

Node auto-provisioning dynamically creates node pools and preemptible nodes lower cost.

Why this answer

Node auto-provisioning with preemptible nodes automatically creates node pools based on workload demands and uses cheaper preemptible VMs, reducing cost for variable traffic. Regional clusters focus on high availability, not cost. Committed use discounts lock in usage and are not optimal for variable traffic.

GKE Autopilot provides convenience but may not be the most cost-efficient with preemptible options.

Full explanation →

272

MCQmedium

The exhibit shows the output of a 'gcloud compute instances describe' command for an instance. What is the most likely impact on reliability if the host machine needs maintenance?

A.The instance will be terminated and then restarted, causing a brief downtime.

B.The instance will not be affected because automatic restart is enabled.

C.The instance will be backed up automatically before maintenance.

D.The instance will be live migrated to another host without interruption.

AnswerA

With TERMINATE, the instance is shut down and later restarted on a healthy host, resulting in downtime.

Why this answer

Option A is correct because when a host machine requires maintenance, Google Compute Engine instances that are not configured for live migration will be terminated and then restarted on another host. This behavior is determined by the 'onHostMaintenance' setting; if it is set to 'TERMINATE' (the default for instances with GPUs or preemptible VMs), the instance stops and restarts, causing brief downtime. The exhibit likely shows 'onHostMaintenance: TERMINATE' or the instance lacks live migration support, making termination and restart the expected outcome.

Exam trap

Google Cloud often tests the distinction between 'automatic restart' (which handles crash recovery) and 'onHostMaintenance' (which handles planned maintenance), causing candidates to mistakenly think automatic restart prevents downtime during maintenance.

How to eliminate wrong answers

Option B is wrong because 'automatic restart' is a separate setting that controls whether an instance restarts after a failure or crash, not how it behaves during host maintenance; it does not prevent downtime from maintenance events. Option C is wrong because Google Compute Engine does not automatically back up instances before host maintenance; backups must be configured separately via snapshots or images. Option D is wrong because live migration is only possible if the instance has 'onHostMaintenance' set to 'MIGRATE' and does not have GPUs, local SSDs, or preemptible status; the exhibit likely shows a configuration that disables live migration, such as a GPU attached or the setting explicitly set to 'TERMINATE'.

Full explanation →

273

MCQmedium

A company has a multi-project Google Cloud environment with strict compliance requirements. They need to ensure that all projects enforce a uniform set of constraints, such as requiring CMEK for Compute Engine disk encryption and blocking the use of public IPs on VMs. They have defined these constraints using Organization Policies at the organization level. However, the security team discovers that some projects are not enforcing the constraints because they have been overridden at the project level by the respective project owners. The security team wants a solution that prevents project-level overrides while maintaining the ability to apply exceptions at a folder level when approved. What should they do?

A.Deploy Forseti Security to automatically remediate when projects override policies.

B.Use Cloud Asset Inventory to monitor for non-compliant projects and alert the security team.

C.Manually remove the overridden policies in each project and set the constraints at the organization level again.

D.Move all projects under a common folder and set the Organization Policies at that folder level with 'enforce: true'.

AnswerD

Folder-level policies cannot be overridden by project-level policies, ensuring enforcement while allowing folder-level exceptions.

Why this answer

Option B is correct because setting the Organization Policy at the folder level (e.g., a 'compliance' folder that contains all projects) with the 'enforce: true' setting on constraints ensures that project-level overrides are not possible unless explicitly allowed by the folder policy. Option A is wrong because removing overrides manually is not scalable and does not prevent future overrides. Option C is wrong because Cloud Asset Inventory is for auditing, not enforcement.

Option D is wrong because Forseti is a security tool but does not provide the policy enforcement mechanism of Organization Policies.

Full explanation →

274

Multi-Selectmedium

Which TWO actions are required to allow a private GKE cluster to pull container images from Artifact Registry in the same project?

Select 2 answers

A.Create a firewall rule allowing outbound traffic to Artifact Registry IP ranges.

B.Set up VPC Network Peering with the Artifact Registry service.

C.Configure Cloud NAT for the GKE cluster.

D.Enable Private Google Access on the subnet where the GKE nodes are deployed.

E.Grant the Artifact Registry Reader role to the GKE service account.

AnswersD, E

Private Google Access allows nodes without external IPs to reach Google APIs.

Why this answer

Option D is correct because Private Google Access enables GKE nodes with only internal IP addresses to reach Google APIs and services, including Artifact Registry, over Google's private network rather than the public internet. Option E is correct because the GKE node's service account must have the Artifact Registry Reader role (roles/artifactregistry.reader) to authenticate and pull container images from the registry.

Exam trap

Google Cloud often tests the misconception that Cloud NAT is required for private clusters to access Google APIs, but Private Google Access is the correct mechanism for reaching Google-managed services like Artifact Registry without public IPs.

Full explanation →

275

MCQhard

A financial services company is designing a multi-region disaster recovery architecture for a critical application. The application runs on Compute Engine with a stateful backend using Cloud Spanner. The Recovery Time Objective (RTO) is 1 hour, and the Recovery Point Objective (RPO) is 15 minutes. What architecture meets these requirements cost-effectively?

A.Deploy the application in two regions with active-active traffic load balancing and Cloud Spanner multi-region configuration.

B.Deploy in one region with scheduled snapshots to Cloud Storage and use persistent disk snapshots for recovery.

C.Deploy in two regions with active-passive using Cloud Load Balancing and Cloud Spanner backup/restore.

D.Use a single region with Cloud SQL for PostgreSQL and enable cross-region replication using Cloud SQL replica.

AnswerA

Cloud Spanner multi-region provides synchronous replication with RPO < 15 min and automatic failover meets RTO.

Why this answer

Option A is correct because it uses Cloud Spanner's multi-region configuration, which provides synchronous replication across regions with automatic failover, meeting an RPO of 15 minutes (typically seconds) and an RTO of 1 hour. Active-active traffic load balancing with Compute Engine ensures that the application can immediately route traffic to the healthy region, minimizing downtime without the need for manual failover or backup/restore operations.

Exam trap

The trap here is that candidates often confuse Cloud Spanner's backup/restore (asynchronous, slow) with its multi-region configuration (synchronous, fast), or assume that active-passive with backups can meet low RTO/RPO when in reality only synchronous replication can achieve sub-minute RPO and automatic failover.

How to eliminate wrong answers

Option B is wrong because scheduled snapshots to Cloud Storage and persistent disk snapshots are asynchronous and can take longer than 15 minutes to capture, potentially exceeding the RPO; also, recovery from snapshots involves manual steps that likely exceed the 1-hour RTO. Option C is wrong because Cloud Spanner backup/restore is an asynchronous process that can take hours to restore a database, far exceeding the 1-hour RTO, and active-passive setups introduce failover delays that may not meet the RTO. Option D is wrong because Cloud SQL for PostgreSQL with cross-region replication uses asynchronous replication, which can result in data loss exceeding the 15-minute RPO, and Cloud SQL does not support the same multi-region synchronous replication capabilities as Cloud Spanner.

Full explanation →

276

MCQmedium

A company is migrating its on-premises Oracle database to Cloud SQL for PostgreSQL. The database team wants to minimize downtime during migration. Which approach should they use?

A.Set up Oracle GoldenGate to replicate to Cloud SQL.

B.Use Database Migration Service for PostgreSQL with continuous migration from Oracle via Homogeneous Migration.

C.Take a physical backup of Oracle and restore to Cloud SQL.

D.Export the database as a dump file, upload to Cloud Storage, and import into Cloud SQL.

AnswerB

DMS supports minimal downtime via continuous replication.

Why this answer

Database Migration Service (DMS) for PostgreSQL with continuous migration is the correct approach because it supports ongoing change data capture (CDC) from Oracle to Cloud SQL for PostgreSQL, enabling near-zero downtime. DMS handles schema conversion and data replication continuously, allowing the target to stay synchronized until a cutover, which minimizes downtime compared to batch methods.

Exam trap

Google Cloud often tests the misconception that any 'migration service' automatically supports heterogeneous migrations, but here the trap is that Database Migration Service for PostgreSQL is specifically designed for PostgreSQL targets and includes built-in schema conversion from Oracle, whereas options like GoldenGate or dump/restore are either too complex or cause downtime.

How to eliminate wrong answers

Option A is wrong because Oracle GoldenGate is a third-party tool that requires separate licensing, complex configuration, and is not natively integrated with Cloud SQL for PostgreSQL; it is overkill and not the recommended Google Cloud service for this migration. Option C is wrong because a physical backup of Oracle (e.g., RMAN) is platform-specific and cannot be directly restored to Cloud SQL for PostgreSQL, which uses a different database engine and storage format. Option D is wrong because exporting as a dump file and importing is a one-time, offline process that requires the source database to be quiesced or taken offline, causing significant downtime, and does not support continuous replication.

Full explanation →

277

MCQhard

A financial services company uses VPC Service Controls to protect their project containing BigQuery datasets and Cloud Storage buckets. They have a perimeter that includes the BigQuery service. Users report that they cannot export data from BigQuery to Cloud Storage using the web console. The export job fails with an access denied error. The team needs to allow exports while maintaining data exfiltration prevention. The users have the necessary IAM permissions (BigQuery Data Editor, Storage Object Admin) on the appropriate resources. What should the architect do?

A.Add Cloud Storage to the same VPC Service Controls perimeter.

B.Remove BigQuery from the VPC Service Controls perimeter.

C.Create an access level that permits exports during business hours.

D.Grant the users the Storage Object Admin role at the bucket level.

AnswerA

Correct: This allows controlled data flow between BigQuery and Cloud Storage within the perimeter.

Why this answer

Option A is correct because VPC Service Controls perimeters enforce data exfiltration prevention by default, blocking egress from protected services (like BigQuery) to unprotected services (like Cloud Storage). Adding Cloud Storage to the same perimeter allows BigQuery to export data to Cloud Storage while still preventing data from leaving the perimeter. The users already have the necessary IAM roles (BigQuery Data Editor and Storage Object Admin), so the issue is solely the perimeter boundary, not permissions.

Exam trap

The trap here is that candidates often confuse IAM permissions with VPC Service Controls boundaries, assuming that granting the correct IAM roles (like Storage Object Admin) will resolve the access denied error, when in fact the error is caused by the perimeter blocking cross-service egress, not by insufficient IAM privileges.

How to eliminate wrong answers

Option B is wrong because removing BigQuery from the perimeter would disable all VPC Service Controls protections for BigQuery, exposing the datasets to data exfiltration risks, which contradicts the requirement to maintain data exfiltration prevention. Option C is wrong because access levels control ingress based on client attributes (e.g., IP address, device state) and do not affect egress permissions between services within a perimeter; the export failure is a perimeter boundary issue, not an access level restriction. Option D is wrong because the users already have the Storage Object Admin role at the bucket level (as stated in the question), and the error is an access denied from the perimeter, not from IAM; granting the same role again does not resolve the VPC Service Controls boundary.

Full explanation →

278

Multi-Selecthard

Which THREE factors should be considered when choosing a Google Cloud region for deploying a low-latency application serving global users? (Choose three.)

Select 3 answers

A.Proximity to your user base to minimize network latency.

B.Availability of the specific Google Cloud services required by the application.

C.Pricing differences between regions due to variations in compute and storage costs.

D.Compliance with data residency requirements (e.g., GDPR, CCPA).

E.Number of zones in the region to ensure high availability.

AnswersA, B, D

Closer regions reduce round-trip time.

Why this answer

Options A, C, and D are correct. Latency to users, service availability, and data residency are key; cost is secondary, and number of zones is not a primary factor.

Full explanation →

279

MCQhard

An organization has a VPC with two subnets: subnet-a (10.0.1.0/24) and subnet-b (10.0.2.0/24). They launched a Compute Engine instance in subnet-a with an internal IP 10.0.1.2 and a public IP. They want the instance to only allow HTTPS traffic from the internet. Which firewall rule should they create?

A.Ingress rule: allow tcp:0-65535, source 0.0.0.0/0, target tag 'https-server'

B.Egress rule: allow tcp:443, destination 0.0.0.0/0, target tag 'https-server'

C.Ingress rule: allow tcp:443, source 10.0.0.0/16, target tag 'https-server'

D.Ingress rule: allow tcp:443, source 0.0.0.0/0, target tag 'https-server'

AnswerD

This rule correctly allows inbound HTTPS from any source to instances with the tag.

Why this answer

Option D is correct because the instance needs to accept incoming HTTPS traffic (TCP port 443) from the internet. An ingress firewall rule with source 0.0.0.0/0 allows traffic from any external IP, and applying it to instances with the target tag 'https-server' ensures only tagged instances are affected. This matches the requirement to allow only HTTPS from the internet.

Exam trap

The trap here is that candidates often confuse ingress vs. egress rules or mistakenly restrict the source to the VPC range (10.0.0.0/16) thinking it includes the internet, when in fact it only allows traffic from within the VPC.

How to eliminate wrong answers

Option A is wrong because it allows all TCP ports (0-65535) from the internet, which violates the requirement to allow only HTTPS traffic (port 443). Option B is wrong because it is an egress rule, which controls outbound traffic from the instance, not inbound HTTPS traffic from the internet. Option C is wrong because it restricts the source to the internal VPC range (10.0.0.0/16), which blocks all internet traffic and does not meet the requirement for allowing HTTPS from the internet.

Full explanation →

280

MCQeasy

A developer accidentally deleted a bucket in Cloud Storage. The bucket had object versioning enabled. How can the bucket and its objects be restored?

A.Contact Cloud Support to restore the bucket from the undisclosed backup within a limited time window.

B.Restore the bucket from the Trash in the Cloud Console.

C.Enable bucket lock and then undo deletion.

D.Use the gsutil ls -a command to list deleted buckets and gsutil cp to restore.

AnswerA

Google can restore deleted buckets within a short period.

Why this answer

When a Cloud Storage bucket is deleted, even with versioning enabled, the bucket itself is removed along with its objects. Google Cloud does not provide a self-service restore option for deleted buckets; instead, it maintains an internal, undisclosed backup for a limited time (typically 7 days). Only Cloud Support can initiate the restoration process from this backup, making Option A the correct approach.

Exam trap

Google Cloud often tests the misconception that versioning provides a safety net for bucket deletion, but versioning only protects objects within an existing bucket—it does not prevent or undo the deletion of the bucket itself.

How to eliminate wrong answers

Option B is wrong because Cloud Storage does not have a 'Trash' feature for buckets; the Trash in Cloud Console is for Compute Engine resources like VM instances, not for storage buckets. Option C is wrong because bucket lock is a feature for retention policies (e.g., preventing object deletion or modification), not for undoing a bucket deletion; once a bucket is deleted, there is no 'undo deletion' operation. Option D is wrong because the `gsutil ls -a` command lists object versions within an existing bucket, not deleted buckets; there is no `gsutil` command to list or restore a deleted bucket.

Full explanation →

281

MCQmedium

A company has a microservices architecture on GKE. One service is failing due to resource exhaustion. How can they proactively prevent this?

A.Use vertical pod autoscaling.

B.Set up autoscaling based on CPU utilization.

C.Configure a horizontal pod autoscaler with custom metrics.

D.Implement a cluster autoscaler.

AnswerC

Custom metrics can detect specific exhaustion signals.

Why this answer

Option B is correct because a horizontal pod autoscaler with custom metrics (e.g., memory, request queue depth) can detect resource exhaustion early and scale pods before failure. Option A is wrong because CPU-based autoscaling may not capture all exhaustion types. Option C is wrong because vertical pod autoscaling may not react fast enough.

Option D is wrong because cluster autoscaler scales nodes, not pods.

Full explanation →

282

Matchingmedium

Match each GCP data processing service to its use case.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Stream and batch data processing (Apache Beam)

Managed Hadoop and Spark clusters

Asynchronous messaging for event ingestion

Visual data integration pipelines

Workflow orchestration (Apache Airflow)

Why these pairings

These are data processing services in GCP.

Full explanation →

283

Multi-Selectmedium

A company is designing a highly available application on GCE. Which TWO steps should they take to ensure reliability?

Select 2 answers

A.Use a global external HTTP(S) load balancer.

B.Use a managed instance group with autohealing.

C.Configure health checks that check the application endpoint.

D.Use persistent disks without snapshots.

E.Deploy instances in a single zone to avoid latency.

AnswersB, C

Automatically replaces unhealthy instances.

Why this answer

Option B is correct because a managed instance group (MIG) with autohealing automatically replaces unhealthy VM instances based on health check results, ensuring the application remains available even if individual instances fail. This is a core reliability pattern for stateless applications on Compute Engine, as it provides self-healing infrastructure without manual intervention.

Exam trap

Google Cloud often tests the distinction between load balancing (traffic distribution) and instance-level recovery (autohealing), causing candidates to incorrectly select a global load balancer as the sole reliability measure without recognizing the need for health-check-driven instance replacement.

Full explanation →

284

MCQmedium

A company is using Cloud SQL for PostgreSQL and needs to run a one-time heavy analytical query that takes over 30 minutes and uses 100% CPU. The production database is serving user traffic with high QPS. What should the company do to run the query without impacting production?

A.Run the query directly on the primary instance during low traffic hours.

B.Create a read replica of the production instance and run the query on the replica.

C.Use Cloud SQL's pgBouncer to pool connections and queue the query.

D.Create a clone of the production instance and run the query on the clone.

AnswerB

Read replicas are designed for offloading read-only workloads.

Why this answer

Option B is correct because a read replica in Cloud SQL for PostgreSQL is a separate instance that asynchronously replicates data from the primary. Running the heavy analytical query on the replica offloads the CPU-intensive workload from the production primary, ensuring user-facing traffic with high QPS is not impacted. The replica can handle read-only queries without affecting the primary's performance or availability.

Exam trap

Google Cloud often tests the distinction between a read replica (which offloads read traffic) and a clone (which is a point-in-time copy not kept in sync), leading candidates to choose the clone option because they confuse it with a replica's ability to handle production queries without impact.

How to eliminate wrong answers

Option A is wrong because even during low traffic hours, a query using 100% CPU on the primary instance will still degrade performance for any concurrent user requests, risking latency spikes or timeouts. Option C is wrong because pgBouncer is a connection pooler that manages database connections, not a query scheduler or resource isolator; it cannot queue or throttle a single heavy query to prevent CPU saturation. Option D is wrong because a clone creates a new primary instance from a snapshot, which requires provisioning time and does not provide ongoing replication; it is suitable for testing or development but not for running a one-time query without impacting production, as the clone is not kept in sync and the heavy query still runs on a separate instance that does not offload the primary's workload.

Full explanation →

285

MCQeasy

Your company runs a critical application on Compute Engine instances in a managed instance group across three zones. The application writes logs to local disk. You are asked to improve the reliability of log retention and ensure logs are available in case of instance failure. You have already configured a health check that automatically recreates instances. However, after a recent zonal outage, logs from the affected instances were lost. You need to implement a solution that preserves logs even when instances are terminated. What should you do?

A.Increase the size of the local SSD to accommodate more logs and set a longer retention period.

B.Configure each instance to write logs to a persistent disk that is retained after instance deletion.

C.Install the Cloud Logging agent on each instance and configure it to stream application logs to Cloud Logging.

D.Mount a Cloud Storage bucket using gcsfuse on each instance and write logs directly to the bucket.

AnswerC

Cloud Logging provides centralized, durable log storage independent of instance lifecycle.

Why this answer

Option C is correct because the Cloud Logging agent streams logs directly to Cloud Logging (now part of Google Cloud's operations suite), which stores logs independently of the Compute Engine instances. This ensures logs are preserved even if instances are terminated due to a zonal outage or health check recreation, as logs are sent to a centralized, durable logging service rather than being stored on local disk.

Exam trap

Google Cloud often tests the misconception that persistent disks or Cloud Storage buckets are sufficient for log durability, but the key requirement is centralized log management with automatic streaming, which only Cloud Logging provides without additional complexity or latency.

How to eliminate wrong answers

Option A is wrong because increasing local SSD size and retention period does not protect logs from instance termination; local SSDs are ephemeral and their data is lost when an instance is deleted or recreated. Option B is wrong because persistent disks are not automatically retained after instance deletion unless the 'delete-on-terminate' flag is set to false, and even then, logs would be tied to a specific disk that may not survive a zonal outage if not replicated; the question requires a solution that works across instance failures, not just disk retention. Option D is wrong because while gcsfuse can mount a Cloud Storage bucket, writing logs directly to a bucket introduces latency and potential consistency issues, and the bucket is not a log management solution; Cloud Logging is purpose-built for log ingestion, analysis, and retention.

Full explanation →

286

Multi-Selecthard

A financial services company is designing a multi-tier application on Google Cloud. The application must meet PCI DSS compliance, with data encrypted at rest and in transit. They plan to use Cloud SQL for PostgreSQL for transactional data and Cloud Storage for archival data. Which TWO actions should the architect take to meet compliance requirements?

Select 2 answers

A.Configure client-side encryption in the application code

B.Rely on Google-managed default encryption for all data

C.Enable customer-managed encryption keys (CMEK) on Cloud SQL and Cloud Storage

D.Use VPC Service Controls to restrict data access

E.Use Cloud HSM with a key generated outside of Google Cloud

AnswersC, D

CMEK provides control over key management required for PCI DSS.

Why this answer

Option C is correct because enabling CMEK on Cloud SQL and Cloud Storage allows the company to use their own encryption keys, which is often required by PCI DSS to demonstrate control over key management. CMEK ensures data at rest is encrypted with keys managed via Cloud KMS, providing auditability and separation of duties beyond Google-managed default encryption.

Exam trap

The trap here is that candidates often assume Google-managed default encryption is sufficient for PCI DSS, but the exam tests the nuance that many compliance frameworks require customer-managed keys (CMEK) to demonstrate control over the encryption process, not just encryption itself.

Full explanation →

287

MCQmedium

A company runs a web application on Google Kubernetes Engine (GKE) with Cluster Autoscaler enabled. During a traffic spike, the application becomes slow and some requests timeout. The cluster has sufficient CPU and memory headroom. What is the most likely cause and solution?

A.Increase the node pool's machine type to a larger size.

B.Enable Cluster Autoscaler to add more nodes.

C.Deploy the application in a regional cluster for higher availability.

D.Configure Horizontal Pod Autoscaler (HPA) based on CPU utilization or custom metrics.

AnswerD

HPA automatically scales pods based on load, resolving the timeout issue.

Why this answer

The correct answer is D because the cluster has sufficient CPU and memory headroom, indicating that the issue is not about cluster capacity but about pod-level scaling. The Horizontal Pod Autoscaler (HPA) automatically scales the number of pod replicas based on observed CPU utilization or custom metrics, which directly addresses the application slowdown and timeouts during traffic spikes by distributing the load across more pods.

Exam trap

Google Cloud often tests the distinction between node-level scaling (Cluster Autoscaler) and pod-level scaling (HPA), trapping candidates who assume that adding more nodes is the solution when the cluster already has headroom, whereas the real issue is insufficient pod replicas to handle the load.

How to eliminate wrong answers

Option A is wrong because increasing the node pool's machine type addresses node-level resource constraints, but the cluster already has sufficient CPU and memory headroom, so the bottleneck is at the pod level, not the node level. Option B is wrong because Cluster Autoscaler is already enabled and the cluster has headroom, so adding more nodes would not solve the problem of insufficient pod replicas to handle the traffic spike. Option C is wrong because deploying in a regional cluster improves availability and resilience to zone failures, but does not directly address the performance degradation and timeouts caused by insufficient application instances during a traffic spike.

Full explanation →

288

Multi-Selectmedium

Your organization is moving a legacy monolithic application to Google Kubernetes Engine (GKE). The application currently runs on a single virtual machine with a local MySQL database. You need to design a cloud-native architecture that improves scalability and reliability. Which two actions should you take? (Choose TWO.)

Select 2 answers

A.Deploy the entire application in a single container with a large custom machine type to handle load.

B.Refactor the application into microservices and deploy each as a separate deployment in GKE.

C.Expose the application using a simple Service of type LoadBalancer with round-robin distribution.

D.Use Cloud SQL for MySQL instead of running the database in the same cluster.

E.Use a single Pod with multiple containers that communicate via localhost to reduce latency.

AnswersB, D

Microservices allow independent scaling and faster deployments.

Why this answer

Option B is correct because refactoring the monolithic application into microservices and deploying each as a separate Deployment in GKE aligns with cloud-native principles, enabling independent scaling, fault isolation, and easier updates. This approach improves scalability and reliability by allowing each microservice to scale horizontally based on demand, and failures in one service do not cascade to others.

Exam trap

Google Cloud often tests the misconception that simply containerizing a monolith or using a larger machine type is sufficient for cloud-native scalability, when in fact true scalability requires decoupling components into independently scalable units and separating stateful services like databases.

Full explanation →

289

MCQeasy

An engineer runs the above command and sees two firewall rules that allow SSH access. A security review requires that SSH access be allowed only from the bastion subnet 10.0.1.0/24. What should the engineer do to meet the requirement?

A.Add a firewall rule with priority 500 that denies SSH from all IPs

B.Change the priority of allow-ssh-ingress to 2000

C.Delete the allow-ssh-ingress rule

D.Remove the target tag 'ssh-allowed' from allow-ssh-from-bastion

AnswerC

Deleting the overly permissive rule leaves only the bastion-specific rule, meeting the requirement.

Why this answer

The correct answer is C because the allow-ssh-ingress rule has a higher priority (lower number) than the allow-ssh-from-bastion rule, allowing SSH from any source IP. Deleting this rule ensures that only the lower-priority rule (allow-ssh-from-bastion) remains, which restricts SSH access to the bastion subnet 10.0.1.0/24. In Google Cloud VPC firewall rules, lower priority numbers indicate higher precedence, so the allow-ssh-ingress rule (priority 1000) overrides the allow-ssh-from-bastion rule (priority 2000) for any traffic matching both.

Exam trap

Google Cloud often tests the misconception that adding a deny rule with a higher priority (lower number) will block unwanted traffic while preserving the allow rule, but candidates forget that the deny rule would also block the intended bastion traffic, breaking the requirement.

How to eliminate wrong answers

Option A is wrong because adding a deny rule with priority 500 would block SSH from all IPs, including the bastion subnet, since deny rules take precedence over allow rules at the same or lower priority; this would break the requirement to allow SSH from the bastion. Option B is wrong because changing the priority of allow-ssh-ingress to 2000 would make it equal to the allow-ssh-from-bastion rule, but both would still allow SSH from all IPs (since allow-ssh-ingress has no source restriction), and with equal priority the evaluation order is undefined, potentially still allowing unwanted access. Option D is wrong because removing the target tag 'ssh-allowed' from allow-ssh-from-bastion would prevent that rule from applying to any instances, effectively blocking all SSH access, including from the bastion subnet.

Full explanation →

290

MCQhard

A company uses Cloud Armor to protect their HTTP Load Balancer from DDoS attacks. Recently, they experienced a targeted attack that bypassed Cloud Armor's predefined rules. The attack involved a high rate of legitimate-looking requests from a small set of IPs that made the application unresponsive. The team needs to block the attack quickly without affecting legitimate users. What should they do?

A.Increase the load balancer's capacity to absorb the attack.

B.Configure rate limiting with a threshold based on the normal traffic pattern.

C.Enable Google Cloud Armor Adaptive Protection.

D.Add the attacking IPs to a Cloud Armor deny list.

AnswerC

Adaptive Protection learns normal traffic patterns and automatically blocks anomalous high-rate requests.

Why this answer

Option C is correct because Cloud Armor Adaptive Protection uses machine learning to analyze traffic patterns and automatically create tailored rules to block application-layer DDoS attacks that bypass predefined rules. In this scenario, the attack consists of legitimate-looking requests from a small set of IPs, which Adaptive Protection can detect as anomalous and generate a custom signature to block without manual intervention, preserving access for legitimate users.

Exam trap

The trap here is that candidates may choose Option D (adding IPs to a deny list) because it seems like a quick fix, but Cisco tests the understanding that Cloud Armor Adaptive Protection is the correct automated solution for application-layer DDoS attacks with legitimate-looking traffic, not manual IP blocking.

How to eliminate wrong answers

Option A is wrong because increasing the load balancer's capacity only absorbs volumetric attacks but does not address the application-layer nature of this attack; the high rate of legitimate-looking requests will still exhaust application resources regardless of capacity. Option B is wrong because configuring rate limiting with a threshold based on normal traffic patterns requires prior knowledge of those patterns and may inadvertently block legitimate users if the threshold is set too low, or fail to block the attack if the threshold is too high; it also does not leverage Cloud Armor's adaptive capabilities. Option D is wrong because adding the attacking IPs to a deny list is reactive and assumes the IPs are static; the attack may use rotating IPs or spoofed addresses, making manual deny lists ineffective and unsustainable for a rapid response.

Full explanation →

291

MCQmedium

A company is deploying a web application on Compute Engine behind a global HTTP(S) load balancer. They want to restrict access to only traffic from specific IP ranges. Which load balancer feature should they use?

A.Cloud Armor security policies.

B.VPC firewall rules.

C.Identity-Aware Proxy (IAP).

D.Cloud CDN.

AnswerA

Cloud Armor can allow/deny traffic based on IP.

Why this answer

Cloud Armor security policies are the correct choice because they allow you to define IP-based allow/deny rules at the edge of Google's network, directly integrated with the global HTTP(S) load balancer. This provides granular access control based on source IP ranges before traffic reaches your backend instances, which is exactly what the requirement specifies.

Exam trap

The trap here is that candidates often confuse VPC firewall rules with Cloud Armor, assuming that firewall rules can filter on the original client IP behind a load balancer, but in reality, VPC firewall rules only see the load balancer's proxy IPs, making Cloud Armor the only viable option for IP-based access control at the edge.

How to eliminate wrong answers

Option B is wrong because VPC firewall rules operate at the instance level (network interface) and cannot filter traffic based on the original client IP when a global HTTP(S) load balancer is used, as the load balancer's health check and proxy IPs are seen instead. Option C is wrong because Identity-Aware Proxy (IAP) controls access based on user identity and context (e.g., Google accounts, OAuth), not on source IP ranges, and is designed for application-layer authentication, not network-layer IP filtering. Option D is wrong because Cloud CDN is a content delivery network that caches content at edge locations to improve latency and reduce load, and it does not provide any IP-based access control or security policy enforcement.

Full explanation →

292

MCQeasy

A company wants to restrict access to a Cloud Storage bucket so that only a specific service account can read objects. The bucket contains sensitive data. Which identity and access management (IAM) approach should the architect use?

A.Grant the service account roles/iam.serviceAccountUser on the bucket.

B.Use a signed URL to allow access for the service account.

C.Grant the service account roles/storage.admin on the bucket.

D.Grant the service account roles/storage.objectViewer on the bucket and remove all other bindings.

AnswerD

This restricts read access to only the service account.

Why this answer

Option D is correct because the principle of least privilege dictates that the service account should be granted only the minimal permissions required to read objects, which is roles/storage.objectViewer. By removing all other bindings, the bucket becomes accessible exclusively to that service account, ensuring that no other identities (users, groups, or other service accounts) can read the sensitive data. This approach directly enforces the requirement using IAM roles on the bucket resource.

Exam trap

Google Cloud often tests the misconception that granting a broad role like roles/storage.admin is acceptable for simplicity, but the trap here is that candidates overlook the principle of least privilege and the specific read-only requirement, leading them to choose an overly permissive role.

How to eliminate wrong answers

Option A is wrong because roles/iam.serviceAccountUser grants permission to impersonate the service account (e.g., to run jobs as that account), not to read objects from a Cloud Storage bucket; it does not provide any storage access. Option B is wrong because signed URLs are used to grant temporary access to specific objects for any user (including non-Google accounts) via a cryptographic signature, not to restrict access to a specific service account; they are not an IAM-based access control mechanism. Option C is wrong because roles/storage.admin grants full control over the bucket, including the ability to delete objects and modify bucket metadata, which violates the principle of least privilege and exceeds the read-only requirement.

Full explanation →

293

MCQeasy

A developer wants to store and retrieve non-relational data with flexible schema and automatic scaling. Which Google Cloud service should they use?

A.Cloud Bigtable.

B.Cloud SQL.

C.Firestore.

D.Cloud Spanner.

AnswerC

Firestore is NoSQL with flexible schema and auto-scaling.

Why this answer

Firestore is a NoSQL document database that supports flexible schema and automatic scaling, making it ideal for non-relational data. It offers real-time synchronization, offline support, and serverless scaling, which aligns with the requirement for storing and retrieving data without manual sharding or capacity planning.

Exam trap

Google Cloud often tests the distinction between NoSQL databases by presenting Cloud Bigtable as a trap for 'non-relational' requirements, but candidates overlook that Bigtable is optimized for analytical workloads with fixed column families, not for flexible schema and automatic scaling in transactional applications.

How to eliminate wrong answers

Option A is wrong because Cloud Bigtable is a wide-column NoSQL database designed for large analytical workloads (e.g., time-series, IoT) with high throughput, but it does not support flexible schema in the same way as Firestore (it requires predefined column families) and is not optimized for transactional, real-time client-side access. Option B is wrong because Cloud SQL is a fully managed relational database service (MySQL, PostgreSQL, SQL Server) that enforces a fixed schema and does not automatically scale beyond its instance limits without manual resizing or read replicas. Option D is wrong because Cloud Spanner is a globally distributed relational database that provides strong consistency and horizontal scaling, but it requires a predefined schema and SQL-based relational model, making it unsuitable for non-relational data with flexible schema.

Full explanation →

294

MCQhard

A company runs a critical web application behind an external HTTPS load balancer. The backend consists of a managed instance group of Compute Engine instances. Users report intermittent 502 Bad Gateway errors. The load balancer logs show occasional health check failures for some instances. The instances have a custom health check endpoint that returns a 200 status code only if the application is fully healthy. The application logs do not show any errors, and CPU/memory usage on the instances is normal. What should be the first troubleshooting step to identify the root cause?

A.Change the health check to a TCP check on the application's port

B.Increase the health check check interval and decrease the unhealthy threshold

C.Increase the number of instances in the managed instance group

D.Check the application's logs on the instances to see why the health check endpoint sometimes returns non-200

AnswerD

This directly investigates the health check failure.

Why this answer

Option B is correct. The health check is failing, and since the instances show normal CPU/memory, the application might be slow to respond under certain conditions. Checking the application logs on the instances will reveal why the health check endpoint returns non-200.

Option A is wrong because increasing the interval doesn't fix the underlying issue. Option C is wrong because adding instances won't help if the health check is flaky. Option D is wrong because TCP health check would not validate application health and could mask the problem.

Full explanation →

295

MCQmedium

A company runs a monolithic application on Compute Engine. They want to modernize by moving to microservices on Google Kubernetes Engine (GKE) to improve deployment frequency and resource utilization. However, they are concerned about the increased operational complexity. Which approach best balances modernization benefits with operational overhead?

A.Keep the monolithic application on Compute Engine and use Cloud Monitoring to optimize resource utilization.

B.Migrate all application components to Cloud Run and use Cloud Tasks for asynchronous communication.

C.Rewrite the entire application as microservices and deploy on GKE with Istio for service mesh.

D.Identify stateless components to migrate to Cloud Run, and keep stateful components on GKE with managed services like Cloud Spanner.

AnswerD

Balances modernization with reduced complexity by using serverless where appropriate.

Why this answer

Option D is correct because it pragmatically balances modernization benefits with operational overhead by migrating only stateless components to Cloud Run (a fully managed serverless platform that reduces operational complexity) while keeping stateful components on GKE with managed services like Cloud Spanner. This approach improves deployment frequency and resource utilization without requiring a full rewrite, and it leverages Cloud Run's automatic scaling and zero infrastructure management to minimize operational burden.

Exam trap

Google Cloud often tests the misconception that full microservices migration (Option C) is always the best modernization path, but the trap here is that candidates overlook the operational overhead of service mesh and full rewrites, failing to recognize that a hybrid approach using serverless for stateless components reduces complexity while still achieving modernization goals.

How to eliminate wrong answers

Option A is wrong because it fails to modernize the architecture—keeping the monolithic application on Compute Engine does not improve deployment frequency or resource utilization, and Cloud Monitoring alone cannot address the core issues of monolithic scaling and slow deployments. Option B is wrong because migrating all application components to Cloud Run is impractical for stateful workloads (Cloud Run is stateless by design, with no persistent local storage), and Cloud Tasks alone does not solve the complexity of managing stateful services or inter-service communication in a microservices architecture. Option C is wrong because rewriting the entire application as microservices and deploying on GKE with Istio introduces significant operational overhead (service mesh configuration, sidecar proxies, and increased complexity) that contradicts the goal of balancing modernization benefits with operational overhead, and it ignores the possibility of a phased migration.

Full explanation →

296

MCQeasy

Refer to the exhibit. A DevOps engineer created this Terraform configuration to deploy a Compute Engine instance. After applying, they notice the instance is not accessible from the internet. What is the most likely cause?

A.The machine type e2-medium does not support public IP addresses.

B.The instance is not attached to a VPC network.

C.No firewall rule allows ingress traffic to the instance.

D.The boot disk size is too small to run the operating system.

AnswerC

Firewall rules are needed to allow inbound traffic; the default network may not have appropriate rules.

Why this answer

The most likely cause is that no firewall rule allows ingress traffic to the instance. By default, GCP instances are created with a VPC network that has implied deny-all ingress rules, and unless a specific firewall rule (e.g., allowing tcp:22 for SSH or tcp:80 for HTTP) is applied to the instance's network tags or service account, all inbound traffic from the internet is blocked. The Terraform configuration shown in the exhibit likely omitted a `google_compute_firewall` resource or did not assign the necessary network tags to the instance.

Exam trap

Google Cloud often tests the misconception that assigning a public IP automatically makes an instance internet-accessible, but the trap here is that without a corresponding ingress firewall rule, the instance remains isolated regardless of the public IP.

How to eliminate wrong answers

Option A is wrong because the machine type e2-medium fully supports public IP addresses; public IP assignment is controlled by the `access_config` block in the Terraform resource, not by the machine type. Option B is wrong because every Compute Engine instance is automatically attached to a default VPC network unless explicitly overridden; the exhibit does not indicate any misconfiguration that would leave the instance networkless. Option D is wrong because the boot disk size (e.g., 10 GB default) is sufficient for most operating systems; the issue is about network accessibility, not disk capacity.

Full explanation →

297

MCQmedium

A company has a requirement to store application logs for 7 years for compliance. They are using Cloud Logging. What is the most cost-effective way to retain logs?

A.Set the log bucket retention to 7 years

B.Export logs to Cloud Storage with Object Lifecycle management to delete after 7 years

C.Export logs to BigQuery and run scheduled queries to delete old data

D.Use Cloud Logging's default retention and rely on backups

AnswerB

Cloud Storage is cost-effective for long-term retention with lifecycle rules.

Why this answer

Cloud Logging's default retention is limited (e.g., 30 days for logs in the default _Default bucket, and up to 365 days for custom log buckets). To meet a 7-year compliance requirement cost-effectively, you should export logs to Cloud Storage and use Object Lifecycle Management to delete objects after 7 years. Cloud Storage offers lower long-term storage costs than retaining logs in Logging's _Required or custom buckets, and lifecycle rules automate deletion without ongoing compute costs.

Exam trap

The trap here is that candidates assume Cloud Logging's retention settings can be extended arbitrarily, but the exam tests knowledge that log buckets have a hard 365-day maximum (except _Required at 400 days), making export to Cloud Storage with lifecycle rules the only viable long-term, cost-effective solution.

How to eliminate wrong answers

Option A is wrong because Cloud Logging log buckets have a maximum retention period of 365 days (1 year) for custom buckets, and the _Required bucket retains logs for 400 days; you cannot set a retention of 7 years directly in a log bucket. Option C is wrong because BigQuery storage costs are significantly higher than Cloud Storage for long-term archival, and running scheduled queries to delete old data incurs additional query costs and complexity. Option D is wrong because Cloud Logging's default retention (e.g., 30 days for _Default, 400 days for _Required) does not meet the 7-year requirement, and backups are not a native retention mechanism for compliance.

Full explanation →

298

MCQmedium

A company uses BigQuery for analytics and has a large number of ad-hoc queries from different teams. Costs are rising unpredictably. They want to control costs while maintaining query performance. What should they do?

A.Use partitioning and clustering to reduce data scanned.

B.Reduce the number of slots available to each team.

C.Require each team to include a cost code in their queries.

D.Purchase flat-rate slots and assign them to a reservation for each team.

AnswerD

Flat-rate provides predictable cost and performance isolation.

Why this answer

Option D is correct because purchasing flat-rate slots and assigning them to a reservation for each team provides predictable, fixed-cost capacity for BigQuery. This eliminates the unpredictability of on-demand pricing while allowing teams to share a dedicated pool of slots, ensuring consistent query performance without unexpected cost spikes.

Exam trap

Google Cloud often tests the misconception that performance optimization techniques (like partitioning/clustering) alone can control costs, when in fact they only reduce per-query data scanned but do not cap total spending under on-demand pricing.

How to eliminate wrong answers

Option A is wrong because partitioning and clustering reduce data scanned per query, which lowers on-demand costs, but they do not cap total spending or prevent cost spikes from high query volumes; costs remain unpredictable if usage surges. Option B is wrong because reducing the number of slots available to each team would degrade query performance and cause queuing, violating the requirement to maintain performance; slots are a resource, not a cost control mechanism. Option C is wrong because requiring a cost code in queries only adds metadata for tracking and chargeback, but does not control or cap the actual compute costs incurred; it provides visibility, not cost control.

Full explanation →

299

Multi-Selecthard

A company wants to optimize their cloud spending on Google Cloud. They have a mix of workloads including batch processing, real-time analytics, and web serving. Which TWO strategies should they implement to reduce costs without significant architectural changes? (Choose two.)

Select 2 answers

A.Use sustained use discounts for short-lived instances.

B.Use preemptible VMs for batch processing jobs that are fault-tolerant.

C.Purchase committed use discounts for 1-year or 3-year terms for stable workloads.

D.Right-size all Compute Engine instances by analyzing utilization metrics.

E.Migrate all web serving workloads to Cloud Functions to benefit from pay-per-use pricing.

AnswersB, C

Preemptible VMs are cost-effective for fault-tolerant workloads.

Why this answer

Preemptible VMs are short-lived, fault-tolerant instances that cost significantly less than standard VMs, making them ideal for batch processing jobs that can handle interruptions. This strategy directly reduces compute costs without requiring architectural changes, as the workloads are already designed to be resilient to failures.

Exam trap

The trap here is that candidates often confuse sustained use discounts (which require long-running instances) with preemptible VMs (which are for short-lived, fault-tolerant workloads), or they assume right-sizing is a 'no-change' strategy when it typically involves instance type modifications that affect architecture.

Full explanation →

300

MCQmedium

A company deploys a web application on Compute Engine behind a Global HTTPS Load Balancer. They need to restrict access to the application based on the client's IP address. Which Google Cloud service should they use?

A.VPC firewall rules

B.Identity-Aware Proxy (IAP)

C.Cloud Armor

D.Cloud CDN

AnswerC

Cloud Armor provides IP-based access control and DDoS protection for load balancers.

Why this answer

Cloud Armor is the correct choice because it provides IP-based access control at the edge of Google's network, integrated directly with the Global HTTPS Load Balancer. It allows you to create security policies with IP allow/deny rules that are evaluated before traffic reaches your Compute Engine instances, making it the appropriate service for client IP restriction at the load balancer level.

Exam trap

The trap here is that candidates often confuse VPC firewall rules with edge security, not realizing that VPC firewall rules cannot see the original client IP when a Global Load Balancer is in front, making Cloud Armor the only option for IP-based access control at the load balancer level.

How to eliminate wrong answers

Option A is wrong because VPC firewall rules operate at the instance network interface level, not at the load balancer edge, and they cannot inspect the original client IP address when traffic passes through a Global HTTPS Load Balancer (the source IP becomes the load balancer's IP). Option B is wrong because Identity-Aware Proxy (IAP) controls access based on user identity and context (e.g., OAuth2, device security), not on client IP addresses; it is designed for authentication and authorization, not network-layer IP filtering. Option D is wrong because Cloud CDN is a content delivery network service that caches content at edge locations to improve latency and reduce load; it does not provide IP-based access control or security policy enforcement.

Full explanation →

Google Professional Cloud Architect (PCA) — Questions 226–300