Google Professional Cloud Architect PCA Questions 376–450 | Page 6/7

376

MCQhard

A financial services company is migrating a monolithic Java application to Google Kubernetes Engine (GKE) for improved scalability and reliability. The application serves real-time trading data and has strict latency requirements. Post-migration, the team observes frequent pod restarts due to OutOfMemory (OOM) errors, increased latency during peak trading hours, and occasional database connection timeouts. The current setup uses a single GKE cluster with a node pool of n1-standard-4 machines, a stateless application deployed as a Deployment with resource requests and limits set to 512 Mi memory and 1 CPU. The database is a Cloud SQL PostgreSQL instance with 2 vCPUs and 7.5 GB memory, and applications connect using a hardcoded connection string. The team wants to ensure reliable operation under load and during node maintenance events. Which course of action best addresses the reliability issues?

A.Adjust resource requests to 1 Gi memory and 2 CPU, set limits to 2 Gi and 4 CPU, create an HPA based on a custom metric (e.g., requests per second), enable cluster autoscaler, implement Cloud SQL connection pooling via Cloud SQL Auth Proxy with a max connection pool size, and configure PDB with maxUnavailable 1.

B.Enable GKE node auto-upgrade, configure Pod Disruption Budgets (PDB) with minAvailable 1, and set readiness probes to check application health.

C.Migrate the database to a StatefulSet in GKE with persistent volumes, increase node count to 10, and enable cluster autoscaler.

D.Increase memory limits to 2 Gi and CPU to 2, add Horizontal Pod Autoscaler (HPA) based on CPU utilization, and implement connection pooling using Cloud SQL Auth Proxy.

AnswerA

Correctly addresses all issues: resource tuning for OOM, custom metric HPA for load, cluster autoscaler for capacity, connection pooling for timeouts, and PDB for maintenance.

Why this answer

Option C comprehensively addresses all issues: setting resource requests ensures scheduling, limits prevent OOM, HPA on custom metrics (e.g., requests per second) scales based on load, Cloud SQL connection pooling with Cloud SQL Auth Proxy prevents connection exhaustion and adds security, cluster autoscaler handles node capacity, and PDB ensures availability during maintenance. Option A misses readiness probes and autoscaling; Option B ignores resource limits and connection pooling; Option D uses StatefulSet unnecessarily and omits connection pooling and HPA on custom metrics.

Full explanation →

377

MCQmedium

Your team manages a service with a 99.9% uptime SLO over a 30-day window. The error budget for this period is 43 minutes. In the first week, outages consumed 30 minutes of the budget. You are planning a new release. What should you do?

A.Reduce the SLO to 99.8% to increase the error budget.

B.Proceed with the release because the remaining budget is sufficient.

C.Delay the release and focus on improving reliability to rebuild the error budget.

D.Release the feature but only to a small percentage of users.

AnswerC

Conservative approach: wait until more error budget is earned (e.g., through flawless operation) before releasing.

Why this answer

With only 13 minutes of error budget remaining after the first week, proceeding with the release (Option B) risks exhausting the budget entirely from any unforeseen issues, violating the 99.9% SLO. Delaying the release (Option C) allows the team to focus on reliability improvements, such as implementing canary deployments, adding circuit breakers, or enhancing monitoring with tools like Prometheus and Grafana, to rebuild the error budget over the remaining 23 days. This aligns with the principle of using error budgets to balance innovation with reliability, as defined in Google's SRE practices.

Exam trap

Google Cloud often tests the misconception that a canary release (Option D) is always safe, but the trap here is that it still consumes error budget and does not solve the underlying reliability deficit when the budget is already critically low.

How to eliminate wrong answers

Option A is wrong because reducing the SLO to 99.8% would increase the error budget to 86.4 minutes, but this is a reactive measure that lowers the reliability target rather than addressing the root cause of the outages; it also violates the principle of maintaining a consistent SLO commitment to customers. Option B is wrong because proceeding with the release with only 13 minutes of error budget left is reckless—any minor incident could exhaust the budget, leading to SLO violations and potential service credits or customer dissatisfaction, especially since the first week already consumed 70% of the budget. Option D is wrong because releasing to a small percentage of users (e.g., a canary deployment) is a valid risk mitigation strategy, but it does not address the fact that the error budget is nearly depleted; even a small-scale release could introduce bugs that consume the remaining budget, and the team should first stabilize the service before any new changes.

Full explanation →

378

MCQmedium

A company is migrating a legacy monolithic application to Google Cloud. The application runs on a single VM and uses a local MySQL database. The goal is to minimize changes to the application code while improving availability. Which strategy should the company use?

A.Use a managed instance group for the application VM and store the database on a persistent disk attached to the primary instance.

B.Re-architect the application into microservices and use Cloud Run for stateless components.

C.Lift and shift the VM to Compute Engine, and migrate the database to Cloud SQL with a failover replica.

D.Containerize the application and deploy on Google Kubernetes Engine (GKE) with Cloud Spanner as the database.

AnswerC

Minimal code changes, uses managed database with high availability.

Why this answer

Option C is correct because it minimizes code changes by lifting the application VM to Compute Engine as-is, while migrating the local MySQL database to Cloud SQL with a failover replica. This improves availability through Cloud SQL's managed automatic failover to a standby replica in a different zone, without requiring application code changes to the database connection logic (the application can continue using the same MySQL protocol).

Exam trap

The trap here is that candidates often choose Option A, mistakenly believing that a managed instance group with a persistent disk provides database high availability, but they overlook that the persistent disk cannot be shared across instances in a managed instance group without additional orchestration (e.g., regional persistent disks or a clustered filesystem), and the database process itself is not automatically failed over.

How to eliminate wrong answers

Option A is wrong because storing the database on a persistent disk attached to a single instance in a managed instance group does not provide high availability for the database; if the primary instance fails, the persistent disk cannot be attached to a new instance without manual intervention, and the database state is lost or requires complex recovery. Option B is wrong because re-architecting into microservices and using Cloud Run requires significant application code changes, contradicting the goal of minimizing changes to the application code. Option D is wrong because containerizing and deploying on GKE with Cloud Spanner requires substantial application code changes (Cloud Spanner uses a different SQL dialect and connection protocol than MySQL) and introduces unnecessary complexity, violating the requirement to minimize code changes.

Full explanation →

379

MCQeasy

A startup deploys a web application on Compute Engine instances behind an HTTP load balancer. They need to handle unpredictable spikes in traffic with minimal operational overhead. What is the simplest scaling approach?

A.Set up a Kubernetes cluster with horizontal pod autoscaling

B.Use a managed instance group with autoscaling based on CPU utilization

C.Migrate the application to Cloud Run

D.Add more instances manually during peak hours

AnswerB

This is the simplest approach; it scales automatically with minimal configuration.

Why this answer

Using a managed instance group with autoscaling automatically adds/removes instances based on demand, requiring minimal manual intervention. Other options either require more complex setup or are not optimal.

Full explanation →

380

MCQmedium

A company wants to implement a CI/CD pipeline for a microservices application on GKE. They require automated canary deployments with gradual traffic shifting and automatic rollback on metric failure. Which Google Cloud service is most suitable?

A.Cloud Deploy with Skaffold.

B.Cloud Build with Deployment Manager.

C.Spinnaker on GKE.

D.Istio with manual traffic management.

AnswerA

Cloud Deploy provides built-in canary strategies and automatic rollback when combined with Skaffold.

Why this answer

Cloud Deploy with Skaffold is the most suitable because it provides native support for progressive delivery on GKE, including automated canary deployments with gradual traffic shifting (using Service Mesh or Ingress) and automatic rollback based on Cloud Monitoring metrics. Skaffold handles the build and deploy configuration, while Cloud Deploy manages the rollout pipeline, approval gates, and metric-driven rollback logic without requiring manual intervention.

Exam trap

The trap here is that candidates often confuse a traffic management tool (like Istio) with a full CI/CD pipeline service, overlooking that Istio alone cannot automate rollback decisions based on metrics without extensive custom integration.

How to eliminate wrong answers

Option B is wrong because Cloud Build is a CI/CD orchestration service for building and testing, but it does not natively support canary deployments or automatic rollback based on metrics; Deployment Manager is an infrastructure-as-code tool, not a deployment pipeline manager. Option C is wrong because Spinnaker on GKE is a valid alternative but requires significant operational overhead to install, configure, and maintain, and it is not a fully managed Google Cloud service, making it less suitable for a company seeking a native, low-maintenance solution. Option D is wrong because Istio with manual traffic management provides the traffic shifting capability but lacks automated rollback on metric failure; it requires custom scripting and external monitoring integration to achieve the desired automation, which contradicts the requirement for an automated CI/CD pipeline.

Full explanation →

381

Multi-Selecteasy

You are deploying a stateless web application on Compute Engine. Which TWO actions improve availability? (Choose 2)

Select 2 answers

A.Use a regional managed instance group.

B.Enable Cloud CDN for the static content.

C.Purchase 1-year committed use contracts for the instances.

D.Enable automatic restart on the instance template.

E.Use preemptible VMs to reduce cost.

AnswersA, D

Regional MIGs spread instances across zones; if one zone fails, other zones continue serving.

Why this answer

A regional managed instance group (MIG) distributes instances across multiple zones within a region, ensuring that if one zone fails, traffic is automatically routed to healthy instances in other zones. This provides high availability by eliminating a single zone of failure, which is critical for stateless web applications that can serve requests from any instance.

Exam trap

Google Cloud often tests the distinction between cost optimization (committed use contracts, preemptible VMs) and availability improvements, leading candidates to mistakenly choose financial commitments or caching services as availability solutions.

Full explanation →

382

MCQmedium

A team uses Cloud Build to build container images and deploy to Cloud Run. They want to automate deployments whenever a new image is pushed to Container Registry. What is the best approach?

A.Use Cloud Deploy with a delivery pipeline that polls for new images

B.Configure a Cloud Build trigger that runs on a push to the container image in Container Registry

C.Create a Cloud Function that subscribes to Pub/Sub and calls Cloud Run deploy

D.Set up a Cloud Scheduler job to run a script that deploys the latest image

AnswerB

Cloud Build triggers can respond to image push events directly.

Why this answer

Option B is correct because Cloud Build triggers can be configured to fire on a push to a container image in Container Registry, using the 'cloud-builds' Pub/Sub topic that Container Registry publishes to when an image is pushed. This allows Cloud Build to automatically run a build step (e.g., gcloud run deploy) to deploy the new image to Cloud Run without any polling or external infrastructure.

Exam trap

The trap here is that candidates may overcomplicate the solution by choosing Cloud Functions or Cloud Scheduler, missing the fact that Cloud Build triggers natively integrate with Container Registry's Pub/Sub events for automated, event-driven deployments.

How to eliminate wrong answers

Option A is wrong because Cloud Deploy delivery pipelines do not poll for new images; they are designed for continuous delivery with Skaffold-based configurations and require explicit triggers or manual releases, not automatic detection of image pushes. Option C is wrong because while a Cloud Function subscribing to Pub/Sub could work, it introduces unnecessary complexity and latency compared to the native Cloud Build trigger, which is the recommended and simpler approach for this exact use case. Option D is wrong because Cloud Scheduler jobs run on a fixed schedule and cannot detect new image pushes in real time, leading to either missed deployments or unnecessary redeployments of the same image.

Full explanation →

383

MCQmedium

An organization is implementing a Hub-and-Spoke network topology with multiple VPCs. Which Google Cloud product is designed for centralized connectivity and policy enforcement?

A.Cloud VPN

B.Cloud NAT

C.Network Connectivity Center

D.Shared VPC

AnswerD

Centralized VPC management with policy enforcement.

Why this answer

Shared VPC (D) is the correct answer because it allows an organization to centrally manage connectivity and enforce network policies across multiple VPCs from a single host project. By designating a host project and attaching service projects, Shared VPC enables centralized control over firewall rules, routes, and IAM policies, which is essential for a hub-and-spoke topology where the host VPC acts as the hub and service VPCs as spokes.

Exam trap

The trap here is that candidates often confuse Network Connectivity Center (NCC) as a centralized hub for VPCs, but NCC is designed for hybrid connectivity (on-prem to cloud) and multi-cloud, not for managing multiple VPCs within a single Google Cloud organization with centralized policy enforcement, which is the domain of Shared VPC.

How to eliminate wrong answers

Option A (Cloud VPN) is wrong because it is a site-to-site VPN service that connects on-premises networks to Google Cloud, not a solution for centralized connectivity and policy enforcement between multiple VPCs. Option B (Cloud NAT) is wrong because it provides outbound internet access for private instances via network address translation, not inter-VPC connectivity or policy enforcement. Option C (Network Connectivity Center) is wrong because, while it can connect on-premises and cloud networks, it is primarily a hub for hybrid connectivity using VPN or Interconnect, not for managing multiple VPCs within a single organization with centralized policy enforcement; Shared VPC is the native solution for that purpose.

Full explanation →

384

MCQeasy

You are reviewing an IAM policy for a Cloud Storage bucket. Alice is a member of the data-team group. What level of access does Alice have to objects in this bucket?

A.Read-only access.

B.No access, because the group policy overrides the individual policy.

C.Read and write access (admin).

D.Write-only access.

AnswerC

Her effective permissions are the union of both roles.

Why this answer

Option C is correct because the IAM policy grants the data-team group the roles/storage.objectAdmin role, which provides full read, write, and delete access to objects in the bucket. Alice, as a member of the data-team group, inherits this role and therefore has read and write (admin) access to the objects.

Exam trap

Google Cloud often tests the misconception that group policies override individual policies (a common RBAC misunderstanding), but in Google Cloud IAM, all applicable policies are additive unless a deny rule is explicitly applied.

How to eliminate wrong answers

Option A is wrong because the group policy grants the storage.objectAdmin role, not a read-only role like roles/storage.objectViewer. Option B is wrong because IAM policies are additive; group policies do not override individual policies—instead, the effective permissions are the union of all applicable policies. Option D is wrong because the storage.objectAdmin role includes both read and write permissions, not write-only access.

Full explanation →

385

MCQhard

A large e-commerce company runs its production workloads on Google Cloud. The security team has implemented a VPC Service Controls perimeter around the production project to prevent data exfiltration. The perimeter includes the project, and access is allowed only from an access level that requires the user to be on the corporate network (192.0.2.0/24). Recently, the DevOps team reported that their CI/CD pipeline, which runs on Cloud Build with a VPC connector attached to a shared VPC in a different project, is failing to deploy to Cloud Run. The pipeline uses a service account with roles/run.admin on the production project. The Cloud Build worker IPs are ephemeral and not in the corporate IP range. The pipeline's deployment step times out with permission errors. Which action will resolve the issue while maintaining security compliance?

A.Add the Cloud Build service account as a member of the access level used in the perimeter, so that it is not restricted by IP.

B.Remove the VPC Service Controls perimeter from the production project and rely solely on IAM permissions.

C.Add the Cloud Build worker IP range (0.0.0.0/0) to the access level's IP condition to allow all IPs.

D.Create a new service account for Cloud Build with roles/iam.serviceAccountUser and roles/run.admin, and assign it to the Cloud Run service.

AnswerA

Access levels can include service accounts as members, allowing them to bypass IP restrictions.

Why this answer

Option C is correct. Adding the Cloud Build service account to the access level's members allows it to bypass the IP restriction while still being subject to the perimeter. Option A is wrong because adding the worker IP range is not feasible (ephemeral) and weakens security.

Option B is wrong because removing the perimeter defeats the security requirement. Option D is wrong because changing the service account does not change the IP address of the Cloud Build workers.

Full explanation →

386

MCQeasy

A company deploys a web application on Compute Engine behind an HTTP Load Balancer. They want to ensure only healthy instances receive traffic. What should they configure?

A.Configure the instance group autoscaling based on CPU utilization

B.Configure an HTTP health check with a custom request path that returns a 200 status

C.Configure a TCP health check on port 80

D.Configure an SSL health check to verify TLS handshake

AnswerB

HTTP health check validates the application layer by checking a specific endpoint.

Why this answer

Option B is correct because an HTTP health check with a custom request path that returns a 200 status allows the HTTP Load Balancer to verify that the web application is actually serving requests correctly. This ensures that only instances passing the application-level health check are considered healthy and receive traffic, preventing requests from being routed to instances that may be running but not serving the expected content.

Exam trap

The trap here is that candidates often confuse health checks with autoscaling metrics, assuming that CPU-based autoscaling alone ensures traffic is only sent to healthy instances, when in fact health checks are a separate mechanism required for load balancer traffic routing.

How to eliminate wrong answers

Option A is wrong because autoscaling based on CPU utilization manages the number of instances but does not determine which instances are healthy for traffic routing; the load balancer still needs health checks to decide which instances to send traffic to. Option C is wrong because a TCP health check on port 80 only verifies that the TCP port is open, not that the web application is responding correctly; an instance could have a listening port but return errors or be unresponsive at the application layer. Option D is wrong because an SSL health check verifies the TLS handshake, which is unnecessary for HTTP traffic and does not validate the application's response; it is designed for HTTPS backends, not plain HTTP.

Full explanation →

387

MCQhard

Refer to the exhibit. The log entry is from Cloud Logging for a VPC subnetwork. What is the most likely cause of this error?

A.A firewall rule blocking ingress on port 80.

B.The subnetwork default has no internet gateway.

C.The VM at 10.0.0.2 is not running.

D.The packet is malformed.

AnswerA

The error message attributes the drop to firewall policy 'default-deny-ingress'.

Why this answer

The log entry indicates a packet was dropped by a firewall rule. Since the destination is 10.0.0.2 on port 80 (HTTP), the most likely cause is a firewall rule blocking ingress traffic on port 80. In Google Cloud VPC, firewall rules are stateful and evaluated before any routing decisions, so a missing or misconfigured ingress rule for TCP port 80 would cause this drop.

Exam trap

Google Cloud often tests the distinction between firewall drops and routing failures; the trap here is that candidates may confuse a firewall rule drop with a missing internet gateway or an unreachable VM, but the log entry's 'firewall' field explicitly indicates a firewall decision, not a routing or connectivity issue.

How to eliminate wrong answers

Option B is wrong because the absence of an internet gateway would not cause a packet drop logged by a firewall rule; it would result in a routing failure (e.g., no route to internet), which is logged differently. Option C is wrong because if the VM at 10.0.0.2 were not running, the packet would be dropped at the hypervisor level (e.g., ICMP unreachable or no ARP response), not by a firewall rule. Option D is wrong because a malformed packet would typically be dropped at a lower network layer (e.g., by the NIC or kernel) and would not generate a firewall rule log entry; firewall rules inspect valid packets against policy.

Full explanation →

388

MCQhard

A company runs a data analytics platform on Google Cloud using BigQuery, Dataflow, and Cloud Storage. They notice that Dataflow jobs are failing with 'out of memory' errors for certain large pipelines. The pipelines process variable amounts of data, sometimes spiking 10x normal. Which strategy should they use to handle these spikes cost-effectively?

A.Manually monitor the job and increase the number of workers when a spike is detected.

B.Increase the machine type of the workers to a high-memory type and disable autoscaling.

C.Configure the Dataflow pipeline to use autoscaling with a higher maximum number of workers and use preemptible VMs for cost savings.

D.Use Dataflow Streaming Engine to offload state to persistent storage and reduce memory usage.

AnswerC

Autoscaling adjusts workers dynamically; preemptible VMs reduce cost for fault-tolerant work.

Why this answer

Option C is correct because Dataflow's autoscaling can dynamically add workers to handle sudden data spikes, and using preemptible VMs significantly reduces cost for batch pipelines that can tolerate interruptions. This approach avoids manual intervention and over-provisioning, making it cost-effective for variable workloads.

Exam trap

Google Cloud often tests the distinction between batch and streaming optimizations, and candidates mistakenly apply Streaming Engine (designed for stateful streaming) to batch pipelines suffering from memory spikes, missing the cost-effective autoscaling with preemptible VMs strategy.

How to eliminate wrong answers

Option A is wrong because manual monitoring and scaling is not cost-effective or reliable for unpredictable spikes; it introduces latency and operational overhead. Option B is wrong because disabling autoscaling and using a fixed high-memory machine type leads to over-provisioning during normal loads and cannot handle spikes beyond the fixed capacity, wasting resources. Option D is wrong because Dataflow Streaming Engine is designed for streaming pipelines to reduce memory usage by offloading state, but the question describes batch pipelines (Dataflow jobs processing variable data amounts), and it does not address the root cause of memory exhaustion during large batch spikes.

Full explanation →

389

Multi-Selecteasy

Which THREE practices are recommended for organizing projects in a Google Cloud organization?

Select 3 answers

A.Create a separate project to hold organization policies.

B.Use a separate project for each environment (e.g., development, staging, production).

C.Apply IAM policies at the folder level instead of the organization level when possible.

D.Use a shared VPC host project for multiple service projects to centralize network management.

E.Consolidate all production resources into a single project for simplicity.

AnswersB, C, D

Separate projects isolate environments and allow independent management and billing.

Why this answer

Option B is correct because using separate projects for each environment (development, staging, production) enforces resource isolation, prevents accidental cross-environment changes, and allows independent IAM policies, billing, and quotas. This aligns with Google Cloud's recommended resource hierarchy best practices for managing lifecycle and security boundaries.

Exam trap

The trap here is that candidates often confuse the purpose of organization policies with project-level resources, mistakenly thinking a separate project is needed to hold policies, when in fact policies are inherited through the resource hierarchy (organization → folder → project).

Full explanation →

390

Multi-Selectmedium

A company wants to improve the reliability of their microservices architecture on Google Cloud. Which TWO practices should they implement? (Choose 2)

Select 2 answers

A.Design with a single point of failure for simplicity

B.Implement retry with exponential backoff

C.Use synchronous communication between all services

D.Implement circuit breaker pattern

E.Disable health checks to reduce latency

AnswersB, D

Retry with backoff handles transient failures without overwhelming the system.

Why this answer

B is correct because implementing retry with exponential backoff allows transient failures (e.g., network timeouts, temporary service unavailability) to be handled gracefully by automatically retrying the request after increasing delays, reducing load on the recovering service. This pattern is essential in microservices on Google Cloud to improve reliability without overwhelming downstream dependencies.

Exam trap

Google Cloud often tests the misconception that synchronous communication is more reliable because it provides immediate feedback, but in distributed systems, asynchronous patterns and resilience mechanisms like retries and circuit breakers are actually critical for reliability.

Full explanation →

391

MCQeasy

A company wants to ensure that all access to their Cloud Storage bucket is logged for compliance purposes. Which type of audit log should they enable?

A.Admin Activity audit logs

B.Data Access audit logs

C.System Event audit logs

D.Access Transparency logs

AnswerB

Data Access logs capture read and write operations on data.

Why this answer

Data Access audit logs record every access to user data. Admin Activity logs record administrative actions; System Event logs record GCP actions; Access Transparency logs record Google support access.

Full explanation →

392

MCQmedium

Refer to the exhibit. A user creates a Cloud SQL for PostgreSQL instance and a Compute Engine VM. The VM cannot connect to the database. What is the most likely cause?

A.The Cloud SQL instance requires SSL connections, and the client is not using SSL.

B.The Cloud SQL instance does not have a private IP assigned, but the VM is attempting to connect using the private IP.

C.The VM's firewall is blocking egress to port 5432 on the Cloud SQL public IP.

D.The authorized networks setting is too permissive; it should be restricted to the VM's public IP.

AnswerB

Correct: The '--assign-ip' flag only assigns a public IP. To use private IP, the instance needs to be configured with a private network. The VM likely uses the private IP because it is in the same region, but the instance doesn't have one.

Why this answer

The Cloud SQL instance has authorized networks set to 0.0.0.0/0, which allows all IPs. However, the instance has a public IP, and the VM has an external IP. The connection fails with timeout, suggesting that the traffic is not reaching the database.

This could be due to the database not having SSL enforced, but that would cause a different error. The most likely cause is that the Cloud SQL instance is not configured to allow connections from the VM's public IP, because authorized networks only apply to connections using the public IP. But the exhibit shows it's set to 0.0.0.0/0, so that should work.

Another possibility: the VM is trying to connect to the private IP of the Cloud SQL instance, but the instance does not have a private IP. The exhibit shows '--assign-ip' which assigns a public IP, but does not assign a private IP. The VM might be trying to connect to the private IP, which doesn't exist.

However, the error is 'connection timed out', which suggests the client cannot reach the IP. If the client is using the public IP, the firewall on the VM allows egress. The issue could be that the Cloud SQL instance's public IP is not reachable from the VM's network due to VPC firewall rules? But the VM's firewall allows egress to 0.0.0.0/0.

The most likely cause is that the Cloud SQL instance does not have a private IP, and the VM is trying to connect via private IP. But the user might be using the correct public IP. Another common issue: the Cloud SQL instance requires SSL, but the client is not using SSL.

However, that would give a different error like 'SSL required'. The timeout suggests network connectivity. Given the exhibit, the Cloud SQL instance has only a public IP and authorized networks allow all IPs, so the issue is likely that the VM is trying to connect using the instance's private IP, which doesn't exist.

Alternatively, the VM might be in a different VPC and peering is not set up. But the question says 'different VPC'. Since the instance has a public IP, the VM can connect via public IP regardless of VPC.

The most plausible answer is that the Cloud SQL instance does not have a private IP, and the user is trying to connect to the private IP. However, the exhibit doesn't show the connection string. Another possibility: the user has not enabled public IP access from the VM's network? No, authorized networks allow all.

I think the intended answer is that the Cloud SQL instance does not have a private IP, so the VM, if using private IP, cannot connect. But the question says 'connection fails', so we need to infer. Let me craft options.

Full explanation →

393

Multi-Selecthard

A company runs a microservices-based application on Google Kubernetes Engine (GKE) with a Regional cluster. They want to improve reliability by implementing best practices for pod scheduling and resilience. Which TWO actions should they take? (Choose two.)

Select 2 answers

A.Set terminationGracePeriodSeconds to 0 for faster pod termination during scale-down

B.Enable cluster autoscaler to automatically add nodes when pods are pending

C.Define a PodDisruptionBudget for each deployment to limit the number of concurrent disruptions

D.Set resource requests equal to limits to ensure guaranteed QoS class

E.Configure pod anti-affinity to spread replicas across different zones

AnswersC, E

Correct: PDB ensures minimum availability during voluntary disruptions.

Why this answer

Option C is correct because a PodDisruptionBudget (PDB) limits the number of Pods of a replicated application that can be down simultaneously from voluntary disruptions, such as node maintenance or cluster upgrades. This ensures that a minimum number of replicas remain available, improving application reliability during planned events.

Exam trap

Google Cloud often tests the distinction between voluntary disruptions (handled by PDB) and involuntary disruptions (e.g., node failure), and the trap here is that candidates confuse resource optimization (requests/limits) or scaling (cluster autoscaler) with resilience mechanisms like PDB and anti-affinity.

Full explanation →

394

MCQhard

Refer to the exhibit. The HPA is configured to scale based on CPU, but it has not scaled up despite the CPU usage being above the target. Which is the most likely cause?

A.The cluster has autoscaling enabled, which may conflict with HPA.

B.The node pool oauthScopes lack the monitoring scope required for HPA to read metrics.

C.The HPA target is 80%, but the current CPU is 90% which should trigger scaling; the HPA may be broken.

D.The HPA min replicas is 3, so it cannot scale down, but it should scale up.

AnswerB

Without the monitoring scope, the HPA cannot retrieve CPU metrics from the nodes.

Why this answer

The node pool uses a service account with devstorage.read_only scope, which does not include the required permissions for the HPA to read metrics. The HPA needs the monitoring scope or a service account with monitoring roles to access CPU metrics.

Full explanation →

395

MCQeasy

A company is deploying a web application on Google Kubernetes Engine (GKE) and needs to ensure that the application's service account can only pull images from a specific Container Registry repository. What is the best practice to enforce this?

A.Use Workload Identity and grant the Kubernetes service account's associated Google service account the roles/storage.objectViewer role on the registry bucket.

B.Grant the Compute Engine default service account the roles/storage.objectViewer role on the registry bucket.

C.Set an IAM policy on the pod directly using the 'gke-default' service account.

D.Create an IAM condition on the node pool's service account that limits access to the registry bucket.

AnswerA

Workload Identity binds pod identity to a GSA, and bucket-level IAM restricts access.

Why this answer

Option B is correct because Workload Identity allows you to bind a Kubernetes service account to a Google service account, and you can then grant the GSA only the roles/storage.objectViewer role on the specific registry bucket. Option A is wrong because IAM conditions on the node pool affect the nodes, not the pods. Option C is wrong because granting access at the project level is too broad.

Option D is wrong because there is no direct IAM for the pod.

Full explanation →

396

MCQeasy

A startup wants to encrypt data at rest in Cloud Storage using Customer-Managed Encryption Keys (CMEK). They have already created a Cloud KMS key ring and key. What additional step is required to enable CMEK for a new Cloud Storage bucket?

A.Enable the Cloud KMS API in the project where the bucket will reside.

B.Create a Cloud HSM key instead, as CMEK requires HSM.

C.Add a label to the key ring to associate it with the bucket.

D.Grant the Cloud Storage service account the Cloud KMS CryptoKey Encrypter/Decrypter role on the key.

AnswerD

The service account that Cloud Storage uses must be authorized to use the key.

Why this answer

Option A is correct because the Cloud Storage service account needs permission to encrypt and decrypt with the CMEK key. Option B is wrong because the key ring does not require a label. Option C is wrong because Cloud HSM is not required for CMEK, though it can be used.

Option D is wrong because no API enablement is needed beyond what is already done.

Full explanation →

397

Multi-Selecthard

A company is running a multi-region application on Google Kubernetes Engine with workloads in us-central1 and europe-west1. They want to route traffic to the closest region based on user location. Which three components should they configure? (Choose three.)

Select 3 answers

A.Cloud Armor security policy

B.Cloud DNS with geo-routing policy

C.Network endpoint groups (NEGs) pointing to GKE pods

D.Regional internal load balancer

E.Global external HTTP(S) load balancer

AnswersB, C, E

Routes DNS queries to the closest region's load balancer IP.

Why this answer

Option B is correct because Cloud DNS geo-routing policy directs DNS queries to the closest healthy backend based on the user's geographic location, enabling traffic to be routed to the nearest GKE region (us-central1 or europe-west1). This is essential for minimizing latency and optimizing user experience in a multi-region setup.

Exam trap

The trap here is that candidates often confuse Cloud Armor's security filtering capabilities with traffic routing, or mistakenly think a regional internal load balancer can handle multi-region traffic, when in fact only a global external HTTP(S) load balancer combined with geo-routing DNS and NEGs can achieve proximity-based routing across regions.

Full explanation →

398

Multi-Selectmedium

Which TWO statements are true regarding the benefits of using VPC Network Peering over Cloud VPN for connecting two VPC networks?

Select 2 answers

A.VPC Network Peering provides lower latency because traffic stays within Google's network.

B.VPC Network Peering requires a separate VPN gateway appliance.

C.Cloud VPN incurs egress costs for data transfer, while VPC Network Peering typically does not.

D.VPC Network Peering can only be established within the same organization.

E.Cloud VPN encrypts traffic, which VPC Network Peering does not.

AnswersA, C

Peering uses Google's internal network, avoiding internet hops, thus lower latency.

Why this answer

Option A is correct because VPC Network Peering uses Google's internal infrastructure to route traffic directly between VPC networks, avoiding the public internet and reducing the number of network hops. This results in lower latency compared to Cloud VPN, which typically encrypts and tunnels traffic over the public internet, introducing additional overhead and potential variability in latency.

Exam trap

The trap here is that candidates may confuse the lack of encryption in VPC Network Peering as a disadvantage, but the question asks for benefits, so encryption (Option E) is not a benefit of peering; instead, the lower latency and reduced egress costs are the key advantages.

Full explanation →

399

Multi-Selectmedium

Which THREE are valid methods to connect an on-premises network to a Google Cloud VPC?

Select 3 answers

A.Dedicated Interconnect

B.Cloud VPN

C.Cloud Router

D.VPC peering

E.Partner Interconnect

AnswersA, B, E

Dedicated Interconnect provides direct physical connection.

Why this answer

Dedicated Interconnect (A) provides a direct physical connection between your on-premises network and Google Cloud, offering high bandwidth and a Service Level Agreement (SLA) of up to 99.99% availability. It uses a cross-connect in a colocation facility to attach your on-premises router to a Google Cloud router, enabling private, low-latency connectivity to your VPC without traversing the public internet.

Exam trap

The trap here is that candidates confuse Cloud Router as a standalone connectivity method, when it is actually a routing component that must be paired with a VPN tunnel or Interconnect to function.

Full explanation →

400

MCQhard

A multinational corporation operates in multiple regions and must comply with GDPR. They use Cloud Load Balancing to distribute traffic across regional backends. Their security team wants to block traffic from specific countries (e.g., non-EU countries) at the edge. What should they use?

A.Configure Cloud CDN to serve content only to EU-based users.

B.Use Cloud Armor security policies with geographic-based denylist rules.

C.Set VPC firewall rules to allow traffic only from EU IP ranges.

D.Configure Identity-Aware Proxy (IAP) to require user authentication from allowed countries.

AnswerB

Cloud Armor can block traffic from specific countries at the Google Cloud edge.

Why this answer

Cloud Armor security policies support geographic-based access control using denylist or allowlist rules that match client IP addresses against country-level geolocation data. This allows the security team to block traffic from specific countries at the edge, before it reaches the backend, which is the most efficient and compliant approach for GDPR enforcement.

Exam trap

The trap here is that candidates often confuse VPC firewall rules (which filter by IP ranges) with Cloud Armor's geolocation-based policies, or they assume Cloud CDN or IAP can enforce geographic access control, when in fact only Cloud Armor provides native country-level blocking at the edge.

How to eliminate wrong answers

Option A is wrong because Cloud CDN caches content but does not enforce geographic access control; it can serve cached content to any user regardless of location, and its 'geo restrictions' are only for signed URLs, not for blocking traffic at the edge. Option C is wrong because VPC firewall rules operate at the network layer and cannot reliably block traffic based on country-level geolocation; they only filter by IP ranges, which are not accurate for country-level blocking due to IP reassignment and lack of granularity. Option D is wrong because Identity-Aware Proxy (IAP) controls access based on user identity and context, not on the geographic origin of the IP address; it cannot block traffic at the edge based solely on country.

Full explanation →

401

MCQmedium

A company runs a web application on Compute Engine with an HTTP Load Balancer. Users report intermittent 502 Bad Gateway errors. What is the most likely cause?

A.Load balancer quota exceeded.

B.Firewall rules block health checks.

C.SSL certificate expired.

D.Backend instances are unhealthy or overloaded.

AnswerD

502 Bad Gateway typically means the backend is not responding properly.

Why this answer

The 502 Bad Gateway error from an HTTP Load Balancer typically indicates that the backend instances are failing to respond to the load balancer's health checks or are overwhelmed, causing the load balancer to consider them unhealthy and return a 502 error. This is the most common cause because the load balancer relies on healthy backends to forward traffic, and overloaded or failing instances cannot handle requests.

Exam trap

The trap here is that candidates often confuse 502 errors with SSL or quota issues, but the PCA exam specifically tests that 502 errors from an HTTP Load Balancer are almost always due to backend unavailability or overload, not frontend configuration problems.

How to eliminate wrong answers

Option A is wrong because exceeding a load balancer quota would result in a 429 Too Many Requests or a 503 Service Unavailable error, not a 502 Bad Gateway. Option B is wrong because firewall rules blocking health checks would cause the load balancer to mark backends as unhealthy, but the error would typically be a 502 only if the health check fails and no healthy backends remain; however, the question asks for the most likely cause, and overloaded backends are more common than misconfigured firewalls in intermittent 502 scenarios. Option C is wrong because an expired SSL certificate on the load balancer would cause SSL handshake failures and a 502 error only if the certificate is used for backend-to-load-balancer communication, but the load balancer terminates SSL and uses its own certificate; an expired certificate on the backend would not cause a 502 from the load balancer's perspective.

Full explanation →

402

MCQeasy

Your company has migrated its legacy web application from a single Compute Engine instance to a managed instance group (MIG) behind an HTTP(S) load balancer. The application was updated to a new version as part of the migration. After the migration, users report intermittent 502 Bad Gateway errors. The application logs show no errors, and the load balancer backend health checks are reported as healthy. On investigation, the developers discover that the new version requires a specific environment variable for authentication to a downstream service. This variable was set manually on the original instance but is missing from the MIG's instance template. The health check endpoint does not depend on this variable and always returns a 200 status even when the variable is absent. As a result, instances created from the template are considered healthy by the load balancer, but when they receive requests that require authentication, they fail and return a 502 error to the client. What is the most likely cause of the 502 errors?

A.The missing environment variable causes authentication failures on new instances.

B.The health check is configured to check the old application path, which no longer exists.

C.The load balancer's backend timeout is too short for the application's response time.

D.The MIG is not scaling out fast enough to handle peak traffic.

AnswerA

The environment variable is essential for authentication; its absence causes requests to fail with 502 errors. Health checks pass because they do not exercise that code path.

Why this answer

The 502 errors occur because the new application version requires a specific environment variable for authentication to a downstream service. The health check endpoint does not depend on this variable, so instances are marked healthy even though they cannot authenticate real requests. When the load balancer routes traffic to these instances, the missing variable causes authentication failures, leading to 502 Bad Gateway errors.

Exam trap

The trap here is that candidates assume healthy health checks guarantee the application is fully functional, but Cisco tests the nuance that health checks may not cover all dependencies, leading to 'false healthy' instances that fail on real requests.

How to eliminate wrong answers

Option B is wrong because the health check is reported as healthy, indicating it is hitting a valid endpoint (the old path would cause health check failures, not intermittent 502s). Option C is wrong because backend timeout issues would typically cause 504 Gateway Timeout errors, not 502 Bad Gateway errors, and the application logs show no errors. Option D is wrong because scaling issues would cause 503 Service Unavailable errors or increased latency, not 502 errors, and the MIG is not reported as overloaded.

Full explanation →

403

MCQeasy

Your company runs a global e-commerce platform on Google Cloud. The application is deployed across multiple regions for low latency. You use Cloud SQL for transactional data and Cloud Spanner for global consistency of inventory. Recently, the operations team reported that the application is experiencing increased latency during peak hours, and the monthly cloud bill has risen significantly. Upon investigation, you find that the Cloud SQL instance is underutilized (CPU < 20%) while Cloud Spanner split utilization is over 80%. The application instances are fronted by a global external HTTPS load balancer. Network egress costs are high. Which course of action would best address both the latency and cost issues?

A.Reduce the Cloud SQL instance tier to a lower machine type to save costs, and add read replicas in other regions for failover.

B.Add more nodes to the Cloud SQL instance and enable automatic storage increase to handle peak loads.

C.Increase the number of splits in Cloud Spanner to reduce hot spots, and configure Cloud CDN in front of the load balancer to cache static content.

D.Move the transactional database to Cloud Spanner and decommission Cloud SQL to reduce complexity.

AnswerC

Increasing splits improves Spanner performance; Cloud CDN reduces egress costs and latency for static content.

Why this answer

The symptoms suggest that the Cloud SQL instance is underutilized, but Cloud Spanner is near capacity, causing potential contention. The high egress costs could be due to cross-region traffic. Option C is the best because scaling Cloud Spanner split utilization (by adding splits or nodes) will improve throughput and reduce latency, and using Cloud CDN reduces egress costs by caching content at edge locations.

Option A might increase costs without addressing Spanner bottleneck. Option B adds more Cloud SQL instances, which are already underutilized. Option D focuses on cloud SQL only, not Spanner.

Full explanation →

404

MCQhard

Your company runs a multi-region web application on Google Kubernetes Engine (GKE) with pods that process sensitive user data. The application uses Cloud SQL for PostgreSQL as the backend database. Your security team has implemented the following controls: 1) All traffic to the database is encrypted using SSL/TLS. 2) The GKE cluster uses Workload Identity to bind Kubernetes service accounts to IAM service accounts. 3) The Cloud SQL instance is configured with a public IP address and authorized networks to allow only the GKE cluster's node IP ranges. 4) The database credentials are stored in Secret Manager and mounted as volumes in the pods. Recently, a security audit revealed that a pod was compromised due to a container vulnerability. The attacker was able to exfiltrate sensitive data directly from the Cloud SQL database using the credentials from Secret Manager. The security team wants to prevent such exfiltration in the future while minimizing changes to the application code. Which course of action should you recommend?

A.Deploy Cloud SQL Auth Proxy as a sidecar container in each pod, and configure IAM database authentication to replace static credentials.

B.Migrate the database to Cloud Spanner, which has built-in IAM integration and automatic encryption.

C.Rotate the database password and store the new password in Secret Manager, then update the application to fetch the password from Secret Manager at startup.

D.Change the Cloud SQL instance to use a private IP address and disable public access, ensuring only the GKE cluster can reach it via VPC peering.

AnswerA

Cloud SQL Auth Proxy with IAM authentication removes static credentials and uses IAM roles to control access, preventing credential exfiltration.

Why this answer

Option A is correct because deploying Cloud SQL Auth Proxy as a sidecar container enforces IAM-based authentication, eliminating static credentials that can be exfiltrated. The proxy also handles SSL/TLS encryption automatically and allows fine-grained access control via IAM permissions, so even if a pod is compromised, the attacker cannot reuse stolen credentials because they are tied to the pod's identity via Workload Identity. This approach requires minimal code changes since the application connects to localhost instead of the Cloud SQL public IP.

Exam trap

Google Cloud often tests the misconception that network-level controls (like private IPs) are sufficient to prevent data exfiltration from a compromised pod, but the real vulnerability is the use of static credentials that can be stolen and reused regardless of network isolation.

How to eliminate wrong answers

Option B is wrong because migrating to Cloud Spanner is a significant architectural change that requires rewriting application code and data modeling, which violates the requirement to minimize changes to the application code. Option C is wrong because rotating the password and storing it in Secret Manager does not prevent exfiltration; if a pod is compromised, the attacker can still read the new password from the mounted volume and reuse it to access the database directly. Option D is wrong because using a private IP and disabling public access only restricts network-level access; it does not prevent an attacker who has compromised a pod within the cluster from using the stored credentials to connect to the database over the private network.

Full explanation →

405

Matchingmedium

Match each GCP security service to its function.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Manage encryption keys

Hardware security module for key protection

Store API keys, passwords, certificates

Manage access control

Centralized security and risk management

Why these pairings

These are security services in GCP.

Full explanation →

406

MCQhard

A company has a global web application deployed across multiple regions. They use an external HTTPS Load Balancer with backend services in us-central1 and europe-west1. They want users to be routed to the closest healthy backend. Which load balancing configuration is required?

A.Internal HTTP(S) Load Balancer

B.External HTTPS Load Balancer with global backend

C.External TCP/UDP Network Load Balancer

D.Classic Application Load Balancer

E.Regional external HTTPS Load Balancer

AnswerB

Correct. Global external HTTPS Load Balancer supports proximity-based routing.

Why this answer

Option B is correct because an External HTTPS Load Balancer with a global backend configuration uses Google Cloud's global anycast IP and the Premium Tier network to route users to the closest healthy backend based on latency and proximity. This setup ensures that traffic from users worldwide is directed to the nearest region (us-central1 or europe-west1) with a healthy instance group, providing optimal performance and failover.

Exam trap

The trap here is that candidates often confuse 'global' with 'regional' load balancers, mistakenly thinking a regional external HTTPS load balancer can serve multiple regions, but only the global external HTTPS load balancer supports cross-region backend services with anycast routing.

How to eliminate wrong answers

Option A is wrong because an Internal HTTP(S) Load Balancer is used for traffic within a VPC network, not for external user traffic from the internet. Option C is wrong because an External TCP/UDP Network Load Balancer operates at Layer 4 and does not support HTTPS termination, content-based routing, or global backend selection across regions. Option D is wrong because Classic Application Load Balancer is a legacy GCP resource that does not support global backends or cross-region routing; it is regionally scoped.

Option E is wrong because a Regional external HTTPS Load Balancer is confined to a single region and cannot route traffic to backends in multiple regions like us-central1 and europe-west1.

Full explanation →

407

MCQmedium

A company is migrating hundreds of on-premises VMs to Compute Engine. They want to minimize manual effort and downtime. Which service should they use?

A.Cloud Build

B.gcloud compute instances import

C.Transfer Appliance

D.Migrate for Compute Engine

E.CloudEndure

AnswerD

Correct. It supports bulk migration with minimal downtime.

Why this answer

Migrate for Compute Engine (formerly Velostrata) is the correct choice because it is a fully managed service specifically designed for migrating large-scale VM workloads to Compute Engine with minimal downtime. It uses a streaming migration approach that moves the OS and application data while the source VM continues running, then performs a cutover with near-zero downtime, making it ideal for hundreds of VMs without manual effort.

Exam trap

The trap here is that candidates may confuse CloudEndure (a popular third-party migration tool) with a native Google Cloud service, or assume gcloud compute instances import is sufficient for large-scale live migrations, but the exam emphasizes using the dedicated, fully managed migration service for minimal downtime and automation.

How to eliminate wrong answers

Option A (Cloud Build) is wrong because it is a CI/CD service for building, testing, and deploying software artifacts, not for migrating on-premises VMs to Compute Engine. Option B (gcloud compute instances import) is wrong because it is a command-line tool for importing single VM images or disks, not designed for orchestrating hundreds of live VM migrations with minimal downtime. Option C (Transfer Appliance) is wrong because it is a physical hardware device for offline bulk data transfer to Google Cloud, not suitable for live VM migration with minimal downtime.

Option E (CloudEndure) is wrong because it is an AWS service (now part of AWS Application Migration Service), not a Google Cloud service; while it can migrate to GCP, it is not a native Google Cloud offering and the question asks for a service they should use, implying a Google-managed solution.

Full explanation →

408

MCQeasy

A company needs to deploy a stateless web application that can handle variable traffic. Which compute option is the most cost-effective and scales automatically?

A.App Engine standard environment with automatic scaling.

B.Google Kubernetes Engine (GKE) with cluster autoscaling.

C.Compute Engine with managed instance groups and autoscaling.

D.Compute Engine with preemptible VMs.

E.Cloud Run with CPU always allocated.

AnswerA

App Engine standard is serverless, cost-effective, and auto-scales.

Why this answer

App Engine standard environment with automatic scaling is the most cost-effective and automatically scales to zero when there is no traffic, making it ideal for variable traffic stateless web applications. It abstracts infrastructure management, charges only for resources used, and handles scaling instantly without provisioning overhead.

Exam trap

Google Cloud often tests the misconception that managed instance groups or GKE are always the best for autoscaling, but the trap here is that for a stateless web app with variable traffic, serverless options like App Engine standard are more cost-effective because they scale to zero and require no infrastructure management.

How to eliminate wrong answers

Option B is wrong because GKE with cluster autoscaling requires managing a Kubernetes cluster, which adds operational overhead and cost for a simple stateless web app, and it does not scale to zero. Option C is wrong because Compute Engine with managed instance groups and autoscaling still requires managing VMs and has a minimum instance count, leading to higher costs and slower scaling compared to serverless options. Option D is wrong because preemptible VMs can be terminated at any time, making them unsuitable for a production web application that needs reliability and consistent availability.

Option E is wrong because Cloud Run with CPU always allocated incurs costs even when the application is idle, whereas the default CPU-throttled mode is more cost-effective for variable traffic.

Full explanation →

409

MCQmedium

A team manages a GKE cluster with node pools using different machine types. They plan to upgrade the cluster to a new Kubernetes version. What is the safest upgrade strategy to minimize application downtime?

A.Perform a rolling upgrade by draining all nodes simultaneously.

B.Create a new cluster with the desired version and migrate workloads.

C.Use a surge upgrade to add new nodes before removing old ones.

D.Upgrade the node pool configuration one by one.

AnswerC

Surge upgrade maintains capacity during the upgrade, minimizing disruption.

Why this answer

Option C is correct because a surge upgrade in GKE adds new nodes with the desired Kubernetes version before removing old nodes, ensuring capacity is maintained throughout the process. This minimizes application downtime by allowing pods to be rescheduled onto new nodes before old nodes are drained, following a controlled rolling update pattern that respects PodDisruptionBudgets.

Exam trap

Google Cloud often tests the misconception that draining all nodes simultaneously is a valid rolling upgrade strategy, when in fact it causes complete downtime and violates Kubernetes best practices for workload availability.

How to eliminate wrong answers

Option A is wrong because draining all nodes simultaneously would remove all running pods at once, causing complete application downtime and violating PodDisruptionBudgets if configured. Option B is wrong because creating a new cluster and migrating workloads requires manual or tool-based migration, which introduces significant operational overhead and potential downtime during the cutover, and is not the safest or most efficient strategy for an existing cluster. Option D is wrong because upgrading node pool configuration one by one does not specify a surge or rolling mechanism; without surge, it would drain nodes in the pool sequentially, potentially causing capacity shortages and downtime if the pool is under-provisioned.

Full explanation →

410

MCQhard

A company has a multi-region deployment of App Engine and wants to optimize request routing for latency and cost. Which GCP service should they use?

A.Cloud Endpoints.

B.Cloud Load Balancing with global anycast.

C.Cloud DNS with latency-based routing.

D.Cloud Traffic Director.

AnswerB

Global load balancing directs users to the closest healthy backend, minimizing latency and balancing cost.

Why this answer

Cloud Load Balancing with global anycast uses Google's global network and anycast IP addresses to route user traffic to the nearest healthy backend, minimizing latency. It also supports premium tier routing for lower latency and standard tier for lower cost, directly addressing the optimization goals for a multi-region App Engine deployment.

Exam trap

The trap here is that candidates often confuse Cloud DNS latency-based routing (a DNS-level, cache-prone approach) with true anycast-based global load balancing, which provides immediate, health-aware routing without DNS caching delays.

How to eliminate wrong answers

Option A is wrong because Cloud Endpoints is an API management service for securing, monitoring, and managing APIs, not a global load balancer for routing traffic across regions based on latency and cost. Option C is wrong because Cloud DNS with latency-based routing is a DNS-level feature that can direct traffic based on latency, but it lacks the fine-grained health checking, anycast IP, and traffic splitting capabilities of a global load balancer, and DNS caching can cause routing delays. Option D is wrong because Cloud Traffic Director is a traffic management service for service mesh (e.g., with Istio on GKE), not designed for global HTTP(S) load balancing to App Engine; it operates at the service mesh layer, not the edge.

Full explanation →

411

MCQmedium

Refer to the exhibit. A user reports that the instance 'batch-vm' is unavailable. Based on the output, what is the most likely cause of the unavailability?

A.The VM was stopped manually by a user.

B.The preemptible VM was terminated by Google due to its preemptible nature.

C.The VM lost its external IP address.

D.The VM crashed due to an out-of-memory error.

AnswerB

Preemptible instances can be terminated at any time, and the status is TERMINATED.

Why this answer

The exhibit shows the instance 'batch-vm' with a status of 'TERMINATED' and the 'preemptible' flag set to 'true'. Preemptible VMs in Google Cloud have a maximum runtime of 24 hours and can be terminated at any time by Google Compute Engine due to resource constraints. The termination reason is typically 'preemption', which matches the scenario of a user reporting unavailability without manual intervention.

Exam trap

Google Cloud often tests the distinction between 'STOPPED' (user-initiated, billable for attached resources) and 'TERMINATED' (preempted or deleted, no longer billable), and candidates confuse preemption with a manual stop or a crash.

How to eliminate wrong answers

Option A is wrong because a manual stop would show the VM status as 'STOPPED' (not 'TERMINATED') and would not be caused by Google's infrastructure; the exhibit does not indicate any user-initiated stop action. Option C is wrong because losing an external IP address does not terminate a VM; the VM would still be running (status 'RUNNING') but inaccessible via that IP, and the exhibit shows the VM as 'TERMINATED'. Option D is wrong because an out-of-memory error would cause the VM to become unresponsive or crash, but the VM would remain in a 'RUNNING' or 'STOPPING' state, not transition to 'TERMINATED'; termination is a distinct lifecycle state typically triggered by preemption, deletion, or explicit stop.

Full explanation →

412

MCQmedium

A company is designing a VPC Service Controls perimeter to protect data stored in Google Cloud. They need to allow access from their on-premises network via a Cloud VPN tunnel while blocking all internet-based access. What is the most secure and manageable approach?

A.Configure firewall rules to only allow traffic from the on-premises CIDR to the VPC.

B.Use Cloud VPN and Private Google Access to allow on-premises access without public IPs.

C.Configure a VPC Service Controls perimeter and create an access level that includes the on-premises CIDR range.

D.Use Cloud IAP (Identity-Aware Proxy) to restrict access based on identity and context.

AnswerC

VPC Service Controls with an access level effectively restricts API access to the allowed CIDR, preventing data exfiltration via the internet.

Why this answer

Option C is correct because VPC Service Controls can create a service perimeter that includes the on-premises CIDR via an access level, ensuring data is not exfiltrated to the internet. Option A is wrong because firewall rules do not prevent data exfiltration via API calls. Option B is wrong because IAP is for user identity, not network-level control.

Option D is wrong because Private Google Access does not restrict API access from the internet.

Full explanation →

413

MCQhard

A company runs a stateful application on Compute Engine with persistent disks. They want to ensure data durability across a zone failure. What is the best approach?

A.Replicate data at application level to another instance in a different zone

B.Use Google Cloud NetApp Volumes with replication

C.Use regional persistent disks

D.Take regular snapshots of the persistent disks and store them in a multiregional bucket

AnswerC

Regional PDs replicate data across zones with synchronous writes, ensuring durability.

Why this answer

Regional persistent disks (RPDs) synchronously replicate data between two zones in the same region, providing an RPO of zero and automatic failover without application-level changes. This ensures data durability across a zone failure while maintaining consistent performance and low latency.

Exam trap

Google Cloud often tests the distinction between synchronous replication (regional persistent disks) and asynchronous backup (snapshots), leading candidates to choose snapshots for durability when they actually need zero RPO across a zone failure.

How to eliminate wrong answers

Option A is wrong because replicating data at the application level adds complexity, latency, and requires custom code, whereas Compute Engine offers a managed, synchronous replication solution. Option B is wrong because Google Cloud NetApp Volumes is a third-party service that is not natively integrated with Compute Engine for this use case and introduces additional cost and management overhead. Option D is wrong because regular snapshots stored in a multiregional bucket provide point-in-time recovery but have an RPO of minutes to hours and do not offer synchronous replication, so data written between snapshots is lost during a zone failure.

Full explanation →

414

MCQhard

A company uses Cloud Bigtable for time-series data. They experience high latency and uneven load distribution across nodes. What is the most likely cause?

A.The data is stored in a single column family

B.The app is using strong reads instead of eventual consistency

C.The table has a single row key pattern that causes hot spotting

D.The cluster has too many nodes

AnswerC

Sequential row keys lead to hot spots.

Why this answer

Cloud Bigtable partitions data by row key range and distributes tablets across nodes. A single row key pattern (e.g., monotonically increasing timestamps) causes all writes to target the same tablet, creating a hot spot. This leads to uneven load distribution and high latency because one node is overwhelmed while others remain idle.

Exam trap

Google Cloud often tests the misconception that column families or read consistency levels are the root cause of performance issues, when in fact row key design is the primary driver of load distribution in Bigtable.

How to eliminate wrong answers

Option A is wrong because storing data in a single column family does not cause uneven load distribution; column families affect storage and read performance but not row key distribution. Option B is wrong because strong reads (read-after-write consistency) add latency but do not cause uneven load distribution across nodes; the issue is about write hot spotting, not read consistency. Option D is wrong because having too many nodes would reduce load per node, not increase latency or cause uneven distribution; the cluster would be over-provisioned, not hot-spotted.

Full explanation →

415

MCQhard

A company runs multiple microservices on Cloud Run. Each service uses a Serverless VPC Access connector to connect to a shared Cloud Memorystore for Redis instance (standard tier) in a VPC network. The Redis instance is configured with a firewall rule that allows TCP connections on port 6379 from the VPC connector's subnet (10.8.0.0/28). After a recent code update, the order-service fails to connect to Redis, while the user-service continues to work. The error logs in order-service show 'connection refused'. The engineer verifies that both services use the same VPC connector, the same Redis instance IP, and the same service account. The VPC connector's metrics show no errors. What is the most likely cause?

A.The order-service is deployed in a different region than the Redis instance.

B.The order-service code now attempts to connect to Redis on port 6380.

C.The VPC connector is out of memory.

D.The Redis instance has reached its maximum number of connections.

AnswerB

A port mismatch would cause connection refused only for the affected service, while the firewall rule only permits port 6379.

Why this answer

The order-service successfully connects to the same Redis instance before the code update. After the update, it fails with 'connection refused', while the user-service still works. Since both services share the same networking configuration and the firewall only allows port 6379, the most likely cause is that the order-service code now attempts to connect on a different port (e.g., 6380) that is not allowed by the firewall.

Other options would affect both services or are inconsistent with the symptoms.

Full explanation →

416

MCQmedium

A data engineer needs to analyze data in BigQuery but must mask personally identifiable information (PII) based on user roles. Which service should they use?

A.BigQuery column-level security

B.Cloud Key Management Service

C.Cloud Data Catalog

D.Cloud Data Loss Prevention (DLP)

AnswerA

BigQuery column-level security with data masking can restrict PII based on roles.

Why this answer

BigQuery column-level security with data masking allows conditional masking. Option A provides classification but not role-based masking. Option B is for tokenization but not integrated with BigQuery.

Option D is for data catalog.

Full explanation →

417

Multi-Selectmedium

A company runs a web application on Compute Engine behind an HTTP load balancer. They want to improve reliability by implementing failover across two regions. Which TWO actions should they take?

Select 2 answers

A.Deploy a global external HTTP load balancer with backends in both regions.

B.Configure a backend service with a failover policy pointing to primary and secondary backends.

C.Configure DNS-based failover using Cloud DNS with health checks.

D.Use an internal load balancer to route traffic between regions.

E.Use a regional external HTTP load balancer with a multi-region backend.

AnswersA, B

Global load balancer automatically routes to healthy backends, providing cross-region failover.

Why this answer

A global external HTTP load balancer is required for cross-region failover because it uses a single anycast IP address and routes traffic to the closest healthy backend. By deploying backends in both regions, the load balancer automatically fails over to the secondary region if the primary region's backends become unhealthy, improving reliability without DNS propagation delays.

Exam trap

The trap here is that candidates confuse DNS-based failover (which is slow and not recommended for HTTP load balancing) with the instant, anycast-based failover of a global load balancer, or mistakenly think a regional load balancer can span multiple regions.

Full explanation →

418

Multi-Selectmedium

Your organization is implementing a Disaster Recovery plan for a critical database. Which THREE components are essential for a robust DR strategy? (Choose 3)

Select 3 answers

A.A single global load balancer for both regions.

B.Automated failover process to switch traffic to the DR region.

C.Data replication strategy (synchronous or asynchronous) to a secondary region.

D.Regular DR drills (testing failover at least once per quarter).

E.Using a single zone for the primary region.

AnswersB, C, D

Automation minimizes manual errors and reduces RTO.

Why this answer

Option B is correct because an automated failover process is essential for minimizing Recovery Time Objective (RTO) in a Disaster Recovery strategy. Without automation, manual intervention introduces delays and risks of human error, which can extend downtime significantly. In cloud or on-premises environments, automated failover typically relies on health checks, DNS updates, or traffic manager rules to seamlessly redirect traffic to the DR region when the primary fails.

Exam trap

Google Cloud often tests the misconception that a single global load balancer provides high availability, when in fact it becomes a single point of failure unless it is itself deployed in a redundant, multi-region architecture.

Full explanation →

419

MCQhard

An organization is migrating a legacy monolithic application to Google Cloud. The application currently runs on a single server with an on-premises database. The application is stateful and requires low-latency access to the database. The migration must minimize downtime and ensure high availability. Which architecture should the company adopt?

A.Deploy on GKE with StatefulSets and use Cloud Spanner for global consistency.

B.Deploy on Compute Engine with a regional persistent disk and use Cloud SQL for PostgreSQL with regional high availability.

C.Deploy on App Engine Standard Environment and use Cloud Firestore in Datastore mode.

D.Deploy on Cloud Run and use Cloud SQL with read replicas.

AnswerB

This provides HA and low-latency access needed for the stateful monolithic app.

Why this answer

Option B is correct because it combines Compute Engine with a regional persistent disk for synchronous replication across zones, ensuring high availability with minimal downtime during a zonal failure. Cloud SQL for PostgreSQL with regional high availability provides a managed, low-latency database with automatic failover, meeting the stateful application's need for low-latency access and high availability without the complexity of container orchestration.

Exam trap

The trap here is that candidates often overcomplicate the solution by choosing containerized or serverless options (GKE, Cloud Run, App Engine) without recognizing that a legacy monolithic stateful application with low-latency requirements is best served by a simple, proven VM-based architecture with regional persistent disks and a managed relational database with synchronous replication.

How to eliminate wrong answers

Option A is wrong because GKE with StatefulSets introduces orchestration overhead and potential downtime during cluster upgrades or node failures, and Cloud Spanner, while globally consistent, adds latency and cost overkill for a single-region low-latency requirement. Option C is wrong because App Engine Standard Environment is stateless by design and does not support stateful applications with persistent local storage, and Cloud Firestore in Datastore mode is a NoSQL database that does not provide the relational consistency and low-latency access expected from a legacy monolithic database. Option D is wrong because Cloud Run is stateless and ephemeral, requiring external storage for state, and Cloud SQL with read replicas does not provide synchronous replication for high availability; read replicas are asynchronous and cannot guarantee zero data loss during a failover.

Full explanation →

420

MCQmedium

A company is using Cloud Spanner to serve a global gaming application. They have a single instance in us-central1. Players in Asia experience high latency. The application reads and writes player profiles. The team wants to reduce latency for Asian players while keeping write latency low for global consistency. They need a solution that minimizes operational overhead and uses native Spanner capabilities. What should they do?

A.Configure a multi-region instance configuration that includes us-central1 and an Asian region.

B.Add read replicas in Asia using Spanner's read-only replicas.

C.Use Cloud CDN to cache player profiles at the edge.

D.Create a new instance in asia-east1 and use Directed Read options to route reads from Asia.

AnswerA

Multi-region configuration provides read-write replicas in Asia, reducing both read and write latency.

Why this answer

A multi-region instance configuration in Cloud Spanner is the correct solution because it provides a single writable instance that spans multiple geographic regions, allowing reads and writes to be served locally in each region while maintaining strong global consistency. This minimizes latency for Asian players by enabling local reads and writes, and it uses native Spanner capabilities without additional operational overhead. Option A directly addresses the requirement for low write latency and global consistency by leveraging Spanner's built-in multi-region replication.

Exam trap

The trap here is that candidates may confuse Spanner's multi-region configuration with read replicas or separate instances, not realizing that Spanner's native multi-region setup provides both local reads and writes with strong consistency, unlike other databases that require separate read replicas or caching layers.

How to eliminate wrong answers

Option B is wrong because Spanner does not support read-only replicas; it uses a single writable instance with synchronous replication across regions, and adding read-only replicas is not a native Spanner capability. Option C is wrong because Cloud CDN caches static content at the edge, but player profiles are dynamic, frequently updated data that requires strong consistency, which CDN cannot provide. Option D is wrong because creating a separate instance in Asia would require cross-instance replication and would not maintain global consistency; Directed Read options are for read-only replicas in Bigtable, not Spanner.

Full explanation →

421

MCQmedium

A company has Compute Engine instances in us-east1-a and us-east1-b zones. They want to allow communication between these instances with minimal latency and no additional cost. What is the best networking approach?

A.Configure VPC Network Peering between two separate VPC networks.

B.Use a single VPC network that includes both zones.

C.Create a new subnet in each zone and use Cloud NAT.

D.Set up a Cloud VPN between the zones.

AnswerB

Instances in the same VPC network can communicate using internal IPs with low latency.

Why this answer

A single VPC network spans all regions and zones, allowing instances in different zones (us-east1-a and us-east1-b) to communicate using internal IP addresses with low latency and no additional cost. This is because VPC networks provide flat, global networking by default, and traffic between zones within the same VPC uses Google's internal backbone without incurring egress charges.

Exam trap

The trap here is that candidates may overcomplicate the solution by thinking they need separate networks or VPNs for zone-to-zone communication, when in fact a single VPC inherently supports flat, cost-free internal connectivity across zones.

How to eliminate wrong answers

Option A is wrong because VPC Network Peering is used to connect separate VPC networks, which adds complexity and is unnecessary when instances are in the same VPC; it also does not reduce latency or cost compared to a single VPC. Option C is wrong because Cloud NAT is designed for outbound internet access from private instances, not for inter-zone communication, and it would introduce additional latency and cost. Option D is wrong because Cloud VPN is a site-to-site VPN solution for connecting on-premises networks or different VPCs across regions, not for intra-VPC zone-to-zone communication, and it adds latency and cost.

Full explanation →

422

MCQmedium

A company runs a critical application on Compute Engine instances in a managed instance group (MIG) with autoscaling. During a traffic spike, some instances become unhealthy but are not automatically replaced. What is the most likely cause?

A.The MIG is regional and one zone failed.

B.The autohealing health check is misconfigured.

C.The instance template has a startup script error.

D.The HTTP load balancer's health check is failing.

AnswerB

MIG autohealing relies on a health check to detect unhealthy instances and replace them; a misconfiguration prevents detection.

Why this answer

The most likely cause is that the autohealing health check is misconfigured. In a managed instance group, autohealing relies on a health check to detect unhealthy instances and trigger replacement. If the health check is misconfigured (e.g., wrong port, path, or protocol), the MIG will not recognize instances as unhealthy and will not automatically replace them, even during a traffic spike.

Exam trap

Google Cloud often tests the distinction between the MIG's autohealing health check and the load balancer's health check, leading candidates to incorrectly attribute instance replacement failures to load balancer issues rather than the MIG's own health check configuration.

How to eliminate wrong answers

Option A is wrong because a regional MIG with a single zone failure would still trigger autohealing in the remaining healthy zones, and the MIG would replace instances in the failed zone if the health check is correctly configured. Option C is wrong because a startup script error would cause instances to fail at boot, but the MIG would still attempt to replace them based on the health check; the issue is not about the template but the detection mechanism. Option D is wrong because the HTTP load balancer's health check is separate from the MIG's autohealing health check; a failing load balancer health check does not prevent the MIG from replacing unhealthy instances if its own health check is properly configured.

Full explanation →

423

Multi-Selecteasy

A startup deploys a microservices application on GKE. They need to ensure high availability of the services. Which two strategies should they implement? (Choose TWO.)

Select 2 answers

A.Use horizontal pod autoscaling

B.Use regional persistent disks for stateful components

C.Use node auto-repair

D.Deploy the application across multiple zones in a region

E.Use cluster autoscaler

AnswersB, D

Regional PDs replicate data synchronously across zones.

Why this answer

Option B is correct because regional persistent disks provide synchronous replication across two zones within a region, ensuring that stateful workloads (e.g., databases) remain available even if an entire zone fails. This is critical for high availability of stateful components in a GKE cluster, as it prevents data loss and allows pods to be rescheduled in another zone with the same persistent volume.

Exam trap

The trap here is that candidates often confuse auto-scaling mechanisms (HPA, cluster autoscaler) with high availability, failing to recognize that true HA requires redundancy across failure domains (zones) and persistent storage that survives zone outages.

Full explanation →

424

Multi-Selecteasy

A company uses Cloud Build to automate their CI/CD pipeline. They want to optimize the build process for a Java application. Which three practices should they adopt? (Choose three.)

Select 3 answers

A.Parallelize independent build steps by using Cloud Build's step parallelism or by splitting into multiple builds.

B.Store Maven dependencies in a private repository in Artifact Registry for faster access.

C.Use Docker layer caching with Cloud Build by specifying a cached image.

D.Use a custom build step that downloads all tools from the internet each time.

E.Use a high-CPU machine type (e.g., n1-highcpu-64) for faster compilation.

AnswersA, B, C

Reduces overall build time.

Why this answer

Option A is correct because Cloud Build allows you to define build steps that run sequentially by default, but you can parallelize independent steps by using the `waitFor` field to specify dependencies. This reduces total build time by running non-dependent steps concurrently, which is a key optimization for CI/CD pipelines. Splitting into multiple builds is also a valid approach for parallel execution.

Exam trap

Google Cloud often tests the misconception that you can arbitrarily choose high-CPU machine types in Cloud Build, but Cloud Build does not support custom machine types in its standard configuration—this is a trap where candidates confuse Cloud Build with Compute Engine or other GCP services.

Full explanation →

425

MCQmedium

An application uses Cloud Pub/Sub for asynchronous processing. Subscribers occasionally fail to acknowledge messages within the ack deadline, causing redelivery. How to improve reliability and prevent message buildup?

A.Increase the ack deadline to the maximum value

B.Set max delivery attempts to 1 to avoid redelivery

C.Implement exponential backoff in the subscriber retry logic

D.Use a dead-letter topic to capture failed messages

AnswerC

Exponential backoff allows the subscriber to retry after increasing delays, handling transient failures effectively.

Why this answer

Option C is correct because implementing exponential backoff in the subscriber retry logic allows the subscriber to gradually increase the delay between retries when messages are not acknowledged, reducing the likelihood of overwhelming the system and preventing message buildup. This approach aligns with Cloud Pub/Sub's recommended practices for handling transient failures, as it gives the subscriber time to recover without exhausting the ack deadline or causing excessive redelivery.

Exam trap

Google Cloud often tests the misconception that increasing the ack deadline or using a dead-letter topic alone solves reliability issues, but the key is implementing retry logic with backoff to handle transient failures without losing messages or causing buildup.

How to eliminate wrong answers

Option A is wrong because increasing the ack deadline to the maximum value (e.g., 600 seconds) does not address the root cause of subscriber failures; it only delays redelivery, potentially leading to message buildup if the subscriber never recovers. Option B is wrong because setting max delivery attempts to 1 prevents redelivery entirely, which means any message that fails to be acknowledged will be permanently lost, undermining the reliability of asynchronous processing. Option D is wrong because using a dead-letter topic captures failed messages after all delivery attempts are exhausted, but it does not prevent message buildup during the retry process; it is a last-resort mechanism, not a proactive reliability improvement.

Full explanation →

426

Matchingmedium

Match each Google Cloud service to its primary purpose.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Distribute traffic across instances

Cache content at edge locations

Protect against DDoS and web attacks

Enable outbound internet for private instances

Dedicated connection between on-prem and GCP

Why these pairings

These are core networking services in GCP.

Full explanation →

427

Multi-Selectmedium

A data analytics team uses BigQuery to run large queries. They want to reduce query costs. Which three practices should they adopt? (Choose THREE.)

Select 3 answers

A.Use query caching

B.Use clustered tables on commonly filtered columns

C.Partition tables by date

D.Create materialized views for frequent aggregations

E.Always use SELECT * to ensure all columns are available

AnswersB, C, D

Clustering improves query performance and reduces cost by limiting scans.

Why this answer

Option B is correct because clustering tables on commonly filtered columns in BigQuery allows the query engine to prune blocks of data that don't match the filter, reducing the amount of data scanned and thus lowering query costs. This is especially effective when combined with partitioning, as it further narrows the scan to relevant clusters within a partition.

Exam trap

Google Cloud often tests the misconception that query caching is a cost-reduction technique, but candidates must remember that caching only avoids reprocessing identical queries and does not reduce the cost of the initial query or queries with different filters.

Full explanation →

428

Multi-Selecthard

Which THREE options are valid strategies for disaster recovery (DR) in Google Cloud?

Select 3 answers

A.Store hourly snapshots of Compute Engine disks in the same region.

B.Deploy a mirrored environment in another region and use Traffic Director to fail over.

C.Enable Cloud CDN to cache static content from multiple origins.

D.Use a Cloud Storage bucket in a different region with Object Versioning enabled.

E.Configure a cross-region replica for Cloud SQL and promote it during failover.

AnswersB, D, E

Traffic Director can route traffic to the DR environment.

Why this answer

Option B is correct because Traffic Director, based on the xDS API (Envoy), can manage traffic routing across regions. By deploying a mirrored environment in another region and configuring Traffic Director with failover policies, you can redirect traffic to the secondary region if the primary fails, enabling a robust active-passive or active-active DR strategy.

Exam trap

The trap here is confusing high-availability features (like snapshots or CDN) with true disaster recovery, which requires geographic separation and automated failover mechanisms.

Full explanation →

429

MCQeasy

A security team wants to receive alerts when a user attempts to grant the 'roles/owner' role to a member outside of the organization's domain. Which log filter should they use to create a log-based metric?

A.Filter on Admin Activity log type with 'protoPayload.methodName="SetIamPolicy" AND protoPayload.serviceName="cloudresourcemanager.googleapis.com" AND NOT protoPayload.request.policy.bindings: member: "example.com"'.

B.Filter on Data Access log type with 'protoPayload.methodName="google.iam.v1.IAMPolicy.SetIamPolicy"'.

C.Filter on Admin Activity logs for 'resource.type="gce_instance" AND protoPayload.methodName="compute.instances.setServiceAccount"'.

D.Filter on System Event logs with a query for 'resource.type="project" AND protoPayload.response.status.code=7'.

AnswerA

This filter catches IAM policy changes where members are not from the allowed domain.

Why this answer

Option B is correct because Cloud Audit Logs for Admin Activity capture all IAM policy changes. The filter checks for setIamPolicy on the project and that the binding includes a member with a domain outside the allowed list. Option A is wrong because Data Access logs do not include admin activity.

Option C is wrong because it only checks for allAuthenticatedUsers. Option D is wrong because it checks for compute instances, not IAM.

Full explanation →

430

MCQhard

A global e-commerce platform uses Spanner for its transactional database. They observe that some transactions are aborted with 'ABORTED' status due to contention. The application retries immediately, but throughput degrades. What design change should they implement to reduce contention?

A.Redesign the schema to use a separate table for frequently updated rows and batch updates using a single transaction

B.Increase the number of nodes in the Spanner instance

C.Use client-side retry with exponential backoff and jitter

D.Change the transaction isolation level to READ UNCOMMITTED

AnswerA

Isolating hot rows reduces lock conflicts; batching updates into a single transaction reduces lock hold time.

Why this answer

Option A is correct because Spanner contention arises when multiple transactions try to update the same row concurrently, causing aborts. By redesigning the schema to use a separate table for frequently updated rows and batching updates into a single transaction, you reduce the number of overlapping locks on hot rows. This minimizes lock conflicts and aborts, improving throughput without changing Spanner's underlying TrueTime-based concurrency control.

Exam trap

The trap here is that candidates confuse horizontal scaling (adding nodes) with solving lock contention, but Spanner's contention is a concurrency control issue, not a capacity issue, so scaling out does not reduce row-level lock conflicts.

How to eliminate wrong answers

Option B is wrong because increasing the number of nodes in Spanner improves storage and throughput capacity but does not reduce lock contention on specific hot rows; contention is a locking issue, not a capacity issue. Option C is wrong because client-side retry with exponential backoff and jitter is a best practice for handling transient failures, but it does not address the root cause of contention—it only makes retries more polite, not less frequent. Option D is wrong because Spanner does not support READ UNCOMMITTED isolation; it uses Serializable isolation (and Stale Reads for read-only queries), and lowering isolation is not possible and would violate consistency guarantees.

Full explanation →

431

MCQhard

A multinational corporation must comply with GDPR and requires that all customer data stored in BigQuery be encrypted using customer-managed encryption keys (CMEK) and that the keys are stored in a specific region. Which combination of steps should they take?

A.Enable default encryption at rest in BigQuery and use Organization Policies to restrict key location

B.Create a Cloud KMS key ring and crypto key in the desired region, then associate the BigQuery dataset with the CMEK key using DDL

C.Create a Cloud HSM key, then use Cloud DLP to automatically encrypt the data before loading into BigQuery

D.Use Cloud External Key Manager (EKM) to integrate with an on-premises key management system

AnswerB

This is the standard procedure for CMEK in BigQuery.

Why this answer

BigQuery CMEK requires creating a Cloud KMS key in the desired region and associating it with the dataset using DDL. Default encryption uses Google-managed keys; Cloud DLP is for de-identification; EKM is for on-prem key integration.

Full explanation →

432

MCQmedium

An organization is migrating a MySQL database to Cloud SQL. They require automatic failover with zero data loss in the event of a zone outage. Which configuration should they use?

A.Cloud SQL with a cross-region replica.

B.Cloud SQL with automated backups and binary logging.

C.Cloud SQL with a read replica in a different zone.

D.Cloud SQL with high availability (HA) configuration.

AnswerD

HA uses synchronous replication in two zones, providing automatic failover with no data loss.

Why this answer

Option D is correct because Cloud SQL's high availability (HA) configuration uses a synchronous write to a standby instance in a different zone within the same region. This ensures that every transaction committed on the primary is also committed on the standby before acknowledging the client, guaranteeing zero data loss during a zone outage. Automatic failover to the standby occurs with no manual intervention, meeting both the automatic failover and zero data loss requirements.

Exam trap

The trap here is that candidates often confuse a read replica (which uses asynchronous replication and requires manual promotion) with an HA standby (which uses synchronous replication and automatic failover), leading them to incorrectly select Option C.

How to eliminate wrong answers

Option A is wrong because a cross-region replica uses asynchronous replication, which can result in data loss of up to several seconds of transactions during a failover, failing the zero data loss requirement. Option B is wrong because automated backups and binary logging provide point-in-time recovery from a backup, but they do not provide automatic failover; recovery requires manual intervention and can lose transactions committed after the last backup. Option C is wrong because a read replica in a different zone is designed for read scaling, not for automatic failover; promoting a read replica to primary is a manual process and the replica uses asynchronous replication, risking data loss.

Full explanation →

433

MCQhard

A company is designing a VPC architecture for a multi-tenant SaaS platform. Each tenant has isolated workloads that must not communicate with each other. They also need centralized network security and logging. Which VPC design meets these requirements?

A.Dedicated Cloud VPN connections per tenant

B.Use a Shared VPC with separate subnets for each tenant and firewall rules to enforce isolation

C.Single VPC with network tags and IAP tunnels

D.Peered VPCs for each tenant with Cloud NAT

AnswerB

Shared VPC allows centralized control and subnet isolation.

Why this answer

Option A is correct because Shared VPC with separate subnets per tenant and firewall rules for isolation provides centralized management. Option B is wrong because VPC peering requires explicit peering and does not provide isolation easily. Option C is wrong because Cloud VPN is not for multi-tenant isolation.

Option D is wrong because a single VPC without subnets is insecure.

Full explanation →

434

MCQeasy

A company runs a critical application on Compute Engine instances in a managed instance group (MIG) with autoscaling. Users report intermittent 503 errors during traffic spikes. Which action should the company take to improve reliability?

A.Change the load balancer from regional to global

B.Configure a health check with a sufficient initial delay (grace period) in the MIG

C.Increase the autoscaling cool-down period from 60s to 120s

D.Increase the maximum number of instances in the MIG

AnswerB

Correct: ensures instances are healthy before traffic is sent.

Why this answer

Intermittent 503 errors during traffic spikes often indicate that new VM instances are being started but are not yet ready to serve traffic, causing the load balancer to forward requests to them prematurely. Configuring a health check with a sufficient initial delay (grace period) in the MIG ensures that newly created instances are given time to fully initialize and pass health checks before they receive traffic, preventing 503 errors. This directly addresses the root cause by allowing the application to become healthy before being added to the load balancer's backend.

Exam trap

Google Cloud often tests the misconception that scaling-related errors are always solved by increasing capacity or adjusting scaling parameters, when in fact the root cause is often a misconfigured health check or insufficient initialization time for new instances.

How to eliminate wrong answers

Option A is wrong because changing the load balancer from regional to global does not address the timing issue of new instances being marked healthy before they are ready; global load balancers improve cross-region routing but do not affect instance readiness. Option C is wrong because increasing the autoscaling cool-down period from 60s to 120s only delays the scaling decision after a scale-out event, but does not prevent the load balancer from sending traffic to instances that are still initializing; the cool-down period controls how often autoscaler evaluates metrics, not instance readiness. Option D is wrong because increasing the maximum number of instances in the MIG allows more capacity but does not fix the problem of instances being added to the backend pool before they are ready; it may even exacerbate the issue by creating more unhealthy instances.

Full explanation →

435

Multi-Selectmedium

Which TWO of the following are valid methods to securely access Google Cloud APIs from a Compute Engine instance without managing service account keys?

Select 2 answers

A.Download a service account key file and store it on the instance

B.Attach a custom service account to the instance using the gcloud command

C.Grant the appropriate IAM roles to the instance's service account

D.Use a Cloud KMS key to generate temporary credentials

E.Use the default Compute Engine service account

AnswersB, E

Custom service account can be attached at creation, no keys needed.

Why this answer

The default service account and attaching a custom service account to the instance both provide access via metadata server, no key management. Using a service account key file (B) requires key management. Using Cloud KMS (D) is for encrypting keys, not accessing APIs.

IAM roles (E) are permissions, not method of access.

Full explanation →

436

MCQeasy

A company is using Cloud Storage for backups and wants to minimize costs. The backups are accessed infrequently and can tolerate retrieval delays. Which storage class is most appropriate?

A.Standard

B.Archive

C.Coldline

D.Nearline

AnswerB

Archive is the cheapest option for long-term backups with rare access and retrieval delays acceptable.

Why this answer

Archive storage class is the most cost-effective option for backups that are accessed infrequently and can tolerate retrieval delays. It offers the lowest storage cost among Google Cloud Storage classes, with a default retrieval time of minutes to hours, making it ideal for long-term backup data that does not require immediate access.

Exam trap

Google Cloud often tests the misconception that 'Coldline' is the cheapest storage class, but Archive is actually the lowest-cost option for data that can tolerate retrieval delays of minutes to hours, not just for data that is rarely accessed.

How to eliminate wrong answers

Option A is wrong because Standard storage class is designed for frequently accessed data with no retrieval delay, and its higher cost makes it unsuitable for infrequently accessed backups. Option C is wrong because Coldline storage, while cheaper than Standard, is still more expensive than Archive and has a 90-day minimum storage duration, which may not be optimal for long-term backups with very low access frequency. Option D is wrong because Nearline storage is intended for data accessed less than once a month, but it has a 30-day minimum storage duration and higher cost compared to Archive, making it less cost-efficient for backups that can tolerate retrieval delays.

Full explanation →

437

MCQeasy

A company is migrating sensitive customer data to Google Cloud. They need to ensure data is encrypted at rest and in transit. Which Google Cloud service provides a centralized way to manage encryption keys used by Google Cloud services?

A.Cloud HSM

B.Cloud External Key Manager (Cloud EKM)

C.Cloud Key Management Service (Cloud KMS)

D.Secret Manager

AnswerC

Cloud KMS provides centralized management of encryption keys used by Google Cloud services.

Why this answer

Cloud KMS is the correct choice because it provides a centralized, managed service for creating, rotating, and destroying encryption keys used by Google Cloud services. It integrates directly with services like Cloud Storage, BigQuery, and Compute Engine to enforce encryption at rest, and it supports customer-managed encryption keys (CMEK) for granular control. For data in transit, Cloud KMS can be used to manage keys for TLS or application-level encryption, though Google Cloud automatically encrypts all network traffic by default.

Exam trap

Google Cloud often tests the distinction between Cloud KMS as the centralized key management service and Cloud HSM as a hardware-backed option within Cloud KMS, leading candidates to choose Cloud HSM when the question asks for the centralized service.

How to eliminate wrong answers

Option A is wrong because Cloud HSM is a hardware security module service that provides dedicated, FIPS 140-2 Level 3 validated hardware for key operations, but it is not the centralized key management service; it is an option within Cloud KMS for higher security requirements. Option B is wrong because Cloud External Key Manager (Cloud EKM) allows you to manage keys outside of Google Cloud using an external key management partner, but it is not a centralized Google Cloud service for managing encryption keys used by Google Cloud services; it is for keys stored externally. Option D is wrong because Secret Manager is designed to store and manage secrets such as API keys, passwords, and certificates, not encryption keys for encrypting data at rest or in transit across Google Cloud services.

Full explanation →

438

Multi-Selectmedium

Which TWO are required to allow on-premises hosts to access Google APIs using internal IP addresses (Private Google Access)? (Choose 2)

Select 2 answers

A.A Cloud Interconnect or Cloud VPN connection between on-premises and VPC

B.A Cloud Router instance configured in the on-premises network

C.VPC Service Controls enabled

D.Private Google Access enabled on the subnet that the on-premises traffic will use

E.A private DNS zone for googleapis.com

AnswersA, D

Provides network connectivity between on-premises and GCP.

Why this answer

A Cloud Interconnect or Cloud VPN connection is required to establish private, encrypted connectivity between on-premises hosts and a VPC network. This provides the network path for on-premises traffic to reach Google APIs using internal IP addresses, bypassing the public internet. Without this direct connection, on-premises hosts cannot leverage Private Google Access, which only applies to traffic originating within Google Cloud subnets.

Exam trap

Google Cloud often tests the misconception that a Cloud Router or DNS zone is required for Private Google Access, but the core requirement is simply the private network connectivity (Cloud Interconnect or Cloud VPN) and the subnet-level feature enablement.

Full explanation →

439

MCQhard

When will the key be automatically rotated?

A.Every 180 days

B.Only when manually triggered

C.Every 30 days

D.Every 90 days

AnswerD

7776000s = 90 days.

Why this answer

The rotationPeriod is 7776000 seconds, which equals 90 days. The nextRotationTime is set to 2024-04-01, 90 days after creation, confirming automatic rotation every 90 days.

Full explanation →

440

MCQeasy

A company wants to store backup data that is accessed rarely but must be available for retrieval within minutes. Which Cloud Storage class is appropriate?

A.Standard

B.Nearline

C.Coldline

D.Archive

AnswerB

Low-cost storage for data accessed less than once a month with fast retrieval.

Why this answer

Nearline storage is designed for data accessed less than once a month but requires retrieval within minutes, making it ideal for backup data that needs quick availability. It offers lower cost than Standard storage while still supporting sub-minute retrieval times, aligning with the scenario's access and latency requirements.

Exam trap

Google Cloud often tests the distinction between 'retrieval within minutes' and 'retrieval within hours' to confuse candidates into selecting Coldline or Archive, assuming 'rarely accessed' automatically means the cheapest option, but the key is the specific retrieval time requirement.

How to eliminate wrong answers

Option A is wrong because Standard storage is for frequently accessed data (e.g., multiple times per month) and costs more, making it unsuitable for rarely accessed backups. Option C is wrong because Coldline storage is for data accessed less than once a quarter, with retrieval times that can be minutes to hours, but it is optimized for even colder data than Nearline, and its cost structure (including retrieval fees) is less appropriate for backups needing consistent minute-level access. Option D is wrong because Archive storage is for long-term retention with retrieval times typically in hours (e.g., 1-12 hours), not minutes, and is intended for data that is accessed extremely rarely, such as regulatory archives.

Full explanation →

441

MCQmedium

Refer to the exhibit. An engineer deploys this Terraform configuration. After deployment, they can SSH into the VM using its public IP. However, they want to restrict SSH access to only a specific IP range (203.0.113.0/24). What change is required?

A.Change the 'source_ranges' in the firewall rule to ['203.0.113.0/24']. The instance already has the required tag.

B.Modify the instance to use a network tag 'restricted-ssh' and update the firewall rule target_tags accordingly.

C.Add a new firewall rule with higher priority allowing SSH from 203.0.113.0/24, and keep the existing rule but change its priority to 100.

D.Update the 'source_ranges' in the firewall rule to ['203.0.113.0/24'] and remove the 'ssh-allowed' tag from the instance.

AnswerA

Correct: Updating the source ranges restricts incoming SSH to the specified IP range.

Why this answer

The firewall rule 'allow-ssh' currently allows SSH from all IPs (0.0.0.0/0) to instances with tag 'ssh-allowed'. To restrict to a specific IP range, the source_ranges must be updated to ['203.0.113.0/24']. The instance already has the tag 'ssh-allowed', so no change to tags is needed.

Full explanation →

442

Multi-Selecteasy

A DevOps team is deploying a microservices application on Google Kubernetes Engine (GKE). They want to ensure that the pods can securely access Google Cloud APIs (e.g., Cloud Storage) without managing service account keys. Which TWO steps should they take? (Choose two.)

Select 2 answers

A.Create a dedicated GCP service account with necessary roles and bind it to Kubernetes service accounts via Workload Identity.

B.Use the Compute Engine default service account on each node.

C.Use a secrets management solution like HashiCorp Vault to store service account keys and retrieve them at runtime.

D.Enable Workload Identity on the GKE cluster.

E.Store service account keys in a Kubernetes Secret and mount them into pods.

AnswersA, D

This grants minimal required permissions to the workload, following the principle of least privilege, and leverages Workload Identity for secure access.

Why this answer

Option A is correct because Workload Identity allows you to bind a Kubernetes service account to a GCP service account, enabling pods to authenticate to Google Cloud APIs (e.g., Cloud Storage) without managing or storing service account keys. This eliminates the security risk of key leakage and simplifies credential rotation. Option D is correct because Workload Identity must be explicitly enabled on the GKE cluster (using the `--workload-pool` flag or via the console) before the binding can be established.

Exam trap

Google Cloud often tests the misconception that storing keys in Kubernetes Secrets or using node-level default service accounts is acceptable for secure API access, when in fact Workload Identity is the recommended, keyless approach for GKE.

Full explanation →

443

MCQhard

A company runs a large-scale data processing pipeline using Dataflow with streaming data from Pub/Sub. They notice increasing costs due to high data shuffle operations. They want to optimize the pipeline performance and cost. Which approach should they take?

A.Use a larger machine type for workers.

B.Increase the number of workers to reduce shuffle.

C.Optimize the pipeline by partitioning data and using Combine transforms.

D.Switch to batch mode overnight.

AnswerC

Partitioning and Combine reduce the amount of data shuffled, lowering cost and improving performance.

Why this answer

Optimizing pipeline logic to minimize shuffle reduces resource usage and cost. Increasing workers or using larger machine types may improve performance but increase cost. Switching to batch mode would lose real-time processing capability.

Full explanation →

444

MCQhard

A company is using Cloud Storage to store sensitive data. They need to enforce that objects are deleted exactly 30 days after creation. Which object lifecycle rule should they configure?

A.AbortIncompleteMultipartUpload after 30 days.

B.Delete action with condition daysFromNonCurrentTime: 30.

C.Delete action with condition age: 30.

D.SetStorageClass to Nearline after 30 days.

AnswerC

Deletes objects 30 days after creation.

Why this answer

Option C is correct because the 'Delete action with condition age: 30' directly instructs Cloud Storage to remove objects 30 days after their creation time. The 'age' condition is measured from the object's creation timestamp, which aligns perfectly with the requirement to delete objects exactly 30 days after creation.

Exam trap

Google Cloud often tests the distinction between 'age' (based on creation time) and 'daysFromNonCurrentTime' (based on versioning status), leading candidates to confuse deletion of current objects with cleanup of older versions.

How to eliminate wrong answers

Option A is wrong because AbortIncompleteMultipartUpload is used to cancel incomplete multipart uploads after a specified number of days, not to delete completed objects. Option B is wrong because 'daysFromNonCurrentTime' applies to non-current object versions in a versioned bucket, not to the creation time of the current object. Option D is wrong because SetStorageClass to Nearline changes the storage class to a colder tier but does not delete the object; it only modifies the cost and retrieval latency.

Full explanation →

445

Matchingmedium

Match each GCP networking concept to its definition.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Virtual Private Cloud for isolated network

Regional IP address range within a VPC

Controls ingress/egress traffic

Dynamically exchange routes using BGP

Connect two VPCs privately

Why these pairings

These are fundamental networking concepts in GCP.

Full explanation →

446

Multi-Selectmedium

An organization wants to monitor network traffic between VMs in a VPC for troubleshooting. Which TWO services can provide this?

Select 2 answers

A.Cloud Audit Logs

B.Packet Mirroring (Network Intelligence Center)

C.VPC Flow Logs

D.Cloud Monitoring

E.Cloud Logging

AnswersB, C

Provides deep packet inspection.

Why this answer

Packet Mirroring (Network Intelligence Center) is correct because it clones the actual packet contents (headers and payload) from VM instances and forwards them to a collector for deep packet inspection, enabling detailed troubleshooting of network traffic between VMs in a VPC. This service captures full packet data, including application-layer information, which is essential for diagnosing issues like packet loss, latency, or protocol errors.

Exam trap

Google Cloud often tests the distinction between services that capture raw packet data (Packet Mirroring) versus those that only log metadata or metrics (VPC Flow Logs, Cloud Monitoring), leading candidates to mistakenly choose VPC Flow Logs as the sole correct answer when full packet capture is required for deep troubleshooting.

Full explanation →

447

MCQeasy

A developer wants to deploy a stateless web application that automatically scales based on HTTP traffic. The application should be cost-effective and require minimal configuration. Which compute option is best?

A.App Engine Standard Environment

B.Cloud Functions

C.Compute Engine managed instance group

D.Cloud Run

E.Google Kubernetes Engine

AnswerD

Correct. Cloud Run scales automatically and is simple to deploy.

Why this answer

Cloud Run is the best choice because it automatically scales to zero when idle, scales up to handle HTTP traffic spikes, and requires minimal configuration—just deploy a container. It is cost-effective as you pay only for resources used during request processing, and it supports stateless web applications natively without managing servers or clusters.

Exam trap

The trap here is that candidates often confuse Cloud Run with Cloud Functions, thinking both are equivalent for web applications, but Cloud Functions is limited to event-driven triggers and cannot serve a full web app with persistent HTTP connections.

How to eliminate wrong answers

Option A is wrong because App Engine Standard Environment, while serverless, has more restrictive runtime environments and may require code modifications to fit its sandbox, whereas Cloud Run offers more flexibility with any container. Option B is wrong because Cloud Functions is designed for event-driven, short-lived functions, not for a full stateless web application that handles continuous HTTP traffic. Option C is wrong because Compute Engine managed instance groups require manual configuration of autoscaling policies, instance templates, and health checks, and do not scale to zero, leading to higher costs during idle periods.

Option E is wrong because Google Kubernetes Engine requires cluster management, node configuration, and more operational overhead, making it less minimal in configuration compared to Cloud Run's fully managed serverless container platform.

Full explanation →

448

MCQhard

A Cloud Router BGP session is flapping. The logs show 'Interface flapping due to changes in the underlying network'. What is the most likely cause?

A.MTU mismatch across the network path.

B.BGP MD5 authentication failure.

C.Incorrect local AS number in Cloud Router configuration.

D.BGP timer misconfiguration between peers.

AnswerA

MTU mismatch can cause intermittent packet loss, leading to BGP session flapping.

Why this answer

The log message 'Interface flapping due to changes in the underlying network' indicates that the BGP session is unstable because the physical or logical interface is going up and down. An MTU mismatch across the network path can cause packet fragmentation issues, leading to intermittent connectivity and interface flaps as the router detects and recovers from the problem. This is the most likely cause because it directly affects the stability of the underlying network path.

Exam trap

The trap here is that candidates often associate BGP flapping with timer misconfigurations or authentication issues, but the specific log message about 'changes in the underlying network' points directly to a Layer 2 or path-level problem like MTU mismatch, not BGP protocol errors.

How to eliminate wrong answers

Option B is wrong because BGP MD5 authentication failure would generate authentication error messages, not interface flapping logs, and would prevent the session from establishing rather than cause intermittent flaps. Option C is wrong because an incorrect local AS number in Cloud Router configuration would cause a BGP open message error and the session would fail to establish entirely, not flap due to interface changes. Option D is wrong because BGP timer misconfiguration (e.g., hold time or keepalive) would cause the session to time out and reset, but the log specifically mentions 'changes in the underlying network', not timer expiry.

Full explanation →

449

MCQhard

You are responsible for incident management for a production service. You want to reduce manual toil during the initial response to common issues like high latency. What is the best approach?

A.Use Cloud Monitoring to trigger a Cloud Function that performs automated checks and rolls back the last deployment if latency spikes.

B.Set up Cloud Monitoring alerts with email notifications to the on-call engineer.

C.Create detailed runbooks and require the on-call to follow them step by step.

D.Enable Cloud Logging and set up a custom dashboard for the on-call.

AnswerA

Automated actions reduce manual toil and speed up response.

Why this answer

Option A is correct because it directly reduces manual toil by automating the initial response to common issues like high latency. Cloud Monitoring triggers a Cloud Function that performs automated checks and, if latency spikes, rolls back the last deployment, eliminating the need for human intervention during the critical first response phase.

Exam trap

Google Cloud often tests the distinction between 'alerting' (which still requires manual action) and 'automated remediation' (which reduces toil), so candidates mistakenly choose options that provide visibility or documentation instead of automation.

How to eliminate wrong answers

Option B is wrong because email notifications alone still require the on-call engineer to manually investigate and respond, which does not reduce toil; it merely alerts them. Option C is wrong because requiring the on-call to follow runbooks step by step still involves manual effort and does not automate the response, leaving toil unchanged. Option D is wrong because enabling Cloud Logging and setting up a custom dashboard provides visibility but does not automate any action, so the on-call must still manually diagnose and respond to the issue.

Full explanation →

450

MCQmedium

A company is designing a microservices architecture on Google Kubernetes Engine (GKE) for a global user base. They require high availability across multiple zones, automatic scaling, and rolling updates without downtime. Which Kubernetes workload resource should they use for each service?

A.StatefulSet with volumeClaimTemplates for persistent storage

B.Deployment with pod anti-affinity rules spread across zones

C.Job for batch processing

D.DaemonSet to ensure one pod per node

AnswerB

Deployments provide rolling updates and replica management; combined with pod anti-affinity, they can ensure pods are distributed across zones for high availability.

Why this answer

Option C is correct because Deployments support rolling updates and pod anti-affinity can spread pods across zones for high availability. StatefulSets are for stateful workloads, DaemonSets for node-level daemons, and Jobs for batch tasks.

Full explanation →

Google Professional Cloud Architect (PCA) — Questions 376–450