Google Professional Cloud Architect PCA Questions 451–509 | Page 7/7

451

Multi-Selecthard

A company is designing a highly available web application on Google Cloud. The application consists of stateless compute instances behind a global HTTP(S) Load Balancer. The compute instances must be able to handle sudden spikes in traffic. Which TWO strategies should the company implement? (Choose two.)

Select 2 answers

A.Use Cloud CDN to cache all responses from the application servers.

B.Use a managed instance group with autoscaling based on CPU utilization.

C.Use a single Compute Engine instance in a single zone with a large machine type.

D.Use a global HTTP(S) Load Balancer with backends in multiple regions.

E.Use vertical scaling by selecting a machine type with more vCPUs and memory.

AnswersB, D

Autoscaling handles spikes by adding instances.

Why this answer

Option B is correct because a managed instance group with autoscaling based on CPU utilization automatically adjusts the number of stateless compute instances in response to traffic spikes, ensuring the application can handle sudden load increases without manual intervention. This aligns with the requirement for stateless instances behind a global load balancer, as autoscaling adds or removes instances based on real-time CPU metrics, providing elasticity and high availability.

Exam trap

The trap here is that candidates often confuse caching (Cloud CDN) with compute scaling, or assume vertical scaling (larger machine types) is sufficient for sudden spikes, ignoring the need for horizontal elasticity and multi-zone redundancy in a highly available architecture.

Full explanation →

452

Multi-Selectmedium

A company runs a critical application on a Compute Engine instance. They want to ensure that the application remains available even if the instance crashes. Which two GCP features should they use? (Choose two.)

Select 2 answers

A.Regular snapshots of the persistent disk.

B.A load balancer distributing traffic to a single instance.

C.Instance template with automatic restart.

D.Managed Instance Group with autohealing.

E.A Cloud CDN to cache static content.

AnswersC, D

Automatic restart restarts the instance on host failure.

Why this answer

Option C is correct because an instance template with automatic restart enables Compute Engine to automatically restart a VM instance if it crashes or is terminated due to a non-user-initiated failure. This feature is configured at the instance level and ensures that the application recovers quickly without manual intervention, improving availability for a single-instance workload.

Exam trap

The trap here is that candidates often confuse automatic restart (which handles VM crashes) with autohealing (which handles application-level failures), or incorrectly assume a load balancer alone provides high availability without a redundant backend.

Full explanation →

453

MCQmedium

Your organization uses Cloud Spanner for a customer database with a 99.999% availability SLA. You need a Disaster Recovery plan that ensures data consistency with zero RPO in case of a region failure. What should you do?

A.Use a single-region instance configuration and enable read replicas.

B.Export the database periodically to Cloud Storage and set up a cross-region load balancer.

C.Configure daily backups and store them in Cloud Storage in a different region.

D.Use a multi-region instance configuration (e.g., nam-eur-asia) for the Spanner instance.

AnswerD

Multi-region configs use synchronous replication across regions, providing automatic failover with zero RPO.

Why this answer

Option D is correct because Cloud Spanner multi-region instance configurations (e.g., nam-eur-asia) provide synchronous replication across multiple regions, ensuring strong global consistency and zero RPO. This architecture uses Paxos-based replication to commit writes only after they are durably stored in a majority of regions, so a region failure does not lose any committed data. The 99.999% availability SLA is met by automatic failover within the multi-region setup without manual intervention.

Exam trap

Google Cloud often tests the misconception that read replicas or periodic exports can achieve zero RPO, but only synchronous multi-region replication (as in Spanner's multi-region configurations) guarantees no data loss during a region failure.

How to eliminate wrong answers

Option A is wrong because single-region instance configurations with read replicas are not supported in Cloud Spanner; Spanner uses writable replicas, not read replicas, and a single-region setup cannot survive a full region failure, thus cannot achieve zero RPO. Option B is wrong because exporting the database periodically to Cloud Storage introduces a non-zero RPO (the time between exports) and does not guarantee data consistency at the point of failure; cross-region load balancers do not handle Spanner's transactional consistency. Option C is wrong because daily backups stored in a different region provide point-in-time recovery with a minimum RPO of 24 hours (or more), not zero RPO, and cannot ensure data consistency for transactions in flight at the time of failure.

Full explanation →

454

Drag & Dropmedium

Drag and drop the steps to configure a Cloud Load Balancer with a backend service consisting of Compute Engine instances into the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Health checks ensure traffic only goes to healthy instances; URL map defines routing; forwarding rule exposes the IP.

Full explanation →

455

MCQeasy

A developer needs to deploy a containerized application on Google Kubernetes Engine (GKE) with minimal operational overhead. They want to automatically scale the number of pods based on CPU utilization. Which GKE feature should they use?

A.Horizontal Pod Autoscaler.

B.Node auto-repair.

C.Vertical Pod Autoscaler.

D.Cluster Autoscaler.

AnswerA

HPA scales pods based on metrics like CPU.

Why this answer

The Horizontal Pod Autoscaler (HPA) is the correct choice because it automatically scales the number of pod replicas in a GKE deployment based on observed CPU utilization (or other custom metrics). This directly meets the requirement of scaling pods with minimal operational overhead, as HPA is a native Kubernetes resource that requires no manual intervention once configured.

Exam trap

Google Cloud often tests the distinction between horizontal scaling (HPA) and vertical scaling (VPA), where candidates mistakenly choose VPA when the question explicitly asks for scaling the number of pods based on CPU utilization.

How to eliminate wrong answers

Option B (Node auto-repair) is wrong because it automatically repairs unhealthy nodes in the node pool, not scales pods based on CPU utilization. Option C (Vertical Pod Autoscaler) is wrong because it adjusts the CPU and memory requests/limits of existing pods (vertical scaling), not the number of pod replicas (horizontal scaling). Option D (Cluster Autoscaler) is wrong because it adds or removes nodes from the cluster based on pod scheduling needs, not directly scaling pods based on CPU utilization.

Full explanation →

456

MCQmedium

A company uses BigQuery for analytics. They have a large partitioned table that is queried frequently. The query performance has degraded over time. Which optimization should they try first?

A.Create a materialized view for each frequent query.

B.Increase the number of slots for the project.

C.Apply clustering on frequently filtered columns.

D.Denormalize the table to reduce joins.

AnswerC

Clustering sorts data, reducing scanned data for filters.

Why this answer

Clustering on frequently filtered columns reorganizes the data within partitions based on the values of those columns, which allows BigQuery to prune blocks more effectively during queries. This directly addresses the performance degradation by reducing the amount of data scanned, without requiring additional storage or compute resources.

Exam trap

Google Cloud often tests the misconception that adding more slots (Option B) is the default performance fix, when in reality the first step should be to reduce data scanned through clustering or partitioning optimization.

How to eliminate wrong answers

Option A is wrong because creating materialized views for each frequent query would increase storage costs and maintenance overhead, and they are not the first optimization to try for a partitioned table with degraded performance; clustering addresses the root cause of excessive data scanning. Option B is wrong because increasing the number of slots only improves concurrency and throughput, not the efficiency of individual queries; it does not reduce the amount of data read per query. Option D is wrong because denormalizing the table to reduce joins is a schema design change that may help with join-heavy workloads, but it does not address the core issue of scanning too many rows in a large partitioned table; clustering is a more targeted and less disruptive first step.

Full explanation →

457

MCQeasy

A developer needs to deploy a stateful application that requires persistent storage across pod restarts in Google Kubernetes Engine. Which resource should they use?

A.ConfigMap

B.EmptyDir

C.Secret

D.PersistentVolumeClaim

AnswerD

Provides persistent storage that remains across pod restarts.

Why this answer

A PersistentVolumeClaim (PVC) is the correct resource because it allows a pod to request persistent storage that survives pod restarts. In GKE, a PVC binds to a PersistentVolume (PV), which can be backed by Compute Engine persistent disks, ensuring data remains available even if the pod is rescheduled or restarted.

Exam trap

The trap here is that candidates confuse ephemeral volumes (EmptyDir) with persistent storage, or assume ConfigMaps/Secrets can store application data, when in fact they are for configuration and secrets only.

How to eliminate wrong answers

Option A is wrong because a ConfigMap is used to inject configuration data (e.g., environment variables, files) into pods, not for persistent storage. Option B is wrong because an EmptyDir volume is ephemeral—it is created when a pod starts and is deleted when the pod is removed, so data does not persist across pod restarts. Option C is wrong because a Secret is designed to store sensitive data (e.g., passwords, tokens) and is not a storage volume for application data.

Full explanation →

458

MCQhard

A global e-commerce site uses an external HTTPS load balancer with a backend service pointing to a managed instance group. Some users report 503 errors during peak traffic. The backend instances are healthy and not overloaded. What is the most likely cause?

A.The CDN cache is not warming up properly

B.The backend service's health check interval is too short

C.The SSL certificate is expired

D.The load balancer's max rate per backend is configured too low

AnswerD

The load balancer enforces a rate limit at the backend level; exceeding it produces 503.

Why this answer

A 503 error from an external HTTPS load balancer with healthy backends typically indicates that the load balancer is throttling requests. The 'max rate per backend' setting limits the number of requests per second that the load balancer forwards to each backend instance. When this limit is exceeded, the load balancer returns 503 errors even though the instances themselves are not overloaded, which matches the scenario of peak traffic.

Exam trap

Google Cloud often tests the misconception that 503 errors always indicate backend overload or health check failures, when in fact the load balancer's rate limiting configuration can cause 503s with perfectly healthy instances.

How to eliminate wrong answers

Option A is wrong because CDN cache warming affects cache hit ratios and latency, not 503 errors from the load balancer; a cold cache would cause more origin requests but not throttling. Option B is wrong because a health check interval that is too short could cause flapping or false unhealthy status, but the question states backend instances are healthy and not overloaded, so health checks are passing. Option C is wrong because an expired SSL certificate would cause TLS handshake failures (e.g., ERR_CERT_DATE_INVALID) and 502 or connection errors, not 503 errors from the load balancer itself.

Full explanation →

459

MCQeasy

A company wants to minimize egress costs for data transferred between Compute Engine instances in the same region but different zones. What is the best practice?

A.Use a VPN connection.

B.Use internal IPs and ensure they are in the same VPC.

C.Use Cloud NAT.

D.Use external IPs for all instances.

AnswerB

Internal IP traffic within the same VPC and region is free.

Why this answer

B is correct because data transfer between Compute Engine instances in the same region but different zones uses internal IP addresses within the same VPC, which incurs no egress costs. Google Cloud does not charge for traffic between instances using internal IPs within the same region, regardless of zone, as long as they are in the same VPC network. This is the most cost-effective approach for minimizing egress costs.

Exam trap

The trap here is that candidates often confuse 'different zones' with 'different regions' and assume egress costs apply, or they mistakenly think that using external IPs or NAT is necessary for inter-instance communication, when in fact internal IPs within the same VPC and region are free and optimal.

How to eliminate wrong answers

Option A is wrong because using a VPN connection introduces additional complexity and does not reduce egress costs; VPN traffic still traverses the internet or uses Cloud VPN tunnels, which incur egress charges. Option C is wrong because Cloud NAT is used for outbound internet access from private instances and does not affect inter-instance traffic costs within the same region; it would add unnecessary overhead and potential costs. Option D is wrong because using external IPs for all instances forces traffic to go through the internet or Google's external network, incurring egress charges even within the same region, which is the opposite of minimizing costs.

Full explanation →

460

MCQhard

Refer to the exhibit. A Deployment Manager template deploys a GKE cluster and a job that publishes to Pub/Sub. The job fails with a permission error. Which change would fix the issue?

A.Set the job's serviceAccountName to the default compute service account.

B.Change the oauthScopes to include https://www.googleapis.com/auth/cloud-platform.

C.Add dependsOn: [my-job] to the cluster resource to ensure the cluster is ready.

D.Add a serviceAccount field to nodeConfig with a custom service account that has roles/pubsub.publisher.

AnswerD

This ensures the nodes (and thus the job) have the required Pub/Sub publish permission.

Why this answer

The node pool's service account needs the Pub/Sub Publisher role. The exhibit shows the nodes are using the default compute engine service account with only pubsub scope (no roles). The fix is to assign a service account with the necessary IAM role.

Full explanation →

461

MCQeasy

An organization needs to meet a RTO of 1 hour for a critical application running on GCE with persistent disks. What is the most cost-effective approach?

A.Use regional persistent disks.

B.Replica of compute instance in another zone.

C.Frequent disk image exports.

D.Regular snapshots to a regional bucket.

AnswerA

Synchronous replication, fast failover.

Why this answer

Regional persistent disks (PD) provide synchronous replication of data between two zones in the same region, enabling automatic failover for a GCE instance without manual intervention. This meets the 1-hour RTO by allowing the instance to be recreated or failed over to the secondary zone quickly, and it is more cost-effective than maintaining a full replica instance because you only pay for the disk storage and replication, not for an idle compute instance.

Exam trap

The trap here is that candidates often confuse regional persistent disks with snapshots or image exports, assuming that any backup method can meet a strict RTO, but they overlook the synchronous replication and automatic failover capability of regional PDs that make them the most cost-effective for this requirement.

How to eliminate wrong answers

Option B is wrong because maintaining a replica of the compute instance in another zone incurs additional compute costs for the idle replica, which is less cost-effective than using regional PDs that only replicate the disk. Option C is wrong because frequent disk image exports are time-consuming (exporting an image can take longer than 1 hour) and incur storage costs for each image, making it impractical for a 1-hour RTO and not cost-effective. Option D is wrong because regular snapshots to a regional bucket provide asynchronous backup, not synchronous replication; restoring from a snapshot requires creating a new disk and instance, which can exceed the 1-hour RTO due to snapshot export and disk creation times.

Full explanation →

462

MCQhard

An organization has deployed a multi-region Cloud Spanner instance for a global application. The application is experiencing high latency for read requests from a specific region. The team has verified that the application is using stale reads and the data distribution is even. What is the most likely cause of the high latency?

A.The number of read replicas in the region is insufficient to handle the read volume.

B.The Spanner instance has too few nodes, causing contention.

C.The application is using read-write transactions instead of read-only transactions.

D.The Spanner instance does not have a read replica in a location close to the clients.

AnswerD

Adding a read replica in the region reduces network round-trip time, lowering read latency.

Why this answer

Option D is correct because Cloud Spanner uses a single global configuration with regional read replicas. If the instance does not have a read replica in the region where the clients are located, read requests must traverse the network to a replica in another region, causing higher latency. Even with stale reads, the physical distance to the nearest replica directly impacts read latency.

Exam trap

Google Cloud often tests the misconception that adding more nodes or read replicas solves regional latency, when the real issue is the absence of a local replica in the specific region.

How to eliminate wrong answers

Option A is wrong because Cloud Spanner does not have a concept of 'read replicas' in the same way as traditional databases; it uses a single set of nodes per instance, and read capacity scales with the number of nodes, not with separate read replicas. Option B is wrong because the team has verified that data distribution is even, and the question states the issue is specific to a region, not global contention; too few nodes would cause high latency across all regions, not just one. Option C is wrong because the team has already verified that the application is using stale reads, which are read-only transactions by definition; read-write transactions would not be used in this scenario.

Full explanation →

463

Multi-Selecthard

Which THREE are valid Google Cloud Dedicated Interconnect connection options?

Select 3 answers

A.High availability (HA) with two 10 Gbps circuits.

B.10 Gbps single circuit.

C.IPsec VPN tunnel as a backup to the interconnect.

D.Partner Interconnect offering via a service provider.

E.100 Gbps single circuit.

AnswersA, B, E

HA option provides redundancy.

Why this answer

Option A is correct because Google Cloud Dedicated Interconnect supports high availability configurations using two 10 Gbps circuits to provide redundancy and meet SLA requirements. This setup ensures that if one circuit fails, traffic can be rerouted through the other, maintaining connectivity.

Exam trap

Google Cloud often tests the distinction between Dedicated Interconnect and Partner Interconnect, and the fact that IPsec VPN is a separate backup option, not a connection type for Dedicated Interconnect.

Full explanation →

464

Multi-Selectmedium

Which TWO strategies should a company implement to optimize costs for a production GKE cluster? (Choose two.)

Select 2 answers

A.Use Istio for traffic management.

B.Use a regional cluster.

C.Enable GKE usage metering.

D.Use cluster autoscaler with preemptible node pools.

E.Use node local DNS cache.

AnswersC, D

Usage metering helps allocate costs per namespace and identify waste.

Why this answer

Option C is correct because GKE usage metering provides detailed cost allocation by breaking down cluster resource consumption (CPU, memory, storage) per Kubernetes namespace or label. This enables teams to track and optimize spending across different projects or departments, directly supporting cost optimization for a production cluster.

Exam trap

Google Cloud often tests the distinction between cost optimization and other operational goals like high availability or performance; candidates mistakenly choose regional clusters (high availability) or Istio (traffic management) as cost-saving measures when they are not.

Full explanation →

465

MCQmedium

A company wants to restrict access to a Cloud Storage bucket so that only objects encrypted with a specific Cloud KMS key can be read. Which approach should they use?

A.Enable Key Access Justifications on the Cloud KMS key and allow access only for justified requests.

B.Set a bucket policy that denies access if the object's encryption type is not CMEK.

C.Use IAM conditions with the resource name condition 'resource.name.startsWith("projects/_/buckets/example-bucket/objects/")' and 'resource.hasTag("kmsKeyName", "projects/p/locations/l/keyRings/kr/cryptoKeys/ck")'.

D.Configure VPC Service Controls to include the bucket and the Cloud KMS key resource.

AnswerC

IAM conditions can check object metadata such as 'kmsKeyName' to restrict access to objects encrypted with a specific key.

Why this answer

Option D is correct because Cloud Storage Object has a 'kmsKeyName' condition that can be used in IAM conditions to require objects to be encrypted with a specific KMS key. Option A is wrong because VPC Service Controls prevent data exfiltration but do not enforce encryption at the object level. Option B is wrong because bucket policies do not directly examine encryption key metadata.

Option C is wrong because Key Access Justifications only provide reasons for key access but do not restrict object access based on encryption key.

Full explanation →

466

Multi-Selectmedium

A cloud architect is implementing a CI/CD pipeline for a microservices-based application on Google Kubernetes Engine (GKE). The team needs to deploy new versions of the services with zero downtime and the ability to quickly roll back if issues are detected. Which two strategies should the architect consider? (Choose two.)

Select 2 answers

A.Shadow deployment

B.Rolling update

C.Blue/green deployment

D.Canary deployment

E.A/B testing deployment

AnswersC, D

Correct: blue/green allows instant rollback by switching traffic back to the old version.

Why this answer

Blue/green deployment (A) and canary deployment (C) are two strategies that provide zero downtime and quick rollback. Blue/green deploys new version to a separate environment and switches traffic; canary gradually shifts traffic and allows easy rollback. Rolling update (B) also provides zero downtime but rollback requires a new update, not immediate.

Shadow deployment (D) mirrors traffic for analysis but doesn't serve users. A/B testing (E) is a method for comparing features but not a deployment strategy.

Full explanation →

467

MCQeasy

A multinational e-commerce company needs a globally distributed database that provides strong consistency and transactional support for order processing. Which Google Cloud database service should they use?

A.Cloud SQL

B.Cloud Spanner

C.Cloud Bigtable

D.Cloud Firestore

AnswerB

Cloud Spanner provides global distribution, strong consistency, and full transactional support, making it ideal for order processing.

Why this answer

Cloud Spanner is the correct choice because it is a globally distributed, horizontally scalable relational database service that provides strong consistency and full ACID transactional support across regions. Unlike other Google Cloud databases, Spanner uses synchronous replication and the TrueTime API to guarantee external consistency, making it ideal for order processing systems that require both global scale and transactional integrity.

Exam trap

The trap here is that candidates often confuse Cloud Spanner with Cloud SQL, assuming that a traditional relational database like Cloud SQL can be scaled globally by adding replicas, but they miss that Cloud SQL replicas are read-only and cannot provide the strong consistency and write scalability needed for a globally distributed transactional system.

How to eliminate wrong answers

Option A is wrong because Cloud SQL is a regional, single-writer database that cannot scale horizontally across multiple regions, and it does not provide the global strong consistency needed for a globally distributed order processing system. Option C is wrong because Cloud Bigtable is a NoSQL wide-column database designed for high-throughput analytical workloads, not for transactional order processing that requires strong consistency and ACID transactions. Option D is wrong because Cloud Firestore is a NoSQL document database that offers eventual consistency by default (unless using transactions in a single region) and is not designed for the complex, strongly consistent transactional workloads of a global e-commerce order processing system.

Full explanation →

468

MCQeasy

An administrator is configuring firewall rules in a VPC. Two rules apply to the same traffic: rule 1 allows ingress from 0.0.0.0/0 on TCP 80, rule 2 denies ingress from 10.0.0.0/8 on TCP 80. Rule 1 has priority 1000, rule 2 has priority 500. What is the effective behavior for traffic from 10.0.0.1?

A.The result is unpredictable without knowing the rule creation order.

B.Traffic is allowed because allow rules override deny rules.

C.Traffic is denied because rule 2 has higher priority.

D.Traffic is allowed because rule 1 has a lower priority number.

AnswerC

Rule 2 (priority 500) has higher priority than rule 1 (priority 1000), so deny applies.

Why this answer

In AWS VPC Network ACLs (NACLs), rules are evaluated in priority order, with lower numbers having higher priority. Rule 2 (priority 500) is evaluated before rule 1 (priority 1000), and since rule 2 explicitly denies traffic from 10.0.0.0/8 on TCP 80, traffic from 10.0.0.1 is denied regardless of rule 1's allow. NACLs are stateless and do not have an implicit override between allow and deny; the first matching rule determines the outcome.

Exam trap

Google Cloud often tests the misconception that allow rules override deny rules or that rule creation order matters, but the trap here is that candidates confuse priority numbers (lower = higher priority) and assume a higher number means higher priority.

How to eliminate wrong answers

Option A is wrong because rule creation order does not affect evaluation; only the priority number matters. Option B is wrong because allow rules do not inherently override deny rules; the rule with the highest priority (lowest number) that matches the traffic is applied. Option D is wrong because a lower priority number means higher priority, not lower; rule 1 has a higher priority number (1000) and thus lower priority, so it is not evaluated before rule 2.

Full explanation →

469

MCQhard

A company runs a multi-tier application on Google Cloud: a frontend on App Engine Standard, a backend on Cloud Run, and a Cloud SQL database. The application experiences intermittent 500 errors when users submit forms. The errors correlate with high CPU usage on the Cloud SQL instance (db-n1-standard-2, 7.5 GB memory). The Cloud Run service has a concurrency setting of 80 and a maximum of 10 instances. The App Engine service uses automatic scaling. The team has verified that the application code is not the issue. They suspect the database is hitting connection limits. Current max_connections on Cloud SQL is 250. The Cloud Run service uses a connection pool of 10 connections per instance. The App Engine service uses a connection pool of 5 connections per instance. They also have a few batch jobs that run occasionally, using up to 10 connections. The team wants to resolve the errors with minimal cost and complexity. Which course of action should they take?

A.Increase the maximum number of Cloud Run instances to 20 to handle more requests.

B.Upgrade the Cloud SQL instance to db-n1-standard-4 (15 GB memory) to handle more connections.

C.Increase the max_connections parameter on Cloud SQL to 500.

D.Reduce the concurrency setting on Cloud Run from 80 to 40.

AnswerC

This directly addresses the connection limit issue with minimal cost and no code changes.

Why this answer

The intermittent 500 errors are caused by the Cloud SQL instance hitting its max_connections limit of 250. With Cloud Run using 10 connections per instance and up to 10 instances (100 connections), App Engine using 5 connections per instance (unknown instance count but likely significant), and batch jobs using up to 10 connections, the total can easily exceed 250. Increasing max_connections to 500 directly addresses the connection limit without changing instance size or scaling behavior, which is the simplest and most cost-effective fix.

Exam trap

Google Cloud often tests the misconception that upgrading the instance tier (more memory/CPU) automatically increases connection limits, when in fact max_connections is a configurable parameter that can be increased independently without changing the instance size.

How to eliminate wrong answers

Option A is wrong because increasing Cloud Run instances to 20 would increase the total number of connections (up to 200 from Cloud Run alone), worsening the connection limit issue and potentially causing more 500 errors. Option B is wrong because upgrading to db-n1-standard-4 increases memory but does not change the default max_connections limit (which is based on tier, not memory alone); the current bottleneck is the connection count, not CPU or memory, so this adds cost without solving the problem. Option D is wrong because reducing concurrency on Cloud Run from 80 to 40 would decrease the number of concurrent requests per instance but does not reduce the number of connections per instance (still 10), and could lead to more instances being spun up, potentially increasing total connections.

Full explanation →

470

MCQhard

You are investigating a Vertex AI Workbench instance (instance-2) that is showing UNHEALTHY status. Based on the exhibit, what is the most likely cause of the issue?

A.The container image gcr.io/my-project/my-image:latest does not exist, or the service account used by the Workbench instance does not have storage.objectViewer access to the container registry.

B.The container registry endpoint is blocked by a firewall rule that does not allow egress to gcr.io.

C.The instance's underlying Compute Engine resources are exhausted, causing the container creation to timeout.

D.The Workbench instance is using an outdated custom image that is not compatible with the latest runtime version.

AnswerA

Option C is correct because the error is an image pull failure, which is typically due to missing image or insufficient permissions.

Why this answer

The UNHEALTHY status in Vertex AI Workbench typically occurs when the instance fails to start its container. The most likely cause is that the specified container image (gcr.io/my-project/my-image:latest) does not exist in the Container Registry, or the service account attached to the instance lacks the storage.objectViewer role on the registry bucket. Without this permission, the instance cannot pull the image, leading to a container creation failure and an UNHEALTHY state.

Exam trap

Google Cloud often tests the distinction between container image availability/permissions and network-level issues; the trap here is that candidates may assume a firewall or resource exhaustion is the cause, but the exhibit's focus on a specific container image points directly to a missing image or insufficient IAM permissions on the Container Registry.

How to eliminate wrong answers

Option B is wrong because while a firewall blocking egress to gcr.io could cause a pull failure, the exhibit does not mention any firewall rules, and the question asks for the 'most likely' cause based on the exhibit—lack of image existence or permissions is a more common and direct issue. Option C is wrong because Compute Engine resource exhaustion (e.g., CPU/memory) would typically cause a timeout or error during instance creation, not a persistent UNHEALTHY status after the instance is running; Vertex AI Workbench handles resource allocation separately. Option D is wrong because an outdated custom image would likely cause compatibility warnings or startup failures, but the exhibit shows a specific container image reference (gcr.io/my-project/my-image:latest), not a custom image issue; the UNHEALTHY status is tied to container pull failures, not image version mismatches.

Full explanation →

471

MCQmedium

You are designing a CI/CD pipeline for a containerized application on Google Cloud. The application is built with Cloud Build, stored in Container Registry, and deployed to GKE. The team wants to ensure that only images that pass vulnerability scanning are deployed. What should you do?

A.Add a step in Cloud Build that runs a vulnerability scanner on the image and fails the build if vulnerabilities exceed a threshold.

B.Configure Container Analysis to automatically scan images in Container Registry and block deployment via a webhook.

C.Enable Binary Authorization on the GKE cluster and configure a policy to require an attestation from a trusted authority.

D.Use Security Command Center to detect vulnerabilities and alert the team to manually block deployments.

AnswerA

This integrates scanning into the pipeline, preventing vulnerable images from being pushed.

Why this answer

Option A is correct because Cloud Build can include a custom step that runs a vulnerability scanner (e.g., using the Google Cloud `gcloud container images list-tags` with the `--show-occurrences-from` flag or a third-party tool like Trivy) and then evaluates the results against a threshold. If the scan finds vulnerabilities exceeding the defined threshold, the build step exits with a non-zero status, causing the Cloud Build pipeline to fail and preventing the image from being pushed to Container Registry or deployed. This directly enforces the requirement that only images passing vulnerability scanning proceed in the CI/CD pipeline.

Exam trap

The trap here is that candidates often confuse Binary Authorization (which requires attestations but does not perform scanning) with vulnerability scanning, or they assume Container Analysis can directly block deployments via a webhook, when in fact it only generates metadata that must be consumed by another policy engine.

How to eliminate wrong answers

Option B is wrong because Container Analysis automatically scans images in Container Registry, but it does not have a built-in webhook mechanism to block deployment; it only generates vulnerability occurrences that must be consumed by another service (e.g., Binary Authorization) to enforce policy. Option C is wrong because Binary Authorization enforces deployment policies based on attestations from trusted authorities, but it does not itself perform vulnerability scanning; it relies on an external attestor to verify the image, and the question requires that only images passing vulnerability scanning are deployed, not that an attestation is required. Option D is wrong because Security Command Center is a security and risk management platform that provides visibility and alerts, but it does not automatically block deployments; it requires manual intervention or integration with other tools to stop a deployment.

Full explanation →

472

Multi-Selecthard

A company is designing a highly available architecture for a stateful application on Compute Engine. They need to protect against zonal failures. Which THREE steps should they take?

Select 3 answers

A.Store session state in memory

B.Use a global load balancer with health checks

C.Use a single zone instance group

D.Use persistent disks with regional persistent disks

E.Use a managed instance group across multiple zones

AnswersB, D, E

Distributes traffic and fails over.

Why this answer

To protect against zonal failures: use a managed instance group across multiple zones (A) to distribute instances; use regional persistent disks (B) that replicate data across zones; and use a global load balancer with health checks (C) to direct traffic to healthy instances. Storing session state in memory (D) is not durable. Using a single zone instance group (E) does not provide HA.

Full explanation →

473

MCQeasy

A developer runs the command above and sees the output. The cluster has one node pool with 3 nodes, each of type e2-standard-4 (4 vCPU, 16 GB RAM). The application requires at least 2 GB of memory per pod and the cluster has 10 pods that need to be scheduled. The developer also notices that the node pool autoscaling is enabled with a minimum of 1 and maximum of 5 nodes. However, the cluster is unable to schedule all pods. What is the most likely cause?

A.The cluster is running an older version of Kubernetes that does not support node auto-scaling.

B.The node pool autoscaler is not properly configured to scale up based on pod resource requests.

C.The node auto-repair feature is disabled, causing a node to be unhealthy.

D.The pod resource requests exceed the allocatable resources on the existing nodes after accounting for system reservations.

AnswerD

System reservations (kube-reserved, eviction threshold) reduce allocatable CPU and memory, and the pod requests may exceed what is available.

Why this answer

Option D is correct because the cluster has 3 e2-standard-4 nodes, each with 4 vCPU and 16 GB RAM. After accounting for system reservations (e.g., kubelet, OS, daemonsets), the allocatable memory per node is typically around 13-14 GB. With 10 pods each requesting 2 GB, the total memory request is 20 GB, but the total allocatable memory across 3 nodes is only about 39-42 GB.

However, the autoscaler can only scale up to 5 nodes, and even then, the total allocatable memory would be around 65-70 GB, which is sufficient. The most likely cause is that the pod resource requests exceed the allocatable resources on the existing nodes, preventing scheduling, and the autoscaler may not have triggered yet or is constrained by other factors like CPU or node limits.

Exam trap

Google Cloud often tests the distinction between pod resource requests and limits, and the fact that the Cluster Autoscaler scales based on requests, not limits, leading candidates to overlook system reservations or assume autoscaling is misconfigured.

How to eliminate wrong answers

Option A is wrong because older Kubernetes versions (e.g., 1.15+) do support node autoscaling via the Cluster Autoscaler; the version is unlikely to be the issue. Option B is wrong because the node pool autoscaler is configured to scale based on unschedulable pods, and it does consider pod resource requests; the issue is that the autoscaler may not have scaled up sufficiently or the requests exceed the current node capacity. Option C is wrong because node auto-repair is unrelated to scheduling; it handles node health issues, not resource insufficiency.

Full explanation →

474

MCQeasy

A developer is trying to deploy a Compute Engine instance from a Cloud Build step. The build fails with the above error. What is the problem?

A.The project has exceeded its service account quota.

B.The Cloud Build service account lacks 'compute.instances.create' permission.

C.Cloud Build does not have the 'iam.serviceAccounts.actAs' permission on the default compute service account.

D.The developer's personal account does not have permission to use Cloud Build.

AnswerC

When Cloud Build creates a VM, it must act as the VM's service account.

Why this answer

The error occurs because Cloud Build needs to impersonate the Compute Engine default service account to create a VM instance. The Cloud Build service account requires the 'iam.serviceAccounts.actAs' permission on the target service account to delegate its identity. Without this permission, the build step fails even if the Cloud Build service account has 'compute.instances.create' permission.

Exam trap

Google Cloud often tests the subtle distinction between having resource-level permissions (like 'compute.instances.create') and the 'actAs' permission required to impersonate a service account, leading candidates to incorrectly choose the missing resource permission.

How to eliminate wrong answers

Option A is wrong because service account quotas are separate from IAM permissions; exceeding a quota would produce a different error (e.g., 'quota exceeded'), not a permission denied error. Option B is wrong because the error message specifically indicates an 'actAs' permission issue, not a missing 'compute.instances.create' permission; if that were the problem, the error would reference 'compute.instances.create' directly. Option D is wrong because Cloud Build uses its own service account for execution, not the developer's personal account; the error is about the Cloud Build service account's permissions, not the developer's.

Full explanation →

475

MCQhard

A multinational corporation has deployed a web application across multiple Google Cloud regions using an external HTTPS load balancer with backend services in each region. They recently added a new region (asia-southeast1) and updated the load balancer configuration. After the update, some users in that region report high latency and occasional connection timeouts when accessing the application. The load balancer health checks show all backends as healthy. The network team confirms that the backend instances in asia-southeast1 are correctly configured and can be accessed directly via their external IPs. What should the architects investigate next?

A.Check the Cloud CDN cache settings for the new region

B.Verify that the backend service in asia-southeast1 has the correct timeout settings for the load balancer

C.Ensure that the firewall rules allow traffic from the load balancer's health check ranges to the instances

D.Review the Cloud Armor security policy rules that might be blocking traffic from that region

AnswerD

Cloud Armor geo-filtering may block traffic from that region while allowing health checks from Google IPs.

Why this answer

Option D is correct because Cloud Armor security policies can block traffic based on geographic location. If the new region (asia-southeast1) was added but the Cloud Armor policy was not updated to allow traffic from that region, requests from users in asia-southeast1 could be denied or rate-limited, causing high latency and timeouts even though health checks (which originate from Google's health check ranges, not user IPs) show backends as healthy. The direct access via external IPs works because it bypasses the load balancer and its associated Cloud Armor policy.

Exam trap

Google Cloud often tests the misconception that health check success implies full end-to-end connectivity, but health checks bypass Cloud Armor and firewall rules that apply to user traffic, so healthy backends do not guarantee user traffic is allowed.

How to eliminate wrong answers

Option A is wrong because Cloud CDN cache settings affect content delivery speed and cache hit ratio, not connection timeouts or high latency caused by traffic blocking; CDN would not cause timeouts if the origin is reachable. Option B is wrong because timeout settings on the backend service control how long the load balancer waits for a response from the backend, but since health checks pass and direct access works, timeouts are not the issue; incorrect timeouts would affect all users, not just those in the new region. Option C is wrong because firewall rules for health check ranges are already correctly configured (health checks show all backends as healthy), and the issue is with user traffic, not health check probes; the network team confirmed backend instances are reachable via external IPs, indicating no firewall blockage.

Full explanation →

476

Multi-Selecthard

Which THREE Google Cloud services can be used to implement a zero-trust architecture for network security? (Choose three.)

Select 3 answers

A.Cloud Armor

B.Access Context Manager (ACM)

C.Identity-Aware Proxy (IAP)

D.VPC Networks

E.Cloud VPN

AnswersA, B, C

Cloud Armor provides WAF and DDoS protection at the edge, enforcing security policies.

Why this answer

Options A, B, and C are correct. Cloud Armor enforces security policies at the edge, Identity-Aware Proxy (IAP) verifies identity and context before granting access, and Access Context Manager defines access levels based on device, IP, etc., to enforce fine-grained access. Option D is wrong because Cloud VPN is for network connectivity, not zero-trust security.

Option E is wrong because VPC Networks are the underlying network, but zero-trust requires beyond network perimeters.

Full explanation →

477

MCQmedium

Alice needs to read objects in the bucket 'secret-bucket'. Based on the IAM policy, what is her effective access?

A.Alice cannot read objects because the deny rule overrides all allow bindings.

B.Alice can read objects only if she also has objectCreator role.

C.Alice can read objects because objectAdmin grants read access and is not denied.

D.Alice cannot read objects because the deny rule removes objectViewer and she has no other read access.

AnswerC

objectAdmin includes read, and the deny only applies to objectViewer.

Why this answer

The deny rule denies objectViewer specifically on secret-bucket, but she also has objectAdmin which overrides (deny does not block other roles). So she can read via objectAdmin. Option B is incorrect because deny rules only remove the specified role.

Option C is incorrect because objectAdmin includes read. Option D is incorrect.

Full explanation →

478

MCQeasy

A service account needs to be able to start and stop Compute Engine instances in a specific project. Which IAM role should be assigned at the project level?

A.roles/iam.serviceAccountUser

B.roles/editor

C.roles/compute.viewer

D.roles/compute.instanceAdmin.v1

AnswerD

Grants necessary permissions to start and stop instances.

Why this answer

The correct answer is D, roles/compute.instanceAdmin.v1, because this role grants the necessary permissions to start, stop, and manage Compute Engine instances, including operations like instances.start and instances.stop, at the project level. This role is specifically designed for managing compute resources without granting broader project-level access like editing all resources.

Exam trap

Google Cloud often tests the distinction between primitive roles (like roles/editor) and predefined roles (like roles/compute.instanceAdmin.v1), where candidates mistakenly choose the broader role due to its apparent convenience, overlooking the principle of least privilege and the specific permissions required for the task.

How to eliminate wrong answers

Option A is wrong because roles/iam.serviceAccountUser grants permission to impersonate service accounts, not to manage Compute Engine instances; it allows attaching a service account to a resource but does not include compute.instance.start or compute.instance.stop. Option B is wrong because roles/editor is a broad, primitive role that grants full edit access to all resources in the project, including Compute Engine, but it violates the principle of least privilege by providing excessive permissions beyond what is needed for instance management. Option C is wrong because roles/compute.viewer only provides read-only permissions to view Compute Engine resources (e.g., compute.instances.list, compute.instances.get) and does not include any write or action permissions like starting or stopping instances.

Full explanation →

479

MCQeasy

A company stores sensitive data in Cloud Storage and wants to enforce encryption at rest using customer-managed keys. Which Google Cloud service should they use to manage the keys?

A.Cloud HSM

B.Secret Manager

C.Cloud KMS

D.IAM

AnswerC

Manages customer-managed encryption keys for Cloud Storage.

Why this answer

Cloud KMS (Key Management Service) is the correct choice because it is the native Google Cloud service for managing cryptographic keys, including customer-managed encryption keys (CMEK). It allows you to create, rotate, and control access to keys used to encrypt data at rest in Cloud Storage, and it integrates directly with Cloud Storage's CMEK feature. Cloud HSM is a hardware-backed key management option but is built on top of Cloud KMS, not a separate service for key management.

Exam trap

The trap here is that candidates confuse Cloud HSM as a separate key management service, but Cloud HSM is actually a hardware-backed key storage option that requires Cloud KMS for key management, not a replacement for it.

How to eliminate wrong answers

Option A is wrong because Cloud HSM is a hardware security module service that provides FIPS 140-2 Level 3 validated key storage, but it is an add-on to Cloud KMS, not a standalone key management service; you still use Cloud KMS to manage the keys stored in HSM. Option B is wrong because Secret Manager is designed to store and manage secrets such as API keys, passwords, and certificates, not for managing encryption keys used for data at rest in Cloud Storage. Option D is wrong because IAM (Identity and Access Management) is a service for managing access control and permissions, not for creating, storing, or managing encryption keys.

Full explanation →

480

MCQeasy

You manage a batch data processing workload on Compute Engine that runs daily on a single VM. The VM uses a standard persistent disk (pd-standard) for input data and output results. Recently, the VM crashed due to a hardware failure, and the job failed. You need to implement a solution that automatically recovers from VM failures with minimal data loss. The job is idempotent and can restart from the beginning if necessary. Which solution should you choose?

A.Take a snapshot of the persistent disk every hour and create a new VM from the latest snapshot on failure

B.Use Cloud Scheduler to restart the VM every hour until the job completes

C.Add a startup script to the existing VM to rerun the job on boot, and enable automatic restart

D.Create a managed instance group (MIG) with an instance template that includes a startup script to run the job, and enable autohealing

AnswerD

Correct: MIG autohealing recreates VM on failure.

Why this answer

Option D is correct because a managed instance group (MIG) with autohealing automatically recreates a VM instance when it fails, and the startup script ensures the idempotent job reruns from the beginning on the new VM. This minimizes data loss by using the same persistent disk (or a fresh one) and leverages Compute Engine's health check mechanism to detect failure and trigger recovery without manual intervention.

Exam trap

The trap here is that candidates confuse automatic restart (which only works for transient failures on the same VM) with autohealing (which recreates the VM after hardware failure), leading them to pick Option C instead of D.

How to eliminate wrong answers

Option A is wrong because hourly snapshots introduce up to 1 hour of potential data loss and require manual steps to create a new VM from the snapshot, which does not provide automatic recovery. Option B is wrong because Cloud Scheduler restarting the VM every hour does not detect actual VM failure; it blindly restarts on a schedule, which could interrupt a running job and does not address hardware failure recovery. Option C is wrong because enabling automatic restart on a single VM only recovers from transient failures (e.g., host maintenance), not from hardware failures that destroy the VM; the VM must be recreated, and a startup script on a dead VM cannot execute.

Full explanation →

481

MCQhard

An e-commerce platform uses Cloud Spanner for order processing. Recently, latency spikes have occurred during flash sales. The team suspects hot spots due to monotonically increasing order IDs. Which table design change would best solve this?

A.Remove the primary key and let Spanner auto-generate it.

B.Use interleaved tables to store orders under customers.

C.Add a random prefix to the order ID primary key.

D.Create a secondary index on the timestamp column.

AnswerC

Randomizing the first part of the key distributes writes across splits.

Why this answer

Monotonically increasing primary keys (like sequential order IDs) cause hot spots in Cloud Spanner because all writes are directed to a single split (tablet), overwhelming that node. Adding a random prefix (e.g., a hash of the customer ID) distributes writes across multiple splits, eliminating the hot spot and reducing latency spikes during high-throughput flash sales.

Exam trap

Google Cloud often tests the misconception that secondary indexes or interleaved tables can fix write hot spots, when in reality only primary key distribution strategies (like hash prefixes) address the root cause of split-level contention.

How to eliminate wrong answers

Option A is wrong because removing the primary key and relying on auto-generation still produces monotonically increasing values (e.g., Spanner's auto-generated keys are sequential), which does not solve the hot spot issue. Option B is wrong because interleaved tables organize child rows under a parent row, but if the parent key is monotonically increasing, writes still concentrate on the same split, failing to distribute load. Option D is wrong because a secondary index on the timestamp column does not affect the distribution of primary key writes; it only helps query performance, not write hot spots.

Full explanation →

482

Multi-Selectmedium

Which TWO actions reduce egress costs when transferring data from Compute Engine to the internet? (Choose 2)

Select 2 answers

A.Use Cloud NAT for outbound traffic

B.Use Cloud CDN to cache content

C.Move instances to a lower-cost region

D.Use Premium Tier networking

E.Compress data before sending it

AnswersB, E

Reduces origin egress by serving from edge caches.

Why this answer

Options A and D are correct. Using a Cloud CDN caches content at edge locations, reducing egress from origin. Using a Premium Tier network does not reduce egress costs; it may increase.

Compressing data reduces egress volume, but it's not always applicable and may affect latency. Moving to a different region does not reduce egress cost. There are only two correct: A and D.

Actually, compressing data reduces the amount of data transferred, so that reduces egress cost. Let's reconsider. The correct answers: Using Cloud CDN (caches content, reduces origin egress) and using data compression (reduces size).

Option A is correct, option D is correct. Option B (move to lower-cost region) does not reduce egress cost because egress is per GB, same worldwide. Option C (use Premium Tier) increases cost.

Option E (use a NAT gateway) does not reduce egress.

Full explanation →

483

MCQeasy

An application running on Compute Engine instances behind a load balancer experiences intermittent failures. Health checks show instances passing, but some users get errors. What should be the first troubleshooting step?

A.Increase instance size.

B.Review the application logs for errors.

C.Enable HTTP health checks.

D.Check the load balancer's backend service configuration for session affinity.

AnswerB

Logs reveal application-level errors.

Why this answer

The correct first step is to review the application logs (Option B) because the issue is intermittent failures despite healthy load balancer health checks. Since health checks confirm the instances are reachable and responding correctly at the health check endpoint, the problem likely lies within the application itself—such as request handling errors, timeouts, or resource contention. Application logs provide the most direct evidence of what is happening when users encounter errors, enabling targeted debugging before modifying infrastructure.

Exam trap

The trap here is that candidates assume health check failures are the cause of user errors, but Cisco tests the distinction between infrastructure-level health (passing) and application-level errors (logged), leading them to incorrectly adjust health checks or backend configuration instead of inspecting application logs.

How to eliminate wrong answers

Option A is wrong because increasing instance size addresses resource constraints (CPU/memory) but does not target the root cause of intermittent errors when health checks pass; it is a reactive scaling action, not a diagnostic step. Option C is wrong because enabling HTTP health checks (if not already enabled) would only change the health check protocol from TCP to HTTP, but the instances are already passing health checks, so the issue is not with health check configuration. Option D is wrong because checking the load balancer's backend service configuration for session affinity is premature; session affinity (sticky sessions) could cause uneven load distribution but would not explain intermittent errors if health checks are passing—this is a configuration review step, not the first troubleshooting action.

Full explanation →

484

MCQeasy

A team wants to allow a service account to be used only on specific Compute Engine VMs. Which IAM condition should be applied to the service account's roles?

A.resource.service

B.resource.owner

C.resource.name

D.resource.type

E.resource.labels

AnswerC

Correct. resource.name can be used to restrict to specific resources.

Why this answer

Option C is correct because the `resource.name` IAM condition allows you to restrict a service account's roles to specific Compute Engine VM instances by matching the VM's resource name (e.g., `projects/project-id/zones/zone/instances/instance-name`). This ensures the service account can only be used on designated VMs, enforcing fine-grained access control.

Exam trap

Google Cloud often tests the misconception that `resource.type` or `resource.labels` can restrict access to a specific VM, but only `resource.name` provides a unique identifier for a single instance, while labels are for grouping and can change over time.

How to eliminate wrong answers

Option A is wrong because `resource.service` is not a valid IAM condition attribute for Compute Engine VMs; it is used in other services like Cloud Storage to match the service name. Option B is wrong because `resource.owner` is not a standard IAM condition attribute; IAM conditions use resource attributes like `resource.name`, not ownership metadata. Option D is wrong because `resource.type` refers to the resource type (e.g., `compute.googleapis.com/Instance`), which cannot narrow access to specific VMs—it applies to all VMs of that type.

Option E is wrong because `resource.labels` can filter VMs by label key-value pairs, but it does not uniquely identify a specific VM instance; labels are mutable and can be shared across multiple VMs, making them unsuitable for restricting to a single VM.

Full explanation →

485

MCQmedium

A company uses Cloud Logging to monitor their application logs. They notice that some logs from their Compute Engine instances are missing. The instances have the required logging permission. What is the most likely cause?

A.The log sink is not configured correctly.

B.The logging agent is not configured to send logs to Cloud Logging.

C.The instances are using a custom image without the logging agent.

D.The log bucket is in a different project.

E.The log entries are being filtered by the exclusion filter.

AnswerB

The logging agent must be installed and configured to forward logs.

Why this answer

Compute Engine instances do not automatically send logs to Cloud Logging. They require the Cloud Logging agent (based on fluentd) to be installed and configured to forward logs. Even with correct IAM permissions, without the agent, logs will not be collected.

Option B correctly identifies this missing agent as the most likely cause.

Exam trap

Google Cloud often tests the distinction between log collection (agent) and log routing (sinks) — the trap here is that candidates assume IAM permissions alone are sufficient, overlooking the mandatory agent installation and configuration step.

How to eliminate wrong answers

Option A is wrong because a log sink controls where logs are routed (e.g., to BigQuery or Pub/Sub), not whether logs are collected from instances; missing logs are a collection issue, not a routing issue. Option C is wrong because while a custom image might lack the agent, the question states the instances have the required logging permission, implying the agent could be installed separately; the most likely cause is the agent not being configured, not the image itself. Option D is wrong because log buckets in a different project would still receive logs if the sink is configured correctly; the issue is logs not appearing at all, not appearing in the wrong project.

Option E is wrong because exclusion filters remove logs after they are ingested; if logs are missing entirely, they were never ingested, so exclusion is not the cause.

Full explanation →

486

MCQhard

An organization has a multi-regional deployment of a stateful application on GKE using regional persistent disks. They need to implement disaster recovery with an RPO of less than 1 hour and RTO of 30 minutes. What is the most cost-effective approach?

A.Use zonal persistent disks and take snapshots every 45 minutes, then restore in secondary region.

B.Use regional persistent disks with asynchronous replication to a secondary region and deploy GKE clusters in both regions with a load balancer directing traffic.

C.Use a third-party replication tool to asynchronously replicate data to the secondary region.

D.Use Cloud Storage FUSE to write state to a multi-regional bucket and read from secondary cluster.

AnswerB

Regional pd already replicates within zone; adding asynchronous cross-region replication meets RPO/RTO.

Why this answer

Option B is correct because regional persistent disks with asynchronous replication provide built-in, managed replication to a secondary region, meeting the RPO of less than 1 hour and RTO of 30 minutes without additional infrastructure costs. By deploying GKE clusters in both regions and using a load balancer, traffic can be redirected to the secondary cluster within the RTO, making this the most cost-effective approach as it avoids third-party tools or complex manual processes.

Exam trap

The trap here is that candidates often confuse the cost-effectiveness of snapshots (Option A) with the need for low RPO/RTO, overlooking that snapshot-based recovery cannot meet sub-hour RTOs due to restore times, while regional persistent disk replication provides near-continuous replication at a lower total cost than third-party tools.

How to eliminate wrong answers

Option A is wrong because zonal persistent disks with snapshots every 45 minutes cannot guarantee an RPO of less than 1 hour due to snapshot consistency delays and the time required to restore volumes in a secondary region, which would exceed the 30-minute RTO. Option C is wrong because using a third-party replication tool introduces additional licensing, operational overhead, and potential compatibility issues, making it less cost-effective than Google's native asynchronous replication. Option D is wrong because Cloud Storage FUSE introduces significant latency and consistency challenges for stateful applications, and multi-regional buckets do not provide the low-latency, consistent storage required for a stateful application's RPO and RTO targets.

Full explanation →

487

MCQmedium

A company wants to migrate on-premises workloads to Google Cloud. They need to assess the existing infrastructure, plan the migration, and track progress. Which tool should they use?

A.Cloud Endpoints.

B.Cloud Deployment Manager.

C.Cloud Foundation Toolkit.

D.Migrate for Compute Engine.

AnswerD

Provides assessment and migration capabilities.

Why this answer

Migrate for Compute Engine (formerly Velostrata) is the correct tool because it is specifically designed to assess, plan, and migrate on-premises workloads to Google Cloud. It provides discovery of existing infrastructure, generates migration plans, and tracks progress through a dashboard, directly addressing the need for assessment, planning, and tracking.

Exam trap

The trap here is that candidates may confuse Cloud Foundation Toolkit (a foundation setup tool) with a migration tool, or assume Cloud Deployment Manager can handle migration planning, when in fact only Migrate for Compute Engine provides the full assessment-to-tracking workflow.

How to eliminate wrong answers

Option A is wrong because Cloud Endpoints is an API management service for securing and monitoring APIs, not a migration assessment or planning tool. Option B is wrong because Cloud Deployment Manager is an infrastructure-as-code tool for deploying Google Cloud resources using templates, not for assessing or migrating on-premises workloads. Option C is wrong because Cloud Foundation Toolkit provides Terraform templates and best practices for setting up a Google Cloud foundation (e.g., projects, networking), but it does not include discovery, assessment, or migration tracking for existing on-premises workloads.

Full explanation →

488

MCQhard

An organization is running a stateful workload on Compute Engine with a single persistent disk. They want to migrate to a regional persistent disk for higher availability. The disk is 500 GB and currently 80% full. They need zero downtime during the migration. What is the recommended approach?

A.Attach a new regional disk to the instance and use RAID 1 mirroring.

B.Create a snapshot of the disk, then create a new regional persistent disk from that snapshot, and attach it to the instance.

C.Use rsync to copy data to a new regional disk while the instance is running.

D.Use gcloud compute disks resize to change the disk type to regional.

AnswerB

This is the recommended migration path; snapshot creation is the only downtime window.

Why this answer

Option B is correct because creating a snapshot of the existing persistent disk and then creating a new regional persistent disk from that snapshot allows you to attach the new disk to the instance with zero downtime. The snapshot captures the disk state at a point in time, and the regional disk is created asynchronously; once available, you can detach the original disk and attach the regional disk without stopping the instance, as Compute Engine supports live disk attachment/detachment.

Exam trap

Google Cloud often tests the misconception that you can change a disk's type in-place using a resize or update command, but the only supported way to switch from zonal to regional is to create a new disk from a snapshot or image.

How to eliminate wrong answers

Option A is wrong because RAID 1 mirroring requires two disks of the same type and is not a supported feature for attaching a regional disk to a running instance; it would also require downtime to configure the RAID array. Option C is wrong because rsync does not provide a consistent point-in-time copy of a disk that is actively being written to, risking data inconsistency and requiring application-level quiescence to avoid corruption. Option D is wrong because gcloud compute disks resize does not support changing a disk's type from zonal to regional; you must create a new regional disk from a snapshot or image, not modify the existing disk.

Full explanation →

489

MCQmedium

A web application running on Compute Engine behind a global HTTP(S) load balancer experiences high latency during traffic spikes. Which quick fix would best address this issue without changing the architecture?

A.Configure managed instance group autoscaling to add more instances.

B.Enable Cloud CDN on the load balancer.

C.Switch to a regional load balancer to reduce latency.

D.Increase the machine type of the backend instances.

AnswerA

Horizontal scaling quickly increases capacity.

Why this answer

Managed instance group (MIG) autoscaling dynamically adds more instances when CPU utilization or other metrics exceed a threshold, directly absorbing the increased traffic during spikes. This is the quickest fix because it requires no architectural changes—just configuring autoscaling parameters on the existing MIG. By scaling out horizontally, the load balancer can distribute requests across more backends, reducing per-instance load and latency.

Exam trap

Google Cloud often tests the distinction between horizontal scaling (autoscaling) and vertical scaling (increasing machine type) or caching solutions, leading candidates to choose Cloud CDN or machine type changes as a 'quick fix' when the real issue is insufficient compute capacity to handle dynamic request spikes.

How to eliminate wrong answers

Option B is wrong because enabling Cloud CDN caches static content at edge locations, which does not help with high latency caused by dynamic request processing during traffic spikes—CDN only reduces latency for cacheable content, not for the dynamic workload that is overwhelming the backend. Option C is wrong because switching to a regional load balancer would actually increase latency for global users, as it lacks the anycast IP and global distribution of the global HTTP(S) load balancer, and it requires architectural changes (e.g., changing the load balancer type). Option D is wrong because increasing the machine type (vertical scaling) is not a quick fix—it requires instance recreation or rolling update, and it does not scale as elastically as horizontal autoscaling; it also may not handle sudden spikes as effectively as adding more instances.

Full explanation →

490

Multi-Selecteasy

A company is using BigQuery for data analytics. They want to optimize costs while maintaining query performance. Which TWO actions should they take? (Choose 2.)

Select 2 answers

A.Use reserved slots with flat-rate pricing.

B.Always use SELECT *.

C.Partition tables by date.

D.Materialize frequently used queries as tables.

E.Use clustering on frequently filtered columns.

AnswersC, E

Partitioning reduces the amount of data scanned, lowering costs.

Why this answer

Partitioning tables by date (Option C) is correct because it allows BigQuery to prune partitions during query execution, scanning only the relevant date ranges instead of the entire table. This reduces the amount of data processed, directly lowering query costs under on-demand pricing while maintaining performance through reduced I/O.

Exam trap

Google Cloud often tests the distinction between cost optimization and performance optimization, and the trap here is that candidates might choose reserved slots (Option A) thinking it always reduces costs, when in fact it is a pricing model that only benefits sustained high usage, not a direct cost-reduction technique for typical query patterns.

Full explanation →

491

Multi-Selecteasy

A company deploys a critical application on Google Kubernetes Engine (GKE) and wants to ensure high availability during cluster upgrades. Which TWO practices should they follow?

Select 2 answers

A.Use a single-zone node pool with multiple replicas.

B.Use multiple node pools across different zones within the cluster.

C.Configure PodDisruptionBudgets to allow only a small number of pods to be unavailable during upgrades.

D.Enable cluster autoscaling to add nodes during upgrades.

E.Enable regional clusters for multi-zone control plane.

AnswersB, C

Multi-zone node pools allow pods to be rescheduled in other zones during upgrades.

Why this answer

Option B is correct because deploying multiple node pools across different zones ensures that if one zone fails or is taken down for maintenance, the application can continue serving from the other zones. This aligns with GKE's best practice for high availability by distributing workloads across failure domains. Option C is correct because PodDisruptionBudgets (PDBs) define the minimum number of pods that must remain available during voluntary disruptions like cluster upgrades, preventing the upgrade from taking down too many replicas at once.

Exam trap

The trap here is that candidates often confuse control plane high availability (regional clusters) with application-level high availability, or they assume autoscaling can compensate for disruption during upgrades, when in fact PDBs and multi-zone node pools are the correct mechanisms.

Full explanation →

492

MCQeasy

A startup runs a web application on App Engine standard environment. They want to ensure the application can handle sudden traffic spikes without manual intervention. Which App Engine feature should they configure?

A.Manual scaling with a fixed number of instances.

B.Basic scaling with automatic instance creation.

C.Resident instances with a minimum number of always-on instances.

D.Custom scaling based on CPU utilization.

E.Automatic scaling with a maximum number of idle instances.

AnswerE

Automatic scaling dynamically creates instances to handle traffic spikes.

Why this answer

Option E is correct because App Engine's automatic scaling with a maximum number of idle instances is designed to handle sudden traffic spikes by dynamically creating and removing instances based on request load. This configuration allows the application to scale up quickly when traffic increases, ensuring responsiveness without manual intervention, while the maximum idle instances setting prevents over-provisioning and controls costs.

Exam trap

The trap here is that candidates often confuse 'basic scaling' with 'automatic scaling' because both involve dynamic instance creation, but basic scaling does not maintain idle instances and is unsuitable for handling sudden traffic spikes without latency.

How to eliminate wrong answers

Option A is wrong because manual scaling with a fixed number of instances requires manual intervention to adjust capacity, which does not handle sudden traffic spikes automatically. Option B is wrong because basic scaling creates instances only when a request is received and shuts them down after processing, leading to cold starts and latency under sudden spikes, and it does not maintain a pool of idle instances for immediate handling. Option C is wrong because resident instances with a minimum number of always-on instances are a feature of manual scaling, not automatic scaling, and they do not dynamically scale up or down in response to traffic spikes.

Option D is wrong because custom scaling based on CPU utilization is not a native App Engine scaling type; App Engine offers automatic, basic, and manual scaling, and custom scaling is not a supported configuration option.

Full explanation →

493

MCQeasy

A company wants to use Cloud Armor to protect their HTTP load balancer from SQL injection attacks. Which rule action should they configure to block malicious requests?

A.Use a pre-configured WAF rule that includes 'evaluatePreconfiguredExpr('sqli-stable')' with action 'deny(403)'.

B.Configure a rate-limiting rule with action 'rateLimit' to throttle traffic from suspicious IPs.

C.Create a rule that redirects traffic to a reCAPTCHA challenge for validation.

D.Set a security policy rule with action 'deny(403)' and a simple condition on the user-agent header.

AnswerA

Cloud Armor's pre-configured WAF rules detect common SQL injection signatures.

Why this answer

Option B is correct because Cloud Armor's pre-configured WAF rules include 'sqli' to detect SQL injection patterns. Option A is wrong because 'deny(403)' alone does not inspect the request body. Option C is wrong because 'rateLimit' limits request rate but does not inspect for SQL injection.

Option D is wrong because 'redirect' does not block.

Full explanation →

494

MCQeasy

A company runs a web application on Compute Engine instances behind a global HTTP(S) Load Balancer. The application uses Cloud SQL for MySQL for user data. Users report that during peak hours, the page load times increase significantly. The development team notices that the number of database connections exceeds the maximum allowed, causing some requests to fail. The application is designed to use connection pooling with a maximum pool size of 100 connections per instance. There are currently 10 instances. The Cloud SQL instance is configured with 4 vCPUs and 15 GB memory, and the maximum connections is set to 400. The application team wants to minimize cost while resolving the issue. What should the architect recommend?

A.Reduce the max pool size per instance to 40 connections.

B.Increase the Cloud SQL instance tier to have more vCPUs and memory.

C.Implement connection pooling at the global HTTP(S) Load Balancer level.

D.Use Cloud SQL Proxy with connection pooling.

AnswerA

This reduces total connections to 400, matching the Cloud SQL max and resolving the issue at no extra cost.

Why this answer

Option B is correct. With 10 instances each using a pool of 100 connections, the potential connections is 1000, far exceeding the Cloud SQL max of 400. Reducing the pool size to 40 per instance brings the total to 400, fitting within the limit without additional cost.

Option A increases cost unnecessarily as the current tier can handle the load when connections are properly sized. Option C is not feasible because load balancers do not manage database connection pooling. Option D does not reduce the total connection count and adds complexity without solving the core issue.

Full explanation →

495

MCQhard

An organization runs a Kubernetes cluster on GKE with cluster autoscaling enabled. They notice that pods are frequently in 'Pending' state due to insufficient CPU, but the cluster autoscaler does not add nodes quickly enough. What is the most likely cause?

A.The cluster autoscaler is using the 'least-waste' expander.

B.The horizontal pod autoscaler (HPA) is misconfigured.

C.The pod disruption budget (PDB) is too restrictive.

D.The node pool has reached the maximum node count limit.

AnswerD

Cluster autoscaler cannot exceed max node limit.

Why this answer

Option D is correct because the cluster autoscaler cannot add new nodes if the node pool has already reached its maximum node count limit. This limit is configured at the node pool level in GKE, and once reached, the autoscaler will not scale up further, leaving pods in 'Pending' state due to insufficient CPU resources.

Exam trap

Google Cloud often tests the distinction between pod-level scaling (HPA) and node-level scaling (cluster autoscaler), and the trap here is that candidates confuse a restrictive PDB with a node pool limit, or assume the expander strategy directly causes scaling delays.

How to eliminate wrong answers

Option A is wrong because the 'least-waste' expander selects a node pool that minimizes resource waste after scaling, but it does not prevent the autoscaler from adding nodes; it only affects which node pool is chosen. Option B is wrong because the HPA scales pods based on CPU or memory utilization, not nodes; a misconfigured HPA would cause incorrect pod scaling, not a delay in node addition by the cluster autoscaler. Option C is wrong because a pod disruption budget (PDB) controls the number of pods that can be voluntarily disrupted during maintenance or upgrades, not the ability of the cluster autoscaler to add nodes.

Full explanation →

496

Multi-Selecteasy

Which TWO methods can be used to encrypt data at rest in BigQuery?

Select 2 answers

A.Use a Cloud Storage bucket with bucket-level default encryption.

B.Use Customer-Managed Encryption Keys (CMEK) via Cloud KMS.

C.Use Cloud SQL with encryption at rest.

D.Use Customer-Supplied Encryption Keys (CSEK).

E.Use Cloud Bigtable with encryption at rest.

AnswersB, D

BigQuery tables can use CMEK.

Why this answer

BigQuery supports both CMEK (Cloud KMS key) and CSEK (customer-supplied encryption key) for data at rest encryption. Option C is for Cloud Storage, not BigQuery. Option D is for Cloud SQL.

Option E is for Bigtable.

Full explanation →

497

MCQmedium

Refer to the exhibit. A user alice@example.com is unable to list objects in bucket 'bucket-b'. What is the most likely reason?

A.The condition restricts access only to bucket-a.

B.The IAM policy is missing the roles/storage.objectAdmin role.

C.The condition expression is invalid.

D.The user needs the roles/storage.legacyBucketReader role.

AnswerA

The condition expression limits the role to buckets with names starting with 'bucket-a'.

Why this answer

Option B is correct. The IAM condition restricts the storage.objectViewer role to only objects whose name starts with 'bucket-a'. Since alice is trying to list objects in bucket-b, the condition prevents access.

Option A is incorrect because the objectViewer role is sufficient to list objects. Option C is incorrect because legacyBucketReader is not required for listing. Option D is incorrect because the condition expression is valid.

Full explanation →

498

MCQeasy

A development team uses Cloud Build for their CI/CD pipeline. They want to reduce build times. Which action is most effective?

A.Store build artifacts in a Cloud Storage bucket and reuse them

B.Enable parallel builds by separating build steps into multiple jobs

C.Use more powerful build machines by specifying larger machine types

D.Use a Cloud Run service to run builds asynchronously

AnswerB

Parallelizing independent steps reduces overall build duration.

Why this answer

Enabling parallel builds by separating build steps into multiple jobs reduces total build time by running independent steps concurrently. Other options are less effective or add complexity.

Full explanation →

499

MCQhard

Refer to the exhibit. The SLO for the payments-api service is 99.9% availability over 30 days. The current compliance is 99.89% and the error budget is exhausted. Which action should the SRE team take FIRST?

A.Increase the SLO target to 99.99% to reduce future burn rate.

B.Pause all non-critical deployments and investigate the cause of the increased error rate.

C.Trigger a rollback of the latest deployment to stabilize the service.

D.Scale up the service to handle more traffic and reduce error rate.

AnswerB

This aligns with error budget policy: when budget is exhausted, slow down or stop deployments to prevent further errors.

Why this answer

With the error budget exhausted and a high burn rate, the team should immediately stop all non-critical deployments to prevent further degradation and allow the error budget to recover.

Full explanation →

500

MCQhard

A company runs an e-commerce platform on Google Cloud. The application is deployed on Google Kubernetes Engine (GKE) with a regional cluster (us-central1, three zones). The frontend service is exposed via an HTTP Load Balancer with Cloud CDN. Recently, during a flash sale, users experienced high latency and occasional 502 errors. The backend service is a Java application that reads from Cloud Spanner. The team has observed that Spanner CPU utilization averaged 65% during the sale, with a few spikes to 80%. The number of frontend pods was auto-scaled to 50, each running on n1-standard-2 nodes. The node pool is set to autoscale up to 100 nodes. The errors appear to correlate with periods of high CPU on the nodes, but not always. What is the most likely cause and recommended action?

A.Scale up the Cloud Spanner instance to handle higher peak CPU, as the 80% spikes indicate insufficient capacity.

B.Change the backend service to use a multi-zone NEG that includes endpoints from all three zones, and ensure the load balancer is configured for cross-zone load balancing.

C.Increase the CPU request for the frontend pods and set a higher target CPU utilization for the Horizontal Pod Autoscaler.

D.Increase the health check interval and timeout settings to give pods more time to respond before being marked unhealthy.

AnswerB

This ensures traffic is distributed evenly across zones, reducing cross-zone latency and preventing a single zone from being overloaded.

Why this answer

The high latency and 502 errors are likely caused by the HTTP Load Balancer sending requests to unhealthy backend pods due to zone-imbalanced traffic. A regional GKE cluster with a multi-zone NEG and cross-zone load balancing ensures that the load balancer distributes requests evenly across all pods in all three zones, preventing node CPU spikes in a single zone from causing errors. Option B directly addresses this by enabling proper traffic distribution, which is the most probable root cause given that node CPU spikes correlate with errors but not always.

Exam trap

The trap here is that candidates focus on scaling the database (Spanner) or adjusting pod-level configurations, when the real issue is zone-imbalanced traffic distribution from the HTTP Load Balancer, a common misdiagnosis in multi-zone GKE setups.

How to eliminate wrong answers

Option A is wrong because Spanner CPU at 65% average with spikes to 80% is well within acceptable limits (Spanner can handle up to 65-70% sustained CPU before needing scaling, and 80% spikes are transient); the errors correlate with node CPU, not Spanner CPU, so scaling Spanner would not resolve the issue. Option C is wrong because increasing CPU requests and HPA target utilization would reduce pod density per node, potentially worsening node CPU spikes and not addressing the load balancer's zone-imbalanced traffic distribution. Option D is wrong because increasing health check intervals and timeouts would make the load balancer slower to detect unhealthy pods, increasing the chance of routing traffic to failing pods and exacerbating 502 errors, not reducing them.

Full explanation →

501

Multi-Selecthard

A team is designing a disaster recovery (DR) plan for a critical application. Which THREE components are essential for a robust DR plan? (Choose 3)

Select 3 answers

A.Failover procedures and runbooks

B.Regular backups to a separate region

C.A single-region deployment for consistency

D.Monitoring and alerting for disaster events

E.Load testing to validate performance

AnswersA, B, D

Well-documented failover steps ensure quick recovery.

Why this answer

Failover procedures and runbooks (A) are essential because they provide step-by-step instructions for executing a controlled transition to the secondary site, ensuring minimal downtime and consistent recovery actions. Without documented runbooks, teams risk misconfigurations during a disaster, which can extend recovery time objectives (RTO) beyond acceptable limits.

Exam trap

Google Cloud often tests the misconception that a single-region deployment is acceptable for DR if it has high availability within that region, but the exam emphasizes that DR requires geographic separation to survive a full regional failure.

Full explanation →

502

MCQmedium

A company is using Cloud Functions (2nd gen) for event-driven processing of uploaded images in Cloud Storage. Each image is resized to multiple sizes and stored back in different buckets. Recently, the number of uploads has increased 10x, and the team notices that some images are not being processed, and logs show function execution timeouts after 60 seconds. The function's timeout is set to 60 seconds. The code processes images sequentially. The team needs to reliably process all images with minimal code changes. What should they do?

A.Increase the memory allocation for the function.

B.Use Cloud Tasks to queue the processing and run the function as a task handler.

C.Increase the Cloud Function timeout to 540 seconds.

D.Split the function into separate functions for each resize operation.

AnswerB

Cloud Tasks allows asynchronous processing with retries, reducing timeouts and enabling parallel execution.

Why this answer

Option D is correct because using Cloud Tasks decouples the processing, allowing retries and better scaling. Option A (increase timeout) delays the problem. Option B (split into multiple functions) requires significant code changes.

Option C (increase memory) may speed up processing but may not handle all image sizes reliably.

Full explanation →

503

MCQeasy

A company wants to automatically rotate cryptographic keys on a schedule without manual intervention. Which service should they use?

A.Cloud Key Management Service (KMS)

B.Secret Manager

C.Cloud Audit Logs

D.Cloud IAM

AnswerA

Cloud KMS allows automatic key rotation.

Why this answer

Cloud KMS supports automatic key rotation. Option A is for managing secrets. Option C is for access control.

Option D is for auditing.

Full explanation →

504

MCQhard

Your company runs a critical application on Google Kubernetes Engine (GKE) in us-central1. The application consists of a frontend deployment with 3 replicas and a backend statefulset with 5 replicas using persistent volumes (SSD). Recently, the team noticed that during a regional outage in us-central1, the application became completely unavailable. They want to design a multi-region architecture that can survive a regional failure with RPO of 1 hour and RTO of 30 minutes. The application is stateless on the frontend but the backend stores critical data on persistent disks. The backend can operate in a read-only mode from a secondary region if needed. They have a limited budget and want to minimize ongoing costs. Which approach should they take?

A.Migrate the backend to Cloud SQL for MySQL with cross-region replication, and keep the frontend on GKE with multi-region ingress.

B.Deploy the frontend and backend in a regional GKE cluster and use regional persistent disks for the statefulset, replicating data synchronously across zones.

C.Deploy the frontend and backend in a single zonal cluster in us-central1-a, and use scheduled snapshots of persistent disks to a different region.

D.Deploy the frontend and backend in a regional GKE cluster across us-central1, and use a CronJob to take snapshots of persistent volumes every hour and copy them to a secondary region. In disaster, restore the snapshots to a new cluster in the secondary region.

AnswerD

Regional cluster survives zonal failure; snapshots provide cross-region backup with RPO 1 hour and RTO within 30 minutes if restore is automated.

Why this answer

Option D meets the RPO of 1 hour by using a CronJob to take hourly snapshots of PersistentVolume data and copy them to a secondary region. In a disaster, you restore those snapshots to a new GKE cluster in the secondary region, achieving an RTO of 30 minutes by automating the restore process. This approach minimizes ongoing costs because snapshots are incremental and you only pay for storage in the secondary region when needed, while the frontend remains stateless and can be redeployed quickly.

Exam trap

Google Cloud often tests the distinction between zonal, regional, and multi-region resilience; the trap here is that candidates may choose regional persistent disks (Option B) thinking they provide multi-region protection, when in fact they only replicate across zones within a single region.

How to eliminate wrong answers

Option A is wrong because migrating to Cloud SQL for MySQL with cross-region replication introduces significant ongoing costs for a managed database service and may not align with the existing statefulset architecture; it also requires application changes to use Cloud SQL instead of persistent disks. Option B is wrong because regional persistent disks replicate synchronously across zones within a single region, which does not protect against a full regional outage in us-central1. Option C is wrong because a single zonal cluster in us-central1-a cannot survive a regional failure, and scheduled snapshots to a different region without a restore plan in a secondary cluster do not meet the RTO of 30 minutes.

Full explanation →

505

MCQeasy

A user wants to store a database password that will be used by a Compute Engine instance. What is the most secure and manageable approach?

A.Use Secret Manager and grant the instance's service account access to the secret

B.Set the password as an environment variable in instance metadata

C.Store the password in Cloud Storage bucket metadata

D.Store the password in a file on the instance's boot disk

AnswerA

Secret Manager is the recommended way to store secrets with fine-grained access control.

Why this answer

Secret Manager is the most secure and manageable approach because it provides encrypted storage, automatic rotation, and fine-grained access control via IAM. By granting the Compute Engine instance's service account access to the secret, the password is never exposed in plaintext metadata, logs, or disk files, and access can be audited and revoked independently of the instance lifecycle.

Exam trap

Google Cloud often tests the misconception that instance metadata is a secure place for secrets because it is 'internal' to the project, but in reality, metadata is accessible to any process on the instance and is logged, making it unsuitable for sensitive data.

How to eliminate wrong answers

Option B is wrong because setting the password as an environment variable in instance metadata exposes it in the metadata server, which can be accessed by any process on the instance or via the metadata API, and it is logged in Cloud Audit Logs. Option C is wrong because Cloud Storage bucket metadata is not designed for secrets; it is unencrypted at rest by default, accessible via the Storage API, and lacks IAM-level access control for individual metadata entries. Option D is wrong because storing the password in a file on the instance's boot disk persists the secret in the filesystem, making it vulnerable to snapshot exports, disk cloning, and unauthorized OS-level access, and it cannot be centrally managed or rotated.

Full explanation →

506

MCQmedium

A company uses Google Cloud Armor to protect their HTTP load balancer from OWASP Top 10 attacks. After deploying a security policy with pre-configured WAF rules, they notice that some legitimate user requests are being blocked because they match a rule incorrectly. The security team wants to fine-tune the rules to reduce false positives while maintaining strong protection. They also want to evaluate the impact of changes before enforcing them. What should they do?

A.Disable the WAF rules entirely and implement IP-based allowlists.

B.Set the WAF rules to 'preview' mode to test their impact without blocking traffic, then adjust thresholds or exclusions based on logs.

C.Add a higher priority allow rule to permit the traffic that is being incorrectly blocked.

D.Remove the WAF rules and rely solely on rate limiting to protect the application.

AnswerB

Preview mode allows safe testing of rule modifications without disrupting legitimate traffic.

Why this answer

Option A is correct because Cloud Armor allows setting a rule to 'preview' mode, which logs matched requests without blocking them. This enables analysis of rule effectiveness before enforcement. Option B is wrong because adding a higher priority allow rule could bypass security.

Option C is wrong because rate limiting does not address WAF false positives. Option D is wrong because disabling WAF removes protection entirely.

Full explanation →

507

MCQmedium

A company hosts a web application on Google Kubernetes Engine (GKE) and wants to protect against SQL injection attacks. Which service should they configure?

A.Identity-Aware Proxy (IAP)

B.Cloud Armor

C.Cloud Audit Logs

D.Container Analysis

AnswerB

Cloud Armor provides WAF capabilities to block SQL injection.

Why this answer

Cloud Armor with WAF rules can block SQL injection. Option B is for identity. Option C is for container scanning.

Option D is for audit logs.

Full explanation →

508

MCQeasy

A startup uses Cloud Functions with a Pub/Sub trigger to process incoming orders. They notice that the function sometimes fails to process messages, and those messages are lost. What is the most likely cause?

A.The subscription has an ackDeadlineSeconds of 600.

B.The Cloud Function has a timeout of 540 seconds.

C.The Pub/Sub topic has a retention duration of 10 minutes.

D.The Cloud Function is configured with retry on failure set to false.

AnswerD

If retry is disabled, failed messages are dropped.

Why this answer

Option D is correct because when a Cloud Function fails to process a Pub/Sub message and retry on failure is set to false, the message is not redelivered. Pub/Sub relies on the subscriber (the Cloud Function) to acknowledge messages; without retries, a failed execution causes the message to be dropped after the ack deadline expires, leading to message loss.

Exam trap

The trap here is that candidates often confuse the Cloud Function's timeout setting with message delivery guarantees, overlooking that the retry on failure flag is the critical control for preventing message loss in Pub/Sub-triggered functions.

How to eliminate wrong answers

Option A is wrong because a longer ackDeadlineSeconds (600 seconds) gives the function more time to process messages, reducing the chance of premature timeout and message loss, not causing it. Option B is wrong because a Cloud Function timeout of 540 seconds is generous and would not cause message loss unless the function consistently exceeds it; the default timeout is 60 seconds, so 540 seconds is actually a mitigation, not a cause. Option C is wrong because a topic retention duration of 10 minutes means messages that are not acknowledged are retained for 10 minutes before being discarded, which is a reasonable duration and does not cause immediate loss; the issue is about the function failing to process, not the topic discarding messages too quickly.

Full explanation →

509

MCQeasy

A developer wants to monitor the CPU usage of a single Compute Engine VM and receive alerts when it exceeds 80%. What is the simplest way to achieve this?

A.Query the Compute Engine API periodically and check CPU usage.

B.Configure a Cloud Logging sink to BigQuery and set a scheduled query to detect high CPU.

C.Install the Cloud Monitoring agent and create an alerting policy based on the metric 'cpu.utilization'.

D.Use the managed instance group's autoscaling metric to trigger a notification.

AnswerC

The Monitoring agent collects CPU utilization from the OS and sends it to Cloud Monitoring, where you can set alerts.

Why this answer

Option C is correct because the Cloud Monitoring agent (formerly Stackdriver agent) collects CPU utilization metrics from Compute Engine VMs and sends them to Cloud Monitoring. You can then create an alerting policy directly on the metric 'cpu.utilization' with a threshold of 80% without any custom scripting or additional infrastructure. This is the simplest and most native approach for a single VM.

Exam trap

Google Cloud often tests the misconception that you need to export logs to BigQuery or query APIs manually, when in fact the Cloud Monitoring agent provides a built-in, agent-based metric that can be alerted on directly.

How to eliminate wrong answers

Option A is wrong because periodically querying the Compute Engine API for CPU usage is inefficient, requires custom code, and does not provide real-time alerting; the API does not expose high-frequency CPU metrics natively. Option B is wrong because exporting logs to BigQuery and running scheduled queries adds unnecessary complexity, latency, and cost; Cloud Logging sinks are for log data, not for real-time metric-based alerting. Option D is wrong because managed instance group autoscaling metrics are designed for scaling groups of VMs, not for alerting on a single VM's CPU usage; they do not trigger notifications directly.

Full explanation →

Google Professional Cloud Architect (PCA) — Questions 451–509