Knowledge + Practice

Google Professional Cloud DevOps Engineer (PCDOE) — Questions 1–75

500 questions total · 7pages · All types, answers revealed

Take a mock exam Exam hub

Page 1 of 7

1

Multi-Selecteasy

A company serves static content using a global HTTP(S) load balancer with Cloud CDN. They want to maximize the cache hit ratio. Which two actions should they take?

Select 2 answers

A.Use signed URLs for all requests.

B.Set Cache-Control: public, max-age=31536000.

C.Enable Cloud CDN with cache key based on URL and host.

D.Set Cache-Control: private.

E.Enable Identity-Aware Proxy (IAP) for the backend.

AnswersB, C

A long max-age allows content to be cached for a year, maximizing cache hits.

Why this answer

Setting Cache-Control: public, max-age=31536000 instructs browsers and intermediate caches to store the response for one year, maximizing the likelihood that subsequent requests are served from cache. This long max-age reduces the need for revalidation, directly improving cache hit ratio.

Exam trap

Google Cloud often tests the misconception that signed URLs or IAP improve caching, when in fact they introduce per-request variability that reduces cache hit ratio, and that Cache-Control: private is appropriate for static content when it actually prevents caching entirely.

Full explanation →

2

Multi-Selectmedium

A company runs a nightly batch data processing job on Compute Engine instances. The job runs for approximately 2 hours each night and is fault-tolerant (can resume from checkpoint). The team wants to minimize costs. Which TWO strategies should they implement? (Choose two.)

Select 2 answers

A.Use preemptible VMs for the worker nodes.

B.Purchase 1-year committed use discounts for the worker nodes.

C.Migrate the workload to Cloud Functions.

D.Select custom machine types tailored to the workload's resource needs.

E.Enable sustained use discounts by keeping the instances running 24/7.

AnswersA, D

Preemptible VMs cost ~60% less and are suitable for fault-tolerant batch jobs.

Why this answer

Preemptible VMs are significantly cheaper than standard VMs (up to 60-80% discount) and are ideal for fault-tolerant, short-lived batch jobs that can resume from checkpoints. Since the job runs nightly for only 2 hours and can handle interruptions, preemptible VMs minimize costs without risking job completion.

Exam trap

Google Cloud often tests the misconception that committed use discounts are always the best cost-saving strategy, but candidates must recognize that they are only cost-effective for steady-state workloads, not short-duration batch jobs.

Full explanation →

3

Multi-Selecthard

A DevOps engineer is designing a CI/CD pipeline using Cloud Build. Which TWO configurations are necessary to ensure secure and reliable deployments? (Choose two.)

Select 2 answers

A.Use manual approval steps for production deployments.

B.Store secrets in Cloud Secret Manager.

C.Use Cloud Build triggers with branch filters.

D.Push all artifacts to a public Container Registry.

E.Enable Cloud Build service account with Editor role.

AnswersA, B

Provides a gate for reliability.

Why this answer

Options C and E are correct. Secret Manager provides security, and manual approval ensures reliability.

Full explanation →

4

MCQhard

You are the DevOps engineer for a large gaming company. Your game backend runs on Compute Engine instances behind a global HTTP(S) Load Balancer. You have set up Cloud Monitoring with an uptime check for the load balancer's IP address, and you are using logging to capture 404 errors. Recently, a new game update caused a surge in traffic, and you started receiving many alerts from your uptime check indicating that the site is down. However, you verify that the backend instances are healthy and the load balancer is responding correctly, though some requests are timing out due to the increased load. Your alerting policy currently triggers when 2 consecutive checks fail. What is the most likely reason for the false positive alerts?

A.The global load balancer's health check is failing due to the surge.

B.The monitoring project has reached its limit for concurrent uptime checks.

C.The uptime check is configured to check a specific URL that is returning a 503 status code.

D.The uptime check's timeout is too short for the current response times.

AnswerD

During traffic surge, response time increases; if timeout is too short, check fails despite site being up.

Why this answer

Option D is correct because the uptime check's timeout is too short for the current response times. When a surge in traffic causes some requests to time out, the load balancer may still respond correctly to most requests, but the uptime check—which has a fixed timeout (default 10 seconds)—fails if the response does not arrive within that window. Since the alert triggers after 2 consecutive failures, the check falsely reports the site as down even though the backend and load balancer are healthy.

Exam trap

Google Cloud often tests the distinction between health checks (which verify backend instance health) and uptime checks (which verify end-to-end availability from a monitoring perspective), leading candidates to confuse a healthy backend with a successful uptime check response.

How to eliminate wrong answers

Option A is wrong because the global load balancer's health check is separate from the uptime check; the health check monitors backend instance health, and the scenario states the backend instances are healthy, so the health check is not failing. Option B is wrong because Cloud Monitoring does not have a hard limit on concurrent uptime checks; the limit is on the number of uptime checks per project (100), not on concurrency, and a surge in traffic would not cause a limit to be reached. Option C is wrong because the scenario mentions capturing 404 errors, not 503 errors; a 503 status code would indicate the backend is unavailable, but the problem states the backend is healthy and the load balancer is responding correctly, so the uptime check is not receiving a 503.

Full explanation →

5

MCQmedium

A team has deployed a Prometheus server on GKE using the configuration above. They expect Prometheus to scrape metrics from pods with the label 'app: my-app' and the annotation 'prometheus.io/scrape: true' on port 8080. However, no metrics are being collected. What is the most likely cause?

A.The kubernetes_sd_configs role is set to 'pod' but should be 'endpoints'.

B.Prometheus needs to be configured to listen on port 9090 for scraping.

C.The keep action for the label 'my-app' is filtering out all pods.

D.The relabel_config for port incorrectly constructs the target address; it should use the annotation value directly without appending ':8080'.

AnswerD

The port annotation usually includes the port number, so appending a fixed port is wrong.

Why this answer

Option D is correct because the relabel_config is incorrectly constructing the target address. The configuration likely uses `__meta_kubernetes_annotation_prometheus_io_port` to get the port, but then appends ':8080' statically, overriding the annotation value. If the annotation specifies a different port or is missing, this hardcoded port causes Prometheus to scrape the wrong endpoint or fail entirely.

The correct approach is to use the annotation value directly without appending a fixed port, or to use a default port only when the annotation is absent.

Exam trap

The trap here is that candidates often overlook the relabel_config port construction and assume the issue is with service discovery roles or label filtering, when in fact the static port override is the subtle cause of scrape failures.

How to eliminate wrong answers

Option A is wrong because the `kubernetes_sd_configs` role set to 'pod' is correct for scraping pods directly; changing it to 'endpoints' would scrape service endpoints, which is not required here and would not fix the issue. Option B is wrong because Prometheus's own listening port (default 9090) is for its web UI and API, not for scraping targets; scraping uses the target's port, not Prometheus's port. Option C is wrong because the `keep` action for the label 'my-app' is likely intended to filter pods with that label, and if it were filtering out all pods, no targets would be discovered at all; the problem is specifically with port construction, not label matching.

Full explanation →

6

MCQmedium

A DevOps team wants to autoscale a GKE Deployment based on a custom metric exposed by the application. The metric is available via an HTTP endpoint. Which approach should they use to integrate this metric with the Horizontal Pod Autoscaler (HPA)?

A.Deploy a Prometheus Operator with the kube-state-metrics adapter and configure the HPA to use the custom metric.

B.Expose the metric via an Ingress and configure HPA to read from the Ingress metrics.

C.Use the standard CPU-based HPA and map the custom metric to CPU usage via a script.

D.Configure the Stackdriver Metrics Adapter to collect the metric from the endpoint.

AnswerA

Prometheus adapter can scrape custom endpoints and expose metrics to the custom.metrics.k8s.io API used by HPA.

Why this answer

Option A is correct because the Prometheus Operator, combined with the kube-state-metrics adapter (or the prometheus-adapter), allows HPA to consume custom metrics from a Prometheus server that scrapes the application's HTTP endpoint. The adapter exposes these metrics via the custom.metrics.k8s.io API, which HPA natively queries. This is the standard approach for integrating application-specific HTTP metrics into Kubernetes autoscaling.

Exam trap

Google Cloud often tests the misconception that any HTTP endpoint can be directly plugged into HPA, but the trap here is that HPA requires a metrics API adapter (like prometheus-adapter or Stackdriver adapter) to bridge the gap between the raw metric source and the Kubernetes custom metrics API.

How to eliminate wrong answers

Option B is wrong because Ingress does not expose application-level custom metrics to the HPA; Ingress metrics are typically for traffic routing and not designed to be consumed by the custom metrics API. Option C is wrong because HPA's CPU-based autoscaling cannot be mapped to custom metrics via a script; HPA requires a dedicated metrics API (custom.metrics.k8s.io) and does not support arbitrary mapping of custom metrics to CPU usage. Option D is wrong because the Stackdriver Metrics Adapter collects metrics from Google Cloud Monitoring, not directly from an HTTP endpoint; it would require the application to push metrics to Stackdriver, not expose them via an HTTP endpoint.

Full explanation →

7

MCQhard

A company is running a stateful workload on Compute Engine and has configured a TCP health check on port 8080. The health check is failing, but the application is running and responding on port 8080 when tested manually from within the instance. What is the most likely cause of the health check failure?

A.The health check is configured to use port 80 instead of port 8080.

B.The firewall rules are not allowing traffic from the health check probe IP ranges.

C.The instance's DNS resolution is failing, causing the health check to use the wrong IP.

D.The health check response timeout is set too low (e.g., 1 second).

AnswerB

Health check probes use specific IP ranges that must be allowed.

Why this answer

The health check probes originate from Google's health check systems, which use specific IP ranges (e.g., 35.191.0.0/16, 130.211.0.0/22). If firewall rules on the instance or VPC do not explicitly allow inbound traffic from these probe IP ranges on port 8080, the health check will fail even though the application is running and responding to manual tests from within the instance. This is the most common cause of health check failures when the application itself is healthy.

Exam trap

Google Cloud often tests the misconception that health check failures are always due to application misconfiguration or port mismatches, but the trap here is that the health check probes come from external Google IP ranges that must be explicitly allowed in firewall rules, not from within the instance's own network.

How to eliminate wrong answers

Option A is wrong because the health check is explicitly configured to use port 8080, and the question states the application responds on that port; a misconfiguration to port 80 would be a different issue, but the scenario describes the health check as failing despite the correct port being configured. Option C is wrong because health checks in Google Cloud use IP addresses, not DNS names, so DNS resolution is irrelevant to the probe reaching the instance. Option D is wrong because a timeout set too low would cause intermittent failures or timeouts, but the question states the health check is consistently failing, and the application responds instantly from within the instance, indicating the probes are not reaching the instance at all.

Full explanation →

8

Drag & Dropmedium

Arrange the steps to recover a Google Cloud SQL instance from a point-in-time backup.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Identify time, clone, verify, promote, update connections.

Full explanation →

9

MCQeasy

Refer to the exhibit. A budget alert has fired for project dev-123 indicating that the cost has exceeded the budget of $1000. What should the team do next to investigate the cost overrun?

A.Disable all APIs in project dev-123 immediately.

B.Open a BigQuery query on the billing export table, filtering by project 'dev-123' and service.

C.Set up a Cloud Function to automatically shut down resources when the budget is exceeded.

D.View the billing account's cost table in the Cloud Console.

E.Create a new budget with a lower threshold to get alerted earlier.

AnswerB

BigQuery billing exports provide detailed cost data for root cause analysis.

Why this answer

The most effective first step is to query the billing export data in BigQuery, which provides detailed cost breakdowns by project, service, and labels. Option B is the recommended approach for granular analysis. Viewing the billing account page (A) gives a high-level view but not the necessary detail.

Creating a new budget (C) does not address the root cause. Disabling APIs (D) is too drastic without understanding the cause. Setting up automatic shutdown (E) is a reaction that should follow investigation.

Full explanation →

10

MCQeasy

A latency-sensitive web application uses Cloud CDN. What configuration change would most directly reduce cache miss rates?

A.Enable Cloud Armor to filter malicious traffic

B.Use signed URLs to restrict access

C.Set TTL to 0 to ensure content is always fresh

D.Enable cache static content and set appropriate cache modes

AnswerD

Caching static files reduces origin requests and cache misses, improving latency.

Why this answer

Enabling cache static content and setting appropriate cache modes directly reduces cache miss rates by ensuring that more requests are served from the Cloud CDN cache rather than forwarded to the origin. This configuration allows the CDN to store and serve static assets (e.g., images, CSS, JavaScript) based on cache keys and TTLs, minimizing the number of cache misses for eligible content.

Exam trap

Google Cloud often tests the misconception that reducing TTL or using security features like signed URLs improves cache performance, when in fact these actions either increase cache misses or have no effect on caching behavior.

How to eliminate wrong answers

Option A is wrong because Cloud Armor filters malicious traffic at the edge but does not affect cache hit ratios; it protects against DDoS and web attacks but does not change caching behavior. Option B is wrong because signed URLs restrict access to content for security purposes but do not influence cache miss rates; they may even increase misses if unique URLs bypass cache. Option C is wrong because setting TTL to 0 forces the CDN to treat every request as a cache miss, requiring revalidation with the origin for each request, which drastically increases cache miss rates and defeats the purpose of caching.

Full explanation →

11

Multi-Selectmedium

Which TWO are best practices when bootstrapping a Google Cloud organization for DevOps? (Choose two.)

Select 3 answers

A.Use separate service accounts for each environment.

B.Use separate folders for development, staging, and production environments.

C.Enable audit logging for all projects at the organization level.

D.Create a single service account for all environments to simplify permissions.

E.Disable audit logging to reduce costs.

AnswersA, B, C

Isolates permissions and limits blast radius.

Why this answer

Option A is correct because using separate service accounts for each environment enforces the principle of least privilege, ensuring that credentials used in development cannot accidentally or maliciously affect production resources. This isolation aligns with Google Cloud's IAM best practices, where each service account should have only the permissions necessary for its specific environment, reducing the blast radius of a compromised account.

Exam trap

Google Cloud often tests the misconception that a single service account simplifies management and is acceptable for DevOps, but the trap is that this violates the core security principle of environment isolation, which is a fundamental requirement for bootstrapping a Google Cloud organization.

Full explanation →

12

MCQhard

A team implements canary deployments using Cloud Deploy and deploys to GKE. They want to automatically roll back if the canary release's error rate exceeds 5% within 10 minutes. Which approach should they use?

A.Deploy using Spinnaker on GKE with a canary pipeline that includes an automated rollback step.

B.Configure a Cloud Build step to monitor the canary and rollback if needed.

C.Use a GKE rolling update strategy within the deployment manifest.

D.Set up Cloud Deploy with a rollout strategy that uses Cloud Monitoring metrics to automatically rollback.

AnswerD

Cloud Deploy can use a 'canary' strategy with metrics-based promotion/rollback.

Why this answer

Option C is correct because Cloud Deploy supports integration with Cloud Monitoring for automated rollback on metric thresholds. Option A is incorrect - manual rollback doesn't meet the automatic requirement. Option B is incorrect - GKE rolling update is not canary.

Option D is incorrect - Spinnaker is not native to GCP and not automatically integrated with Cloud Deploy.

Full explanation →

13

MCQhard

Your company runs a multi-region e-commerce platform on Google Kubernetes Engine (GKE) with services in us-central1 and europe-west1. The application uses a global external HTTP(S) load balancer with Cloud CDN for static assets. Recently, users in Asia report that product images take 5-10 seconds to load, while users in the US and Europe experience sub-second load times. You check the Cloud CDN cache hit ratio and see it is 95% globally. You also notice that the images are served from a backend bucket in us-central1. The load balancer uses the default routing configuration. Your team has implemented client-side caching with Cache-Control headers set to public, max-age=3600. What is the most likely cause of the high latency for Asian users?

A.The load balancer is not using premium tier networking, so traffic from Asia takes a longer path.

B.Client-side caching with max-age=3600 is too short, causing frequent revalidation.

C.Cloud CDN does not have edge caches in Asia, so requests are served from the nearest available edge location, which may be far from users.

D.The cache hit ratio is too low for Asian users due to different traffic patterns.

AnswerC

Cloud CDN edge locations in Asia may be limited; first request latency is high due to cache miss.

Why this answer

Option C is correct because Cloud CDN does not have edge caches in Asia; the nearest edge locations for Asian users are in the western United States (e.g., Los Angeles) or possibly Australia, resulting in higher latency. Even with a 95% global cache hit ratio, the physical distance from Asia to the serving edge increases round-trip time (RTT) significantly, causing 5-10 second load times. The default routing configuration of the global external HTTP(S) load balancer does not automatically optimize for regional cache presence.

Exam trap

Google Cloud often tests the misconception that a high global cache hit ratio guarantees low latency for all users, ignoring the impact of geographic distance between edge caches and end users.

How to eliminate wrong answers

Option A is wrong because the global external HTTP(S) load balancer uses Premium Tier networking by default, which provides a single anycast IP and routes traffic over Google's global network, not the public internet, so the path is already optimized. Option B is wrong because client-side caching with max-age=3600 (1 hour) is reasonable for static assets, and revalidation would only add a conditional request (304 Not Modified) which is fast; the issue is not revalidation frequency but the distance to the serving edge. Option D is wrong because the cache hit ratio is 95% globally, indicating that the majority of requests are served from cache; even if Asian users had a slightly different pattern, the high global ratio suggests cache misses are not the primary cause of latency.

Full explanation →

14

MCQmedium

Refer to the exhibit. You are reviewing an alert policy for CPU utilization. What is a potential problem with this configuration?

A.The autoClose time is too long.

B.The duration is too short, which may cause noise during spikes.

C.The threshold is set too low.

D.The combiner should be 'AND'.

AnswerB

A 60-second window may not filter out short-lived bursts.

Why this answer

Option B is correct because a short duration in a CPU utilization alert policy means the threshold must be breached for only a brief period before triggering an incident. This can cause noise during transient spikes that are not indicative of a sustained problem, leading to false positives and alert fatigue for operations teams.

Exam trap

Google Cloud often tests the distinction between duration and threshold, tricking candidates into thinking a low threshold is the primary cause of noise, when in fact a short duration is the more direct trigger for false positives from spikes.

How to eliminate wrong answers

Option A is wrong because the autoClose time being too long is not inherently a problem; it simply means incidents remain open longer, which may be acceptable or even desirable for tracking. Option C is wrong because the threshold being set too low is not indicated by the exhibit; the question focuses on duration, not threshold value. Option D is wrong because the combiner should be 'OR' (not 'AND') when you want to trigger on any single condition being met; using 'AND' would require all conditions to be true simultaneously, which is less sensitive and could miss issues.

Full explanation →

15

MCQeasy

A DevOps team wants to serve static content from a Cloud Storage bucket with low latency globally. They also need TLS termination. Which load balancer type should they use?

A.External Network Load Balancer

B.SSL Proxy Load Balancer

C.Internal HTTP(S) Load Balancer

D.External HTTP(S) Load Balancer

AnswerD

External HTTP(S) LB supports backend buckets, global anycast IP, and TLS termination.

Why this answer

External HTTP(S) Load Balancer is the correct choice because it provides global anycast IP addresses, TLS termination at the Google Front End (GFE), and integrates directly with Cloud Storage buckets as a backend. This enables low-latency content delivery worldwide while offloading SSL decryption to the load balancer.

Exam trap

Google Cloud often tests the misconception that any load balancer with 'SSL' or 'Proxy' in its name can serve web content to Cloud Storage, but only the External HTTP(S) Load Balancer provides the necessary HTTP protocol support and global anycast routing for static content delivery.

How to eliminate wrong answers

Option A is wrong because External Network Load Balancer operates at Layer 4 (TCP/UDP) and does not support TLS termination or HTTP-based routing to Cloud Storage backends. Option B is wrong because SSL Proxy Load Balancer terminates TLS but is designed for non-HTTP traffic and cannot route directly to Cloud Storage buckets as a backend service. Option C is wrong because Internal HTTP(S) Load Balancer is regional and cannot serve content globally with low latency; it also does not support external clients.

Full explanation →

16

MCQmedium

A company uses Cloud Build to compile a Java application. The build takes 15 minutes due to dependency downloads. They want to cache the Maven dependencies to speed up subsequent builds. What is the best approach?

A.Use a Cloud Storage bucket to store the .m2 directory and restore it at the start of the build.

B.Use a Docker layer caching with a custom image that includes dependencies.

C.Use Cloud Build's built-in caching mechanism by specifying volumes.

D.Use Cloud Build's 'cache' configuration to persist directories.

AnswerC

Cloud Build volumes persist data across steps and builds, ideal for caching dependencies.

Why this answer

Option B is correct because Cloud Build supports volumes that can be used to cache directories between builds. Option A is possible but less integrated. Option C is not for Maven specificly.

Option D is not a native Cloud Build feature.

Full explanation →

17

MCQhard

Which tool can be used to capture and analyze latency spikes in a distributed application?

A.Cloud Logging

B.Cloud Debugger

C.Cloud Monitoring

D.Cloud Trace

AnswerD

Correct. Trace captures end-to-end latency and identifies spikes.

Why this answer

Cloud Trace is the correct tool for capturing and analyzing latency spikes in a distributed application because it provides end-to-end latency tracking by instrumenting requests as they traverse microservices. It collects trace spans with timing data, allowing you to identify bottlenecks and pinpoint the exact service or operation causing the spike.

Exam trap

Google Cloud often tests the distinction between Cloud Monitoring (metrics) and Cloud Trace (distributed tracing), so the trap here is assuming that latency spikes are a metric problem solvable by Cloud Monitoring, when in fact they require trace-level analysis to identify the root cause across service boundaries.

How to eliminate wrong answers

Option A is wrong because Cloud Logging is designed for centralized log storage and querying, not for capturing distributed latency data or tracing request paths across services. Option B is wrong because Cloud Debugger is used for inspecting application state and code execution in production without stopping the app, but it does not capture latency metrics or trace request flows. Option C is wrong because Cloud Monitoring focuses on collecting and alerting on metrics (e.g., CPU, memory) and uptime checks, but it lacks the distributed tracing capability needed to analyze per-request latency spikes across services.

Full explanation →

18

MCQmedium

A team notices that Cloud SQL read replicas are not handling read traffic efficiently, causing high latency for read-heavy queries. What is the best approach to improve read performance?

A.Use a connection pooling proxy like ProxySQL

B.Use Cloud Memorystore to cache frequent query results

C.Enable MySQL query cache

D.Increase the number of read replicas

AnswerB

Caching reduces database load and latency for read-heavy, repetitive queries.

Why this answer

Cloud Memorystore (Redis) caches the results of frequent read queries, reducing the load on Cloud SQL read replicas and lowering latency for repeated queries. This directly addresses the root cause of inefficient read traffic by serving cached data from in-memory storage, which is orders of magnitude faster than querying a replica. It is the best approach because it offloads read-heavy workloads without requiring additional replicas or relying on deprecated features like MySQL query cache.

Exam trap

Google Cloud often tests the misconception that adding more read replicas is always the best solution for read performance, but the trap here is that replicas still execute queries against disk and do not eliminate redundant work, whereas caching directly reduces query execution frequency and latency.

How to eliminate wrong answers

Option A is wrong because ProxySQL is a connection pooling and query routing proxy that manages connections and can distribute reads to replicas, but it does not cache query results; it only improves connection efficiency, not the latency of repeated read-heavy queries. Option C is wrong because MySQL query cache is deprecated and removed in MySQL 8.0, and even when available, it is inefficient for high-concurrency workloads due to cache invalidation overhead and mutex contention. Option D is wrong because simply increasing the number of read replicas adds more nodes to handle read traffic, but it does not reduce latency for repeated queries; replicas still execute the same queries against disk, and the underlying inefficiency of redundant reads remains.

Full explanation →

19

Multi-Selectmedium

A DevOps engineer is reviewing GCP costs and notices high network egress charges. Which two actions can help reduce network egress costs? (Choose two.)

Select 2 answers

A.Use preemptible VMs

B.Migrate all workloads to a single zone

C.Use Cloud CDN to cache content closer to users

D.Use the same region for all GCP resources that communicate with each other

E.Use external load balancers with global access

AnswersC, D

Cloud CDN reduces egress from origin to internet by serving cached content.

Why this answer

Options A and B are correct. Using the same region for resources reduces inter-region egress. Cloud CDN caches content, reducing egress to the internet.

Option C is not practical. Option D saves compute, not network. Option E increases egress by exposing globally.

Full explanation →

20

MCQmedium

Which Cloud Run setting controls the maximum number of requests a container can handle concurrently?

A.concurrency

B.timeout

C.max-instances

D.min-instances

AnswerA

Correct. Concurrency sets the maximum concurrent requests per container.

Why this answer

Option A is correct because the `concurrency` setting in Cloud Run defines the maximum number of simultaneous requests that a single container instance can process at any given time. This directly controls how many requests are routed to a single container before Cloud Run spins up additional instances, optimizing resource utilization and preventing overload.

Exam trap

Google Cloud often tests the distinction between `concurrency` (requests per instance) and `max-instances` (total instances), leading candidates to confuse capacity scaling limits with per-instance request handling.

How to eliminate wrong answers

Option B is wrong because `timeout` controls the maximum duration a request can run before being terminated (default 300 seconds), not how many requests can be handled concurrently. Option C is wrong because `max-instances` limits the total number of container instances that can be created to handle traffic, not the concurrency per instance. Option D is wrong because `min-instances` specifies the minimum number of container instances that must remain warm and ready to serve traffic, which affects cold starts but does not control concurrent request handling.

Full explanation →

21

MCQhard

You are designing a monitoring strategy for a microservices application running on Google Kubernetes Engine (GKE). You need to create a custom metric that counts the number of failed login attempts from the application logs. The logs are in JSON format and contain a field 'status' with value 'FAILED'. Which approach should you use?

A.Use Cloud Monitoring's Metrics Explorer to create a metric from logs using a filter.

B.Install the Ops Agent on GKE nodes to collect application metrics directly.

C.Configure a logs-based metric in Cloud Logging that filters for the condition and counts.

D.Export logs to BigQuery and then create a custom metric from the exported data.

AnswerC

Logs-based metrics are designed for this purpose – they count log entries that match a filter and expose them as custom metrics in Cloud Monitoring.

Why this answer

Option C is correct because a logs-based metric in Cloud Logging directly counts occurrences of a specific log entry pattern (e.g., 'status' = 'FAILED') without requiring additional agents or data exports. This approach is purpose-built for deriving metrics from log data and integrates seamlessly with Cloud Monitoring for alerting and dashboards.

Exam trap

Google Cloud often tests the misconception that Metrics Explorer can create metrics from logs, when in fact it only queries and charts existing metrics, while logs-based metrics are the correct service for deriving custom metrics from log data.

How to eliminate wrong answers

Option A is wrong because Metrics Explorer is a visualization tool for existing metrics, not a mechanism to create new metrics from log content. Option B is wrong because the Ops Agent collects system and application metrics from VM instances, not from GKE pods or container logs, and it cannot parse JSON log fields to count failed logins. Option D is wrong because exporting logs to BigQuery adds latency, cost, and complexity; while you could query the data, it is not the recommended or direct method for creating a custom metric from logs.

Full explanation →

22

MCQmedium

A Cloud Build pipeline uses the above cloudbuild.yaml. When triggered, the deploy step fails with: 'ERROR: (gcloud.run.deploy) PERMISSION_DENIED: Permission 'run.services.update' denied on resource.' The Cloud Build service account has the 'Cloud Run Admin' role. What is the most likely cause?

A.The substitution variable ${_ENV} is not properly passed to the nested build.

B.The nested 'gcloud builds submit' command runs with a different service account that does not have the Cloud Run Admin role.

C.The 'gcr.io/cloud-builders/gcloud' image does not support the 'gcloud run deploy' command.

D.The Cloud Build service account at the top level does not have the 'run.services.update' permission.

AnswerB

Nested builds use the default Cloud Build service account of the second build, which may lack permissions.

Why this answer

Option B is correct because when a Cloud Build pipeline uses a nested `gcloud builds submit` command, the nested build runs under the default Compute Engine service account (or a user-specified service account) rather than the top-level Cloud Build service account. Even if the top-level service account has the Cloud Run Admin role, the nested build's service account may lack the `run.services.update` permission, causing the deploy step to fail with PERMISSION_DENIED.

Exam trap

Google Cloud often tests the misconception that all steps in a Cloud Build pipeline share the same service account, when in fact nested builds use a different service account by default, leading to unexpected permission errors.

How to eliminate wrong answers

Option A is wrong because the substitution variable ${_ENV} is a user-defined substitution that is passed to the top-level build; if it were missing, the error would be about an undefined variable or incorrect resource name, not a permission denied error. Option C is wrong because the `gcr.io/cloud-builders/gcloud` image fully supports `gcloud run deploy`; it is the official Google-maintained image for running gcloud commands in Cloud Build. Option D is wrong because the problem statement explicitly says the Cloud Build service account has the 'Cloud Run Admin' role, which includes the `run.services.update` permission; the error arises from a different service account used in the nested build.

Full explanation →

23

MCQeasy

Which GCP service is used to store build artifacts such as Docker images?

A.Cloud Storage

B.Both B and C

C.Container Registry

D.Artifact Registry

AnswerB

Both Artifact Registry and Container Registry can store Docker images.

Why this answer

Option D is correct because both Artifact Registry and Container Registry can store Docker images, though Artifact Registry is the recommended service. Option A is for object storage, not container images.

Full explanation →

24

Drag & Dropmedium

Order the steps to respond to a Google Cloud security incident involving a compromised service account key.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Revoke key, rotate, audit logs, assess impact, update policies.

Full explanation →

25

MCQhard

A DevOps team is setting up SLOs for a service with two critical metrics: availability and latency. They want to measure over a 30-day window. Which approach correctly defines an SLO?

A.Use Cloud Monitoring to create a custom SLI based on logs and set an SLO with a 7-day rolling window

B.Use Cloud Tasks to schedule a cron job that calculates availability

C.Use Cloud Monitoring to create a custom SLI based on request latency metrics and set an SLO with a 30-day rolling window

D.Use Stackdriver Monitoring (deprecated) to set an SLO with a fixed 30-day window

AnswerC

This correctly uses Cloud Monitoring and a 30-day rolling window.

Why this answer

Option C is correct because it uses Cloud Monitoring to create a custom SLI based on request latency metrics, which is a valid SLI type, and sets an SLO with a 30-day rolling window, matching the requirement. Cloud Monitoring's SLO feature natively supports rolling windows, and latency is a standard metric for defining service-level objectives.

Exam trap

Google Cloud often tests the distinction between deprecated services (Stackdriver) and current ones (Cloud Monitoring), and the requirement for a rolling window versus a fixed window, leading candidates to pick D if they are unaware of deprecation or misunderstand window types.

How to eliminate wrong answers

Option A is wrong because it specifies a 7-day rolling window instead of the required 30-day window, and while logs can be used for SLIs, the question explicitly asks for a 30-day window. Option B is wrong because Cloud Tasks is a task queue service for asynchronous work, not a monitoring or SLO calculation tool; using it to calculate availability is architecturally incorrect and not a supported approach for SLOs. Option D is wrong because Stackdriver Monitoring is deprecated and should not be used; the current service is Cloud Monitoring, and a fixed 30-day window is not the standard rolling window approach for SLOs in Cloud Monitoring.

Full explanation →

26

MCQeasy

You are investigating a slow increase in latency for a service running on Compute Engine. You have Cloud Monitoring and Cloud Logging set up. Which tool would best help you identify the cause of the latency?

A.Error Reporting

B.Cloud Profiler

C.Cloud Debugger

D.Cloud Trace

AnswerD

Cloud Trace traces requests and identifies latency contributors.

Why this answer

Cloud Trace is designed to capture latency data from distributed applications by tracing requests as they propagate through services. It provides detailed latency distributions and per-span breakdowns, allowing you to pinpoint which component or operation is causing the slowdown. For a Compute Engine service with Cloud Monitoring and Logging already in place, Cloud Trace is the most direct tool to analyze request-level latency.

Exam trap

Google Cloud often tests the distinction between tools that measure latency (Cloud Trace) versus tools that measure resource utilization (Cloud Profiler) or error rates (Error Reporting), leading candidates to confuse profiling with tracing.

How to eliminate wrong answers

Option A is wrong because Error Reporting aggregates and analyzes application errors (exceptions, crashes), not latency metrics; it would not help identify the cause of a slow increase in latency. Option B is wrong because Cloud Profiler continuously samples CPU and heap usage to identify performance bottlenecks in code, but it focuses on resource consumption rather than request-level latency tracing. Option C is wrong because Cloud Debugger allows you to inspect application state at a specific point in code without stopping execution, but it is meant for debugging logic issues, not for analyzing latency trends over time.

Full explanation →

27

MCQeasy

A DevOps team wants to ensure that all audit logs from projects across the organization are sent to a central project for analysis. Which approach should they use?

A.Enable VPC Flow Logs in each project.

B.Configure each project to export logs to a central BigQuery dataset.

C.Use organization-level log sinks to route audit logs to a central Cloud Storage bucket.

D.Use Cloud Logging's default routing.

AnswerC

Organization-level sinks aggregate logs from all projects.

Why this answer

Option C is correct because organization-level log sinks allow you to aggregate audit logs from all projects within an organization into a single destination, such as a Cloud Storage bucket. This approach ensures centralized analysis and compliance without requiring per-project configuration, leveraging the hierarchical nature of Google Cloud resource management.

Exam trap

The trap here is that candidates confuse VPC Flow Logs (network logs) with audit logs (IAM and resource activity logs), or assume that per-project export is the only option, missing the organization-level sink capability.

How to eliminate wrong answers

Option A is wrong because VPC Flow Logs capture network traffic metadata, not audit logs, and they are per-VPC, not designed for cross-project aggregation. Option B is wrong because configuring each project individually to export logs to a central BigQuery dataset is operationally inefficient and error-prone; organization-level sinks provide a single, scalable configuration. Option D is wrong because Cloud Logging's default routing only sends logs to the project's own logs viewer, not to a central destination, and does not aggregate across projects.

Full explanation →

28

MCQmedium

A company's SRE team is designing an incident management process. They want to ensure that alerts are actionable and that on-call engineers are not overwhelmed by false positives. Which approach should they take?

A.Use only critical severity alerts and rely on manual dashboard review for lower severity

B.Create alerting policies for every available metric to ensure nothing is missed

C.Set all alert thresholds to 50% above the average value to avoid false positives

D.Define SLOs and set alert thresholds based on historical error budget consumption

AnswerD

SLO-based alerting focuses on user-facing impact and reduces noise.

Why this answer

Option D is correct because defining SLOs and setting alert thresholds based on historical error budget consumption ensures alerts are directly tied to user-facing reliability. This approach prevents false positives by only triggering when the error budget is being consumed faster than expected, making alerts actionable and reducing noise for on-call engineers.

Exam trap

Google Cloud often tests the misconception that more alerts or higher thresholds equal better reliability, when in fact the key is aligning alerts with SLOs and error budgets to ensure they are actionable and reduce noise.

How to eliminate wrong answers

Option A is wrong because relying solely on critical alerts and manual dashboard review for lower severity risks missing early warning signs of degradation, leading to delayed incident response and potential SLO breaches. Option B is wrong because creating alerting policies for every available metric generates excessive noise and alert fatigue, overwhelming on-call engineers with non-actionable alerts. Option C is wrong because setting all alert thresholds to 50% above the average value is arbitrary and does not account for normal variance or seasonal patterns, which can either miss real issues or still produce false positives during low-traffic periods.

Full explanation →

29

MCQhard

A team uses Cloud Spanner for a global application. Query performance degrades as data grows. They notice that most queries filter on a column 'customer_id' but the primary key is a UUID. What is the best approach to optimize performance?

A.Enable query optimization hints

B.Reorder the primary key to start with customer_id

C.Use interleaved tables

D.Create a secondary index on customer_id

AnswerB

Putting customer_id first in the primary key distributes writes and optimizes queries on that column.

Why this answer

In Cloud Spanner, the primary key determines the physical ordering of rows on storage tablets. When queries filter on `customer_id` but the primary key starts with a UUID, Spanner must perform a full table scan because the filter cannot leverage the key order. Reordering the primary key to start with `customer_id` allows Spanner to use efficient key-range scans, dramatically reducing the number of rows read per query.

Exam trap

Google Cloud often tests the misconception that secondary indexes are always the best solution for query performance, when in fact reordering the primary key to match the most common query filter pattern is more efficient because it avoids the extra index lookup and write overhead.

How to eliminate wrong answers

Option A is wrong because query optimization hints (e.g., `@{FORCE_INDEX}` or `@{JOIN_METHOD}`) can influence execution plans but do not fix the fundamental physical design issue of an inefficient primary key ordering; they are a band-aid, not a structural solution. Option C is wrong because interleaved tables are used to store parent-child relationships physically co-located for join performance, not to optimize single-table queries filtering on a non-key column. Option D is wrong because while a secondary index on `customer_id` would help, it introduces additional write amplification and storage overhead, and the question asks for the 'best approach' — reordering the primary key is more efficient as it avoids index lookups entirely and leverages the primary storage order.

Full explanation →

30

MCQmedium

Your team deploys a microservice on Google Kubernetes Engine (GKE) that serves an API with low latency requirements. Users report that the API occasionally times out during peak hours. You check the GKE metrics and see that CPU utilization is below 50% but memory is near 100% on the nodes. What is the most likely cause and what should you do?

A.The nodes are under-provisioned; add more nodes to the cluster.

B.The application is memory-constrained; increase memory resource limits for the pod.

C.The application is CPU-bound; increase CPU resource limits for the pod.

D.The network bandwidth is insufficient; increase the machine type for nodes.

AnswerB

Memory is near 100% on nodes, causing requests to queue and time out. Increasing memory limits allows more concurrent requests.

Why this answer

Option B is correct because the symptoms—low CPU utilization but near 100% memory usage on nodes, with API timeouts during peak hours—indicate that the application is hitting memory limits. When a pod exceeds its memory request, the kernel can OOM-kill it, causing request failures and timeouts. Increasing the memory resource limits for the pod allows it to allocate more heap or cache, preventing OOM kills and stabilizing latency.

Exam trap

Google Cloud often tests the misconception that high memory usage on nodes always means you need more nodes, but the correct action is to first check pod-level resource limits and adjust them, as adding nodes only masks the real issue of memory-constrained pods.

How to eliminate wrong answers

Option A is wrong because adding more nodes would distribute the memory load but does not address the root cause: the application itself needs more memory per pod; under-provisioned nodes would show high CPU or memory across nodes, but here memory is near 100% while CPU is low, indicating a memory bottleneck at the pod level. Option C is wrong because CPU utilization is below 50%, so the application is not CPU-bound; increasing CPU limits would not resolve memory exhaustion and could waste resources. Option D is wrong because network bandwidth issues would manifest as packet loss or high latency, not as near-100% node memory; increasing machine type might add memory but is an inefficient fix compared to adjusting pod resource limits.

Full explanation →

31

MCQmedium

A DevOps engineer is bootstrapping a CI/CD pipeline using Cloud Build. They need to ensure that only specific service accounts can trigger builds on certain branches. What is the recommended approach?

A.Use Cloud Functions to validate branch names before triggering builds.

B.Use Cloud Source Repositories with branch protection rules.

C.Store the service account keys in Secret Manager and use them in build steps.

D.Use Cloud Build triggers with regular expressions on branch patterns and restrict access via IAM.

AnswerD

Cloud Build triggers support branch patterns and IAM can limit who can trigger.

Why this answer

Option D is correct because Cloud Build triggers natively support regular expressions on branch patterns, and IAM conditions can restrict which service accounts are allowed to invoke specific triggers. This provides a declarative, auditable, and least-privilege approach without custom code or external services.

Exam trap

Google Cloud often tests the distinction between branch protection (which controls Git pushes) and trigger authorization (which controls build invocation), leading candidates to confuse Cloud Source Repositories branch protection rules with Cloud Build trigger access control.

How to eliminate wrong answers

Option A is wrong because using Cloud Functions to validate branch names adds unnecessary complexity, latency, and cost; Cloud Build triggers already support branch pattern filtering natively. Option B is wrong because Cloud Source Repositories branch protection rules only prevent direct pushes to branches, not trigger invocations; they do not control which service accounts can trigger builds. Option C is wrong because storing service account keys in Secret Manager and using them in build steps does not restrict who can trigger builds; it only manages authentication for build steps, not authorization for trigger invocation.

Full explanation →

32

MCQhard

A large enterprise is bootstrapping a Google Cloud organization with strict security requirements. They need to: (1) enforce multi-factor authentication (MFA) for all users, (2) prevent any new project from using default VPCs, (3) require customer-managed encryption keys (CMEK) for all Cloud Storage buckets, (4) automatically revoke access for offboarded employees within 24 hours. They have an existing Active Directory and plan to use Google Cloud's Identity Platform for SSO. Which combination of Google Cloud services and policies should they implement?

A.Configure MFA via organization policy, use a project creation Cloud Function to disable default VPC and enforce CMEK, and use a Cloud Scheduler job to scan for offboarded users daily.

B.Set up a Cloud VPN to Active Directory, use Cloud Run to enforce MFA, apply a custom organization policy for CMEK, and use Cloud Monitoring alerts for offboarding.

C.Use Identity Platform with MFA enforced in OIDC, apply organization policy 'compute.skipDefaultNetworkCreation' and 'storage.cmekRequired', and use Cloud Audit Logs to detect offboarded users.

D.Use Cloud Identity with MFA, apply organization policy for default VPC and CMEK, and use Cloud Functions to deactivate IAM accounts offboarded in HR system via a custom integration.

AnswerC

Identity Platform handles MFA; org policies enforce technical requirements.

Why this answer

Option C is correct because it uses Identity Platform with OIDC to enforce MFA via the existing Active Directory SSO, applies the organization policy 'compute.skipDefaultNetworkCreation' to prevent default VPCs, uses 'storage.cmekRequired' to enforce CMEK on Cloud Storage buckets, and leverages Cloud Audit Logs to detect offboarded users by monitoring identity changes, enabling automated revocation within 24 hours. This combination directly addresses all four requirements using native Google Cloud policies and services without custom code.

Exam trap

The trap here is that candidates often confuse organization policies with identity policies, thinking MFA can be enforced via organization policies, or they overcomplicate the solution with custom code when native services like Cloud Audit Logs and organization policies suffice.

How to eliminate wrong answers

Option A is wrong because MFA cannot be configured via an organization policy; organization policies do not enforce authentication methods like MFA. Option B is wrong because Cloud Run cannot enforce MFA, and Cloud Monitoring alerts do not automatically revoke access for offboarded users. Option D is wrong because Cloud Functions deactivating IAM accounts is not a native or recommended approach; Cloud Audit Logs should be used to detect offboarded users, and the integration with HR systems is overly complex and not directly supported.

Full explanation →

33

MCQmedium

Based on the exhibit, what does the duration of 300s mean in this alerting policy?

A.The alert fires if CPU utilization is above 80% for at least 5 consecutive minutes.

B.The alert fires after 300 seconds of sustained CPU utilization above 80% with a count of 1.

C.The alert fires if CPU utilization averaged over 5 minutes exceeds 80%.

D.The alert fires if CPU utilization is above 80% for any 5-minute window in the last hour.

AnswerA

Duration is the minimum continuous time the condition must hold. This is correct.

Why this answer

In the PagerDuty alerting policy shown in the exhibit, the duration of 300s (5 minutes) defines the minimum period during which the CPU utilization must continuously exceed the 80% threshold before the alert fires. This prevents transient spikes from triggering unnecessary alerts. Option A correctly states that the alert fires only if CPU utilization is above 80% for at least 5 consecutive minutes, matching the policy's configuration.

Exam trap

Google Cloud often tests the distinction between 'sustained threshold' (all data points above the threshold for the duration) and 'average-based threshold' (mean over the window), and candidates frequently confuse the duration as a sliding window or a count-based condition instead of a consecutive period.

How to eliminate wrong answers

Option B is wrong because it incorrectly implies that the alert fires after 300 seconds of sustained CPU utilization above 80% with a count of 1, but the 'count' parameter in PagerDuty refers to the number of times the condition must be met within a window, not a separate condition; here, the duration alone defines the sustained period. Option C is wrong because it describes an average-based threshold (CPU utilization averaged over 5 minutes exceeds 80%), but the exhibit shows a 'threshold' condition, not an average; the alert fires only when every data point in the 300s window is above 80%, not when the average is above 80%. Option D is wrong because it introduces a 'last hour' window, which is not part of the policy; the duration of 300s is a fixed consecutive period, not a sliding window within an hour.

Full explanation →

34

MCQeasy

A data processing company runs nightly batch jobs that can tolerate interruptions. They want to minimize compute costs. Which compute option should they use?

A.VMs with committed use discounts

B.Preemptible VMs

C.Standard VMs with sustained use discounts

D.Sole-tenant nodes

AnswerB

Preemptible VMs offer significant cost savings (up to 80%) and are ideal for interruptible batch jobs.

Why this answer

Option D (Preemptible VMs) is correct because they are significantly cheaper and suitable for fault-tolerant workloads. Options A, B, and C are more expensive or require commitments.

Full explanation →

35

Multi-Selectmedium

An alerting policy for high CPU utilization on a VM is firing even when CPU is not high. The team suspects a misconfiguration. Which two possible issues should they check? (Choose two.)

Select 2 answers

A.The alert condition is using the average aggregation with a short alignment period.

B.The threshold is set too low compared to actual CPU usage.

C.The metric is being duplicated because multiple agents are running.

D.The alerting policy was created in a different project and not imported.

E.The VM is reporting metrics from a custom namespace instead of the standard agent.

AnswersA, C

A short alignment period makes the alert sensitive to brief spikes, causing false positives.

Why this answer

Option A is correct because using a short alignment period with average aggregation can cause the alert to fire on brief spikes in CPU utilization that do not represent sustained high usage. If the alignment period is too short (e.g., 1 minute), the alerting policy may trigger on transient bursts, even when the overall CPU load is low. This is a common misconfiguration in Google Cloud Monitoring (formerly Stackdriver) where the alignment period and aggregator settings must match the expected workload pattern.

Exam trap

Google Cloud often tests the misconception that a low threshold (Option B) is the cause of false positives, but the real issue is the alignment period and aggregation settings that amplify transient spikes, not the threshold value itself.

Full explanation →

36

MCQhard

Refer to the exhibit. The team observes that some requests are fast while others are slow. Both requests have identical payload and response. What is the most likely cause of the latency difference?

A.The fast request hit a cached response

B.The slow request had a larger response size

C.The fast request used a different load balancer

D.The slow request used a different HTTP method

AnswerA

The cacheHit field shows true for the fast request, indicating a cache hit reduced latency.

Why this answer

The fast request hit a cached response, meaning the reverse proxy or CDN served the response from its cache without forwarding the request to the origin server. This eliminates the round-trip time to the backend and the processing time on the origin, resulting in significantly lower latency. Since both requests have identical payloads and responses, caching is the most plausible explanation for the observed difference.

Exam trap

Google Cloud often tests the misconception that latency differences must be caused by network or server-side factors, when in fact caching is the most common and simplest explanation for identical requests with different response times.

How to eliminate wrong answers

Option B is wrong because the question explicitly states both requests have identical payload and response, so the response size cannot differ. Option C is wrong because using a different load balancer would not inherently cause a latency difference for identical requests; load balancers typically add minimal, consistent overhead. Option D is wrong because the HTTP method (e.g., GET vs POST) does not affect latency for identical payloads and responses; the method only changes semantics, not the network or processing time for the same data.

Full explanation →

37

MCQhard

A DevOps team is bootstrapping a Google Cloud organization. They have created a folder for a business unit and want to prevent users from moving projects out of that folder to other folders. Which organization policy constraint should they apply?

A.constraints/resourcemanager.allowedResourceRestrictions

B.constraints/resourcemanager.allowedPolicyMemberDomains

C.constraints/resourcemanager.allowedProjectParent

D.constraints/resourcemanager.disableProjectMove

AnswerD

This prevents projects from being moved.

Why this answer

Option D is correct because the `constraints/resourcemanager.disableProjectMove` organization policy constraint explicitly prevents users from moving projects out of a specified folder or organization. When applied at the folder level, this constraint blocks any move operation that would relocate a project to a different parent, ensuring projects remain within the designated business unit folder.

Exam trap

Google Cloud often tests the misconception that `allowedProjectParent` is a real constraint, but Google Cloud uses `disableProjectMove` instead, and candidates may confuse it with similar-sounding but unrelated constraints like `allowedPolicyMemberDomains`.

How to eliminate wrong answers

Option A is wrong because `constraints/resourcemanager.allowedResourceRestrictions` is not a valid Google Cloud organization policy constraint; the correct constraint for restricting resource types is `constraints/resourcemanager.allowedResourceTypes`. Option B is wrong because `constraints/resourcemanager.allowedPolicyMemberDomains` restricts which external domains can be added as members in IAM policies, not project move operations. Option C is wrong because `constraints/resourcemanager.allowedProjectParent` is not a real organization policy constraint; the actual constraint for controlling project parent is `constraints/resourcemanager.disableProjectMove`.

Full explanation →

38

MCQhard

A company's Cloud SQL for PostgreSQL instance is experiencing performance degradation. They observe a high number of idle connections and slow transaction commit times. Which combination of actions will most effectively address this issue?

A.Add a read replica and route read-only queries to it.

B.Configure statement timeout and use PgBouncer for connection pooling.

C.Increase the storage size and enable automatic backup.

D.Drop unused indexes and run VACUUM.

AnswerB

Statement timeout kills long-running queries; connection pooling reduces idle connections.

Why this answer

The high number of idle connections and slow transaction commit times indicate connection management and resource contention issues. PgBouncer reduces overhead by pooling and reusing database connections, while statement timeout prevents long-running queries from holding locks and consuming resources, directly addressing both symptoms.

Exam trap

Google Cloud often tests the misconception that performance degradation is always solved by scaling storage or adding replicas, when the real issue is connection management and query timeout configuration.

How to eliminate wrong answers

Option A is wrong because adding a read replica only offloads read traffic, but does not reduce idle connections or improve commit latency for write transactions. Option C is wrong because increasing storage size addresses disk space or I/O throughput, not connection overhead or transaction commit delays. Option D is wrong because dropping unused indexes and running VACUUM reclaims storage and improves query planning, but does not manage idle connections or reduce commit wait times caused by connection churn.

Full explanation →

39

Matchingmedium

Match each Cloud Monitoring metric type to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Cumulative value that only increases

Instantaneous measurement at a point in time

Statistical summary of values over time

Change in a counter over a time interval

Running total from start of observation

Why these pairings

Metric types used in Cloud Monitoring.

Full explanation →

40

MCQhard

A large enterprise manages costs across multiple GCP projects using a shared billing account. They want to apply a 10% discount on all committed use discounts (CUDs) to specific departments based on resource labels. How should they allocate the CUD savings?

A.Use the Cloud Billing report's 'Cost Attribution' feature to assign CUD savings based on labels

B.Enable 'CUD cost attribution' in the billing account settings to automatically distribute savings based on project labels

C.Use the Cloud Billing cost table in BigQuery with the CUD cost data and allocate using custom queries

D.Manually calculate CUD savings and distribute via journal entries

AnswerB

Google Cloud provides built-in CUD cost attribution that splits benefits across projects using labels.

Why this answer

Option D is correct because enabling CUD cost attribution in billing account settings automatically distributes CUD benefits based on labels. Option A is incorrect because 'Cost Attribution' is for custom cost allocation, not CUDs. Option B is manual and error-prone.

Option C is possible but not a built-in feature.

Full explanation →

41

MCQmedium

A web application frequently reads the same set of reference data from Cloud SQL. This causes high database load and slow responses. Which design change would most improve performance?

A.Implement caching with Memorystore for Redis

B.Increase Cloud SQL machine size

C.Add a read replica

D.Use Cloud Spanner for higher throughput

AnswerA

Caching frequently read data in memory drastically reduces database load and latency.

Why this answer

Implementing caching with Memorystore reduces database load by serving repeated reads from memory, which is faster than SQL queries. Increasing machine size or adding read replicas helps but still involves database I/O; Cloud Spanner is overkill for reference data.

Full explanation →

42

Multi-Selectmedium

A team is bootstrapping a new Google Cloud organization. Which TWO practices are recommended for managing project creation and resource hierarchy? (Choose two.)

Select 2 answers

A.Use a centralized service account to create projects via API.

B.Create a folder for each department to isolate resources.

C.Use a single project for all development environments.

D.Assign project creator role to all users by default.

E.Use organization policies to enforce naming conventions on projects.

AnswersB, E

Folders provide logical isolation.

Why this answer

Option B is correct because creating a folder for each department allows you to isolate resources, apply IAM policies at the folder level, and enforce organizational boundaries. This follows Google Cloud's recommended resource hierarchy best practices for multi-team environments, enabling delegated administration and cost tracking per department.

Exam trap

The trap here is that candidates often confuse 'centralized automation' (Option A) with best practices, but Google Cloud recommends using a project factory with a dedicated service account per automation scope rather than a single shared service account to avoid blast radius and credential management issues.

Full explanation →

43

MCQeasy

Which service is commonly used for time-series data and real-time analytics?

A.Bigtable

B.Cloud SQL

C.Firestore

D.Cloud Spanner

AnswerA

Correct. Bigtable handles time-series and real-time analytics at scale.

Why this answer

Bigtable is a fully managed, scalable NoSQL database optimized for large analytical and operational workloads, including time-series data and real-time analytics. It provides sub-10ms latency for high-throughput reads and writes, making it ideal for IoT, monitoring, and financial data streams. Its column-oriented storage and automatic sharding support efficient time-based queries and aggregation.

Exam trap

Google Cloud often tests the misconception that any NoSQL database (like Firestore) is suitable for time-series analytics, but Bigtable's specific architecture for high-throughput, low-latency time-ordered data is the key differentiator.

How to eliminate wrong answers

Option B (Cloud SQL) is wrong because it is a relational database for OLTP workloads, not designed for the high write throughput or time-series-specific optimizations needed for real-time analytics. Option C (Firestore) is wrong because it is a document-oriented NoSQL database for mobile and web apps, lacking native time-series features and optimized for real-time sync rather than analytical queries. Option D (Cloud Spanner) is wrong because it is a globally distributed relational database with strong consistency, but its overhead and cost make it unsuitable for the high-frequency, append-heavy patterns of time-series data.

Full explanation →

44

Multi-Selecteasy

Which TWO are benefits of using a shared VPC in a Google Cloud organization? (Choose 2)

Select 2 answers

A.Centralized management of network resources.

B.Eliminates the need for project administrators to have any IAM roles.

C.Ensures compliance with organizational policies.

D.Separation of network administration from project administration.

E.Automatically enables required APIs in all service projects.

AnswersA, D

Shared VPC allows network administrators to manage a common network.

Why this answer

Option A is correct because a shared VPC allows network resources (subnets, routes, firewalls) to be defined in a host project and consumed by multiple service projects, enabling centralized management. This reduces administrative overhead by having a single network team control the VPC configuration rather than each project managing its own.

Exam trap

The trap here is that candidates confuse 'separation of network administration from project administration' (Option D) with 'eliminating IAM roles' (Option B), or assume shared VPC automatically enforces compliance (Option C) or enables APIs (Option E), when in fact these require separate configuration.

Full explanation →

45

MCQeasy

During a Cloud Build execution, a step fails due to timeout. What is the first thing to check?

A.Check network connectivity

B.Check the build logs for errors

C.Increase machine type for the build

D.Increase the timeout in cloudbuild.yaml

AnswerB

Logs provide details on what caused the step to hang or fail.

Why this answer

When a Cloud Build step fails due to a timeout, the build logs are the first and most authoritative source of diagnostic information. They contain the exact error messages, exit codes, and step output that reveal why the step exceeded its timeout — for example, a hanging command, a missing dependency, or a resource contention issue. Checking logs before making any configuration changes ensures you address the root cause rather than treating symptoms.

Exam trap

Google Cloud often tests the misconception that a timeout is always a performance or configuration issue, leading candidates to jump to increasing resources or timeout values, when the correct first step is always to inspect the build logs for the actual error.

How to eliminate wrong answers

Option A is wrong because network connectivity issues typically manifest as connection refused or DNS resolution errors, not as a generic timeout; the logs would still show those errors, so checking logs first is still the correct step. Option C is wrong because increasing the machine type addresses performance bottlenecks but does not fix a timeout caused by an infinite loop, a stuck process, or a misconfigured command — the logs must be examined first to determine if the timeout is due to resource starvation. Option D is wrong because increasing the timeout in cloudbuild.yaml only masks the underlying problem; if a step is hanging indefinitely, a longer timeout will just delay the failure, and the logs must be checked to understand why the step is not completing within the original limit.

Full explanation →

46

MCQeasy

Your SLO for availability is 99.9% over a 30-day window. You want an alert that fires when the error budget burn rate is high, leaving less than 5% of the error budget remaining in the next 6 hours. What type of alerting policy should you configure?

A.A custom log-based metric.

B.A static threshold alert based on the error rate.

C.A burn rate alert based on the forecasted consumption.

D.An exponential decay alert.

AnswerC

Burn rate alerts track how fast the error budget is being used and can forecast depletion.

Why this answer

Option C is correct because a burn rate alert based on forecasted consumption directly monitors the rate at which the error budget is being consumed and triggers when the projected remaining budget falls below 5% within the next 6 hours. This aligns with Google SRE best practices for proactive error budget management, using a multi-window, multi-burn-rate approach to detect rapid consumption before the budget is exhausted.

Exam trap

Google Cloud often tests the distinction between static error rate thresholds and dynamic burn rate alerts, trapping candidates who think a simple error rate threshold can adequately protect an SLO without considering the time-based consumption of the error budget.

How to eliminate wrong answers

Option A is wrong because a custom log-based metric is used to extract and measure specific events from logs (e.g., counting 5xx errors), but it does not inherently provide burn rate forecasting or alerting on remaining budget over a time window. Option B is wrong because a static threshold alert based on error rate would fire when the error rate exceeds a fixed value, but it does not account for the error budget consumption rate or the remaining budget percentage over the next 6 hours, leading to either premature or missed alerts. Option D is wrong because an exponential decay alert is typically used for resource utilization metrics (e.g., CPU or memory) that decay over time, not for error budget burn rate, which requires linear or windowed consumption tracking.

Full explanation →

47

MCQeasy

A team is experiencing increased latency in their microservices application after a new deployment. They suspect a specific service is the bottleneck. Which tool should they use to identify the slowest service in the request path?

A.Cloud Profiler

B.Cloud Logging

C.Cloud Monitoring

D.Cloud Trace

AnswerD

Cloud Trace enables distributed tracing to identify slow services.

Why this answer

Cloud Trace is the correct tool because it provides end-to-end latency analysis by capturing trace spans from each microservice in a request path. It aggregates and visualizes the time spent in each service, allowing you to pinpoint the slowest service causing the bottleneck. This directly addresses the need to identify the specific service responsible for increased latency after a deployment.

Exam trap

Google Cloud often tests the distinction between profiling (Cloud Profiler) and tracing (Cloud Trace), where candidates mistakenly choose Cloud Profiler because they confuse 'profiling' with 'tracing' or think it can analyze request paths across services.

How to eliminate wrong answers

Option A is wrong because Cloud Profiler is designed for continuous profiling of CPU and memory usage to identify performance hotspots within a single service's code, not for tracing request latency across multiple services. Option B is wrong because Cloud Logging collects and stores log entries but does not provide distributed tracing or latency breakdowns across service boundaries. Option C is wrong because Cloud Monitoring focuses on metrics, alerts, and dashboards for overall system health and resource utilization, but it lacks the trace-level detail needed to isolate the slowest service in a request path.

Full explanation →

48

Multi-Selecthard

Which THREE steps are typically part of a formal incident postmortem according to Google SRE best practices?

Select 3 answers

A.Identify the person responsible for the incident.

B.Assign action items with deadlines.

C.Summarize the incident timeline.

D.Implement a solution immediately.

E.Determine contributing factors.

AnswersB, C, E

Action items ensure follow-up on improvements.

Why this answer

Option B is correct because Google SRE postmortems emphasize creating actionable follow-ups to prevent recurrence. Assigning action items with deadlines ensures that identified issues are systematically addressed, which is a core principle of blameless postmortems focused on process improvement rather than punishment.

Exam trap

Google Cloud often tests the misconception that postmortems are about assigning blame or immediate fixes, when in reality the PCDOE exam expects knowledge of blameless, data-driven retrospectives with actionable follow-ups as per Google SRE best practices.

Full explanation →

49

MCQmedium

You are a DevOps engineer for a SaaS company that provides a REST API. The API is deployed on Google Cloud Run. You have configured Cloud Monitoring alerts for 5xx errors. Recently, you received an alert that the error rate exceeded 5% for 5 minutes. You investigated and found that the errors were HTTP 503 (Service Unavailable) from a specific endpoint. The endpoint calls an internal Cloud SQL database. The database CPU utilization was at 90% during that period. You suspect the database is the bottleneck. Which action should you take to reduce the error rate without over-provisioning?

A.Implement connection pooling and retry logic with exponential backoff in the API service

B.Increase the max instances per revision in Cloud Run to handle more concurrent requests

C.Reduce the min instances of Cloud Run to decrease load on the database

D.Add a Cloud SQL read replica and route read queries to it

AnswerA

This reduces the number of simultaneous connections to the database and handles transient failures gracefully.

Why this answer

Option A is correct because implementing connection pooling and retry logic with exponential backoff directly addresses the database bottleneck without over-provisioning. Connection pooling reduces the number of concurrent connections to Cloud SQL, lowering CPU contention, while exponential backoff prevents thundering herd retries that could further overwhelm the database. This approach optimizes existing resources rather than scaling infrastructure.

Exam trap

Google Cloud often tests the misconception that scaling application instances (Cloud Run) is the default fix for backend bottlenecks, but the trap here is that increasing concurrency without addressing database connection limits can exacerbate the problem.

How to eliminate wrong answers

Option B is wrong because increasing Cloud Run max instances would amplify concurrent requests to the database, worsening CPU saturation and potentially increasing 503 errors. Option C is wrong because reducing min instances would decrease baseline capacity, causing cold starts and potentially increasing latency or errors under load, not reducing database CPU. Option D is wrong because adding a read replica only helps with read-heavy workloads, but the endpoint in question likely performs writes or mixed operations; moreover, replicas do not reduce write-related CPU load on the primary instance.

Full explanation →

50

MCQhard

Refer to the exhibit. After applying the shown firewall rule, users report increased latency to a web application. What is the most likely cause?

A.The rule priority is set to 1000, which is too low.

B.The rule contains both allow and deny for the same traffic, creating a conflict.

C.The source range covers all IPs, causing excessive traffic.

D.The firewall rule has logging enabled, which adds overhead.

AnswerB

A rule cannot have both allow and deny; this misconfiguration likely causes packets to be dropped or processed incorrectly.

Why this answer

Correct: The rule has both allow and deny with same ports, and the rule is contradictory; the deny overrides because deny rules are evaluated after allow? Actually in VPC firewall rules, allow and deny cannot both be specified in the same rule. This is an invalid combination. The rule may cause unexpected behavior.

Option A is wrong because logging alone does not cause latency. Option B is wrong because source range is all. Option D is wrong because priority is not high.

Full explanation →

51

MCQeasy

Which service provides built-in dashboards for Google Cloud services?

A.Cloud Shell

B.Cloud Console

C.Cloud Logging

D.Cloud Monitoring

AnswerD

Cloud Monitoring has built-in dashboards for Google Cloud services.

Why this answer

Cloud Monitoring (option D) provides built-in dashboards for Google Cloud services, offering pre-configured visualizations of metrics, logs, and alerts without requiring manual setup. These dashboards aggregate data from services like Compute Engine, Cloud SQL, and Kubernetes Engine, enabling real-time monitoring of resource utilization and performance. This aligns with the PCDOE domain of implementing service monitoring strategies by providing out-of-the-box observability.

Exam trap

Google Cloud often tests the distinction between Cloud Logging (for logs) and Cloud Monitoring (for metrics and dashboards), so the trap here is confusing the log storage and analysis service with the visualization and alerting service.

How to eliminate wrong answers

Option A is wrong because Cloud Shell is a browser-based command-line interface for managing Google Cloud resources, not a monitoring or dashboard service. Option B is wrong because Cloud Console is the web-based GUI for managing Google Cloud projects, but it does not provide built-in dashboards for monitoring; it relies on Cloud Monitoring for that functionality. Option C is wrong because Cloud Logging is a service for storing, searching, and analyzing log data, not for providing dashboards; it integrates with Cloud Monitoring for visualization.

Full explanation →

52

MCQeasy

A batch data processing job on Cloud Dataflow is running slower than expected. Which action will most directly increase throughput?

A.Enable Streaming Engine for the pipeline

B.Enable autoscaling

C.Increase the number of workers in the pipeline

D.Use FlexRS pricing model

AnswerC

More workers enable greater parallelism, increasing the processing rate.

Why this answer

Increasing the number of workers directly increases the parallelism of the pipeline, allowing more data to be processed concurrently. In Cloud Dataflow, throughput is limited by the number of available worker slots; adding workers raises the total processing capacity. This is the most direct action to increase throughput when the pipeline is CPU-bound or I/O-bound and underutilizing existing resources.

Exam trap

Google Cloud often tests the misconception that autoscaling (Option B) is a direct performance lever, when in fact it is a reactive mechanism that adjusts resources based on current utilization, not a proactive action to immediately boost throughput.

How to eliminate wrong answers

Option A is wrong because Streaming Engine is designed to improve streaming pipeline performance by offloading state storage to backend services, but it does not directly increase throughput for batch jobs; it may even add latency for batch pipelines. Option B is wrong because enabling autoscaling adjusts the number of workers dynamically based on utilization, but it does not guarantee an immediate increase in throughput—it only reacts to current load and may take time to scale up. Option D is wrong because FlexRS pricing model is a cost-saving option that provides discounts for flexible resource scheduling, but it does not affect pipeline throughput or performance.

Full explanation →

53

MCQeasy

A DevOps engineer needs to verify if a load balancer's health check is behaving normally by examining historical trends. Where should they look?

A.Cloud Monitoring Metrics Explorer

B.Cloud Logging

C.Cloud Console health check page

D.Cloud Load Balancing logs

AnswerA

Metrics Explorer stores health check metrics for historical analysis.

Why this answer

Cloud Monitoring Metrics Explorer is the correct place to examine historical trends of a load balancer's health check because it provides time-series metrics such as `https/backend_request_count`, `https/backend_latencies`, and health check probe success/failure counts. These metrics can be queried over custom time ranges and aggregated to detect anomalies or degradation patterns, which is exactly what the question asks for — verifying historical trends of health check behavior.

Exam trap

The trap here is that candidates confuse Cloud Logging (which shows raw events) with Cloud Monitoring Metrics (which shows aggregated trends), leading them to choose Cloud Logging or the health check page when the question explicitly asks for historical trends.

How to eliminate wrong answers

Option B is wrong because Cloud Logging stores discrete log entries (e.g., individual health check probe results or request logs), not aggregated time-series metrics; it is designed for troubleshooting specific events, not for analyzing historical trends over time. Option C is wrong because the Cloud Console health check page shows the current status and recent results of health checks, but it does not provide historical trend data or allow you to examine patterns over days or weeks. Option D is wrong because Cloud Load Balancing logs (e.g., access logs) record individual requests and responses, not health check probe metrics; they are useful for traffic analysis but not for monitoring the health check mechanism itself over time.

Full explanation →

54

Multi-Selecthard

You are designing SLO monitoring for a high-traffic e-commerce platform. Which three best practices should you follow? (Choose three.)

Select 3 answers

A.Use multiple SLOs for different critical user journeys.

B.Monitor error budgets and alert when depletion is imminent.

C.Use SLI metrics that align with user experience, like request latency and errors.

D.Use a single global SLO for all customer segments.

E.Set the SLO to 100% to ensure maximum reliability.

AnswersA, B, C

Separate SLOs allow targeted monitoring and alerting for each journey's requirements.

Why this answer

Option A is correct because using multiple SLOs for different critical user journeys (e.g., checkout, product search, login) allows you to tailor reliability targets to the specific performance and availability needs of each workflow. This granularity prevents a single, coarse SLO from masking issues that affect only a subset of users, enabling more precise monitoring and faster incident response.

Exam trap

Google Cloud often tests the misconception that a single, high-level SLO is sufficient for monitoring, when in reality multiple SLOs aligned to user journeys are required to detect partial outages that affect specific critical paths.

Full explanation →

55

MCQhard

Refer to the exhibit. A team is troubleshooting a pod crash loop. Based on the exhibit, which infrastructure change should be prioritized to resolve the issue and optimize service performance?

A.Increase the pod's CPU request

B.Increase the maximum number of pods per node

C.Mount a ConfigMap or volume containing the missing file

D.Enable pod anti-affinity

AnswerC

Correct. Providing the missing file resolves the error.

Why this answer

The exhibit indicates a pod crash loop caused by a missing file, which is a configuration issue rather than a resource or scheduling problem. Mounting a ConfigMap or volume that provides the missing file directly resolves the crash by ensuring the pod has the required configuration at startup. This fix also optimizes service performance by eliminating unnecessary restarts and allowing the pod to serve traffic consistently.

Exam trap

The trap here is that candidates often assume a crash loop is always due to resource constraints or scheduling issues, but Cisco tests the ability to identify configuration-related failures by reading pod logs or events that explicitly mention a missing file.

How to eliminate wrong answers

Option A is wrong because increasing the CPU request does not address a missing file; it only guarantees CPU resources, which is irrelevant to a configuration-related crash loop. Option B is wrong because increasing the maximum number of pods per node does not fix the missing file; it only allows more pods on a node, which could worsen resource contention without resolving the root cause. Option D is wrong because enabling pod anti-affinity affects pod placement and distribution across nodes, but it does not provide the missing configuration file needed to prevent the crash loop.

Full explanation →

56

MCQmedium

A Cloud Spanner database is experiencing slow query performance. Which approach should be taken to optimize read performance without compromising consistency?

A.Increase the number of Spanner nodes to boost throughput

B.Use interleaved tables to store related rows together

C.Add secondary indexes and use read-only transactions for read queries

D.Migrate the data to Cloud Bigtable for better read performance

AnswerC

Secondary indexes avoid full table scans, and read-only transactions bypass lock overhead, improving read performance.

Why this answer

Option C is correct because secondary indexes in Cloud Spanner allow efficient lookup of rows by non-key columns, avoiding full table scans. Read-only transactions provide a consistent snapshot of data without locking, which optimizes read performance while maintaining strong consistency.

Exam trap

Google Cloud often tests the misconception that adding nodes always improves read performance, when in fact read optimization in Spanner relies on proper indexing and transaction type selection, not just scaling compute resources.

How to eliminate wrong answers

Option A is wrong because increasing Spanner nodes primarily improves write throughput and storage capacity, not read performance for individual queries; read performance is more dependent on index usage and query design. Option B is wrong because interleaved tables optimize for parent-child row locality and reduce join costs, but they do not directly improve read performance for arbitrary queries on non-key columns. Option D is wrong because migrating to Cloud Bigtable would sacrifice Spanner's strong consistency and transactional capabilities, which contradicts the requirement to not compromise consistency.

Full explanation →

57

Multi-Selecteasy

Which TWO options are valid methods to trigger a Cloud Build build when code is pushed to a Cloud Source Repository?

Select 2 answers

A.Use a Cloud Pub/Sub message sent to the Cloud Build topic to create a build.

B.Have Cloud Logging monitor the repository and create a build via a logs-based metric.

C.Set up a Cloud Pub/Sub subscription that forwards events to Cloud Build.

D.Configure a Cloud Build trigger with a push event and a branch filter.

E.Configure Cloud Scheduler to call the Cloud Build API periodically.

AnswersA, D

Cloud Build can be triggered by Pub/Sub messages via a trigger.

Why this answer

Options A and D are correct. Cloud Build triggers can be based on branch or tag push, and can also be triggered via Pub/Sub message. Option B is incorrect - Cloud Build does not directly listen to Cloud Pub/Sub subscriptions without a trigger.

Option C is incorrect - builds are not automatically triggered by Cloud Logging. Option E is incorrect - Cloud Scheduler cannot directly trigger a build without a Pub/Sub or HTTP intermediary.

Full explanation →

58

Multi-Selectmedium

Which THREE are benefits of migrating from Jenkins to Cloud Build?

Select 3 answers

A.No server maintenance

B.Support for any VCS

C.Pay per use

D.Built-in security scanning

E.Native Google Cloud integration

AnswersA, C, E

Cloud Build is serverless.

Why this answer

Option A is correct because Cloud Build is a fully managed CI/CD service that runs on Google Cloud's infrastructure, eliminating the need for users to provision, configure, or maintain build servers. Unlike Jenkins, which requires ongoing administration of the Jenkins master and agent nodes, Cloud Build automatically scales resources and handles underlying server maintenance, including OS patches and hardware failures.

Exam trap

Google Cloud often tests the misconception that Cloud Build supports any VCS or includes built-in security scanning, when in reality it has specific VCS integrations and relies on external tools for security scanning.

Full explanation →

59

MCQmedium

A government agency is bootstrapping a Google Cloud organization with strict compliance requirements. They must: (1) store all logs in a centralized project with retention of 7 years, (2) ensure no data leaves the United States, (3) use customer-managed encryption keys (CMEK) for all persistent disks and buckets, (4) automatically reject any resource creation outside allowed regions (us-central1 and us-east1). They have an existing on-premises SIEM that needs to receive logs via Pub/Sub. The network team wants to use Shared VPC. What is the correct order of steps to implement this?

A.Set up Pub/Sub for SIEM first, then create log sink, then apply org policies.

B.Create all projects first, then move them into folders, then apply org policies.

C.Create Shared VPC first, then set up org policies, then create log sink, then create folders.

D.Create folder hierarchy, set org policies for allowed regions and CMEK, configure aggregated log sink to centralized project, set up Shared VPC, and create Pub/Sub topic for SIEM.

AnswerD

Logical order: hierarchy, policies, logging, network, then external integration.

Why this answer

Option D is correct because it follows the recommended Google Cloud landing zone bootstrap order: first establish the folder hierarchy to apply organization policies (like constraints/compute.restrictResourceCreation and constraints/gcp.resourceLocations) and CMEK requirements at the correct level, then create the aggregated log sink to the centralized project with a 7-year retention bucket, set up Shared VPC for network isolation, and finally create the Pub/Sub topic for the SIEM. This sequence ensures that policies are inherited before resources are created, preventing non-compliant resource creation and ensuring logs are captured from the start.

Exam trap

Google Cloud often tests the order of operations in a Google Cloud landing zone, and the trap here is that candidates think they can create projects or Shared VPC first, not realizing that organization policies must be applied to the folder hierarchy before any resources are created to enforce compliance from the start.

How to eliminate wrong answers

Option A is wrong because it attempts to set up Pub/Sub and log sinks before applying organization policies, which would allow non-compliant resources to be created and logs to be stored without the required CMEK or regional constraints. Option B is wrong because creating projects before folders and org policies means the projects are not under the correct folder hierarchy, so they cannot inherit the required allowed regions and CMEK policies, leading to potential compliance violations. Option C is wrong because creating Shared VPC before org policies and log sinks would allow network resources to be created in unallowed regions and logs to be stored without CMEK, and creating folders after policies is out of order since folders must exist to scope policies correctly.

Full explanation →

60

Multi-Selectmedium

When bootstrapping a Google Cloud organization for DevOps, which THREE steps are essential to set up a secure CI/CD foundation using Cloud Build?

Select 3 answers

A.Create a Cloud Source Repository for each application's code.

B.Set up a trigger that automatically builds on each commit to the main branch.

C.Disable default service accounts in all projects.

D.Enable the Cloud Build API and grant the Cloud Build service account the necessary roles (e.g., Cloud Run Admin, Artifact Registry Writer).

E.Configure VPC Service Controls for all projects.

AnswersA, B, D

Repositories are needed to host source code and trigger builds.

Why this answer

Creating a Cloud Source Repository for each application's code is essential because it provides a dedicated, version-controlled repository that integrates natively with Cloud Build triggers. This ensures that each application has its own isolated codebase, enabling precise CI/CD pipelines without cross-application interference.

Exam trap

Google Cloud often tests the distinction between 'essential' steps for a specific service (Cloud Build) versus general security best practices, leading candidates to select VPC Service Controls or disabling default service accounts as mandatory when they are actually optional or later-stage configurations.

Full explanation →

61

MCQhard

A team is implementing a CI/CD pipeline for a Kubernetes application. They want to use canary deployments with traffic splitting. Which tool or service is best suited for this?

A.Cloud Run

B.Cloud Build

C.Cloud Deploy with Skaffold

D.Anthos Config Management

AnswerC

Cloud Deploy provides native support for canary deployments and traffic splitting via Skaffold.

Why this answer

Cloud Deploy with Skaffold is best suited for canary deployments with traffic splitting because Cloud Deploy provides built-in support for progressive delivery strategies, including canary and blue-green deployments, and integrates with Skaffold to manage Kubernetes manifests and traffic routing via service mesh or ingress controllers. Skaffold handles the build and deploy stages, while Cloud Deploy orchestrates the rollout with automated traffic splitting and promotion/rollback logic, making it the only option that directly addresses the requirement.

Exam trap

The trap here is that candidates often confuse Cloud Build (a build tool) with a full deployment orchestrator, or assume Cloud Run can handle Kubernetes canary deployments, but Cloud Deploy with Skaffold is the only option that provides native, automated traffic splitting for Kubernetes applications.

How to eliminate wrong answers

Option A is wrong because Cloud Run is a serverless compute platform for stateless containers, not a CI/CD pipeline tool, and it does not support traffic splitting for canary deployments in the same way as Kubernetes-based progressive delivery. Option B is wrong because Cloud Build is a CI/CD build and test service that compiles source code and creates artifacts, but it lacks native support for orchestrating canary deployments or traffic splitting on Kubernetes. Option D is wrong because Anthos Config Management is a policy and configuration management tool for enforcing cluster state and GitOps workflows, not a deployment pipeline service that handles traffic splitting or canary rollouts.

Full explanation →

62

MCQmedium

A company has an application that experiences intermittent errors. They want to be notified immediately when the error rate exceeds 1% of total requests. What should they implement?

A.Create an uptime check pointing to the application endpoint

B.Create a log-based metric counting error logs and set an alerting policy in Cloud Monitoring

C.Use Cloud Trace to analyze latency and set an alert on trace spans

D.Create a dashboard showing error count over time

AnswerB

This directly monitors error rate from logs and alerts on threshold.

Why this answer

Option B is correct because the requirement is to be notified when the error rate exceeds 1% of total requests, which requires a metric that counts error logs relative to total requests. A log-based metric in Cloud Logging can filter for error log entries (e.g., status codes 5xx), and an alerting policy in Cloud Monitoring can trigger when the ratio of error logs to total requests surpasses 1%. This directly addresses the intermittent error rate condition with precise threshold-based alerting.

Exam trap

Google Cloud often tests the distinction between monitoring availability (uptime checks) and monitoring error rates (log-based metrics), leading candidates to mistakenly choose uptime checks for error rate detection.

How to eliminate wrong answers

Option A is wrong because an uptime check only verifies that the application endpoint is reachable and responds (e.g., HTTP 200), but it cannot detect intermittent errors within successful responses or calculate an error rate percentage. Option C is wrong because Cloud Trace analyzes latency and trace spans, not error counts or error rates; it is designed for performance troubleshooting, not real-time error rate alerting. Option D is wrong because a dashboard showing error count over time provides visualization but does not include alerting or notification capabilities; it requires manual monitoring and cannot trigger immediate notifications when the error rate exceeds 1%.

Full explanation →

63

MCQhard

An organization uses Cloud CDN with an HTTP(S) Load Balancer to serve static content. They observe that cache hit ratio is lower than expected. The content is immutable and has long Cache-Control headers. What is the most likely cause?

A.The requests include unique query parameters like session IDs.

B.The Cache-Control max-age is set too short.

C.The load balancer is configured with SSL termination.

D.The content is served using signed URLs with expiration.

AnswerA

Query parameters create different cache keys, reducing hits.

Why this answer

When Cloud CDN serves content with long Cache-Control headers but unique query parameters (like session IDs) are appended to each request, the cache treats each URL as a distinct object. This causes cache misses because the load balancer forwards requests with different query strings to the origin, preventing the CDN from serving cached responses. The correct answer is A because this behavior directly undermines cache efficiency despite proper cache headers.

Exam trap

Google Cloud often tests the misconception that cache hit ratio is solely determined by Cache-Control headers, ignoring that URL uniqueness (especially query parameters) overrides caching behavior.

How to eliminate wrong answers

Option B is wrong because the question states the content has long Cache-Control headers, so a short max-age is not the issue. Option C is wrong because SSL termination at the load balancer does not affect cache hit ratio; it only handles encryption/decryption. Option D is wrong because signed URLs with expiration are used for access control, not caching; they do not inherently reduce cache hits unless the URL changes per request, which is not implied.

Full explanation →

64

Drag & Dropmedium

Order the steps to configure a VPC Network Peering between two projects.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Create networks, set firewalls, create peering in both directions, verify.

Full explanation →

65

MCQhard

During the bootstrapping of a Google Cloud organization, the DevOps team wants to implement a policy that prevents the deletion of certain resources, such as Cloud Storage buckets or Cloud SQL instances, unless a specific approval process is followed. Which approach best achieves this goal?

A.Configure Cloud Source Repositories to require code review for any changes to Terraform configurations that delete resources.

B.Implement Binary Authorization to require approvals for any delete commands.

C.Use Resource Manager locks on projects and set up a Cloud Function that triggers on audit logs to require approval before removing the lock.

D.Use VPC Service Controls to block delete operations on specific services.

AnswerC

Locks prevent deletion; Cloud Functions can automate approval workflows.

Why this answer

Option C is correct because Resource Manager locks prevent accidental deletion of critical resources by placing a deletion prevention lock on the project or resource hierarchy. By combining this with a Cloud Function that monitors audit logs for lock removal attempts and requires an approval workflow before the lock is removed, the team enforces a controlled approval process for any deletion, meeting the policy requirement precisely.

Exam trap

The trap here is that candidates may confuse Binary Authorization (which handles container deployment approvals) with a general-purpose approval system, or assume VPC Service Controls can block deletion when they are actually focused on data exfiltration prevention.

How to eliminate wrong answers

Option A is wrong because Cloud Source Repositories and code review only control changes to Terraform configurations, not the actual deletion of resources; a user could still delete resources via the console or API without touching Terraform. Option B is wrong because Binary Authorization is designed for container image deployment approvals, not for resource deletion operations; it cannot intercept or approve delete commands on Cloud Storage buckets or Cloud SQL instances. Option D is wrong because VPC Service Controls are used to define security perimeters around data access and exfiltration, not to block delete operations; they restrict data movement but do not prevent resource deletion.

Full explanation →

66

Multi-Selecteasy

A company wants to implement a DevOps culture in their new Google Cloud organization. Which THREE practices align with Google's DevOps principles? (Choose three.)

Select 3 answers

A.Monitor systems with telemetry and logs.

B.Implement continuous integration and delivery.

C.Use a monolithic architecture to simplify deployments.

D.Centralize all operations in a single team.

E.Conduct post-mortems without blame.

AnswersA, B, E

Observability is key.

Why this answer

Option A is correct because Google's DevOps principles emphasize observability through telemetry and logs to gain insight into system behavior, enabling data-driven decisions and rapid troubleshooting. In Google Cloud, this aligns with services like Cloud Monitoring and Cloud Logging, which collect metrics and logs from resources such as Compute Engine instances and GKE clusters, supporting the 'monitor and improve' feedback loop central to DevOps.

Exam trap

Google Cloud often tests the misconception that monolithic architectures simplify deployments in a DevOps context, but the trap here is that Google's principles favor microservices and decoupled releases to reduce deployment risk and enable continuous delivery.

Full explanation →

67

Drag & Dropmedium

Order the steps to deploy a new version of a microservice to Google Kubernetes Engine using a rolling update.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Update the manifest, apply it, monitor, verify, and roll back if needed.

Full explanation →

68

MCQeasy

During an incident, a DevOps engineer needs to temporarily increase the capacity of a Google Kubernetes Engine (GKE) cluster to handle the traffic surge. Which approach minimizes manual intervention and follows Google best practices?

A.Enable cluster autoscaler and update the horizontal pod autoscaler to scale faster.

B.Manually add a node pool with larger machines via the Google Cloud Console.

C.Create a new node pool and migrate pods using kubectl drain.

D.Scale the existing node pool by increasing the maximum node count in the cluster.

AnswerA

Cluster autoscaler adds nodes automatically; HPA scales pods.

Why this answer

Option A is correct because enabling the cluster autoscaler automatically adjusts the number of nodes in the node pool based on resource demands, while updating the horizontal pod autoscaler (HPA) to scale faster (e.g., reducing the stabilization window or increasing the target CPU utilization threshold) allows pods to replicate more quickly. This combination minimizes manual intervention by automating both pod-level and node-level scaling, aligning with Google's best practices for handling traffic surges in GKE.

Exam trap

The trap here is that candidates often confuse simply increasing the maximum node count (Option D) with enabling the cluster autoscaler, assuming that raising the cap alone triggers automatic scaling, when in fact the cluster autoscaler must be explicitly enabled to add nodes based on demand.

How to eliminate wrong answers

Option B is wrong because manually adding a node pool with larger machines via the Google Cloud Console requires human intervention and does not automate scaling, violating the goal of minimizing manual effort. Option C is wrong because creating a new node pool and migrating pods using kubectl drain is a manual, multi-step process that does not leverage GKE's native autoscaling capabilities and introduces operational overhead. Option D is wrong because simply increasing the maximum node count in the cluster does not trigger automatic scaling; it only sets an upper limit, and without the cluster autoscaler enabled, nodes will not be added automatically in response to traffic surges.

Full explanation →

69

MCQeasy

A team wants to receive a notification when their monthly spending exceeds 80% of the budget. Which GCP feature should they configure?

A.Committed use discounts.

B.Budget alert with threshold rule at 0.8.

C.Cloud Storage bucket for cost exports.

D.Cloud Monitoring cost insights dashboard.

AnswerB

Budget alerts can be configured with threshold percentages to notify when spend reaches that level.

Why this answer

Budget alerts with threshold rules trigger notifications when spending reaches specified percentages, such as 80%.

Full explanation →

70

MCQeasy

A small team is setting up a Google Cloud organization for their DevOps pipeline. They have zero existing projects. Their planned architecture uses Cloud Build for CI/CD, Cloud Source Repositories for code, and Artifact Registry for images. They want to ensure that developers can only deploy to the production environment after code review and approval. They also want to automatically trigger builds on commits to the main branch. Which of the following is the most efficient way to implement this?

A.Use a single trigger and a build step that checks for a label in the commit to decide if deployment is allowed.

B.Use a Cloud Function to deploy code after a Pub/Sub message from a code review tool.

C.Create separate Cloud Build triggers for development and production, and use manual approval steps in the production trigger.

D.Set up a Cloud Build trigger on the main branch without any approval, and rely on code review outside Google Cloud.

AnswerC

Allows approval for production only.

Why this answer

Option C is correct because it uses separate Cloud Build triggers for development and production, allowing the production trigger to include a manual approval step. This enforces that code review and approval occur before deployment to production, while the development trigger can automatically build on commits to the main branch. Cloud Build's approval gates are the native way to require human sign-off before executing a build, aligning with the requirement for controlled production deployments.

Exam trap

Google Cloud often tests the misconception that a single trigger with conditional logic (like checking a label or commit message) can replace native approval mechanisms, but Cloud Build's approval gates are the only way to enforce mandatory human sign-off before deployment.

How to eliminate wrong answers

Option A is wrong because relying on a commit label for deployment decisions is insecure and bypasses the required code review and approval process; labels can be set by any developer with push access, and Cloud Build does not enforce review before the build runs. Option B is wrong because using a Cloud Function triggered by a Pub/Sub message from a code review tool adds unnecessary complexity and does not integrate natively with Cloud Build's CI/CD pipeline; it also lacks the automatic trigger on commits to the main branch that the team wants. Option D is wrong because setting up a Cloud Build trigger on the main branch without any approval violates the requirement that developers can only deploy to production after code review and approval; relying on external code review outside Google Cloud does not enforce the approval step within the pipeline.

Full explanation →

71

MCQmedium

Your organization has multiple teams that need to deploy infrastructure using Terraform. You want to enforce that all Terraform state files are stored in a central Cloud Storage bucket with versioning enabled. You also need to ensure that only the CI/CD pipeline can write to the bucket. What is the best way to enforce this?

A.Use VPC Service Controls to restrict access to the bucket from only the CI/CD pipeline's VPC.

B.Create a custom IAM role with permissions to write to the bucket and assign it to the CI/CD service account.

C.Grant the Storage Object Admin role to the CI/CD service account at the bucket level.

D.Use IAM conditions to restrict access to the bucket only when the requester is the CI/CD service account.

AnswerD

IAM conditions can enforce that only the specific service account can write.

Why this answer

Option D is correct because IAM conditions allow you to bind a role to a principal (the CI/CD service account) and then restrict access based on attributes of the request, such as the requester's identity. By using a condition that checks `iam.googleapis.com/principal` equals the CI/CD service account email, you ensure that only that specific service account can write to the bucket, even if other principals have the role. This enforces the policy without relying on network constructs or overly broad role assignments.

Exam trap

Google Cloud often tests the distinction between identity-based access control (IAM conditions) and network-based controls (VPC Service Controls), leading candidates to choose VPC SC when the requirement is to restrict by identity, not network.

How to eliminate wrong answers

Option A is wrong because VPC Service Controls restrict access based on network origin (VPC perimeter), not the identity of the requester; the CI/CD pipeline might run outside the VPC or use a different network path, and VPC SC does not enforce that only the CI/CD service account can write. Option B is wrong because creating a custom IAM role with write permissions and assigning it to the CI/CD service account does not prevent other principals from being granted the same role or similar permissions on the bucket; it lacks a condition to exclusively limit writes to that service account. Option C is wrong because granting the Storage Object Admin role to the CI/CD service account at the bucket level gives that service account full write access, but it does not restrict writes to only that service account; any other principal with the same role or inherited permissions could also write.

Full explanation →

72

MCQmedium

A cloud operations team is implementing monitoring for a microservices application deployed on Compute Engine. They want to create a custom dashboard in Cloud Monitoring that shows the 99th percentile latency of a specific service over the last hour. Which combination of Cloud Monitoring features should they use?

A.Use a gauge metric with the max alignment function in a Metrics Explorer chart.

B.Use a distribution metric with the 99th percentile alignment function in a Metrics Explorer chart.

C.Use an uptime check metric and configure the latency percentile in the chart.

D.Create a logs-based metric from application logs and use the count alignment.

AnswerB

Distribution metrics support percentile alignments like 99th percentile.

Why this answer

Option B is correct because Cloud Monitoring's distribution metrics inherently store a histogram of values, allowing percentile calculations like the 99th percentile. By selecting the 99th percentile alignment function in a Metrics Explorer chart, the dashboard directly computes and displays the desired latency threshold from the distribution data over the specified time window.

Exam trap

Google Cloud often tests the distinction between metric types (gauge vs. distribution) and the specific alignment functions available in Cloud Monitoring, trapping candidates who confuse max alignment with percentile calculation or assume uptime checks can measure internal service latency.

How to eliminate wrong answers

Option A is wrong because gauge metrics represent a single instantaneous value and cannot compute percentiles; the max alignment function only returns the maximum value, not a percentile. Option C is wrong because uptime check metrics measure availability and response time from external probes, not the internal 99th percentile latency of a specific microservice, and they lack a built-in percentile configuration. Option D is wrong because a logs-based metric with count alignment counts log entries, not latency values, and cannot derive percentile latency from application logs without additional distribution metric configuration.

Full explanation →

73

MCQhard

An organization is bootstrapping their Google Cloud environment and wants to implement a shared VPC for DevOps workloads. The network team manages the host project, while DevOps teams have service projects. They need to ensure that DevOps teams can create resources in their service projects that use the shared VPC, but they cannot change the host project's network configuration. Which IAM roles should be granted to the DevOps team's service account on the host project?

A.roles/compute.securityViewer

B.roles/compute.admin

C.roles/compute.networkAdmin

D.roles/compute.networkUser

AnswerD

This role allows using existing networks and subnets but not modifying them.

Why this answer

The DevOps team needs to use the shared VPC's resources (e.g., subnets) from their service projects without modifying the host project's network configuration. The `roles/compute.networkUser` role grants permission to use existing networks and subnets in the host project, but not to create, modify, or delete them. This aligns with the principle of least privilege for a shared VPC setup.

Exam trap

The trap here is that candidates often confuse `roles/compute.networkUser` with `roles/compute.networkAdmin`, assuming that using a shared VPC requires administrative privileges, when in fact the `networkUser` role is the minimal permission needed to consume network resources without managing them.

How to eliminate wrong answers

Option A is wrong because `roles/compute.securityViewer` only allows read-only access to security policies and firewall rules, not the ability to use subnets or create resources in the shared VPC. Option B is wrong because `roles/compute.admin` grants full control over all Compute Engine resources, including the ability to modify the host project's network configuration, which violates the requirement that DevOps teams cannot change the host project's network. Option C is wrong because `roles/compute.networkAdmin` allows creating, modifying, and deleting networks and subnets in the host project, which is excessive and would let DevOps teams alter the shared VPC configuration.

Full explanation →

74

MCQhard

Refer to the exhibit. A DevOps engineer configured a budget alert with Pub/Sub notifications. However, the team is not receiving alerts. The Pub/Sub subscription is set up correctly and messages can be published manually. What is the most likely reason the budget alerts are not being sent?

A.The notificationsRule does not include email recipients

B.The budget amount is specified in units without considering fractional cents

C.The threshold rules use different spend basis (CURRENT vs FORECASTED), causing inconsistency

D.The budget filter excludes credits, so actual spend might be lower than threshold

AnswerD

Credits can offset spend, so the monitored spend (excluding credits) might be below the threshold even if total spend is above.

Why this answer

Option A is correct because the budget filter excludes all credits. If credits reduce the actual spend below the thresholds, alerts may not fire. Option B is not a problem; different spend bases are allowed.

Option C is irrelevant because Pub/Sub is configured. Option D is not related.

Full explanation →

75

MCQhard

A company uses Cloud Build with a custom service account that has minimal permissions. The build needs to deploy to Cloud Run. After configuring the service account with roles/run.admin, the build fails with 'Permission denied' on gcloud run deploy. What is the most likely cause?

A.The build service account lacks roles/iam.serviceAccountUser on the Cloud Run runtime service account.

B.The Cloud Run service account is not set.

C.Cloud Build requires the roles/cloudbuild.serviceAgent role.

D.The build configuration file has syntax errors.

AnswerA

Cloud Build needs to impersonate the runtime service account; roles/iam.serviceAccountUser is required.

Why this answer

Option B is correct because deploying to Cloud Run requires the iam.serviceAccountUser role on the runtime service account to impersonate it. Option A is not relevant. Option C is a different error.

Option D is not required.

Full explanation →

Page 1 of 7

All pages

Practice PCDOE by domain

Target a specific domain to shore up weak areas.

Bootstrapping a Google Cloud organization for DevOps Managing service incidents Managing Google Cloud costs Building and implementing CI/CD pipelines Implementing service monitoring strategies Optimizing service performance

See all domains with question counts →