Knowledge + Practice

Google Professional Cloud DevOps Engineer (PCDOE) — Questions 226–300

500 questions total · 7pages · All types, answers revealed

Take a mock exam Exam hub

Page 4 of 7

226

Multi-Selecthard

Which THREE strategies can reduce API latency in Apigee?

Select 3 answers

A.Disable TLS termination to reduce overhead.

B.Enable response compression in the API proxy.

C.Use connection pooling for backend services.

D.Implement caching for frequently accessed responses.

E.Increase the analytics data collection frequency.

AnswersB, C, D

Compresses responses to reduce payload size and transfer time.

Why this answer

Option B is correct because enabling response compression (e.g., gzip) reduces the size of payloads transferred over the network, directly lowering latency by decreasing transmission time. Apigee supports automatic compression via the AssignMessage policy or by setting the Accept-Encoding header, which is especially effective for text-based responses like JSON or XML.

Exam trap

Google Cloud often tests the misconception that disabling security features like TLS reduces latency, but the exam expects you to recognize that security is non-negotiable and that the real performance gains come from compression, connection reuse, and caching.

Full explanation →

227

MCQeasy

You are setting up alerting for a batch processing job that runs daily on Compute Engine. The job must complete within 2 hours. Which metric and alert condition should you use to ensure you are notified if the job is still running after 90 minutes?

A.Alert on CPU utilization greater than 80% for the instance running the job

B.Create a custom metric that emits 1 when the job starts and 0 when it finishes; alert if the metric is 1 for more than 90 minutes

C.Use a heartbeat metric that reports every 5 minutes; alert if no heartbeat for 90 minutes

D.Set up a log-based metric that counts job completion log entries; alert if the count is zero after 90 minutes

AnswerB

This directly measures job duration and triggers an alert if it exceeds 90 minutes.

Why this answer

Option B is correct because it directly monitors the job's running state using a custom metric that emits 1 at job start and 0 at completion. By alerting when the metric remains at 1 for more than 90 minutes, you are notified if the job exceeds the 90-minute threshold, ensuring you catch failures before the 2-hour deadline. This approach is precise and avoids false positives from indirect signals like CPU or logs.

Exam trap

Google Cloud often tests the distinction between direct state monitoring (custom metric with start/end signals) and indirect signals (CPU, heartbeats, log counts), where candidates mistakenly choose an indirect metric that seems plausible but fails to accurately capture the specific condition of 'still running after 90 minutes'.

How to eliminate wrong answers

Option A is wrong because CPU utilization greater than 80% does not reliably indicate that the job is still running; the job could be idle or waiting on I/O while CPU is low, or CPU could be high due to unrelated processes, leading to false alarms or missed alerts. Option C is wrong because a heartbeat metric that reports every 5 minutes would alert if no heartbeat for 90 minutes, but the job could still be running and sending heartbeats even after 90 minutes, or it could fail early and stop heartbeats, missing the specific condition of 'still running after 90 minutes'. Option D is wrong because a log-based metric counting job completion log entries would alert if the count is zero after 90 minutes, but this only detects that the job has not completed; it does not distinguish between a job that is still running and one that never started or failed silently, and it may also be delayed by log ingestion latency.

Full explanation →

228

Multi-Selectmedium

A DevSecOps team is configuring Cloud Monitoring alerts for proactive incident response. Which two practices are recommended for effective alerting? (Choose two.)

Select 2 answers

A.Define clear escalation paths for different alert severities.

B.Alert on every microsecond of latency increase.

C.Use a single high-level alert that covers all symptoms.

D.Set alert thresholds based on arbitrary guesses.

E.Create separate alerts for different symptom classes.

AnswersA, E

Clear escalation ensures the right team is notified based on severity.

Why this answer

Recommended practices include defining clear escalation paths for different severities and creating separate alerts for different symptom classes to reduce noise and ensure proper routing.

Full explanation →

229

MCQmedium

A financial company is bootstrapping their Google Cloud organization for DevOps. They have strict compliance requirements: all projects must be under a folder hierarchy based on business units, and each project must have a Cloud Storage bucket with a retention policy of at least 1 year. They have 50 existing projects that need to be migrated into this hierarchy, and all future projects must comply. The team wants to automate as much as possible using Google Cloud services. Currently, projects are created manually with various ad-hoc permissions. What is the best approach to meet these requirements?

A.Write a Terraform config to create folders and projects, but allow any user to create projects.

B.Create folders for each business unit, use a Cloud Function to move existing projects into folders, and set up an organization policy to require retention policies on buckets.

C.Define a resource hierarchy with folders, use a Cloud Build trigger to run a script that creates new projects and applies bucket retention, and use an organization policy to restrict project creation to a service account.

D.Use a single folder for all projects, apply a bucket retention policy at the folder level using a custom organization policy.

AnswerC

Cloud Build can automate creation; org policy limits creation to service account for control.

Why this answer

Option C is correct because it uses Cloud Build triggers to automate project creation and bucket retention policy application, while restricting project creation to a service account via an organization policy ensures only authorized automation can create projects. This enforces the folder hierarchy and compliance requirements without manual intervention, aligning with DevOps automation principles.

Exam trap

The trap here is that candidates may think organization policies can directly apply retention policies to buckets, but they only enforce constraints on new bucket creation, not apply policies automatically, and they cannot set retention at the folder level.

How to eliminate wrong answers

Option A is wrong because allowing any user to create projects violates the compliance requirement to restrict project creation and does not automate the migration of existing projects. Option B is wrong because using a Cloud Function to move existing projects into folders is not the most scalable or automated approach; organization policies cannot directly enforce bucket retention policies—they can only enforce constraints on bucket settings, not apply retention policies automatically. Option D is wrong because using a single folder for all projects does not meet the requirement for a folder hierarchy based on business units, and a custom organization policy cannot apply a bucket retention policy at the folder level—retention policies must be set on individual buckets or via object lifecycle rules.

Full explanation →

230

MCQmedium

A DevOps team is bootstrapping a new organization. They want to ensure that all projects created within the organization have a specific set of APIs enabled, such as Compute Engine, Cloud Storage, and Cloud Resource Manager. What is the most efficient way to achieve this?

A.Create a Cloud Function that triggers on project creation events and enables the required APIs.

B.Define an organization policy with a constraint that requires the APIs to be enabled.

C.Use Cloud Foundation Toolkit to deploy a project template that includes API enablement.

D.Create a shared VPC and enable the APIs in the host project only.

AnswerB

Organization policies can enforce API enablement via constraints.

Why this answer

Option B is correct because Organization Policies with constraints (like `constraints/compute.requireOsLogin` or custom constraints using the Resource Manager API) allow you to enforce API enablement across all projects in the organization. This is the most efficient approach as it is declarative, centrally managed, and automatically applies to new projects without any additional infrastructure or manual intervention.

Exam trap

Google Cloud often tests the distinction between reactive automation (Cloud Functions) and proactive policy enforcement (Organization Policies), leading candidates to choose the more familiar event-driven approach over the declarative, built-in governance mechanism.

How to eliminate wrong answers

Option A is wrong because Cloud Functions triggered on project creation events introduce latency, require maintaining a function and its dependencies, and are not a native enforcement mechanism — they rely on event-driven remediation rather than proactive policy. Option C is wrong because Cloud Foundation Toolkit (CFT) is a deployment framework for creating projects, but it does not enforce API enablement across all projects; it only applies to projects created via the template, leaving manually or otherwise created projects ungoverned. Option D is wrong because enabling APIs in a Shared VPC host project does not automatically enable those APIs in service projects — each project must have its own API enablement, and Shared VPC only shares network resources, not API states.

Full explanation →

231

MCQeasy

Which service should be used to monitor the health of HTTP endpoints from multiple locations?

A.Cloud Trace

B.Cloud Logging

C.Cloud Monitoring

D.Cloud Debugger

AnswerC

Correct. Uptime checks in Cloud Monitoring monitor endpoint health.

Why this answer

Cloud Monitoring (formerly Stackdriver Monitoring) includes uptime checks that can be configured to probe HTTP endpoints from multiple global locations, measuring latency, availability, and response content. This makes it the correct service for monitoring the health of HTTP endpoints from diverse geographic regions.

Exam trap

Google Cloud often tests the distinction between passive monitoring (Cloud Logging) and active synthetic monitoring (Cloud Monitoring uptime checks), leading candidates to mistakenly choose Cloud Logging because they think 'health' implies analyzing logs rather than proactively probing endpoints.

How to eliminate wrong answers

Option A is wrong because Cloud Trace is a distributed tracing service that captures latency data from applications to diagnose performance bottlenecks, not a tool for monitoring HTTP endpoint health from multiple locations. Option B is wrong because Cloud Logging aggregates and stores log data from various sources, but it does not natively perform active health checks or synthetic monitoring of HTTP endpoints. Option D is wrong because Cloud Debugger allows you to inspect the state of a running application without stopping it, but it does not monitor endpoint health or availability from multiple locations.

Full explanation →

232

MCQmedium

A company has a monorepo with multiple microservices. They want to only build and deploy the service that changed. What CI/CD practice should they implement?

A.Use a multi-branch pipeline where each branch represents a service.

B.Use git diff to conditionally run steps for changed services.

C.Use Cloud Build triggers with inline substitution for each service.

D.Use separate repositories for each service.

AnswerB

Cloud Build steps can be guarded with bash commands checking git diff output.

Why this answer

Using git diff to detect changes in specific paths allows conditional build steps in Cloud Build, triggering only the affected service.

Full explanation →

233

Multi-Selecthard

A company uses BigQuery on-demand pricing. They want to reduce costs without changing usage patterns. Which THREE strategies are effective? (Choose three.)

Select 3 answers

A.Use clustered tables.

B.Use materialized views.

C.Use legacy SQL instead of standard SQL.

D.Use flat-rate pricing.

E.Use partitioned tables.

AnswersA, B, E

Clustering columates data, allowing queries to scan fewer blocks.

Why this answer

Partitioning and clustering reduce data scanned per query. Materialized views precompute results to reduce scanned data on repeated queries.

Full explanation →

234

Multi-Selecteasy

Which TWO of the following are best practices for managing incident response on Google Cloud?

Select 2 answers

A.Automatically escalate all incidents to the engineering manager.

B.Establish a clear escalation path and ensure that on-call engineers are aware of their roles.

C.Assign only one engineer to be on call to reduce confusion.

D.Use only one notification channel (e.g., email) to keep the team focused.

E.Create a written incident response plan that defines severities, roles, and communication channels.

AnswersB, E

Clarity reduces response time.

Why this answer

Option B is correct because a clear escalation path with defined roles ensures that on-call engineers know exactly whom to contact for different severity levels, reducing response time and preventing miscommunication. Google Cloud's operations suite (formerly Stackdriver) supports structured escalation policies through alerting channels and notification routing, making this a foundational best practice for incident management.

Exam trap

Google Cloud often tests the misconception that simplicity (e.g., single on-call engineer or single notification channel) is a best practice, when in reality redundancy and multi-channel communication are critical for reliability.

Full explanation →

235

MCQeasy

A team uses Cloud SQL for PostgreSQL. They receive an alert that the database's CPU utilization is above 95% for the past 30 minutes. Queries are taking longer than usual. They want to investigate without causing further impact. What should they do first?

A.Increase the number of vCPUs of the Cloud SQL instance

B.Restart the Cloud SQL instance to clear the cache

C.Migrate the database to Cloud Spanner

D.Use Cloud SQL Query Insights to find the most time-consuming queries

AnswerD

Query Insights shows top queries by CPU and latency.

Why this answer

Cloud SQL Query Insights is a managed monitoring tool that automatically captures and analyzes query performance metrics, including CPU consumption, latency, and execution plans. In this scenario, it allows the team to identify the specific queries causing high CPU utilization without making any changes to the instance, thus avoiding further impact. This is the first and safest diagnostic step before any remediation.

Exam trap

Google Cloud often tests the principle that the first step in incident management is always to gather diagnostic data without making changes, and the trap here is that candidates may jump to scaling (Option A) or restarting (Option B) as quick fixes, ignoring that these actions can cause further disruption and do not provide root-cause analysis.

How to eliminate wrong answers

Option A is wrong because increasing vCPUs is a scaling action that may temporarily reduce performance during the resize operation and does not address the root cause—it only masks symptoms without identifying the problematic queries. Option B is wrong because restarting the instance clears the buffer cache, which can actually worsen performance temporarily as caches rebuild, and it does not provide any diagnostic information about what caused the high CPU. Option C is wrong because migrating to Cloud Spanner is a major architectural change that is not appropriate for initial investigation; it is a costly, complex migration that should only be considered after identifying that the workload fundamentally requires a horizontally scalable database.

Full explanation →

236

MCQeasy

An application running on GKE experiences high latency during traffic spikes. The team wants to scale pods based on request latency. Which metric should they use in the HorizontalPodAutoscaler?

A.Custom metric: request count

B.Custom metric: request latency percentile (e.g., p95)

C.Memory utilization

D.CPU utilization

AnswerB

A custom latency metric directly reflects the performance issue and can trigger scaling.

Why this answer

To scale based on request latency, the HorizontalPodAutoscaler (HPA) must use a custom metric that directly reflects the application's response time, such as the p95 latency percentile. CPU or memory utilization are system-level metrics that do not capture the user-perceived performance degradation caused by traffic spikes. Custom metrics like request count correlate with load but not directly with latency, making the p95 latency percentile the correct choice for latency-based autoscaling.

Exam trap

Google Cloud often tests the misconception that CPU or memory utilization are sufficient for scaling based on performance metrics, but the trap here is that resource metrics do not capture user-facing latency, which is the actual performance indicator the team wants to optimize.

How to eliminate wrong answers

Option A is wrong because request count is a measure of traffic volume, not latency; scaling on request count may trigger scaling before latency increases, but it does not directly address the goal of reducing high latency during spikes. Option C is wrong because memory utilization is a resource metric that does not reflect request latency; an application can have high latency without high memory usage, and scaling on memory would not solve latency issues. Option D is wrong because CPU utilization is also a resource metric that does not directly correlate with request latency; an application may experience high latency due to I/O waits or lock contention without high CPU usage, so scaling on CPU would be ineffective.

Full explanation →

237

MCQmedium

A team uses Cloud Build with a Kaniko builder to containerize their application. The build fails with the error: 'failed to push to destination: failed to get credentials: failed to get credential from metadata service: failed to fetch metadata...' What is the most likely cause?

A.Kaniko requires a running Docker daemon in the build step.

B.The base image specified in the Dockerfile is not accessible from the build environment.

C.The Dockerfile has an invalid instruction causing Kaniko to fail.

D.The Cloud Build service account does not have the storage.objectAdmin role on the Container Registry bucket.

AnswerD

Missing push permissions cause credential failures.

Why this answer

The error indicates that Kaniko cannot authenticate to push the built image to Container Registry. Kaniko uses the Cloud Build service account's credentials to authenticate with the registry. By default, the Cloud Build service account has the storage.objectViewer role on the Container Registry bucket, which allows pulling images but not pushing.

To push, the service account needs the storage.objectAdmin or storage.objectCreator role on the bucket. Option D correctly identifies this missing permission as the most likely cause.

Exam trap

Google Cloud often tests the misconception that Kaniko requires a Docker daemon (Option A), but the real issue is almost always a missing IAM permission on the target registry bucket.

How to eliminate wrong answers

Option A is wrong because Kaniko is specifically designed to build container images without requiring a Docker daemon; it runs entirely in userspace. Option B is wrong because the error message is about pushing credentials, not about pulling a base image; an inaccessible base image would produce a 'failed to pull' or 'image not found' error. Option C is wrong because an invalid Dockerfile instruction would cause a build-time syntax or execution error, not a credential failure during the push phase.

Full explanation →

238

MCQmedium

Refer to the exhibit. A DevOps engineer observes that a GKE cluster's node performance is degraded during high I/O workloads. Based on the exhibit, which change would most likely improve disk I/O performance?

A.Change machineType to n2-standard-4

B.Change imageType to UBUNTU

C.Change serviceAccount to a custom one

D.Change diskType to pd-ssd

AnswerD

Correct. pd-ssd offers significantly better I/O performance.

Why this answer

The exhibit shows a GKE node pool using the default pd-standard disk type, which is backed by HDDs and has lower IOPS and throughput compared to SSDs. Changing diskType to pd-ssd directly improves disk I/O performance by providing higher IOPS and lower latency, which is critical for high I/O workloads.

Exam trap

Google Cloud often tests the misconception that changing machine type or OS image can fix disk I/O issues, when the real bottleneck is the underlying disk type (pd-standard vs. pd-ssd).

How to eliminate wrong answers

Option A is wrong because changing machineType to n2-standard-4 increases CPU and memory resources but does not address the underlying disk I/O bottleneck caused by the disk type. Option B is wrong because changing imageType to UBUNTU only alters the OS image, not the disk performance characteristics; the disk type remains pd-standard. Option C is wrong because changing serviceAccount to a custom one affects authentication and authorization, not disk I/O performance.

Full explanation →

239

MCQeasy

Your company runs a stateless web application on Google Kubernetes Engine (GKE). You have configured Cloud Monitoring to track request latency and set up an alert when p95 latency exceeds 500ms for 5 minutes. Recently, the alert has been firing frequently during peak hours. You examine the metrics and see that p95 latency spikes to 600ms for short periods. The application's SLO is 99.9% availability with a latency threshold of 1 second. What should you do to reduce alert noise without compromising the SLO?

A.Disable the alert during peak hours.

B.Implement a multi-window burn-rate alerting approach.

C.Increase the alert threshold to 1 second to match the SLO.

D.Change the alert to use a longer evaluation window, e.g., 30 minutes.

AnswerB

Burn-rate alerts use multiple windows to detect fast consumption while filtering out brief spikes.

Why this answer

The best approach is to implement a multi-window burn-rate alerting strategy, which is less sensitive to short spikes and directly tracks error budget consumption, aligning with the SLO.

Full explanation →

240

MCQeasy

Refer to the exhibit. The DevOps team is trying to create a new service account key. The operation fails with a permission error. What is the most likely cause?

A.The project-level IAM denies the action.

B.The service account is disabled.

C.The organization policy prevents uploading service account keys.

D.The service account lacks the iam.serviceAccountKeys.create permission.

AnswerC

The policy is enforced with LIST_POLICY: true.

Why this answer

The correct answer is C because organization policies can explicitly restrict the creation of service account keys at the organization, folder, or project level. When a key upload fails with a permission error despite the user having the necessary IAM roles, the most likely cause is an organization policy constraint such as `iam.disableServiceAccountKeyUpload` that blocks the operation. This policy overrides any IAM permissions granted to the user.

Exam trap

Google Cloud often tests the distinction between IAM permissions and organization policy constraints, where candidates mistakenly assume a missing IAM permission is the cause when a higher-level policy override is actually blocking the action.

How to eliminate wrong answers

Option A is wrong because project-level IAM denies are evaluated after organization policies; if the organization policy blocks key upload, the IAM deny is never reached. Option B is wrong because a disabled service account would cause a different error (e.g., 'disabled service account') rather than a permission error, and the question specifies a permission error. Option D is wrong because the user likely has the required `iam.serviceAccountKeys.create` permission (e.g., via the Service Account Key Admin role), but the organization policy overrides that permission, making the permission check irrelevant.

Full explanation →

241

MCQeasy

During the bootstrapping of a Google Cloud organization, you need to create a shared CI/CD pipeline that can deploy resources to multiple projects. The pipeline must use a service account with minimal permissions. What is the recommended way to grant the pipeline service account permissions to deploy resources across projects?

A.Configure the pipeline to impersonate a project-level service account in each project.

B.Grant the service account the necessary roles on each target project individually.

C.Grant the service account the Project Editor role at the organization level.

D.Use the Cloud Build service account and grant it permissions on each project.

AnswerB

This provides least privilege by scoping permissions to each project.

Why this answer

Option C is correct because granting the service account the appropriate roles on each project is the standard method for cross-project deployments. Option A is wrong because organization-level roles would grant unnecessary broad permissions. Option B is wrong because Cloud Build service account is not needed.

Option D is wrong because impersonation is not required; direct grants are simpler and more secure.

Full explanation →

242

MCQmedium

A company has multiple projects in an organization. They want to set a budget of $5000 per month on a specific project and receive notifications when spending reaches 50% and 90%. What should they do?

A.Use the Cloud Billing API to monitor spending and send custom alerts

B.Create a budget at the organization level with alert thresholds of 50% and 90%

C.Configure a Pub/Sub topic and set up a cloud function to check spending daily

D.Create a budget at the project level with alert thresholds of 50% and 90%

AnswerD

Budgets can be created per project with threshold alerts for email or Pub/Sub notifications.

Why this answer

Option D is correct because budgets can be created at the project level with alert thresholds. Option A is incorrect because an organization-level budget would cover all projects. Option B is overengineering.

Option C is possible but not the simplest approach.

Full explanation →

243

MCQhard

A team defines an SLO of 99.9% availability over a 30-day window. They use a multi-window, multi-burn-rate alerting approach. Which alerting condition should trigger a page based on fast burn rate?

A.1% error budget consumed in 1 hour.

B.10% error budget consumed in 1 hour.

C.2% error budget consumed in 10 minutes.

D.5% error budget consumed in 6 hours.

AnswerC

This is a fast burn rate, consuming budget quickly; should trigger a page.

Why this answer

Option C is correct because a multi-window, multi-burn-rate alerting approach uses a short window (e.g., 10 minutes) to detect fast burn rates that could rapidly exhaust the error budget. Consuming 2% of the error budget in 10 minutes corresponds to an annualized burn rate of over 1000x, which is dangerously fast and requires immediate paging to prevent a breach of the 99.9% SLO over the 30-day window.

Exam trap

Google Cloud often tests the distinction between burn rate thresholds and time windows, where candidates mistakenly associate a larger percentage consumed (like 10% or 5%) with fast burn, without considering the window duration and the resulting annualized burn rate.

How to eliminate wrong answers

Option A is wrong because consuming 1% of the error budget in 1 hour represents a burn rate of approximately 0.24x (1% per hour annualized to 24% per day), which is too slow to trigger a fast-burn alert; it would be caught by a slow-burn alert instead. Option B is wrong because 10% error budget consumed in 1 hour corresponds to a burn rate of about 2.4x (10% per hour annualized to 240% per day), which is moderate and would typically trigger a medium-burn alert, not a fast-burn page. Option D is wrong because 5% error budget consumed in 6 hours equates to a burn rate of roughly 0.2x (0.833% per hour annualized to 20% per day), which is a slow burn that should not trigger a page for fast burn rate; it would be handled by a lower-severity alert.

Full explanation →

244

Multi-Selectmedium

A company uses Cloud Deploy to manage rollouts to GKE. They need to implement a deployment strategy where a new version receives 10% of traffic for 30 minutes, then automatically rolls forward to 100% if no issues are detected. Which THREE Cloud Deploy features are required? (Choose three.)

Select 3 answers

A.A canary deployment strategy with a 10% phase.

B.An automated promotion policy that promotes after the wait phase.

C.A manual approval gate before the 100% phase.

D.An Istio VirtualService configuration for traffic splitting.

E.A 30-minute wait phase in the canary strategy.

AnswersA, B, E

Canary enables traffic splitting.

Why this answer

Option A is correct because a canary deployment strategy in Cloud Deploy allows you to define phases that gradually shift traffic to a new revision. A 10% phase is the first step in this strategy, ensuring only a small subset of users experience the new version initially, which aligns with the requirement to start with 10% traffic.

Exam trap

Google Cloud often tests the distinction between required Cloud Deploy features and optional external tools like Istio, leading candidates to mistakenly include Istio-specific configurations when Cloud Deploy's native traffic management suffices.

Full explanation →

245

Multi-Selecthard

A company runs a microservices architecture on GKE and notices high network latency between services. Which THREE actions can improve inter-service communication performance?

Select 3 answers

A.Use Istio mTLS for all service-to-service communication

B.Increase the number of nodes in the cluster

C.Implement caching at the API gateway to reduce redundant requests

D.Enable HTTP/2 and gRPC for inter-service communication

E.Use headless services for direct pod-to-pod communication

AnswersC, D, E

Caching at the gateway avoids repeated processing, lowering latency for frequent requests.

Why this answer

Option C is correct because implementing caching at the API gateway reduces redundant requests, which directly decreases network round trips and lowers latency for repeated data retrievals. This is a common performance optimization in microservices architectures, as it offloads backend services and minimizes inter-service chatter.

Exam trap

Google Cloud often tests the misconception that security features like mTLS always improve performance, when in reality they add computational cost, and that scaling nodes always reduces latency, ignoring the potential for increased cross-node traffic.

Full explanation →

246

MCQeasy

A company runs a stateful workload on Compute Engine VMs with persistent disks. They observe that disk I/O latency spikes periodically. The workload is sensitive to latency. What should they do to improve performance?

A.Increase the size of the persistent disk.

B.Migrate to local SSDs for better performance.

C.Use SSD persistent disks instead of standard persistent disks.

D.Configure a snapshot schedule to offload I/O.

AnswerC

SSD offers lower latency and higher IOPS.

Why this answer

Option C is correct because SSD persistent disks provide consistent, low-latency I/O performance compared to standard persistent disks, which use spinning media and can exhibit periodic latency spikes under sustained load. For latency-sensitive stateful workloads, SSD persistent disks offer predictable IOPS and throughput, directly addressing the periodic spikes observed.

Exam trap

Google Cloud often tests the misconception that increasing disk size or using local SSDs is the universal fix for latency, but the trap here is failing to recognize that the workload is stateful and requires persistent storage, making local SSDs inappropriate despite their performance.

How to eliminate wrong answers

Option A is wrong because increasing the size of a persistent disk increases its baseline IOPS and throughput limits, but it does not eliminate the underlying latency variability of standard persistent disks; the periodic spikes are due to the disk type, not capacity. Option B is wrong because local SSDs provide very low latency but are ephemeral—data is lost if the VM stops or migrates—making them unsuitable for stateful workloads that require persistent data across VM lifecycles. Option D is wrong because configuring a snapshot schedule offloads I/O only during snapshot creation (via incremental snapshots), but it does not prevent periodic latency spikes during normal operation; snapshots are for backup, not performance improvement.

Full explanation →

247

MCQhard

An organization has a policy that all projects must have Cloud Logging enabled and logs must be retained for at least 365 days. What is the most efficient way to enforce this across all projects?

A.Create a custom role with logging permissions and assign to all projects.

B.Use an organization policy to enforce logging requirements.

C.Configure a sink at the organization level to aggregate logs and set retention.

D.Use Cloud Asset Inventory to monitor logging configurations.

AnswerC

Aggregated sinks enforce logging and retention across all projects.

Why this answer

Option C is correct because configuring a sink at the organization level allows you to aggregate logs from all projects into a single destination (e.g., a Cloud Storage bucket or BigQuery dataset) and set a retention policy of 365 days on that destination. This is the most efficient method as it enforces the logging and retention requirements centrally without needing to configure each project individually.

Exam trap

The trap here is that candidates often confuse organization policies (which are for resource constraints) with log sinks (which are for routing and retention), leading them to choose option B, but organization policies cannot enforce log retention or enablement directly.

How to eliminate wrong answers

Option A is wrong because creating a custom role with logging permissions and assigning it to all projects only grants users the ability to view or manage logs, but does not enforce that logging is enabled or that logs are retained for 365 days. Option B is wrong because organization policies in Google Cloud (using constraints) cannot directly enforce Cloud Logging enablement or log retention settings; they are used for restrictions like resource locations or service disablement, not for configuring logging sinks or retention. Option D is wrong because Cloud Asset Inventory is a monitoring and discovery tool that can alert on configuration drift, but it does not actively enforce logging or retention policies; it only reports on the current state.

Full explanation →

248

MCQhard

Your company has recently migrated to Google Cloud and has set up an organization with three folders: Development, Staging, and Production. Each folder contains multiple projects. The DevOps team has established a centralized CI/CD pipeline using Cloud Build and Artifact Registry in a tools project under the Development folder. They want to ensure that only images built by the CI/CD pipeline are allowed to be deployed to the Production environment. They have configured Binary Authorization with a policy that requires attestations from the Cloud Build service account. However, a developer accidentally pushes a container image directly from their local machine to Artifact Registry using their personal IAM permissions, and then deploys that image to a Production project by bypassing the CI/CD pipeline. How can you prevent this from happening in the future?

A.Enable Cloud Audit Logs for Artifact Registry and set up alerts to detect unauthorized pushes.

B.Remove the Artifact Registry Writer role from all developers and only grant it to the Cloud Build service account.

C.Create a VPC Service Controls perimeter around Artifact Registry to restrict access.

D.Configure a Cloud Function to automatically delete images pushed outside of Cloud Build.

AnswerB

Directly prevents developers from pushing images.

Why this answer

Option B is correct because the root cause of the bypass is that developers have the Artifact Registry Writer (roles/artifactregistry.writer) IAM role, which allows them to push images directly. By removing this role from all developers and granting it exclusively to the Cloud Build service account, you enforce that only the CI/CD pipeline can write to the registry. Binary Authorization then requires attestations from that same service account, ensuring that only pipeline-built images can be deployed to Production.

Exam trap

Google Cloud often tests the distinction between preventive vs. detective controls, and candidates mistakenly choose audit logging or reactive deletion because they focus on detecting the breach rather than fixing the root cause (excessive IAM permissions).

How to eliminate wrong answers

Option A is wrong because Cloud Audit Logs and alerts are detective controls, not preventive; they notify you after an unauthorized push has already occurred, but do not block the action. Option C is wrong because VPC Service Controls restrict network access based on context (e.g., IP, VPC source), but they do not prevent a developer with valid IAM permissions from pushing an image from their local machine if they are within the allowed perimeter. Option D is wrong because a Cloud Function that deletes images post-push is a reactive, not preventive, measure; it introduces a race condition where the image could be deployed before deletion, and it adds complexity without addressing the underlying IAM misconfiguration.

Full explanation →

249

MCQeasy

Which Google Cloud tool automatically captures and visualizes traces for applications running on App Engine?

A.Cloud Debugger

B.Cloud Monitoring

C.Cloud Logging

D.Cloud Trace

AnswerD

Cloud Trace captures and displays latency data.

Why this answer

Cloud Trace is the correct Google Cloud tool for automatically capturing and visualizing latency traces from applications running on App Engine. It provides end-to-end latency insights by collecting trace data from distributed systems, enabling you to analyze request performance and identify bottlenecks without manual instrumentation.

Exam trap

The trap here is that candidates confuse Cloud Trace with Cloud Monitoring or Cloud Logging because all three are part of Google Cloud's operations suite, but only Cloud Trace is designed for distributed tracing and latency visualization.

How to eliminate wrong answers

Option A is wrong because Cloud Debugger is used for inspecting application state and code execution in real time without stopping the application, not for capturing or visualizing traces. Option B is wrong because Cloud Monitoring focuses on collecting metrics, uptime checks, and alerting policies, not trace data or distributed request visualization. Option C is wrong because Cloud Logging handles log data storage, search, and analysis, but does not capture or visualize trace spans or latency distributions.

Full explanation →

250

MCQmedium

Your team wants to create a dashboard that shows request latency broken down by API version. Which approach is most efficient?

A.Use Cloud Monitoring Metrics Explorer to query the latency metric, group by API version, and save the chart as a dashboard

B.Write a custom application to output metrics to a file and send to Cloud Monitoring

C.Export Cloud Logging logs to BigQuery and create a dashboard in Data Studio

D.Enable Cloud Trace and create a dashboard based on trace data

AnswerA

Metrics Explorer directly supports grouping and creating dashboards.

Why this answer

Cloud Monitoring Metrics Explorer allows you to directly query the latency metric (e.g., `request_latencies`) and group by the `version` label, then save the resulting chart as a dashboard widget. This is the most efficient approach because it requires no data export, no custom code, and no additional services — the metric is already available in Cloud Monitoring if your API is instrumented with the appropriate label.

Exam trap

Google Cloud often tests the misconception that exporting logs to BigQuery or using Cloud Trace is always better for analysis, but here the question specifically asks for the most efficient approach to display a pre-existing metric broken down by a label, which Metrics Explorer does directly without extra steps or cost.

How to eliminate wrong answers

Option B is wrong because writing a custom application to output metrics to a file and send to Cloud Monitoring is unnecessarily complex and inefficient; Cloud Monitoring already ingests metrics natively via the Monitoring API or agent, and a file-based pipeline adds latency and operational overhead. Option C is wrong because exporting Cloud Logging logs to BigQuery and creating a dashboard in Data Studio is a roundabout, slower approach — logs are not metrics, and you would need to parse and aggregate log entries, incurring BigQuery costs and delays, whereas the metric is already pre-aggregated in Cloud Monitoring. Option D is wrong because Cloud Trace is designed for distributed tracing (individual request spans), not for aggregating latency metrics by API version; creating a dashboard from trace data would require custom aggregation and is not the most efficient way to show a pre-aggregated metric broken down by a label.

Full explanation →

251

Drag & Dropmedium

Arrange the steps to troubleshoot a high latency issue on a Google Cloud HTTP(S) Load Balancer.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Start with health checks, then logs, metrics, identify culprit, and take action.

Full explanation →

252

MCQmedium

During bootstrapping, a DevOps engineer wants to ensure that all new projects automatically have a set of APIs enabled, such as Cloud Resource Manager API and Cloud Billing API. They also want to enforce that certain APIs cannot be disabled accidentally. What is the most efficient way to achieve this?

A.Use a custom Cloud Function that runs every time a project is created and enables the APIs.

B.Grant the roles/serviceusage.serviceUsageAdmin role to the DevOps team and have them manually enable APIs when creating projects.

C.Use organization policies to define the constraints 'compute.requireOsLogin' and 'serviceuser.services' to restrict/enable APIs.

D.Create a folder with a 'Required APIs' setting that applies to all child projects.

AnswerC

Organization policies can enforce required services across the hierarchy.

Why this answer

Option C is correct because organization policies in Google Cloud allow you to enforce constraints at the organization, folder, or project level. The `serviceuser.services` constraint can be used to specify a list of APIs that must be enabled (or cannot be disabled) on all projects under the policy scope, ensuring compliance without manual intervention or additional infrastructure.

Exam trap

The trap here is that candidates confuse folder-level settings with organization policies, assuming folders have a built-in 'Required APIs' feature, when in reality only organization policies can enforce API enablement and disablement restrictions across projects.

How to eliminate wrong answers

Option A is wrong because a custom Cloud Function triggered on project creation introduces latency, potential failure points, and additional cost; it also cannot prevent APIs from being disabled after creation. Option B is wrong because granting the `roles/serviceusage.serviceUsageAdmin` role and relying on manual enabling is inefficient, error-prone, and does not enforce that APIs remain enabled. Option D is wrong because Google Cloud folders do not have a 'Required APIs' setting; API enablement is controlled via organization policies or service usage constraints, not a folder-level configuration.

Full explanation →

253

MCQeasy

An application running on App Engine is throwing exceptions. The DevOps team wants to be notified when a new type of exception appears. Which Cloud Monitoring feature should they use?

A.Error Reporting

B.Uptime Checks

C.Custom alerts

D.Logs-based metrics

AnswerA

Error Reporting analyzes error logs, groups similar exceptions, and can send notifications for new error groups.

Why this answer

Error Reporting is the correct choice because it is a Cloud Monitoring feature specifically designed to count, analyze, and increase the severity of exceptions in application logs. It automatically groups exceptions by type and can trigger notifications when a new type of exception (one not seen before) appears, which directly meets the DevOps team's requirement.

Exam trap

Google Cloud often tests the distinction between features that passively log data (Logs-based metrics) versus features that actively analyze and classify errors (Error Reporting), leading candidates to choose Logs-based metrics because they think 'any exception can be captured in logs.'

How to eliminate wrong answers

Option B is wrong because Uptime Checks monitor the availability and response latency of a service by sending synthetic requests (e.g., HTTP GET) from Google Cloud locations, not application-level exceptions. Option C is wrong because Custom alerts are generic alerting policies that can be based on any metric, but they do not inherently detect or classify new exception types; they require a pre-defined metric or condition. Option D is wrong because Logs-based metrics extract numerical data (e.g., count of log entries matching a filter) from logs, but they do not automatically identify or notify on new exception types without manual configuration of a metric and alert.

Full explanation →

254

MCQhard

A team wants to optimize a batch processing job that is CPU-bound. Which Compute Engine machine family should they use?

A.C2

B.E2

C.N2

D.M2

AnswerA

Correct. C2 is Compute-optimized for CPU-bound workloads.

Why this answer

C2 is the correct machine family because it is specifically designed for compute-intensive, CPU-bound workloads. It offers the highest clock speed and per-core performance among Compute Engine machine families, making it ideal for batch processing jobs that are limited by CPU throughput rather than memory or I/O.

Exam trap

Google Cloud often tests the distinction between general-purpose (N2, E2) and specialized families (C2, M2), and the trap here is that candidates pick N2 thinking it is 'balanced' for all workloads, failing to recognize that CPU-bound jobs require the dedicated high-frequency compute of the C2 family.

How to eliminate wrong answers

Option B (E2) is wrong because it is a general-purpose, cost-optimized machine family that uses shared-core or smaller dedicated cores, which cannot sustain the high CPU utilization required for CPU-bound batch processing. Option C (N2) is wrong because it is a general-purpose machine family balanced for memory and CPU, but it does not provide the high-frequency CPUs or optimized compute performance of the C2 family. Option D (M2) is wrong because it is a memory-optimized machine family designed for large in-memory databases and memory-intensive workloads, not for CPU-bound tasks.

Full explanation →

255

MCQmedium

A team is using Cloud Run for a containerized application. They notice that requests have high latency due to cold starts. Which configuration change would most effectively reduce cold start latency?

A.Set the min-instances parameter to a value > 0

B.Enable concurrent requests

C.Increase the function timeout

D.Reduce the container image size

AnswerA

Correct. Min-instances keeps instances warm.

Why this answer

Setting the `min-instances` parameter to a value greater than 0 keeps a baseline number of container instances always warm and ready to serve requests. This eliminates the cold start penalty because the runtime environment, including the container and its dependencies, is already initialized and listening for traffic. Cloud Run will scale down to this minimum count, ensuring that new requests are routed to pre-warmed instances rather than triggering a new container startup.

Exam trap

The trap here is that candidates often think reducing container image size (Option D) is the most effective solution, but while it reduces startup time, it does not prevent cold starts entirely, whereas min-instances eliminates them for the baseline count.

How to eliminate wrong answers

Option B is wrong because enabling concurrent requests allows a single container instance to handle multiple requests simultaneously, which improves throughput and resource utilization but does not address the latency caused by starting a new container from scratch. Option C is wrong because increasing the function timeout only extends the maximum duration a request can run; it does not reduce the time it takes for the first request to be processed after a period of inactivity. Option D is wrong because reducing the container image size can speed up the image pull and startup process, but it does not eliminate the cold start entirely; the instance still needs to be provisioned and the runtime initialized, whereas min-instances keeps instances pre-provisioned.

Full explanation →

256

Multi-Selectmedium

Which TWO metrics should be included in a comprehensive monitoring strategy for a production Kubernetes workload to detect performance degradation and capacity issues?

Select 2 answers

A.Disk read IOPS per pod

B.Container CPU utilization

C.Number of nodes in the cluster

D.Network bytes received per second

E.Request latency percentiles (e.g., p99)

AnswersB, E

High CPU utilization can indicate capacity pressure and performance issues.

Why this answer

Container CPU utilization (Option B) is a direct indicator of resource pressure and potential performance degradation in a Kubernetes workload. High CPU utilization can lead to throttling, increased request latency, and pod evictions, making it essential for detecting capacity issues. Request latency percentiles (Option E) are the gold standard for measuring user-facing performance degradation, as they reflect the actual experience of end users and can reveal subtle slowdowns before resource metrics show saturation.

Exam trap

Google Cloud often tests the distinction between infrastructure-level metrics (like node count or network bytes) and application-level metrics (like latency percentiles) that directly measure user experience and workload health.

Full explanation →

257

MCQmedium

Refer to the exhibit. A Cloud Function (2nd gen) is timing out. The function's timeout is set to 60 seconds. The function queries a Cloud SQL database. What is the most likely cause and the best action?

A.Reduce the function's allocated memory to decrease cold start time

B.Add indexes to the database tables queried by the function

C.Increase the function timeout to 120 seconds

D.Increase the Cloud SQL max connections setting

AnswerB

Slow queries often indicate missing indexes; adding them reduces query time.

Why this answer

The most likely cause of the timeout is that the database queries are slow due to missing indexes, causing the function to wait longer than its 60-second timeout for results. Adding indexes to the queried columns reduces query execution time, resolving the timeout without changing the function's configuration. This aligns with best practices for optimizing Cloud SQL queries in serverless environments.

Exam trap

Google Cloud often tests the misconception that increasing timeouts or resources is the default fix for timeouts, when the real issue is almost always unoptimized queries or missing indexes in database-backed serverless functions.

How to eliminate wrong answers

Option A is wrong because reducing memory typically increases cold start time (as less CPU is allocated), which would worsen performance, not fix a timeout caused by slow queries. Option C is wrong because increasing the timeout to 120 seconds only masks the symptom; it does not address the root cause of slow database queries, and the function may still fail if queries remain unoptimized. Option D is wrong because increasing Cloud SQL max connections does not speed up individual queries; it only allows more concurrent connections, which could even increase database load and worsen latency.

Full explanation →

258

Multi-Selecthard

Which THREE actions should be taken to ensure compliance with the principle of least privilege when bootstrapping a Google Cloud organization? (Choose 3)

Select 3 answers

A.Use service accounts for automated processes and grant them the minimum required roles.

B.Use custom roles that include only the necessary permissions.

C.Grant roles at the project level rather than at the organization level when possible.

D.Assign the Owner role at the organization level to a small group of administrators.

E.Use primitive roles (Owner, Editor, Viewer) to simplify management.

AnswersA, B, C

Service accounts should have least privilege.

Why this answer

Option A is correct because service accounts are the recommended identity for automated processes in Google Cloud, and granting them only the minimum required roles directly implements the principle of least privilege. This prevents over-permissioning and reduces the attack surface for automated workflows.

Exam trap

Google Cloud often tests the misconception that assigning the Owner role at the organization level to a small group is acceptable for least privilege, when in fact the Owner role should be reserved for emergency break-glass accounts and never used for routine administration.

Full explanation →

259

Drag & Dropmedium

Order the steps to set up a log-based metric in Cloud Logging for error tracking.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Filter logs, create metric, set alert, test, verify.

Full explanation →

260

MCQhard

A company is bootstrapping a Google Cloud organization for DevOps. They have multiple teams that need to deploy infrastructure using a shared CI/CD pipeline. The security team requires that all deployments be reviewed and approved before production rollout. However, they also want to maintain a fast feedback loop for developers. What is the best way to balance these requirements?

A.Use Cloud Build with a manual approval step triggered via Cloud Pub/Sub

B.Use Spinnaker on GKE with a manual judgment stage between test and production

C.Use Cloud Run for production and Cloud Functions for testing, with IAM roles controlling access

D.Use Cloud Source Repositories with branch restrictions requiring code review

AnswerB

Correct: Spinnaker provides manual judgment stages for approval gates in the pipeline.

Why this answer

Spinnaker on GKE provides a native manual judgment stage that can be inserted between test and production deployments, enabling mandatory approval gates without sacrificing the fast feedback loop for developers. This balances security requirements with DevOps velocity by allowing automated testing to proceed quickly while blocking production rollout until explicit approval is given.

Exam trap

Google Cloud often tests the distinction between code review (e.g., branch restrictions) and deployment approval (e.g., manual judgment stage), leading candidates to choose Option D because they conflate code review with deployment gating.

How to eliminate wrong answers

Option A is wrong because Cloud Build with a manual approval step triggered via Cloud Pub/Sub does not natively support a manual judgment stage; it requires custom logic and lacks the built-in pipeline orchestration for approval gates that Spinnaker provides. Option C is wrong because using Cloud Run for production and Cloud Functions for testing with IAM roles does not introduce any approval or review mechanism for deployments; it only controls access, not the deployment pipeline itself. Option D is wrong because Cloud Source Repositories with branch restrictions requiring code review only enforces code review before merging, not a deployment approval gate after testing; it does not provide a manual judgment stage in the CI/CD pipeline.

Full explanation →

261

MCQmedium

Refer to the exhibit. A Cloud Build pipeline using this configuration fails on the third step with a permission error. The Cloud Build service account has the 'Cloud Run Admin' role. What is the most likely missing permission?

A.run.routes.invoke

B.iam.serviceAccounts.actAs on the Cloud Run runtime service account

C.storage.objects.list on the artifact bucket

D.resourcemanager.projects.get

AnswerB

Required to deploy revisions; the service account acts as the runtime service account.

Why this answer

The Cloud Build pipeline fails on the third step with a permission error because the Cloud Build service account, despite having the 'Cloud Run Admin' role, lacks the `iam.serviceAccounts.actAs` permission on the Cloud Run runtime service account. This permission is required for Cloud Build to impersonate the runtime service account when deploying to Cloud Run, as the deployment step needs to act on behalf of that service account to create or update the service.

Exam trap

Google Cloud often tests the misconception that the 'Cloud Run Admin' role alone is sufficient for Cloud Build to deploy to Cloud Run, when in fact the `iam.serviceAccounts.actAs` permission on the runtime service account is a separate, required permission that is not included in the Cloud Run Admin role.

How to eliminate wrong answers

Option A is wrong because `run.routes.invoke` is used to invoke a Cloud Run service (e.g., making HTTP requests), not for deploying or managing the service; it is unrelated to the deployment permission error. Option C is wrong because `storage.objects.list` on the artifact bucket is needed for reading artifacts, but the error occurs on the third step (likely the deployment step), not during artifact retrieval; the Cloud Build service account already has access to the bucket via its default permissions. Option D is wrong because `resourcemanager.projects.get` is a project-level read permission used for retrieving project metadata, not for deploying to Cloud Run; it is not required for the deployment step.

Full explanation →

262

MCQhard

A company uses a Shared VPC and wants to enforce a set of firewall rules across all projects in a folder. They want these rules to be immutable by project owners. Which approach should they use?

A.Use hierarchical firewall policies at the folder level.

B.Use network tags and service accounts to enforce rules.

C.Use organization policy constraints to prevent project owners from modifying firewall rules.

D.Create firewall rules at the VPC level and assign them to the folder.

AnswerA

Hierarchical policies enforce rules across projects in a folder.

Why this answer

Hierarchical firewall policies allow you to enforce firewall rules at the folder level, which are inherited by all projects within that folder. These policies are immutable by project owners because they are managed at the folder level, not the project or VPC level, and cannot be overridden or deleted by lower-level roles unless explicitly granted permission.

Exam trap

The trap here is that candidates confuse hierarchical firewall policies with organization policy constraints, thinking that preventing modifications is the same as enforcing rules, but hierarchical policies both enforce and lock the rules in a single mechanism.

How to eliminate wrong answers

Option B is wrong because network tags and service accounts are used to apply firewall rules to specific VM instances, not to enforce rules across all projects in a folder or make them immutable. Option C is wrong because organization policy constraints can prevent project owners from modifying firewall rules, but they do not enforce the rules themselves; they only restrict changes, leaving the actual rules to be defined elsewhere. Option D is wrong because firewall rules are created at the VPC network level, not at the VPC level, and they cannot be assigned to a folder; hierarchical firewall policies are the correct mechanism for folder-level enforcement.

Full explanation →

263

Multi-Selectmedium

A company wants to control costs by limiting the maximum number of Compute Engine instances a project can create. Which TWO methods can achieve this? (Choose two.)

Select 2 answers

A.Use IAM conditions to restrict instance creation.

B.Enable preemptible VMs.

C.Create budget alerts with a threshold at 80% of budget.

D.Set resource quotas for compute instances.

E.Apply committed use discounts.

AnswersA, D

IAM conditions can enforce organizational policies, such as limiting the number of instances based on conditions.

Why this answer

Resource quotas limit the number of instances that can be created. IAM conditions can restrict instance creation based on attributes like machine type or region, effectively limiting the number created.

Full explanation →

264

MCQhard

A multinational company runs an application on Google Cloud with an SLO of 99.99% monthly availability. They use a multi-region deployment with Cloud Load Balancing and Cloud Spanner. During a regional outage in us-central1, traffic fails over to us-east1. However, the incident response team is not alerted because the error budget burn rate remained below the alert threshold. What should the team change to ensure timely alerting for such regional failures?

A.Shorten the SLO compliance window from 30 days to 7 days.

B.Create a custom dashboard and alert for regional unavailability using Cloud Monitoring metrics like load_balancing/backend_request_count and region health checks.

C.Change the SLO to 99.9% to allow more error budget.

D.Reduce the error budget burn rate alert threshold from 10% to 5% per hour.

AnswerB

Direct alerts for regional failures catch issues early.

Why this answer

Option D is correct because implementing a 'signals of possible trouble' dashboard and alerts for regional failures provides early warning even before SLO is breached. Option A is wrong because lowering the burn rate alert threshold may cause noise but could help, but it's not the best practice for regional failures. Option B is wrong because the burn rate is already calculated over a longer window; reducing window might help but could increase noise.

Option C is wrong because changing SLO to 99.9% would increase error budget but not address alerting.

Full explanation →

265

MCQhard

An e-commerce platform is using Cloud Load Balancing with a backend service that has a custom health check. The health check is failing intermittently, causing traffic to be routed away from healthy instances. The team has enabled Cloud Logging and wants to diagnose the issue. Which log view should they examine to see the health check probe results?

A.VPC flow logs

B.Cloud Audit Logs (Admin Activity)

C.Instance serial port output logs

D.Load balancer logs (type: 'loadbalancing.googleapis.com')

AnswerD

Load balancer logs contain health check probe results.

Why this answer

Load balancer logs (type: 'loadbalancing.googleapis.com') contain detailed records of health check probes, including the probe source IP, target instance, response code, and latency. This is the exact log view that captures health check probe results, enabling the team to identify intermittent failures by correlating probe timestamps with instance health status changes.

Exam trap

The trap here is that candidates confuse VPC flow logs (which show network-level traffic) with load balancer logs (which show application-level health check results), or assume that audit logs or serial console logs would contain runtime health check data.

How to eliminate wrong answers

Option A is wrong because VPC flow logs capture network traffic metadata (source/destination IP, ports, protocols) but do not include health check probe results or application-layer health check responses. Option B is wrong because Cloud Audit Logs (Admin Activity) record administrative actions like creating or modifying load balancer configurations, not the runtime results of health check probes. Option C is wrong because instance serial port output logs contain OS-level boot and kernel messages, not health check probe data from the load balancer.

Full explanation →

266

MCQhard

During a post-mortem, you identify that an incident was caused by a configuration change that was not reviewed. Which of the following is the most effective preventive action?

A.Add more monitoring alerts.

B.Implement a change management process with mandatory peer review.

C.Schedule weekly meetings to review changes.

D.Use a configuration management database (CMDB).

AnswerB

Peer review catches misconfigurations before deployment.

Why this answer

Option B is correct because a change management process with mandatory peer review directly addresses the root cause: a configuration change was made without oversight. By requiring at least one additional engineer to review and approve changes before implementation, the process catches misconfigurations, policy violations, or unintended side effects before they reach production. This is a preventive control, not a detective or corrective one, and aligns with ITIL best practices for change management.

Exam trap

Google Cloud often tests the distinction between preventive controls (like peer review) and detective controls (like monitoring), leading candidates to mistakenly choose monitoring alerts because they seem proactive, when in fact they only detect failures after they happen.

How to eliminate wrong answers

Option A is wrong because adding more monitoring alerts is a detective control that only notifies you after the incident has already occurred; it does not prevent the unreviewed configuration change from causing the incident. Option C is wrong because scheduling weekly meetings to review changes is a reactive, after-the-fact review that does not prevent the change from being applied without review; the damage is already done by the time the meeting occurs. Option D is wrong because a configuration management database (CMDB) is a repository for storing configuration item data and relationships; it does not enforce any review or approval workflow, so an unreviewed change can still be made without any preventive barrier.

Full explanation →

267

MCQhard

A service is deployed on Cloud Run. You need to monitor memory usage per revision. How can you create an alert?

A.Deploy a sidecar container that collects memory metrics and pushes to Cloud Monitoring

B.Configure Cloud Run to send metrics to Cloud Monitoring and create an alert in Cloud Monitoring

C.Use Cloud Console and navigate to Cloud Run services to set an alert directly

D.Use Cloud Logging to parse container logs for memory usage

AnswerB

Cloud Run metrics are automatically available in Cloud Monitoring.

Why this answer

Cloud Run automatically exports built-in metrics, including memory usage per revision, to Cloud Monitoring without any additional configuration. By creating an alerting policy in Cloud Monitoring based on the `run.googleapis.com/container/memory/utilizations` metric, you can monitor memory usage per revision directly. Option B correctly identifies this native integration, making it the most efficient and reliable approach.

Exam trap

Google Cloud often tests the misconception that you must manually configure metric export or use sidecars for Cloud Run monitoring, when in fact Cloud Run's native integration with Cloud Monitoring handles this automatically.

How to eliminate wrong answers

Option A is wrong because Cloud Run already sends memory metrics to Cloud Monitoring natively; deploying a sidecar container to collect and push metrics is unnecessary, adds complexity, and violates the principle of using built-in observability. Option C is wrong because the Cloud Run console does not provide a direct interface to create alerting policies; alerts must be configured in Cloud Monitoring, not within the Cloud Run service page. Option D is wrong because Cloud Logging is designed for log analysis, not metric-based alerting; parsing container logs for memory usage is inefficient, unreliable, and not the intended method for monitoring resource utilization metrics.

Full explanation →

268

Multi-Selecthard

Which THREE are valid methods to enforce resource location restrictions in a Google Cloud organization? (Choose three.)

Select 3 answers

A.Use VPC Service Controls to limit resource access to specific regions.

B.Use folder-level IAM policies to restrict locations.

C.Apply a folder-level policy with the same organization policy constraint.

D.Apply an organization policy with the constraint 'constraints/gcp.resourceLocations'.

E.Use Cloud Audit Logs to detect and alert on resources created in non-compliant locations.

AnswersA, C, D

VPC SC can restrict access based on location.

Why this answer

Option A is correct because VPC Service Controls allow you to define perimeters that restrict resource access based on attributes such as region, preventing data exfiltration and ensuring resources are only accessible from allowed locations. This is a dedicated security feature that enforces location restrictions at the network level, complementing organization policies.

Exam trap

Google Cloud often tests the distinction between proactive enforcement (organization policies, VPC Service Controls) and reactive detection (Audit Logs), leading candidates to mistakenly select Cloud Audit Logs as a valid enforcement method.

Full explanation →

269

MCQeasy

A team needs to monitor the availability of an HTTPS endpoint that requires a Bearer token in the request header. What is the simplest way to configure this with Cloud Monitoring?

A.Deploy a sidecar container that handles the authentication and exposes a plain endpoint.

B.Use a synthetic monitor from Cloud Monitoring that handles authentication via a script.

C.Export the endpoint logs to Cloud Logging and set up a log-based metric for availability.

D.Configure the Uptime Check to include a custom header with the Bearer token.

AnswerD

Uptime Checks allow custom headers, so you can directly set the Authorization header with the token.

Why this answer

Option D is correct because Cloud Monitoring's Uptime Checks natively support custom HTTP headers, including Authorization headers with Bearer tokens. This allows you to directly monitor an authenticated HTTPS endpoint without any additional infrastructure, scripting, or log-based workarounds. It is the simplest and most straightforward configuration for this requirement.

Exam trap

Google Cloud often tests the misconception that Uptime Checks cannot handle authentication, leading candidates to overcomplicate the solution with sidecars or scripts, when in fact custom headers are a built-in feature.

How to eliminate wrong answers

Option A is wrong because deploying a sidecar container adds unnecessary complexity and defeats the purpose of a simple configuration; it also introduces an extra point of failure and maintenance overhead. Option B is wrong because synthetic monitors are designed for multi-step or browser-based scenarios, not for simple single-request availability checks; using a script for a basic header-based authentication is overkill and not the simplest approach. Option C is wrong because exporting logs and setting up a log-based metric for availability is indirect, introduces latency, and does not provide proactive monitoring; it relies on the endpoint already generating logs, which is not guaranteed for simple availability checks.

Full explanation →

270

Multi-Selectmedium

Which TWO deployment strategies are directly supported by Cloud Deploy for GKE?

Select 2 answers

A.Rolling update

B.A/B testing

C.Canary

D.Shadow

E.Blue-green

AnswersC, E

Cloud Deploy supports canary with incremental traffic shifting.

Why this answer

Options B and D are correct. Cloud Deploy supports canary and blue-green deployment strategies for GKE. Option A is incorrect - rolling update is managed by GKE directly, not Cloud Deploy as a strategy.

Option C is incorrect - shadow deployments are not supported. Option E is incorrect - A/B testing is not a built-in strategy.

Full explanation →

271

Multi-Selectmedium

Which TWO practices are recommended for implementing CI/CD pipelines on Google Cloud?

Select 2 answers

A.Store service account keys in the build configuration file for authentication.

B.Deploy to production directly after a successful build without any approval gate.

C.Create a single build pipeline that handles all microservices to reduce complexity.

D.Use a Dockerfile to define the build process for containerized applications.

E.Use Cloud Build substitutions to parameterize build configurations for different environments.

AnswersD, E

Dockerfile is the standard way to define container builds.

Why this answer

Option D is correct because using a Dockerfile to define the build process for containerized applications is a recommended practice in CI/CD pipelines on Google Cloud. It ensures that the application is built consistently across all environments, leveraging Cloud Build's native support for Dockerfiles to produce container images that can be stored in Container Registry or Artifact Registry.

Exam trap

Google Cloud often tests the misconception that a single monolithic pipeline is simpler and thus better, but the correct approach is to decouple microservices into separate pipelines for isolation and independent release cycles.

Full explanation →

272

Drag & Dropmedium

Arrange the steps to set up a Google Cloud Monitoring alerting policy for a Compute Engine instance.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

First create a notification channel, then define the condition, set evaluation parameters, attach the channel, and save.

Full explanation →

273

MCQmedium

During a canary deployment of a new version of a microservice, the engineer notices increased error rates in the canary instances. What is the best immediate action?

A.Continue the rollout to see if errors stabilize.

B.Perform a rollback of the canary to the previous version.

C.Scale up the canary instances to handle load.

D.Pause the rollout and investigate the errors.

AnswerB

Rolling back immediately stops the errors and protects users.

Why this answer

When error rates increase in a canary, the safest immediate action is to roll back the canary to prevent further impact on users. Pausing and investigating is reasonable but allows continued errors. Scaling up the canary would worsen the issue.

Continuing the rollout would be irresponsible.

Full explanation →

274

Multi-Selecthard

An application running on GKE experiences high tail latency. The team is optimizing performance. Which THREE techniques should they consider? (Choose three.)

Select 3 answers

A.Use pod anti-affinity to spread pods across nodes

B.Set proper resource requests and limits to avoid resource contention

C.Enable Istio sidecar injection for all pods

D.Use pod affinity to pack pods on same node

E.Increase the number of replicas for stateless services

AnswersA, B, E

Spread reduces resource competition and improves performance.

Why this answer

Pod anti-affinity spreads pods across different nodes, reducing the risk of a single node becoming a hotspot and causing contention for resources like CPU, memory, or network bandwidth. By distributing pods, you minimize queuing delays and improve tail latency, as no single node is overloaded with too many pods competing for the same resources.

Exam trap

Google Cloud often tests the misconception that packing pods together (affinity) improves performance by reducing network hops, but in practice, it increases contention and tail latency, while anti-affinity spreads load and improves predictability.

Full explanation →

275

MCQeasy

An application running on App Engine standard environment has high instance startup latency, leading to slow first requests. What is the most effective configuration change to reduce cold starts?

A.Use manual scaling instead of automatic scaling

B.Use Cloud Endpoints for request authentication

C.Set min_idle_instances to a value greater than 0

D.Increase the memory limit per instance

AnswerC

This ensures that a pool of warm instances is always available, reducing cold start latency.

Why this answer

Setting a minimum number of idle instances ensures that instances are always ready to serve traffic, eliminating startup delays. Manual scaling (A) requires manual capacity planning; increasing memory (C) may not reduce startup time; Cloud Endpoints (D) is for API management.

Full explanation →

276

MCQeasy

A startup is bootstrapping their Google Cloud organization with the following constraints: they have a small team of 10 developers, each with varying levels of expertise. They want a simple setup that allows developers to experiment in their own projects but prevents them from deleting production resources. They also want to enforce a budget limit on each project to avoid unexpected costs. The team has no prior Google Cloud experience and wants minimal operational overhead. Which of the following approaches best meets their needs?

A.Use a single project with VPC Service Controls to isolate resources.

B.Create a single project for all developers, use budget alerts, and give everyone Owner role.

C.Create a project per developer, give them Owner role, and set a budget on each project.

D.Create a folder for production and a folder for development, assign developers Editor on dev projects and Viewer on prod, set budget alerts on both folders.

AnswerD

Isolation of environments and budget control.

Why this answer

Option B provides clear separation and cost control with minimal complexity.

Full explanation →

277

MCQhard

A team wants to ensure zero-downtime deployments for a Cloud Run service. They plan to gradually shift traffic from the current revision to the new one. What should they configure?

A.Use Cloud Build to directly replace the revision

B.Set traffic to 100% on the new revision gradually using multiple updates

C.Configure a rolling update strategy in Cloud Deploy

D.Use Cloud Run's traffic splitting feature to slowly increase traffic to the new revision

AnswerD

Cloud Run allows you to assign traffic percentages to revisions, enabling gradual rollouts.

Why this answer

Cloud Run supports traffic splitting, allowing you to gradually send a percentage of traffic to the new revision to monitor health before fully switching.

Full explanation →

278

MCQhard

You are using Cloud CDN with an external HTTPS load balancer. Users in Asia report slow load times for static assets. The origin is in us-central1. What should you do to improve performance?

A.Switch the load balancer to an internal HTTPS load balancer with gRPC.

B.Use premium tier networking for the load balancer.

C.Enable Cloud CDN and configure cache modes for static content.

D.Configure a serverless NEG to route traffic to Cloud Functions.

AnswerC

CDN caches content at edge locations, reducing latency.

Why this answer

Cloud CDN caches static assets at Google's global edge locations, reducing latency for users in Asia by serving content from a nearby point of presence instead of the us-central1 origin. Enabling Cloud CDN and configuring cache modes for static content (e.g., setting Cache-Control headers or using origin cache policies) ensures that frequently requested assets are served from cache, dramatically improving load times for geographically distant users.

Exam trap

Google Cloud often tests the distinction between network-level optimizations (like premium tier) and application-level caching (like CDN), and the trap here is assuming that faster routing alone can solve latency for repeated static asset requests, when in fact caching eliminates the need for those requests to reach the origin at all.

How to eliminate wrong answers

Option A is wrong because switching to an internal HTTPS load balancer with gRPC would restrict traffic to within a VPC network, making it inaccessible to external users in Asia, and gRPC does not inherently improve latency for static assets. Option B is wrong because premium tier networking optimizes routing between Google Cloud regions and the internet but does not cache content; it reduces network hop latency but cannot eliminate the round-trip time to the origin for every request. Option D is wrong because configuring a serverless NEG to route traffic to Cloud Functions would introduce additional compute overhead and is designed for dynamic content or API backends, not for caching or accelerating static asset delivery.

Full explanation →

279

MCQhard

Refer to the exhibit. A team uses these Compute Engine instances to run a batch processing job. The job frequently gets killed on instance-3. What is the most likely cause?

A.Instance-3 has insufficient CPU.

B.Instance-3 has a corrupted disk.

C.Instance-3 is preemptible and gets terminated by Google Cloud.

D.Instance-3 has insufficient memory.

AnswerC

Preemptible instances can be shut down within 24 hours, causing job interruptions.

Why this answer

Instance-3 is a preemptible VM, which means Google Cloud can terminate it at any time if its resources are needed elsewhere. Preemptible instances have a maximum runtime of 24 hours and are subject to termination with a 30-second warning. The batch processing job being frequently killed on instance-3 is a classic symptom of preemptible VM termination, not a resource exhaustion or disk issue.

Exam trap

Google Cloud often tests the distinction between preemptible VM termination and resource exhaustion (CPU/memory/disk) by presenting a scenario where a job is 'killed' — candidates may incorrectly attribute this to insufficient resources rather than recognizing the preemptible VM behavior.

How to eliminate wrong answers

Option A is wrong because insufficient CPU would cause the job to run slowly or fail with a resource-exhausted error, not be killed abruptly by the system. Option B is wrong because a corrupted disk would manifest as I/O errors, data corruption, or boot failures, not as the job being killed without disk-related error messages. Option D is wrong because insufficient memory would cause out-of-memory (OOM) kills logged in the system, which would show as the process being terminated by the kernel, not as the instance being stopped by Google Cloud.

Full explanation →

280

MCQeasy

Refer to the exhibit. The output shows a recommendation from the Cloud Cost Optimization recommender for an instance in us-central1-a. The instance is a production web server that consistently runs at 25% CPU utilization during peak hours. What should the DevOps engineer do to implement this recommendation with minimal risk?

A.Stop the instance, change the machine type to n2-standard-4, and start it again.

B.Ignore the recommendation because the instance is production and any change might cause downtime.

C.Add a second n2-standard-4 instance behind a load balancer to distribute load.

D.Use a custom machine type with 4 vCPUs and 32 GB memory to ensure enough RAM.

AnswerA

This directly implements the recommendation with minimal risk.

Why this answer

Option A is correct because the Cloud Cost Optimization recommender has identified that the current instance is over-provisioned for its actual workload (25% CPU utilization during peak hours). By stopping the instance, changing the machine type to n2-standard-4 (which has 4 vCPUs and 16 GB RAM, matching the recommended 4 vCPUs and 15 GB memory), and starting it again, the engineer rightsizes the instance without incurring downtime during the change itself. This approach minimizes risk because the instance is stopped only briefly for the machine type change, and the new machine type is a standard, well-tested configuration that aligns with the recommender's analysis.

Exam trap

Google Cloud often tests the misconception that production instances should never be modified, but the correct approach is to use the recommender's suggestion with a controlled stop/start process, which minimizes risk and is a standard cost optimization practice.

How to eliminate wrong answers

Option B is wrong because ignoring the recommendation is not a valid cost optimization strategy; the recommender is designed to identify safe rightsizing opportunities, and the instance can be changed with minimal downtime by stopping and restarting it. Option C is wrong because adding a second n2-standard-4 instance behind a load balancer would increase costs and complexity, not reduce them, and the single instance is already underutilized at 25% CPU. Option D is wrong because using a custom machine type with 4 vCPUs and 32 GB memory would double the recommended memory (15 GB), leading to unnecessary cost and not addressing the over-provisioning issue identified by the recommender.

Full explanation →

281

MCQmedium

A development team is using Cloud Build to build and push Docker images to Artifact Registry. The builds are taking longer than expected, and the team wants to reduce build time and cost. They use a Dockerfile that installs many dependencies. Which approach should they recommend?

A.Increase the machine type to use more vCPUs and memory for the build.

B.Use Kaniko cache in Cloud Build with a persistent volume claim to cache base layers.

C.Switch to Docker build with --privileged flag and use a local Docker daemon.

D.Reduce the number of steps in the Cloud Build config to a single step that installs and builds everything.

AnswerB

Kaniko's cache stores intermediate layers in a persistent volume, dramatically reducing build time for unchanged dependencies.

Why this answer

Option C is correct because using Kaniko with a persistent cache for base layers leverages cache from previous builds, speeding up builds without requiring privileged mode. Option A increases cost by adding more vCPUs without addressing inefficient caching. Option B uses Docker with privileged mode, which is slower and less secure.

Option D reduces parallelism, likely increasing build time.

Full explanation →

282

Multi-Selecteasy

Which TWO Organization Policy constraints are commonly used to enhance security in a DevOps environment?

Select 2 answers

A.constraints/cloudbuild.enableBuildManager

B.constraints/storage.uniformBucketLevelAccess

C.constraints/iam.disableServiceAccountKeyCreation

D.constraints/appengine.disableCodeDownload

E.constraints/compute.disablePublicIpAddress

AnswersC, E

Prevents creation of service account keys, reducing risk of key compromise.

Why this answer

Option C is correct because the `constraints/iam.disableServiceAccountKeyCreation` organization policy constraint prevents the creation of long-lived service account keys, which are a common security risk in DevOps pipelines. By enforcing this constraint, you force the use of short-lived credentials (e.g., workload identity federation or OAuth 2.0 access tokens) instead of static JSON keys that could be leaked or misused.

Exam trap

Google Cloud often tests the distinction between organization policy constraints and IAM roles or service-level settings, so candidates mistakenly select options like `constraints/storage.uniformBucketLevelAccess` or `constraints/cloudbuild.enableBuildManager` because they sound security-related but are not specifically designed to enhance DevOps security through credential management.

Full explanation →

283

MCQhard

A financial services company runs a real-time trading application on GKE with 10 microservices. The application uses Cloud Spanner as the database. Recently, the team noticed increased latency during peak trading hours. Cloud Monitoring shows high CPU utilization on the Spanner nodes (averaging 80%) and increased locking contention. The team has already added secondary indexes and tuned queries. The application's latency budget is 50ms for writes and 20ms for reads. The team must reduce latency while maintaining strong consistency and meeting the budget. What should they do?

A.Increase the number of Spanner nodes to reduce contention and CPU load

B.Change the application to use eventual consistency for read operations

C.Migrate the database to Cloud Bigtable for higher throughput

D.Implement a write buffer using Cloud Pub/Sub and batch writes to Spanner

AnswerA

More nodes improve throughput and reduce locking contention, meeting latency budgets without sacrificing consistency.

Why this answer

Increasing the number of Spanner nodes directly addresses the root cause: high CPU utilization (80%) and locking contention. More nodes distribute the read/write load, reducing per-node CPU and contention, which lowers latency. This maintains strong consistency and meets the 50ms write / 20ms read budget without architectural changes.

Exam trap

Google Cloud often tests the misconception that adding nodes only helps with storage or throughput, not latency; in Spanner, more nodes reduce CPU contention and lock waits, directly improving latency under high load.

How to eliminate wrong answers

Option B is wrong because changing to eventual consistency violates the requirement for strong consistency, which is non-negotiable for a real-time trading application. Option C is wrong because Cloud Bigtable does not support strong consistency or SQL queries, and it is optimized for analytical workloads, not transactional trading with strict latency budgets. Option D is wrong because a write buffer with Pub/Sub and batch writes would increase write latency beyond the 50ms budget and could introduce data staleness, violating strong consistency.

Full explanation →

284

MCQmedium

A DevOps engineer is setting up a CI/CD pipeline for a Python application using Cloud Build. The build takes too long because pip install is downloading packages every time. What is the best approach to speed up the build?

A.Use a custom base image that includes all dependencies pre-installed.

B.Increase the machine type to a higher CPU and memory instance.

C.Use Kaniko cache in Cloud Build with a remote cache location.

D.Configure a volume mount to a Cloud Storage bucket for pip cache and set PIP_CACHE_DIR.

AnswerD

Caching pip downloads across builds is the most direct optimization.

Why this answer

Option C is correct because storing pip cache in a Cloud Storage bucket and restoring it in subsequent builds reduces download time. Option A is incorrect - Docker layer caching helps but pip cache is more effective for Python. Option B is incorrect - no guarantee of faster builds.

Option D is incorrect - pre-built images may introduce more complexity and maintenance.

Full explanation →

285

Multi-Selectmedium

A DevOps team is investigating performance issues in their GKE cluster. They want to use Cloud Profiler to identify the bottleneck. Which three steps are required to start profiling? (Select THREE)

Select 3 answers

A.Configure IAM permissions

B.Deploy the profiler agent to the application container

C.Enable Cloud Profiler API

D.Install a sidecar proxy

E.Modify the application code to include profiling endpoints

AnswersA, B, C

The agent needs roles/profiler.agent.

Why this answer

A is correct because Cloud Profiler requires the `cloudprofiler.agent` IAM role (or equivalent permissions) on the service account used by the GKE node or application to allow the agent to write profiling data to the Cloud Profiler API. Without this permission, the agent cannot upload profiles, and no data will appear in the console.

Exam trap

Google Cloud often tests the misconception that Cloud Profiler requires code modifications or sidecar proxies, when in fact it uses a lightweight agent that requires only API enablement, IAM permissions, and agent deployment.

Full explanation →

286

MCQhard

Refer to the exhibit. A DevOps engineer assigned this custom role to a service account used in Cloud Build. The pipeline fails when trying to access a secret stored in Secret Manager. Which permission is missing?

A.cloudbuild.builds.update

B.run.services.get

C.secretmanager.versions.access

D.iam.serviceAccounts.actAs

AnswerC

Required to access the latest version of a secret.

Why this answer

The custom role assigned to the Cloud Build service account lacks the `secretmanager.versions.access` permission, which is required to access the payload of a secret version in Secret Manager. Without this permission, any attempt to read the secret value during a build step will fail with a permission denied error, even if the service account has other roles on the project.

Exam trap

Google Cloud often tests the distinction between permissions that manage resources (e.g., `get`, `update`) and permissions that access data (e.g., `access`), leading candidates to pick a generic read permission like `get` instead of the specific `access` permission required for secret payloads.

How to eliminate wrong answers

Option A is wrong because `cloudbuild.builds.update` allows updating Cloud Build builds, not accessing secrets in Secret Manager. Option B is wrong because `run.services.get` grants read access to Cloud Run service metadata, not to secret payloads. Option D is wrong because `iam.serviceAccounts.actAs` is needed to impersonate a service account (e.g., for Cloud Build to deploy on behalf of another SA), but it does not grant access to secret data.

Full explanation →

287

MCQeasy

A company wants to ensure that all projects in the organization have Cloud Resource Manager API enabled. What is the most efficient method?

A.Use a Cloud Scheduler job to enable the API in new projects.

B.Enable the API manually in each project.

C.Use a Terraform script that iterates over all projects.

D.Set an organization policy to require the API.

AnswerD

Automatically enforced for all projects.

Why this answer

Option D is correct because organization policies allow you to enforce constraints across all projects in the organization, ensuring the Cloud Resource Manager API is enabled automatically and cannot be disabled. This is the most efficient method as it requires no manual intervention or scripting, and it leverages the native Google Cloud policy framework to enforce compliance at scale.

Exam trap

The trap here is that candidates often choose a reactive automation solution like Terraform or Cloud Scheduler, missing that organization policies provide proactive, declarative enforcement that works at the infrastructure layer without requiring custom code or periodic runs.

How to eliminate wrong answers

Option A is wrong because Cloud Scheduler is a cron job service that triggers actions on a schedule, but it cannot proactively enable APIs in new projects before they are created; it would require a custom script and still not prevent projects from being created without the API. Option B is wrong because manually enabling the API in each project is not scalable, error-prone, and violates the principle of infrastructure as code and automation expected in a DevOps organization. Option C is wrong because a Terraform script that iterates over all projects is reactive and requires periodic execution; it cannot enforce the API being enabled at project creation time and may miss projects created outside the Terraform workflow.

Full explanation →

288

Multi-Selecthard

A company needs to monitor custom application metrics from Compute Engine instances. Which TWO methods can be used?

Select 2 answers

A.Use the deprecated Stackdriver Agent

B.Install Cloud Monitoring agent on instances

C.Use Cloud Trace

D.Use OpenTelemetry to send metrics to Cloud Monitoring

E.Install Cloud Logging agent

AnswersB, D

The agent collects custom metrics and sends to Cloud Monitoring.

Why this answer

Option B is correct because the Cloud Monitoring agent is specifically designed to collect custom application metrics from Compute Engine instances and send them to Cloud Monitoring. It supports both third-party applications and custom metrics via its built-in integration with collectd and a configuration interface for defining custom metrics. This is the standard, supported method for monitoring custom metrics from VMs.

Exam trap

Google Cloud often tests the distinction between agents for logging versus monitoring, and candidates mistakenly think the Cloud Logging agent can also handle metrics, or they confuse Cloud Trace (tracing) with metric collection.

Full explanation →

289

MCQeasy

A startup is bootstrapping a Google Cloud organization for DevOps. They need to create a project for their CI/CD tooling and a separate project for logging and monitoring. What is the recommended way to structure the resource hierarchy?

A.Create a single project for all workloads and use labels to differentiate environments.

B.Create both projects directly under the organization node, with separate billing accounts.

C.Create a separate organization for each project to ensure isolation.

D.Create a folder called 'DevOps' and place both projects inside it, sharing a billing account.

AnswerD

Using a folder allows inheritance of IAM policies and organization policies, simplifying management.

Why this answer

Option D is correct because the recommended Google Cloud resource hierarchy for DevOps bootstrapping is to create a folder (e.g., 'DevOps') under the organization node and place both projects inside it. This structure allows centralized policy inheritance (e.g., IAM, org policies) and shared billing via a single billing account, while maintaining logical separation between CI/CD and logging/monitoring workloads. It aligns with Google's best practices for multi-project isolation without unnecessary organizational complexity.

Exam trap

Google Cloud often tests the misconception that projects must be placed directly under the organization node or that separate billing accounts are required for isolation, but the correct approach is to use folders for grouping and a shared billing account to maintain centralized control and policy inheritance.

How to eliminate wrong answers

Option A is wrong because using a single project with labels for environment differentiation violates the principle of workload isolation; labels are metadata for filtering, not a security or policy boundary, and cannot enforce separate IAM roles or resource quotas for CI/CD vs. logging. Option B is wrong because creating both projects directly under the organization node with separate billing accounts introduces unnecessary billing overhead and loses the ability to apply common folder-level policies; Google recommends using folders for grouping related projects. Option C is wrong because creating a separate organization for each project is excessive and unsupported—Google Cloud organizations are designed to contain multiple projects, and creating multiple organizations would require separate domains and break centralized management.

Full explanation →

290

MCQmedium

A DevOps team is bootstrapping CI/CD pipelines that need access to API keys stored in Secret Manager. The pipelines run on Cloud Build. What is the best practice for granting access to secrets?

A.Use a custom service account with roles/secretmanager.admin and run Cloud Build as that account.

B.Store the API keys as build substitutions.

C.Grant the Cloud Build service account roles/secretmanager.secretAccessor on the project containing secrets.

D.Use Cloud KMS to encrypt secrets and pass them as environment variables.

AnswerC

This provides least-privilege access to secrets.

Why this answer

Option A is correct because granting the Cloud Build service account roles/secretmanager.secretAccessor on the project containing secrets provides fine-grained access. Option B is wrong because storing API keys as build substitutions is insecure and exposed in logs. Option C is wrong because roles/secretmanager.admin grants excessive permissions.

Option D is wrong because using Cloud KMS adds complexity without being a best practice for secret access.

Full explanation →

291

MCQmedium

Refer to the exhibit. A team uses this cloudbuild.yaml to deploy a service to Cloud Run. They notice that the deployment fails intermittently with a 'permission denied' error. Which is the most likely cause?

A.The image tag $SHORT_SHA is invalid because it contains a variable

B.The Cloud Build service account does not have the `roles/run.admin` or `roles/run.developer` role

C.The region in the gcloud run deploy command does not match the region where Cloud Run is enabled

D.The Cloud Build service account does not have permission to push images to Artifact Registry

AnswerB

These roles grant permission to deploy Cloud Run services.

Why this answer

The Cloud Build service account (default or custom) must have the `roles/run.admin` or `roles/run.developer` IAM role to execute `gcloud run deploy`. Without these roles, the deployment fails with a 'permission denied' error because the service account lacks the `run.services.create` and `run.services.update` permissions required to deploy or update a Cloud Run service. The intermittent nature suggests the service account may have been granted the role after some failures, or the error only surfaces when the service account's cached credentials expire.

Exam trap

Google Cloud often tests the distinction between permissions needed for different stages of a CI/CD pipeline; the trap here is that candidates assume the error is about image pushing (Artifact Registry) rather than the deployment step (Cloud Run), because both involve 'permission denied' but at different phases.

How to eliminate wrong answers

Option A is wrong because `$SHORT_SHA` is a valid Cloud Build substitution variable that resolves to the short commit SHA; it does not cause a 'permission denied' error. Option C is wrong because if the region in the `gcloud run deploy` command does not match where Cloud Run is enabled, the error would be a region mismatch or 'not found', not a 'permission denied' error. Option D is wrong because the Cloud Build service account typically has the `roles/artifactregistry.writer` role by default in many setups, and even if it lacked push permission, the error would occur during the `docker push` step, not during the `gcloud run deploy` step.

Full explanation →

292

MCQhard

Refer to the exhibit. Your team deployed a new revision to Cloud Run. After deployment, error rates increased. You want to roll back to the previous revision, which is still serving. Which command should you use?

A.gcloud run services update-traffic my-service --to-revisions=my-service-00001-caz=100

B.gcloud run services rollback my-service

C.gcloud run revisions delete my-service-00002-caw

D.gcloud run deploy my-service --image gcr.io/my-project/my-image:v1

AnswerA

This command sends 100% traffic to the previous revision.

Why this answer

Option A is correct because `gcloud run services update-traffic` allows you to precisely control traffic splitting between revisions. By setting `--to-revisions=my-service-00001-caz=100`, you direct 100% of incoming requests to the previous revision, effectively rolling back without deleting the current revision. This command is the standard method for traffic-based rollbacks in Cloud Run.

Exam trap

Google Cloud often tests the misconception that a 'rollback' command exists for Cloud Run, but the correct approach is to use traffic management commands like `update-traffic` to shift traffic away from the problematic revision.

How to eliminate wrong answers

Option B is wrong because `gcloud run services rollback` is not a valid command in the gcloud CLI; Cloud Run does not have a built-in rollback subcommand, so this would result in an error. Option C is wrong because deleting the current revision (`my-service-00002-caw`) does not automatically route traffic to the previous revision; it would cause a service outage until traffic is explicitly redirected, and Cloud Run requires at least one revision serving traffic. Option D is wrong because `gcloud run deploy` with a previous image creates a new revision (e.g., `my-service-00003`) rather than reverting to the existing previous revision, which may introduce additional changes and does not leverage the already-serving revision.

Full explanation →

293

MCQhard

In Google's incident management process, which role is responsible for communication with stakeholders and users during an incident?

A.Incident Commander.

B.Communications Lead.

C.Technical Lead.

D.Operations Lead.

AnswerB

The Communications Lead manages all communication with stakeholders.

Why this answer

In Google's incident management process, the Communications Lead is explicitly responsible for managing all external and internal communications, including updates to stakeholders and users. This role ensures that accurate, timely information is disseminated while the Incident Commander focuses on coordinating the response. The Communications Lead does not engage in technical troubleshooting or operational tasks, which are handled by other roles.

Exam trap

Google Cloud often tests the misconception that the Incident Commander handles all aspects of an incident, including communication, but in Google's model, the Incident Commander delegates communication to a dedicated Communications Lead to maintain focus on coordination.

How to eliminate wrong answers

Option A is wrong because the Incident Commander is responsible for overall coordination and decision-making during the incident, not for direct stakeholder communication; they delegate that to the Communications Lead. Option C is wrong because the Technical Lead focuses on diagnosing and resolving the technical issue, not on communicating with stakeholders or users. Option D is wrong because the Operations Lead handles operational tasks such as resource allocation and infrastructure management, not stakeholder communication.

Full explanation →

294

MCQmedium

A DevOps team uses Cloud Run for a containerized application that processes real-time financial data. The service has a concurrency setting of 80, and instances are scaled based on CPU usage. During market volatility, the service experiences high latency and some requests timeout. Cloud Monitoring shows that the average CPU utilization is 40%, but the instance count spikes to the maximum allowed. What is the most likely cause?

A.The concurrency setting is too low, causing many instances to be created.

B.The max instances limit is set too low, causing requests to queue.

C.The service uses too much memory, causing cold starts.

D.The CPU utilization target for autoscaling is set too high, causing slow scaling.

AnswerA

Low concurrency increases instance count, each handling few requests, causing underutilization.

Why this answer

With a concurrency setting of 80, each instance can handle up to 80 simultaneous requests. However, if the actual request rate exceeds 80 per instance, Cloud Run will spin up new instances. During market volatility, the request volume spikes, causing the instance count to hit the maximum even though average CPU utilization is only 40%.

This indicates that the concurrency limit is too low for the burst traffic, forcing excessive instance creation and leading to high latency and timeouts due to instance startup overhead.

Exam trap

Google Cloud often tests the misconception that CPU utilization is the primary driver of Cloud Run scaling, when in fact concurrency settings and request queuing are the dominant factors in burst scenarios.

How to eliminate wrong answers

Option B is wrong because if the max instances limit were set too low, requests would be queued or rejected, but the symptom here is that instance count spikes to the maximum allowed, not that it is capped prematurely. Option C is wrong because memory issues or cold starts would manifest as increased startup latency or out-of-memory errors, not as high instance count with low average CPU utilization. Option D is wrong because a CPU utilization target set too high would cause the autoscaler to be slow to add instances, leading to sustained high CPU and potential queuing, whereas here instances are being added aggressively despite low average CPU.

Full explanation →

295

MCQmedium

During a Cloud Build execution, the step fails with 'Error: could not find a valid 'Dockerfile' in context '.''. The build configuration file is located in a subdirectory called 'build/' and the Dockerfile is in the root of the repository. How should the team fix this?

A.Create a symbolic link.

B.Move the Cloud Build configuration file to the root.

C.Specify the 'dir' field in the build step to point to the root.

D.Use the 'substitutions' to change context.

AnswerC

Setting 'dir: '.' or 'dir: '/' will make Docker use the root context.

Why this answer

Option C is correct because the Cloud Build step's `dir` field explicitly sets the working directory for the step. By specifying `dir: '.'` (or the repository root), Cloud Build will look for the Dockerfile in the root context, even though the build configuration file (`cloudbuild.yaml`) resides in the `build/` subdirectory. This ensures the Docker build context points to the correct location where the Dockerfile exists.

Exam trap

Google Cloud often tests the misconception that the build configuration file's location dictates the Docker build context, leading candidates to incorrectly choose moving the config file or using substitutions, when the `dir` field is the correct and intended mechanism to control the working directory for a step.

How to eliminate wrong answers

Option A is wrong because creating a symbolic link is an unnecessary workaround that adds complexity and fragility; Cloud Build does not require or recommend symlinks for context resolution. Option B is wrong because moving the Cloud Build configuration file to the root is not required and would break the intended project structure; the `dir` field exists precisely to decouple the config file location from the build context. Option D is wrong because substitutions in Cloud Build are used for variable replacement (e.g., `$_TAG`), not for changing the build context or working directory of a step.

Full explanation →

296

Multi-Selectmedium

Which THREE of the following are valid techniques for mitigating a denial-of-service (DoS) attack against a Google Cloud HTTP(S) Load Balancer?

Select 3 answers

A.Increase the number of backend instances to absorb traffic.

B.Enable autoscaling on the backend services to handle increased load.

C.Modify VPC firewall rules to block all traffic from the source IP.

D.Configure rate limiting per client IP using Cloud Armor or the load balancer's settings.

E.Enable Cloud Armor and create a security policy to block suspicious IP addresses.

AnswersB, D, E

Helps absorb legitimate traffic surge.

Why this answer

Option B is correct because enabling autoscaling on backend services allows the load balancer to dynamically add more backend instances in response to increased traffic, helping to absorb a DoS attack by scaling out capacity. This is a valid mitigation technique as it leverages Google Cloud's managed scaling to maintain service availability under load.

Exam trap

Google Cloud often tests the misconception that manually increasing backend instances (Option A) is a valid real-time mitigation technique, but in practice, autoscaling (Option B) is the correct automated approach, and candidates may overlook that firewall rules (Option C) cannot block application-layer attacks on a load balancer.

Full explanation →

297

MCQmedium

A DevOps engineer notices that a Cloud Build trigger is not firing when commits are pushed to a Cloud Source Repositories repository. The trigger is configured with an invert regex for the branch filter. What could be the issue?

A.The repository is in a different region.

B.The branch name matches the exclude pattern; the trigger ignores matching branches.

C.The commit was made by a service account.

D.The trigger's service account lacks read access to the repository.

AnswerB

Invert regex means the trigger is excluded for matching branches; push to a matching branch will not trigger.

Why this answer

When a Cloud Build trigger is configured with an invert regex for the branch filter, it means the trigger will fire only for branches that do NOT match the specified regex pattern. If the branch name matches the exclude pattern, the trigger ignores commits on that branch, which is why the trigger is not firing. This is the intended behavior of the invert_regex flag in Cloud Build triggers.

Exam trap

The trap here is that candidates often confuse 'invert regex' with 'regex match' and assume the trigger should fire when the pattern matches, whereas invert_regex causes the trigger to fire only when the pattern does NOT match.

How to eliminate wrong answers

Option A is wrong because Cloud Source Repositories and Cloud Build triggers are global resources; region does not affect trigger invocation. Option C is wrong because commits made by a service account still trigger Cloud Build triggers normally, as the trigger watches repository events regardless of the committer identity. Option D is wrong because the trigger's service account requires permissions to start the build, not to read the repository; the trigger itself uses the repository's IAM permissions to detect the push event.

Full explanation →

298

MCQeasy

A company is bootstrapping a Google Cloud organization for the first time. They want to set up Cloud Identity to manage users and groups. What is the correct order of steps?

A.Add users and groups directly in Google Cloud without Cloud Identity.

B.Sign up for Cloud Identity, create the Google Cloud organization node, add users and groups, then enable Google Cloud services and set up billing.

C.Create the organization node first, then sign up for Cloud Identity, then add users.

D.Create the organization node, set up billing, then add Cloud Identity.

AnswerB

Cloud Identity provides the user directory needed for the organization.

Why this answer

Option B is correct because Cloud Identity is the foundation for managing users and groups in a Google Cloud organization. You must first sign up for Cloud Identity to create the identity realm, then create the organization node (which requires a Cloud Identity account), add users and groups, and finally enable services and set up billing. This order ensures that the organization node is linked to the correct Cloud Identity tenant and that users exist before they are granted access to resources.

Exam trap

Google Cloud often tests the misconception that the organization node can be created independently of Cloud Identity, leading candidates to choose option C or D, but in reality, Cloud Identity must be provisioned first as the identity backbone for the entire organization.

How to eliminate wrong answers

Option A is wrong because Cloud Identity is required to manage users and groups at the organization level; adding users directly in Google Cloud without Cloud Identity is not possible for organization-level identity management. Option C is wrong because the organization node cannot be created without first having a Cloud Identity account; Cloud Identity must be set up before the organization node is created. Option D is wrong because Cloud Identity must be established before the organization node is created, and billing setup typically occurs after the organization node exists and users are added.

Full explanation →

299

Multi-Selectmedium

A team uses Google Kubernetes Engine (GKE) with cluster telemetry enabled. During an incident, they notice that a deployment's pods are repeatedly crashing with Exit Code 137. The team wants to investigate the root cause. Which two Google Cloud services should they use together to correlate resource usage and logs?

Select 2 answers

A.Cloud Monitoring and Cloud Logging

B.Security Command Center and Cloud Logging

C.Cloud Trace and Cloud Monitoring

D.Cloud Error Reporting and Cloud Logging

AnswersA, C

Monitoring shows resource usage; Logging shows container logs and OOM events.

Why this answer

Exit Code 137 indicates that a container was killed by SIGKILL (signal 9), typically due to an out-of-memory (OOM) condition. Cloud Monitoring provides metrics such as memory usage and OOM kill counts, while Cloud Logging captures the container's termination logs and system events. By correlating these two services, the team can identify when memory usage spiked and confirm that the pod was OOM-killed, enabling root cause analysis.

Exam trap

Google Cloud often tests the distinction between services that handle metrics (Cloud Monitoring) versus logs (Cloud Logging) versus errors (Cloud Error Reporting), and the trap here is that candidates may confuse Cloud Error Reporting with Cloud Logging, not realizing that Error Reporting only surfaces application-level exceptions, not system-level OOM kills or resource metrics.

Full explanation →

300

MCQeasy

Refer to the exhibit. A team runs a batch processing job on these instances. The job is CPU-bound and can tolerate interruptions. Which instance is the most cost-effective for this workload?

A.instance-3

B.None, they should use a different machine type

C.instance-1

D.instance-2

AnswerC

Correct. Preemptible instance with sufficient CPU at low cost.

Why this answer

Instance-1 is the most cost-effective because it is a preemptible (or spot) VM, which is significantly cheaper than standard on-demand instances. Since the batch processing job is CPU-bound and can tolerate interruptions, preemptible instances are ideal for this workload, offering up to 60-91% cost savings while still providing the necessary compute capacity.

Exam trap

Google Cloud often tests the misconception that any preemptible instance is automatically the best choice, but the trap here is that candidates might overlook whether the workload can actually tolerate interruptions or whether the specific instance type (e.g., with GPUs or high memory) is over-provisioned for a CPU-bound job.

How to eliminate wrong answers

Option A is wrong because instance-3 is likely a standard on-demand or reserved instance, which costs more than preemptible options and is not the most cost-effective for an interruption-tolerant, CPU-bound batch job. Option B is wrong because preemptible instances (like instance-1) are specifically designed for fault-tolerant, batch workloads, so a different machine type is unnecessary. Option D is wrong because instance-2 might be a preemptible instance with a higher machine type or additional resources (e.g., GPUs or more vCPUs) that are not needed for a CPU-bound job, leading to unnecessary cost.

Full explanation →

Page 4 of 7

All pages

Practice PCDOE by domain

Target a specific domain to shore up weak areas.

Bootstrapping a Google Cloud organization for DevOps Managing service incidents Managing Google Cloud costs Building and implementing CI/CD pipelines Implementing service monitoring strategies Optimizing service performance

See all domains with question counts →