Google Professional Cloud DevOps Engineer (PCDOE) — Questions 301375

500 questions total · 7pages · All types, answers revealed

Page 4

Page 5 of 7

Page 6
301
MCQmedium

A team uses Cloud Load Balancing with backend NEGs. Users report intermittent high latency. How should they diagnose the root cause effectively?

A.Increase the number of backend instances immediately
B.Enable Cloud Monitoring latency histogram for the load balancer
C.Check Cloud CDN cache hit ratio
D.Use Cloud Trace to analyze per-request latency spans
AnswerD

Cloud Trace captures latency for each request across distributed services, enabling identification of slow components.

Why this answer

Cloud Trace provides end-to-end latency analysis by capturing per-request spans as they traverse the load balancer, backend NEGs, and other services. This allows you to pinpoint exactly which hop (e.g., load balancer processing, backend queuing, or application code) is causing the intermittent high latency, rather than relying on aggregate metrics or caching assumptions.

Exam trap

Google Cloud often tests the distinction between aggregate monitoring (like histograms or cache ratios) and distributed tracing for diagnosing intermittent, per-request performance issues, leading candidates to choose a simpler metric-based option instead of the more precise tracing tool.

How to eliminate wrong answers

Option A is wrong because blindly increasing backend instances treats a symptom (high latency) without diagnosing its cause; it may waste resources if the latency is due to network congestion, misconfigured timeouts, or a specific backend bottleneck. Option B is wrong because Cloud Monitoring latency histograms show aggregate latency distributions but cannot isolate which specific request or component is responsible for intermittent spikes; they lack per-request span-level granularity. Option C is wrong because Cloud CDN cache hit ratio only affects cacheable content; intermittent high latency for dynamic or uncacheable requests would not be explained by cache misses, and CDN metrics do not reveal backend processing delays.

302
MCQhard

Refer to the exhibit. The team wants to reduce the service's p50 latency from 2 seconds to under 500ms. Which optimization would have the most impact?

A.Increase the number of service instances
B.Optimize processOrder() by reducing logging
C.Optimize getCustomerData() by caching customer data
D.Optimize saveToDatabase() by using batch writes
AnswerC

Caching eliminates the 1200ms function call, potentially reducing total time by over 50%.

Why this answer

The exhibit shows that getCustomerData() is the most time-consuming operation, taking 1.2 seconds out of the total 2-second p50 latency. Caching customer data eliminates repeated expensive lookups (e.g., database queries or external API calls), directly reducing the critical path latency. This optimization targets the largest bottleneck, making it the most impactful for achieving sub-500ms p50.

Exam trap

Google Cloud often tests the misconception that horizontal scaling or optimizing non-critical paths (like logging) can significantly reduce p50 latency, when in fact the largest single bottleneck must be addressed first.

How to eliminate wrong answers

Option A is wrong because increasing service instances (horizontal scaling) reduces throughput bottlenecks but does not reduce per-request latency; it may even add network overhead. Option B is wrong because optimizing processOrder() by reducing logging saves only a few milliseconds, not the ~1.2 seconds needed to meet the target. Option D is wrong because saveToDatabase() using batch writes improves throughput for bulk operations but does not reduce the latency of a single request's synchronous write path.

303
MCQeasy

A company notices increased latency for their web application running on Compute Engine. They suspect a database bottleneck. Which Google Cloud service should they use to identify slow queries?

A.Cloud Logging
B.Cloud Debugger
C.Cloud Trace
D.Cloud SQL Query Insights
AnswerD

Cloud SQL Query Insights provides self-service, intelligent query diagnostics.

Why this answer

Cloud SQL Query Insights is the correct choice because it is a Google Cloud managed service specifically designed to identify and analyze database performance issues, including slow queries, in Cloud SQL instances. It provides detailed query performance metrics, execution plans, and recommendations to optimize database queries, directly addressing the bottleneck in a Compute Engine web application.

Exam trap

The trap here is that candidates often confuse Cloud Trace (which traces request-level latency across services) with database-specific query analysis, but Cloud Trace does not provide the granular SQL-level insights needed to identify slow queries in a database.

How to eliminate wrong answers

Option A is wrong because Cloud Logging aggregates and stores log data from various sources, but it does not provide built-in query analysis or performance insights for databases; it would require manual log parsing to identify slow queries. Option B is wrong because Cloud Debugger is used to inspect application code state at runtime for debugging purposes, not for analyzing database query performance or identifying slow queries. Option C is wrong because Cloud Trace is a distributed tracing service that captures latency data across microservices and HTTP requests, but it does not offer database-specific query analysis or insights into slow SQL queries.

304
Multi-Selectmedium

A DevOps team wants to implement a CI/CD pipeline for a microservices application deployed on Google Kubernetes Engine (GKE). They need to ensure that each service is built, tested, and deployed independently with minimal manual intervention. Which TWO practices should they implement?

Select 2 answers
A.Use Cloud Deploy to manage progressive delivery (e.g., canary, blue/green) to GKE clusters.
B.Use Cloud Source Repositories integrated with Cloud Build for version control and triggering builds.
C.Use a monolithic repository and deploy all services simultaneously to ensure consistency.
D.Use Cloud Build triggers to build and test each service independently on pull request.
E.Use a single Cloud Build configuration file for all services with conditional steps to handle different services.
AnswersA, D

Cloud Deploy provides deployment strategies that reduce risk and allow independent releases.

Why this answer

Option B uses Cloud Build triggers to automatically build and test each service on pull request, enabling independent CI. Option C uses Cloud Deploy for progressive delivery, facilitating safe deployments. Option A is not best practice as a single config becomes complex.

Option D opposes microservices independence. Option E focuses on source control, not CI/CD.

305
MCQeasy

A team is monitoring a production service on Google Kubernetes Engine (GKE) and notices that a deployment is occasionally returning HTTP 503 errors. The team has set up a ServiceMonitor in Prometheus to scrape metrics from the pods. What is the most likely cause of the intermittent 503 errors?

A.The pods are crashing and restarting frequently.
B.The Prometheus scrape interval is too long, causing missed metrics.
C.The readiness probes are failing, causing the pods to be removed from the service endpoints.
D.The container resource limits are set too low, causing out-of-memory errors.
AnswerC

Readiness probe failures remove pods from service endpoints, causing 503s if all replicas fail.

Why this answer

Intermittent HTTP 503 errors in a GKE deployment typically indicate that the service's endpoints are temporarily unavailable. When a readiness probe fails, Kubernetes removes the pod from the Service's endpoints, causing traffic to be routed to remaining healthy pods. If multiple pods fail their readiness probes simultaneously or in quick succession, the Service may have no available endpoints, resulting in 503 errors for incoming requests.

Exam trap

Google Cloud often tests the distinction between liveness probes (which restart pods) and readiness probes (which control traffic routing), and candidates mistakenly attribute 503 errors to pod crashes or resource limits rather than the readiness probe's role in endpoint management.

How to eliminate wrong answers

Option A is wrong because pods crashing and restarting frequently would cause more persistent errors or connection resets, not intermittent 503 errors, and the ServiceMonitor would still scrape metrics from the restarted pods. Option B is wrong because the Prometheus scrape interval affects metric collection, not the availability of the service endpoints; a long scrape interval may cause gaps in monitoring data but does not directly cause HTTP 503 errors. Option D is wrong because out-of-memory errors typically cause pod crashes (OOMKilled) and restarts, which would manifest as connection timeouts or 502 errors rather than intermittent 503 errors from the service endpoint perspective.

306
MCQmedium

A company has a stateful application deployed on a GKE cluster with stateful sets using persistent volumes. The application is experiencing higher than expected latency for write operations. The team uses SSDs for persistent disks. Cloud Monitoring shows high disk queue depth on the nodes where the stateful pods are scheduled. Which of the following is the most effective optimization?

A.Configure a separate node pool with local SSDs for the stateful workloads.
B.Increase the number of replicas of the stateful set.
C.Enable disk caching on the persistent disks.
D.Use regional persistent disks for higher throughput.
AnswerC

Disk caching can significantly reduce I/O latency if supported by the workload.

Why this answer

Option C is correct because enabling read/write caching on persistent disks reduces write latency by buffering writes to the local instance's SSD before acknowledging them to the application. This directly addresses the high disk queue depth observed in Cloud Monitoring, as caching absorbs bursty write I/O and lowers queue depth. For stateful workloads on GKE with SSDs, disk caching is a standard optimization to improve write performance without changing the underlying disk type.

Exam trap

Google Cloud often tests the misconception that local SSDs are always better for performance, but the trap here is that local SSDs lack data persistence, making them inappropriate for stateful sets that require durable storage across pod lifecycle events.

How to eliminate wrong answers

Option A is wrong because local SSDs are ephemeral and do not persist data across pod rescheduling or node failures, making them unsuitable for stateful sets that require durable persistent volumes; they also do not support the same caching mechanisms as persistent disks. Option B is wrong because increasing the number of replicas does not reduce write latency for a single stateful pod; it only distributes read traffic and may increase contention on shared backend storage. Option D is wrong because regional persistent disks provide higher availability through synchronous replication across zones, but they do not inherently improve throughput or reduce write latency compared to zonal persistent disks; in fact, replication adds write latency overhead.

307
Multi-Selectmedium

A team is optimizing the performance of their application running on Cloud Run. They want to reduce cold starts. Which two actions would help? (Select TWO)

Select 2 answers
A.Enable min instances
B.Increase the maximum number of container instances
C.Increase the CPU limit
D.Use a custom container base image with reduced size
E.Enable HTTP/2
AnswersA, D

Keeps a baseline of warm instances, avoiding cold starts.

Why this answer

Enabling min instances (option A) keeps a baseline number of container instances always warm and ready to serve requests, eliminating the cold start latency for those instances. This directly reduces the time required to spin up a new container when traffic spikes, as the pre-warmed instances can handle requests immediately.

Exam trap

Google Cloud often tests the misconception that increasing resource limits (like CPU or memory) or scaling parameters (like max instances) can reduce cold starts, when in fact only pre-warming instances (min instances) and reducing container image size (option D) directly address the startup latency.

308
MCQmedium

A company uses Compute Engine with committed use discounts for 1-year. They need to reduce costs further. What should they do?

A.Use sustained use discounts instead.
B.Use preemptible VMs for all workloads.
C.Rightsize their VMs based on recommender recommendations.
D.Increase committed use discount term to 3 years.
AnswerC

Rightsizing reduces resource usage, directly lowering costs.

Why this answer

Rightsizing based on recommender recommendations reduces resource usage without changing pricing model, offering immediate cost savings.

309
MCQmedium

A company has a steady-state workload of 100 vCPUs running 24/7. They want to get the maximum discount possible without long-term commitment. What discount should they expect?

A.Committed use discount of up to 57%
B.Sustained use discount of up to 20%
C.Sustained use discount of up to 30%
D.No discount available
AnswerC

Sustained use discounts automatically provide up to 30% for running instances the entire month.

Why this answer

Option B is correct because sustained use discounts automatically apply for running instances over 25% of a month, up to 30% for full-month usage. Option A is incorrect as the maximum is 30%. Options C and D are incorrect; committed use discounts require commitment.

310
Multi-Selecteasy

Which TWO are best practices for bootstrapping a Google Cloud organization for DevOps?

Select 2 answers
A.Share a single service account key across multiple projects for simplicity.
B.Disable Organization Policies to allow maximum flexibility for DevOps teams.
C.Use a separate project to host shared CI/CD tools and artifacts.
D.Set up Organization Policies to enforce compliance requirements across projects.
E.Create a single service account with broad permissions to be used by all projects.
AnswersC, D

Isolating CI/CD tools in a dedicated project improves security and manageability.

Why this answer

Option C is correct because hosting shared CI/CD tools and artifacts in a dedicated project follows the principle of resource isolation and centralized management. This approach simplifies access control, cost tracking, and lifecycle management for DevOps pipelines, as the project acts as a single source of truth for build outputs and deployment tools.

Exam trap

Google Cloud often tests the misconception that simplifying management by sharing credentials or disabling policies is a best practice, when in reality it undermines security and compliance in a multi-project organization.

311
Multi-Selecteasy

Which THREE of the following are best practices for securing a CI/CD pipeline using Cloud Build? (Choose 3.)

Select 3 answers
A.Configure Cloud Build triggers to run only from protected branches (e.g., main, release).
B.Store secrets and credentials in Secret Manager and access them via the 'availableSecrets' field.
C.Grant the Cloud Build service account the Storage Admin role for the project to allow pushing images.
D.Enable Container Analysis on the Artifact Registry repository to automatically scan images for vulnerabilities after build.
E.Disable build cache to ensure fresh builds and avoid using potentially compromised cached layers.
AnswersA, B, D

This prevents injection of malicious code from feature branches.

Why this answer

Option A is correct because restricting Cloud Build triggers to protected branches (e.g., main, release) prevents unauthorized or untested code changes from initiating builds, which is a fundamental security control for CI/CD pipelines. This ensures that only code that has passed review and is merged into stable branches can trigger automated builds, reducing the risk of malicious or erroneous code being deployed.

Exam trap

Google Cloud often tests the principle of least privilege by including overly broad IAM roles (like Storage Admin) as distractors, and candidates may mistakenly think granting full access is acceptable for simplicity, when in fact specific roles like Artifact Registry Writer or Cloud Build Service Account should be used.

312
MCQeasy

You are a DevOps engineer at a media streaming company. Your application runs on Google Kubernetes Engine (GKE) and serves video content to users worldwide. The application uses a microservices architecture with a frontend service that handles user requests and a backend transcoding service that converts video files. Recently, you noticed that the transcoding service is causing performance bottlenecks during peak hours, leading to increased latency for users. You have enabled Cloud Monitoring and Cloud Trace and observed that the transcoding service's CPU utilization is consistently above 90% during peak times, and the queue of video transcoding tasks is growing. The current deployment has 5 replicas of the transcoding service with no autoscaling. You need to optimize the performance of the transcoding service to reduce latency. Your company has a limited budget and wants to minimize costs. What should you do?

A.Enable Horizontal Pod Autoscaling (HPA) on the transcoding service based on CPU utilization, targeting 70% utilization.
B.Upgrade the transcoding service to a larger machine type with more CPU and memory.
C.Increase the number of replicas of the transcoding service to 10 and keep it static.
D.Refactor the frontend to push transcoding tasks to a Cloud Pub/Sub topic, and create a separate deployment of workers that subscribe to the topic and perform transcoding. Configure HPA on the worker deployment based on the Pub/Sub subscription backlog.
AnswerD

This decouples the frontend from the transcoding, preventing blocking. Workers can scale based on queue depth, optimizing cost and performance.

Why this answer

Option D is correct because it decouples the transcoding workload from user-facing requests using Cloud Pub/Sub, allowing the worker deployment to scale independently based on the backlog of tasks. This pattern reduces latency by preventing the frontend from being blocked by transcoding, and HPA on Pub/Sub backlog ensures cost-efficient scaling only when demand increases, aligning with the limited budget.

Exam trap

Google Cloud often tests the misconception that CPU-based HPA is sufficient for all performance bottlenecks, but the trap here is that CPU-bound services with growing queues require decoupling and backlog-based scaling, not just more replicas or larger machines.

How to eliminate wrong answers

Option A is wrong because HPA based on CPU utilization alone does not address the root cause of the bottleneck—the transcoding service is already CPU-bound at 90%, and scaling based on CPU will only add more replicas that still compete for the same resources, potentially increasing cost without resolving the queue growth. Option B is wrong because upgrading to a larger machine type increases cost significantly without improving scalability or handling the queue backlog, and it does not address the architectural coupling between frontend and transcoding. Option C is wrong because increasing replicas to 10 statically raises costs and does not adapt to variable demand, leading to over-provisioning during off-peak hours and still failing to handle peak loads efficiently.

313
MCQmedium

An organization uses Cloud Armor to protect their web application. After enabling the service, they notice increased latency on some requests. Which Cloud Armor feature is most likely causing this?

A.Rate limiting
B.IP blacklist/whitelist
C.Pre-configured WAF rules
D.Geo-based access control
AnswerD

Checking geographic location involves IP database lookup, which can increase latency.

Why this answer

Geo-based access control (D) is the most likely cause of increased latency because it requires Cloud Armor to perform a GeoIP lookup on every request to determine the geographic origin. This lookup adds processing overhead, especially if the organization has a large or complex set of geo-based rules, which can introduce measurable delay.

Exam trap

The trap here is that candidates often assume all security features add latency equally, but Cisco specifically tests that GeoIP lookups are the most computationally expensive compared to simple IP or rate-limit checks.

How to eliminate wrong answers

Option A is wrong because rate limiting typically reduces latency by dropping or throttling excess requests, not increasing it. Option B is wrong because IP blacklist/whitelist checks are simple, fast lookups in a small list that add negligible latency. Option C is wrong because pre-configured WAF rules (e.g., OWASP Top 10) are evaluated efficiently by Cloud Armor's edge infrastructure and are not a primary source of added latency.

314
MCQhard

A company runs a batch processing workload on Compute Engine that runs for 3 hours every night. They want to minimize costs while ensuring the job completes reliably. Which recommendation should they follow?

A.Use sole-tenant nodes to isolate the workload.
B.Use standard (on-demand) VMs and enable sustained use discounts.
C.Use preemptible VMs and design the job to handle interruptions gracefully.
D.Purchase 1-year committed use discounts for the VMs.
AnswerC

Preemptible VMs are up to 60% cheaper and suitable for fault-tolerant batch jobs.

Why this answer

Preemptible VMs cost about 60-80% less than standard VMs and are ideal for batch workloads that can tolerate interruptions. Since the job runs for only 3 hours nightly, it can be designed to checkpoint progress and restart from the last checkpoint if a preemptible VM is terminated (which can happen at any time within 24 hours). This minimizes cost while ensuring reliability through graceful interruption handling.

Exam trap

The trap here is that candidates may choose sustained use discounts (Option B) thinking they apply to any usage pattern, but they actually require sustained usage over a month (e.g., 25% of a month) to trigger, which a 3-hour nightly job does not meet.

How to eliminate wrong answers

Option A is wrong because sole-tenant nodes are used for workload isolation and compliance, not cost reduction; they actually increase costs due to dedicated hardware. Option B is wrong because sustained use discounts apply automatically to VMs running for a significant portion of a month (e.g., 25%+), but a 3-hour nightly job totals only ~90 hours per month, which is far below the threshold for meaningful discounts. Option D is wrong because 1-year committed use discounts require a 1-year commitment and are cost-effective only for workloads running continuously (e.g., 24/7), not for a short 3-hour nightly batch job.

315
MCQmedium

You are monitoring a microservices application deployed on Google Kubernetes Engine (GKE) that uses Cloud Monitoring for observability. You notice that the error rate for a critical service has increased, but the CPU and memory usage remain normal. The service uses gRPC and logs are structured. Which Cloud Monitoring tool should you use first to diagnose the root cause of the increased error rate?

A.Logs Explorer to filter logs by error status codes
B.Service Monitoring to create a custom dashboard
C.Error Reporting to automatically group error occurrences
D.Metrics Explorer to view error rate and latency charts
AnswerA

Logs Explorer allows you to examine structured logs, including gRPC status codes, to find error patterns.

Why this answer

Option A is correct because Logs Explorer allows you to directly query structured gRPC logs by filtering on error status codes (e.g., gRPC status codes like `UNAVAILABLE`, `INTERNAL`, or `DEADLINE_EXCEEDED`). Since the service uses structured logging, you can quickly isolate the exact error messages and stack traces without needing to pre-configure dashboards or wait for automated grouping. This is the fastest first step to identify the root cause of an increased error rate when CPU and memory are normal, as it points to application-level or dependency issues.

Exam trap

Google Cloud often tests the distinction between monitoring (Metrics Explorer, dashboards) and logging (Logs Explorer) — the trap here is assuming that aggregated metrics or automated error grouping are the fastest path to root cause, when in fact direct log inspection is required to see the specific error details and status codes.

How to eliminate wrong answers

Option B is wrong because creating a custom dashboard with Service Monitoring is a longer-term visualization setup, not a diagnostic tool for immediate root cause analysis; it does not provide the granular log-level filtering needed to inspect individual error occurrences. Option C is wrong because Error Reporting automatically groups error occurrences based on stack traces, but it requires the errors to be sent to Cloud Logging and may take time to aggregate; it is better for ongoing monitoring after the initial diagnosis, not the first tool to use. Option D is wrong because Metrics Explorer shows aggregated error rate and latency charts, which can confirm the problem but cannot drill into individual log entries or specific gRPC status codes to identify the root cause.

316
Multi-Selecteasy

Which TWO are benefits of using Cloud Build triggers to implement CI/CD pipelines?

Select 2 answers
A.Start a build automatically when changes are pushed to a repository
B.Deploy to a specific Google Cloud region based on the trigger
C.Support only a single branch per trigger
D.Integrate with Cloud Source Repositories, GitHub, and Bitbucket
E.Automatically provision infrastructure as part of the build
AnswersA, D

Triggers automate builds on source code changes.

Why this answer

Option A is correct because Cloud Build triggers can be configured to automatically start a build in response to events such as a push to a repository branch or the creation of a pull request. This event-driven automation is the foundation of a CI/CD pipeline, eliminating the need for manual build initiation and ensuring that every code change is validated immediately.

Exam trap

Google Cloud often tests the misconception that triggers can directly control deployment regions or infrastructure provisioning, when in fact triggers only respond to events and start builds, with all deployment logic residing in the build configuration file.

317
MCQmedium

A team has configured an uptime check with a 5xx threshold alert. During an incident, the alert fires with severity 'critical'. The team mitigates the issue, but the alert keeps firing for 15 more minutes due to a slow-responding downstream dependency. What should the team do to avoid false alarms in future incidents?

A.Add a second notification channel to send alerts to a different team.
B.Increase the 'duration' field in the alerting policy to require the condition to be true for a longer time before alerting.
C.Modify the alert condition to check only for 5xx errors and ignore other status codes.
D.Decrease the check frequency to every 30 seconds to get faster feedback.
AnswerB

A longer duration reduces false alerts from transient issues.

Why this answer

Option B is correct because increasing the 'duration' field in the alerting policy ensures that the condition (e.g., 5xx errors) must persist for a longer, defined period before the alert fires. This prevents false alarms from transient issues like a slow-responding downstream dependency that temporarily triggers the threshold but resolves before the alert duration expires. In Google Cloud Monitoring, the duration parameter specifies the minimum time the condition must be true, filtering out short-lived spikes.

Exam trap

Google Cloud often tests the misconception that increasing alert sensitivity (e.g., faster checks) or adding more notification channels improves incident response, when the correct approach is to tune the alert duration to match the expected persistence of the underlying issue.

How to eliminate wrong answers

Option A is wrong because adding a second notification channel does not address the root cause of false alarms; it merely duplicates alerts to another team, increasing noise without reducing false positives. Option C is wrong because the alert already checks for 5xx errors (as stated in the question), and ignoring other status codes would not prevent false alarms caused by a slow downstream dependency that still returns 5xx errors. Option D is wrong because decreasing the check frequency to every 30 seconds would make the alert more sensitive to transient conditions, likely increasing false alarms rather than reducing them.

318
MCQeasy

A developer wants to view logs from all pods in a GKE namespace in real time. Which command-line tool should they use?

A.gcloud logging read
B.Cloud Console Logs Viewer
C.kubectl logs --tail=100
D.gcloud logging tail
AnswerD

This streams logs in real time across resources.

Why this answer

The `gcloud logging tail` command streams logs in real time from all pods in a GKE namespace, as it directly queries the Cloud Logging API for live log entries. This is the correct tool for real-time log streaming across multiple pods, unlike `kubectl logs` which only shows logs from a single pod or a limited set. The command supports filtering by resource labels, such as `--filter="resource.labels.namespace_name=NAMESPACE"`, to scope the output to a specific namespace.

Exam trap

Google Cloud often tests the distinction between historical log retrieval (`gcloud logging read`) and real-time streaming (`gcloud logging tail`), and the trap here is that candidates mistakenly choose `kubectl logs` because they are familiar with its `-f` flag, but they overlook that it cannot aggregate logs from all pods in a namespace without complex scripting.

How to eliminate wrong answers

Option A is wrong because `gcloud logging read` retrieves historical logs from Cloud Logging, not real-time streaming; it requires a time range and returns a snapshot. Option B is wrong because Cloud Console Logs Viewer is a web-based UI for querying historical logs, not a command-line tool, and it does not provide real-time streaming natively. Option C is wrong because `kubectl logs --tail=100` shows the last 100 lines from a single pod's logs, not all pods in a namespace, and it does not stream in real time unless combined with `-f` (follow), but even then it only follows one pod at a time.

319
MCQmedium

Which storage class provides the lowest cost for data accessed less than once a year?

A.Nearline
B.Archive
C.Standard
D.Coldline
AnswerB

Correct. Archive is for data accessed less than once a year, at lowest cost.

Why this answer

Archive storage class is the correct answer because it is specifically designed for long-term data retention where access is extremely infrequent, such as less than once a year. It offers the lowest storage cost among Google Cloud Storage classes, but with higher retrieval costs and a minimum storage duration of 365 days, making it ideal for data that is rarely accessed.

Exam trap

Google Cloud often tests the distinction between Coldline and Archive by making candidates confuse the minimum storage duration (90 days for Coldline vs. 365 days for Archive) and the access frequency thresholds (once a quarter vs. once a year), leading them to pick Coldline when Archive is the correct lowest-cost option for data accessed less than once a year.

How to eliminate wrong answers

Option A (Nearline) is wrong because it is optimized for data accessed less than once a month, not less than once a year, and has a higher storage cost than Archive. Option C (Standard) is wrong because it is designed for frequently accessed data (hot data) and has the highest storage cost among the options. Option D (Coldline) is wrong because it is intended for data accessed less than once a quarter (90 days), still more frequent than once a year, and its storage cost is higher than Archive.

320
MCQmedium

A DevOps engineer is troubleshooting a production incident where users are getting 502 errors from a Google Cloud HTTP(S) Load Balancer. The backend service is a GKE deployment. Initial checks show the backend pods are healthy and responding. What is the most likely cause?

A.The load balancer's health check is failing on the backend instance group due to mismatch between health check port and backend port.
B.The backend pods are out of memory and crashing.
C.The IAM permissions for the load balancer service account are misconfigured.
D.The backend service has been accidentally deleted by another engineer.
AnswerA

502 errors indicate the backend is unhealthy to the load balancer.

Why this answer

A 502 error from an HTTP(S) Load Balancer indicates that the load balancer is unable to establish a successful connection to the backend. Even though the backend pods are healthy and responding, the load balancer's health check may be failing because it is configured to check a different port (e.g., the health check port) than the port the backend service is actually serving traffic on (e.g., the backend port). This mismatch causes the load balancer to mark the backend instances as unhealthy, resulting in 502 errors for users.

Exam trap

Google Cloud often tests the distinction between backend health and health check configuration, where candidates assume that if pods are healthy, the load balancer must also see them as healthy, ignoring the port mismatch or firewall rules that block health check probes.

How to eliminate wrong answers

Option B is wrong because if the backend pods were out of memory and crashing, they would not be 'healthy and responding' as stated in the question; the engineer's initial checks would have found them unhealthy. Option C is wrong because IAM permissions for the load balancer service account affect the load balancer's ability to access Google Cloud APIs (e.g., to read instance groups), not the direct HTTP connection between the load balancer and backend pods; a misconfiguration here would typically cause a 500 or 403 error, not a 502. Option D is wrong because if the backend service were deleted, the load balancer would have no target to forward traffic to, resulting in a 503 or 404 error, not a 502; the engineer's checks confirm the backend service exists and pods are responding.

321
Multi-Selectmedium

Which THREE actions can help optimize Cloud Storage costs? (Choose three.)

Select 3 answers
A.Use Nearline or Coldline storage classes for infrequently accessed data.
B.Enable Object Lifecycle management to transition objects to colder storage classes.
C.Compress objects before uploading to reduce storage size.
D.Enable versioning on all buckets to protect against accidental deletion.
E.Use Standard storage class for all objects to ensure low latency.
AnswersA, B, C

These classes offer lower storage costs for infrequent access.

Why this answer

Option A is correct because Nearline and Coldline storage classes are designed for infrequently accessed data, offering lower storage costs compared to Standard storage. By choosing these classes for data that is accessed less than once a quarter or once a year, you reduce the per-GB storage charge, though you incur higher retrieval and early deletion fees. This directly optimizes costs when access patterns align with the class's intended use.

Exam trap

The trap here is that candidates may confuse data protection features (like versioning) or performance choices (like Standard class) with cost optimization actions, overlooking that versioning increases storage costs and Standard is the most expensive class.

322
Multi-Selecthard

Which THREE of the following are valid approaches to monitor a custom application metric in Cloud Monitoring? (Choose 3)

Select 3 answers
A.Install the Stackdriver Monitoring agent on a Windows VM and configure custom metric collection in the agent configuration file.
B.Use the Cloud Monitoring API to write time series data directly.
C.Create a logs-based metric from application logs that contain the metric value.
D.Use the built-in JMX plugin in the Cloud Monitoring agent to collect Java application metrics.
E.Use the OpenTelemetry Collector with the Google Cloud Monitoring exporter.
AnswersB, C, E

The API allows writing custom metrics.

Why this answer

Option B is correct because the Cloud Monitoring API allows you to write custom metric data directly via the `projects.timeSeries.create` endpoint. This is the most direct programmatic approach, supporting arbitrary metric descriptors and time series data without requiring any agent or intermediary.

Exam trap

Google Cloud often tests the distinction between predefined agent-collected metrics (like JMX plugin metrics) and custom metrics that require explicit API or logs-based creation, leading candidates to incorrectly select agent-based options for custom metric monitoring.

323
MCQmedium

A team is running a stateful application on Compute Engine VMs. They notice that the application performance degrades over time as the disk fills up. They want to proactively alert before performance degrades. Which metric should they monitor?

A.Disk usage percentage
B.Disk read/write latency
C.Network sent bytes
D.CPU utilization
AnswerA

Monitors disk capacity, enabling early alerts.

Why this answer

Disk usage percentage is the correct metric because the application performance degrades as the disk fills up, which is a capacity issue. Monitoring disk usage percentage allows the team to set an alert threshold (e.g., 80% or 90%) to proactively take action (e.g., clean up logs or resize disks) before the disk becomes full and causes performance degradation. This directly addresses the root cause described in the scenario.

Exam trap

The trap here is that candidates may confuse a capacity metric (disk usage percentage) with a performance metric (disk latency), assuming latency is the best indicator of degradation, but the question explicitly ties the degradation to the disk filling up, making capacity the direct cause to monitor.

How to eliminate wrong answers

Option B is wrong because disk read/write latency measures the time taken for I/O operations, which can indicate performance issues but does not directly reflect the disk filling up; latency may increase due to other factors like contention or hardware failure, not solely capacity. Option C is wrong because network sent bytes tracks outbound traffic, which is unrelated to disk space consumption and performance degradation from a full disk. Option D is wrong because CPU utilization measures processor load, which can degrade application performance but is not the specific cause described—the problem is explicitly tied to the disk filling up, not CPU saturation.

324
MCQhard

Refer to the exhibit. A GKE pod is repeatedly crashing with the error shown. The deployment has resource requests of 512 MiB memory and limits of 1 GiB. What is the most likely cause and the best remediation?

A.The Java heap size exceeds the container memory limit; reduce the JVM heap size or increase the container memory limit
B.The node is under memory pressure; add more nodes to the cluster
C.The container needs more CPU; increase CPU request and limit
D.The application has a memory leak; refactor the DataProcessor class
AnswerA

JVM heap must fit within the container limit to avoid OOM.

Why this answer

The error indicates an OutOfMemoryError (OOM) in the Java application, which occurs when the JVM heap size exceeds the container's memory limit. Since the deployment has a memory limit of 1 GiB, if the JVM is configured with a heap size larger than this limit (or if the heap plus other memory usage exceeds it), the container will be killed by Kubernetes. Reducing the JVM heap size or increasing the container memory limit directly resolves the mismatch.

Exam trap

Google Cloud often tests the distinction between application-level errors (like JVM OOM) and infrastructure-level issues (like node pressure), tempting candidates to choose a cluster scaling solution when the root cause is a misconfigured application resource limit.

How to eliminate wrong answers

Option B is wrong because node memory pressure would cause pod eviction or scheduling failures, not a Java-specific OOM error within a running container; adding nodes does not fix the application's memory configuration. Option C is wrong because the error is an OutOfMemoryError, not a CPU starvation issue; increasing CPU resources would not prevent the JVM from exceeding the memory limit. Option D is wrong because while a memory leak could cause OOM over time, the immediate error message points to heap size exceeding limits, and refactoring the DataProcessor class is a speculative fix that does not address the explicit memory limit configuration.

325
MCQhard

You are a DevOps engineer for a large e-commerce platform running on Google Kubernetes Engine (GKE). The platform consists of 15 microservices, each with its own code repository. Your team uses Cloud Build for CI and Cloud Deploy for CD. Recently, the deployment to production has been failing intermittently because the new version of the 'payment' service is not compatible with the current version of the 'order' service. This causes a production outage every few weeks. The team wants to implement a strategy to catch such incompatibilities before promoting to production, without slowing down development velocity. Currently, the pipeline builds each service independently, runs unit tests, deploys to a shared staging environment, runs integration tests, and then promotes to production after manual approval. What should you do?

A.Define strict version compatibility matrices between services and enforce them in the pipeline by locking versions.
B.Implement canary deployments in staging: deploy the new payment service alongside the current version, route a percentage of test traffic to the new version, and run integration tests before promoting. If tests pass, promote to production.
C.Add a manual testing phase after staging deployment where QA engineers manually test the integration before production promotion.
D.Combine all microservice builds into a single pipeline that builds and tests all services together before deploying to staging.
AnswerB

Canary deployments in staging catch incompatibilities early without slowing development.

Why this answer

Option B is correct because it introduces canary deployments in the staging environment, allowing the new payment service to be tested with a subset of realistic traffic alongside the current order service. This catches incompatibilities early by running integration tests against the canary, without blocking the pipeline or slowing development velocity. Cloud Deploy supports canary deployment strategies natively, making this a practical and automated solution.

Exam trap

The trap here is that candidates may choose option A (version locking) because it seems like a straightforward dependency management solution, but it ignores the need for dynamic testing under realistic traffic patterns and the requirement to maintain development velocity.

How to eliminate wrong answers

Option A is wrong because locking versions with strict compatibility matrices reduces flexibility and slows development velocity, contradicting the requirement to avoid slowing down the team. Option C is wrong because adding a manual QA testing phase introduces human delay and does not scale, failing to maintain development velocity. Option D is wrong because combining all 15 microservices into a single pipeline creates a monolithic build that increases build times, reduces parallelism, and violates the principle of independent service deployment, which is a core tenet of microservices architecture.

326
MCQeasy

You are the DevOps engineer for a social media platform. After a recent code rollout, you receive multiple user complaints about failed logins. The service logs show a sharp increase in 5xx errors from the authentication service. However, the existing alerting policy for the authentication service did not fire. The policy is configured to trigger if the error rate exceeds 5% for 5 minutes. Upon checking Cloud Monitoring, you see that the error rate spiked to 15% for 3 minutes, then dropped back to normal. What is the most likely reason the alert did not fire?

A.The error rate threshold of 5% was too low, causing the alert to be suppressed.
B.The alignment period for the metric was set to 5 minutes, hiding the spike.
C.The duration condition of 5 minutes was not satisfied.
D.The notification channel was incorrectly configured.
AnswerC

The spike lasted 3 minutes, less than required 5 minutes.

Why this answer

The alert did not fire because the policy requires the error rate to exceed 5% for a continuous duration of 5 minutes. The spike only lasted 3 minutes, which is shorter than the configured duration condition, so the alerting policy's condition was never fully met. In Google Cloud Monitoring, alerting policies evaluate both the threshold and the duration window before transitioning to a firing state.

Exam trap

Google Cloud often tests the distinction between threshold-based alerts and duration-based conditions, tricking candidates into focusing on the threshold value or notification channels when the real issue is the unmet time window requirement.

How to eliminate wrong answers

Option A is wrong because a lower threshold (5%) would make the alert more sensitive, not suppress it; the spike exceeded 5%, so the threshold was not the issue. Option B is wrong because the alignment period (e.g., 1 minute) controls how raw data points are combined into time series, but the alert's duration condition of 5 minutes is a separate parameter that requires the threshold to be breached for that entire window; a 5-minute alignment period would actually smooth out short spikes, but the spike was 3 minutes, which still wouldn't satisfy the 5-minute duration. Option D is wrong because the notification channel configuration only affects delivery of the alert, not whether the alert fires; if the policy's conditions are not met, no alert is generated regardless of channel settings.

327
MCQhard

Refer to the exhibit. The company received an alert when the threshold was triggered. What does this alert indicate?

A.The actual spend has reached 50% of the budget.
B.The spend has exceeded the budget amount.
C.The forecasted spend is projected to exceed 50% of the budget.
D.Both actual and forecasted spend thresholds have been crossed.
AnswerA

CURRENT_SPEND triggers on actual spend.

Why this answer

The threshold rule uses CURRENT_SPEND basis, so the alert triggers when actual costs reach 50% of the budget.

328
MCQeasy

A DevOps engineer needs to assign IAM roles at the organization level. Which built-in role is specifically designed for managing IAM policies across the organization?

A.roles/resourcemanager.organizationAdmin
B.roles/owner
C.roles/editor
D.roles/iam.securityAdmin
AnswerD

This role is focused on managing IAM policies only.

Why this answer

The role `roles/iam.securityAdmin` is the built-in IAM role specifically designed for managing IAM policies across the organization. It grants permissions to get and set IAM policies at the organization, folder, and project levels, without granting other resource management permissions. This makes it the correct choice for a DevOps engineer who needs to assign IAM roles organization-wide.

Exam trap

The trap here is that candidates often confuse the `roles/iam.securityAdmin` role with the `roles/resourcemanager.organizationAdmin` role, mistakenly thinking that organization-level resource management includes IAM policy management, but the latter lacks the specific `iam.policies.set` permission.

How to eliminate wrong answers

Option A is wrong because `roles/resourcemanager.organizationAdmin` grants permissions to manage organization-level resources (like folders and projects) but does not include the `iam.policies.set` permission required to modify IAM policies. Option B is wrong because `roles/owner` is a primitive role that grants full access to all resources, including IAM management, but it is not specifically designed for managing IAM policies; it also grants many other permissions that are excessive for this task. Option C is wrong because `roles/editor` is a primitive role that allows modifying existing resources but does not include permission to modify IAM policies (it lacks `iam.policies.set`).

329
MCQeasy

A DevOps engineer is optimizing a Cloud Run service that experiences cold starts. The service is written in Python and uses several large libraries. Which change is most effective to reduce cold start latency?

A.Increase the maximum number of concurrent requests per container.
B.Set a minimum number of instances to keep containers warm.
C.Set a longer request timeout.
D.Increase the CPU allocation for the service.
AnswerB

Min instances avoid cold starts entirely.

Why this answer

Setting a minimum number of instances (option B) ensures that a baseline of container instances is always warm and ready to serve requests, eliminating cold starts for those instances. Cold starts occur when a new container must be initialized, including loading large Python libraries, which adds significant latency. By keeping a minimum number of instances running, the service avoids the initialization delay for the first request to each instance.

Exam trap

Google Cloud often tests the misconception that increasing CPU or concurrency directly reduces cold start latency, but the key insight is that cold starts are caused by the initialization of new containers, not by processing speed or request handling capacity.

How to eliminate wrong answers

Option A is wrong because increasing the maximum number of concurrent requests per container does not reduce cold start latency; it only allows each container to handle more requests simultaneously, which can improve throughput but does not prevent the initial startup delay. Option C is wrong because setting a longer request timeout does not address cold starts; it only gives the service more time to respond, which might mask latency but does not reduce the initialization time. Option D is wrong because increasing CPU allocation can speed up request processing but does not eliminate the need to load large Python libraries during a cold start; the startup time is dominated by library loading, which is I/O-bound and not significantly improved by more CPU.

330
MCQmedium

A team is using Cloud Build to build and deploy to multiple environments (dev, staging, prod) using Cloud Deploy. They want to ensure that only builds from the main branch are promoted to prod. How should they configure this?

A.Use Cloud Build tags to mark builds from the main branch and filter in Cloud Deploy.
B.Set IAM policies on the Container Registry or Artifact Registry to restrict access to the prod image.
C.Set the Cloud Build trigger to only run on the main branch.
D.Configure a Cloud Deploy promotion with an approval gate required for the prod target.
AnswerD

Approval gating prevents automatic promotion to prod.

Why this answer

Option D is correct because Cloud Deploy's approval gate feature allows you to require manual approval before a release is promoted to a specific target, such as prod. By configuring an approval gate on the prod target, you ensure that only builds from the main branch (which can be verified via the release metadata or source) are manually approved for promotion, providing a controlled, auditable gate. This approach directly enforces the branch-based promotion policy without relying on build-time filtering or IAM restrictions.

Exam trap

Google Cloud often tests the misconception that a Cloud Build trigger restriction alone is sufficient to control promotions, but the trigger only controls build creation, not the subsequent deployment promotion, which requires a separate gate like an approval gate in Cloud Deploy.

How to eliminate wrong answers

Option A is wrong because Cloud Build tags are metadata attached to builds, but Cloud Deploy does not have a native filter to promote releases based on tags; tags are not propagated or evaluated during promotion. Option B is wrong because IAM policies on Container Registry or Artifact Registry control who can pull or push images, not which builds are promoted to prod; they cannot enforce a branch-based promotion policy. Option C is wrong because setting the Cloud Build trigger to only run on the main branch ensures that only main branch builds are created, but it does not prevent a release from that build from being promoted to prod; the trigger alone does not gate the promotion step.

331
MCQeasy

A team notices that the 'cpu-high' alert fires frequently even for short bursts. The 'disk-full' alert never sends notifications. Based on the exhibit, what is the issue with each?

A.The cpu-high uses email which is unreliable; the disk-full condition is too low.
B.Both alerts have misconfigured durations.
C.The cpu-high alert duration is too short; the disk-full alert has no notification channel.
D.The cpu-high threshold is too high; the disk-full duration is too long.
AnswerC

Duration of 0s causes firing on any transient spike; missing notification channel means no alerts are delivered.

Why this answer

Option C is correct because the 'cpu-high' alert fires frequently for short bursts due to its duration being set too short, causing it to trigger on transient spikes. The 'disk-full' alert never sends notifications because it lacks a configured notification channel, so even when the condition is met, no alert is dispatched.

Exam trap

Google Cloud often tests the distinction between alert condition configuration (threshold/duration) and notification delivery, leading candidates to confuse a missing notification channel with a threshold or duration misconfiguration.

How to eliminate wrong answers

Option A is wrong because email is not inherently unreliable in this context; the issue is the duration setting, not the channel. Additionally, the 'disk-full' condition being 'too low' would cause false positives, not silence. Option B is wrong because both alerts do not have misconfigured durations; only the 'cpu-high' alert has a duration issue, while the 'disk-full' alert has a missing notification channel.

Option D is wrong because a threshold that is too high would reduce false alarms, not increase them; the 'disk-full' duration being too long would delay alerts, not prevent them entirely.

332
MCQmedium

An SRE team needs to define an SLI for a web service's availability SLO of 99.9%. Which metric should they use?

A.Error budget
B.CPU utilization
C.Request latency (p99)
D.Uptime check success rate
AnswerD

Uptime checks measure the fraction of successful probes, directly reflecting availability.

Why this answer

Option D is correct because an uptime check success rate directly measures the proportion of time the service is reachable and responding, which aligns with the definition of availability for a 99.9% SLO. This metric is typically derived from synthetic probes or health check endpoints (e.g., HTTP 200 responses) and reflects the binary state of the service being up or down, making it the appropriate SLI for availability.

Exam trap

Google Cloud often tests the distinction between availability (binary up/down) and performance (latency/error rate), so candidates mistakenly choose latency metrics like p99 for availability SLOs, conflating responsiveness with uptime.

How to eliminate wrong answers

Option A is wrong because error budget is a derived concept (the allowed amount of downtime or errors before violating the SLO), not a raw metric used as an SLI; it is calculated from the SLI and SLO, not measured directly. Option B is wrong because CPU utilization is a resource-level metric that does not directly measure service availability; a service can have high CPU usage but still be available, or low CPU usage but be unresponsive due to other failures. Option C is wrong because request latency (p99) measures performance (e.g., the 99th percentile of response times), not availability; a service could be available but slow, or unavailable but not captured by latency metrics if requests fail entirely.

333
MCQeasy

A development team wants to automatically run unit tests and static code analysis on every push to a Cloud Source Repository, but only run integration tests on merges to the main branch. Which Cloud Build trigger configuration should they use?

A.Use a single trigger with a substitution variable like '_BRANCH' and set it to 'main' for integration tests.
B.Create one trigger with a build config that uses the 'branchName' substitution to conditionally skip integration test steps.
C.Create two triggers: one with a branch filter for '^main$' that runs integration tests, and another with a branch filter for '^.*$' that runs unit tests.
D.Configure one trigger with no branch filter and rely on developers to manually trigger integration tests.
AnswerC

Correct: separate triggers with branch filters allow different pipelines per branch.

Why this answer

Option C is correct because Cloud Build triggers allow you to define separate triggers with branch filters to execute different build configurations based on the branch. By creating one trigger with a branch filter of '^main$' for integration tests and another with '^.*$' for unit tests, you ensure unit tests run on every push to any branch, while integration tests run only on merges to main. This approach directly maps the desired behavior without requiring conditional logic or manual intervention.

Exam trap

The trap here is that candidates mistakenly think a single trigger with conditional steps or substitution variables can handle branch-specific logic, but Cloud Build triggers are designed to be event-filtered at the trigger level, not at the build step level.

How to eliminate wrong answers

Option A is wrong because a single trigger with a substitution variable like '_BRANCH' cannot conditionally skip steps based on the branch at trigger time; substitution variables are resolved at build time and do not control trigger execution. Option B is wrong because the 'branchName' substitution is not a valid Cloud Build trigger property for conditional step skipping; Cloud Build triggers use branch filters to determine which events fire the trigger, not to conditionally execute steps within a single build config. Option D is wrong because relying on developers to manually trigger integration tests defeats the purpose of automation and introduces human error, violating CI/CD best practices.

334
MCQhard

A microservices application on GKE with Istio service mesh experienced performance degradation after a recent update. Which optimization technique is most effective for improving inter-service communication performance?

A.Increase Istio sidecar resource limits
B.Implement request collapsing to merge identical requests
C.Use Istio traffic mirroring to offload requests
D.Enable gRPC for inter-service communication
AnswerD

gRPC leverages HTTP/2 and binary serialization, reducing overhead and latency compared to JSON-based REST.

Why this answer

Option D is correct because gRPC uses HTTP/2 as its transport protocol, which enables multiplexed streams over a single TCP connection, reducing latency and improving throughput for inter-service communication. In a GKE environment with Istio, gRPC also leverages Istio's native support for HTTP/2-based traffic, allowing efficient load balancing and connection reuse. This directly addresses performance degradation caused by chatty or high-frequency service calls.

Exam trap

Google Cloud often tests the misconception that increasing resources (Option A) or adding caching (Option B) is the primary fix for performance issues, when the real bottleneck is often the communication protocol itself, especially in service mesh environments where gRPC is the recommended approach.

How to eliminate wrong answers

Option A is wrong because increasing Istio sidecar resource limits only addresses resource contention, not the underlying inefficiency in inter-service communication protocols; it may mask symptoms without improving protocol-level performance. Option B is wrong because request collapsing merges identical requests at a proxy or cache layer, which is typically used for read-heavy workloads and does not optimize the communication protocol itself; it can introduce additional latency for dynamic service calls. Option C is wrong because traffic mirroring duplicates requests for testing or observability, not for offloading production traffic; it actually increases load on the system and degrades performance further.

335
MCQhard

A company is migrating from Jenkins to Cloud Build for their CI/CD pipeline. They have a large Java monorepo with multiple modules that take over 2 hours to build and test sequentially. They want to reduce build time by running module builds in parallel. The current Jenkins pipeline uses a single Jenkinsfile that builds all modules. They have a Cloud Build config that runs 'mvn clean package' for the entire project, which is slow. They have a 2-hour Cloud Build timeout. The architecture requires that some modules depend on others. Which approach should they take to minimize build time while correctly handling dependencies?

A.Break the monolith into separate Cloud Build triggers per module and run them independently on every push.
B.Create a single build config that defines parallel steps for independent modules, using 'waitFor' to sequence dependent modules, and uses Maven's incremental compilation with caching.
C.Use a build step that runs 'mvn -pl moduleA,moduleB -am' to build only changed modules and their dependencies.
D.Increase the Cloud Build timeout to 4 hours and keep a single build step.
AnswerB

This models the dependency graph and runs independent modules in parallel, plus caching speeds up subsequent builds.

Why this answer

Option C is correct: Using Cloud Build's 'waitFor' to model dependency DAG allows parallel builds of independent modules, reducing total time. Option A is incorrect because building each module individually without dependencies would break dependent modules. Option B is incorrect because a single build step is exactly what they have now.

Option D is incorrect because the 'mvn -pl' approach still runs on a single machine and doesn't leverage Cloud Build's parallelism.

336
MCQmedium

A multinational corporation has multiple development teams working on microservices deployed to GKE clusters. They want to implement a CI/CD pipeline that ensures every container image is scanned for vulnerabilities, passes unit tests, and gets a security approval before deployment to production. They are using Cloud Build for CI and Cloud Deploy for CD. The current pipeline triggers on code push to any branch. The security team requires that all production deployments be reviewed and approved by the security team. Which set of actions best meets these requirements?

A.Run all tests and scans in a single Cloud Build step and use Cloud Build's built-in approval feature to require a reviewer before pushing to Artifact Registry.
B.Run vulnerability scans in the Cloud Build step before building the image, and add a security team member to the project as an editor to approve deployments.
C.Configure Cloud Build triggers only for the main branch. Use Cloud Build to build and push images, then rely on Artifact Registry's automatic Container Analysis scanning. In Cloud Deploy, add a manual approval gate for the production phase.
D.Use Cloud Build to run tests and scans, then have Cloud Build send a notification to a Cloud Pub/Sub topic that triggers a Cloud Function to approve the deployment.
AnswerC

This meets all requirements: scanning, tests in Cloud Build, and approval in Cloud Deploy.

Why this answer

Option B is correct: Using Cloud Build triggers only for main branch reduces unnecessary builds; Container Analysis automatically scans images on push to Artifact Registry; Cloud Deploy can incorporate a manual approval step for the production phase. Option A is incorrect because pre-build scanning doesn't catch build-time introduced vulnerabilities. Option C is incorrect because Cloud Build does not natively support manual approvals; that is a CD responsibility.

Option D is incorrect because Cloud Build can run tests before scanning, but the approval should be in Cloud Deploy.

337
Multi-Selectmedium

Which TWO are benefits of using Cloud Build private pools?

Select 2 answers
A.Lower cost
B.Dedicated VMs for builds
C.Custom machine types
D.No internet access
E.Faster builds compared to public pools
AnswersB, C

Private pools use VMs not shared with other projects.

Why this answer

Option B is correct because Cloud Build private pools provide dedicated VMs that are not shared with other Google Cloud projects. This isolation ensures consistent performance and eliminates the 'noisy neighbor' effect that can occur in public pools, where build resources are shared across multiple tenants.

Exam trap

Google Cloud often tests the misconception that private pools are always faster or cheaper than public pools, but the real benefits are isolation, custom machine types, and network control, not performance or cost.

338
MCQeasy

A company uses BigQuery on-demand pricing. To control costs, they want to prevent any single query from scanning more than 1 TB of data. How can they enforce this?

A.Set a custom cost budget with alert at 1 TB
B.Use the maximum bytes billed parameter in the query settings
C.Use BigQuery reservations with 1 TB slot capacity
D.Set a query quota in the GCP Console quota page
AnswerB

This parameter limits the amount of data a query can scan.

Why this answer

Option D is correct because BigQuery allows setting a maximum bytes billed parameter per query. Option A is for reservations, not per-query limits. Option B is not available.

Option C is a budget alert, not a query limit.

339
Multi-Selecthard

A company uses Cloud Monitoring to set up alerting for their production system. They want to reduce alert fatigue while ensuring critical issues are caught quickly. Which two strategies should they implement? (Select TWO)

Select 2 answers
A.Use notification channels with escalation policies
B.Use low threshold values to catch issues early
C.Implement alert aggregation and deduplication
D.Disable alerts during off-hours
E.Set up separate alerts for each microservice
AnswersA, C

Ensures the right people are notified and issues are escalated.

Why this answer

Option A is correct because notification channels with escalation policies ensure that alerts are routed to the appropriate responders based on severity and time thresholds, reducing noise by preventing low-severity issues from repeatedly notifying the same person. Escalation policies automatically escalate unacknowledged critical alerts to higher-level teams, ensuring critical issues are caught quickly without overwhelming on-call staff.

Exam trap

Google Cloud often tests the misconception that lowering thresholds or disabling alerts improves fatigue, when in fact these actions either increase noise or create dangerous gaps in coverage; the correct approach is to use escalation policies and aggregation to intelligently manage alert volume.

340
MCQmedium

After a recent deployment, the mean latency of a user-facing service increased from 200ms to 500ms. The engineer uses Cloud Trace to analyze traces. Which trace characteristic should the engineer focus on to identify the bottleneck?

A.Timestamps of the trace ID.
B.Distribution of span latencies across services.
C.Error count per span.
D.Total number of spans in the trace.
AnswerB

Span latencies show how long each service took, pinpointing the slowest.

Why this answer

The engineer should focus on the distribution of span latencies across services (Option B) because Cloud Trace captures the latency of each span in a distributed trace. By examining the histogram or distribution of span latencies, the engineer can identify which specific service or component is contributing the most to the overall increase from 200ms to 500ms, pinpointing the bottleneck. This approach aligns with the principle of distributed tracing, where the critical path is determined by the slowest span in the trace.

Exam trap

Google Cloud often tests the misconception that timestamps or error counts are the primary indicators of performance bottlenecks, but in distributed tracing, the distribution of span latencies is the key to identifying which service is the root cause of increased latency.

How to eliminate wrong answers

Option A is wrong because timestamps of the trace ID only indicate when the trace started and ended, not the relative performance of individual services; they cannot reveal which service caused the latency increase. Option C is wrong because error count per span focuses on failures, not performance degradation; a service can have zero errors yet still be the bottleneck due to high latency. Option D is wrong because the total number of spans in the trace reflects the complexity or depth of the request path, not the latency contribution of any single service; a trace with many spans can still have a single slow span causing the bottleneck.

341
MCQhard

A DevOps team is using Cloud Build to build and push container images. The build times have increased significantly. They suspect that the build cache is not being used effectively. Which build configuration change would likely improve cache usage?

A.Increase the machine type
B.Use a private pool
C.Use kaniko instead of Docker
D.Enable parallel builds
AnswerC

Kaniko leverages fine-grained layer caching, reducing rebuild time.

Why this answer

Kaniko is a cache-aware container image builder that can leverage a remote image registry as a cache layer, unlike the default Docker builder which relies on a local Docker daemon and its local layer cache. By using Kaniko with a configured cache repository, the team can reuse previously built layers across builds, even when builds run on different Cloud Build workers, significantly reducing build times.

Exam trap

The trap here is that candidates often assume 'more resources' (larger machine) or 'dedicated resources' (private pool) will fix caching issues, when in fact the problem is the ephemeral nature of the build environment and the need for a persistent, remote cache mechanism like Kaniko provides.

How to eliminate wrong answers

Option A is wrong because increasing the machine type (e.g., from e2-standard-2 to e2-highcpu-8) provides more CPU and memory, which can speed up the build execution but does not address the root cause of cache misses or ineffective cache usage. Option B is wrong because using a private pool provides dedicated compute resources and reduces contention, but it does not change the caching mechanism; the Docker builder still uses a local cache that is not persisted across builds. Option D is wrong because enabling parallel builds runs multiple build steps concurrently, which can reduce overall wall-clock time but does not improve cache hit rates; it may even cause cache conflicts if steps share dependencies.

342
MCQeasy

Your organization requires that all new Google Cloud projects are automatically configured with a common set of VPC networks and subnets, and that these networks must be created before any resources are deployed. What is the best approach to enforce this requirement across the organization?

A.Create a Cloud Deployment Manager template and share it with all project owners.
B.Use Organization Policies with a custom constraint to enforce that all projects must have a specific VPC network configuration.
C.Set up VPC Network Peering between all projects to enforce network connectivity.
D.Configure a shared VPC host project and attach all new service projects to it.
AnswerB

Organization Policies can enforce requirements across all projects in the organization.

Why this answer

Organization Policies with custom constraints allow you to enforce that all new projects automatically include specific VPC networks and subnets before any resources are deployed. This is the only approach that provides mandatory, organization-wide enforcement at the project creation level, ensuring compliance without relying on manual templates or post-creation configuration.

Exam trap

The trap here is that candidates often confuse 'enforcing a configuration' with 'providing a tool or connectivity'—they choose Shared VPC or Deployment Manager because those are common networking or automation tools, but they fail to recognize that only Organization Policies can mandate the presence of specific resources at project bootstrap time.

How to eliminate wrong answers

Option A is wrong because Cloud Deployment Manager templates are not enforceable; sharing a template relies on project owners to manually apply it, which does not guarantee automatic or mandatory configuration. Option C is wrong because VPC Network Peering only establishes connectivity between existing VPCs, it does not create or enforce the presence of specific VPC networks or subnets in new projects. Option D is wrong because Shared VPC attaches service projects to a host project but does not automatically create the required VPC networks and subnets in each new project; it only provides network access from the host project.

343
MCQeasy

A startup wants to reduce their Google Cloud costs for a batch processing job that runs nightly for 3 hours. The job is fault-tolerant and can tolerate interruptions. What is the most cost-effective compute option?

A.Standard VM
B.Shielded VM
C.Sole-tenant node
D.Preemptible VM
AnswerD

Preemptible VMs cost up to 60-91% less than standard VMs and are ideal for fault-tolerant workloads.

Why this answer

Option C is correct because preemptible VMs offer up to 60-91% discount and are suitable for fault-tolerant batch jobs. Option A, Shielded VM, adds security features but not cost savings. Option B, Sole-tenant node, is for isolation and costs more.

Option D, Standard VM, is more expensive than preemptible.

344
Multi-Selectmedium

You are responding to an incident where a new release has caused increased error rates. Which TWO actions should you take immediately?

Select 2 answers
A.Disable the alert.
B.Notify stakeholders.
C.Push a hotfix without testing.
D.Roll back the release.
E.Create a post-mortem document.
AnswersB, D

Keeping stakeholders informed is critical during an incident.

Why this answer

Option B is correct because immediately notifying stakeholders (such as product owners, support teams, and affected users) is a critical first step in incident management. It ensures transparency, sets expectations, and allows coordinated response efforts. In the PCDOE framework, stakeholder communication is prioritized to maintain trust and align business impact with technical remediation.

Exam trap

Google Cloud often tests the distinction between immediate containment actions (rollback, notification) versus post-incident tasks (post-mortem) or harmful actions (disabling alerts, untested hotfixes) to see if candidates understand the priority of stopping user impact over preserving data or process.

345
MCQhard

Your team manages a CI/CD pipeline for a microservices application deployed on Google Kubernetes Engine (GKE). The pipeline uses Cloud Build to build container images and push them to Artifact Registry, then uses a Cloud Build step with kubectl to apply Kubernetes manifests stored in a separate 'manifests' repository. Recently, the team has experienced issues: sometimes a new image is deployed to production even though the corresponding pull request (PR) has not been merged into the main branch of the manifests repository. Also, rollbacks are slow because the previous image tag is overwritten. The team wants to ensure that only code that passes all tests and is merged to main is deployed, and that each deployment uses a unique immutable image tag. What should the team do?

A.Keep the current architecture but modify Cloud Build triggers to only run on the main branch of both repositories. Use the short SHA ($SHORT_SHA) as the image tag.
B.Consolidate application code and Kubernetes manifests into a single repository. Configure Cloud Build triggers to build and run tests on all branches, but only deploy to GKE when changes are merged to the main branch. Use the full commit SHA as the image tag.
C.Move all source code and manifests into a single repository. Use Cloud Build triggers to build and test on every push, and deploy only on pushes to the main branch. Use the commit SHA ($COMMIT_SHA) as the image tag.
D.Keep application and manifests in separate repositories. Use Cloud Build triggers to build on changes to the app repo, and use a separate trigger on the manifests repo to deploy. Use the 'latest' tag for the image.
AnswerB

This ensures that only merged code triggers deployments, and the full commit SHA provides an immutable unique tag for easy rollback.

Why this answer

Option B is correct because consolidating the application code and Kubernetes manifests into a single repository ensures that the image tag (full commit SHA) is uniquely tied to the exact code and manifest changes that passed all tests. By configuring Cloud Build triggers to deploy only on merges to the main branch, the team guarantees that only fully tested, merged code reaches production. Using the full commit SHA as the image tag provides immutability and enables fast, precise rollbacks by referencing the exact image from Artifact Registry.

Exam trap

Google Cloud often tests the misconception that separate repositories with branch-based triggers are sufficient for deployment integrity, when in reality the atomicity of code and manifest changes in a single repository is required to prevent untested code from reaching production.

How to eliminate wrong answers

Option A is wrong because keeping separate repositories with triggers on the main branch of both does not solve the root cause: a PR merged into the manifests repo could reference an image tag (short SHA) that was built from unmerged app code, leading to deployment of untested code. Option C is wrong because deploying on every push to main (rather than only on merges) could still deploy code that hasn't passed all tests if the trigger is misconfigured or if tests are run in parallel; also, using $COMMIT_SHA is correct but the trigger condition is insufficiently strict. Option D is wrong because using the 'latest' tag violates immutability and makes rollbacks impossible, and separate repositories with separate triggers do not enforce the atomicity of code and manifest changes, allowing mismatched deployments.

346
MCQhard

An organization is using Cloud Source Repositories and wants to enforce that all commits are signed with a verified GPG key. How can they enforce this?

A.Use a branch protection rule in Cloud Source Repositories.
B.Use Cloud Functions to validate commits after push.
C.Enable the Signed Commits policy in the repository settings.
D.Use a pre-receive hook in Cloud Source Repositories.
AnswerC

Native feature to require GPG-signed commits.

Why this answer

Option C is correct because Cloud Source Repositories provides a built-in 'Signed Commits' policy in the repository settings that, when enabled, rejects any push containing commits that are not signed with a verified GPG key. This policy is enforced server-side at the repository level, ensuring that only signed commits are accepted without requiring external tools or custom scripts.

Exam trap

The trap here is that candidates confuse branch protection rules (which control merge behavior) with commit signing enforcement, or assume pre-receive hooks are available in Cloud Source Repositories when they are not supported in this managed service.

How to eliminate wrong answers

Option A is wrong because branch protection rules in Cloud Source Repositories control merge requirements (e.g., required reviews, status checks) but do not enforce commit signing; they operate on pull request merges, not on individual commits pushed directly. Option B is wrong because Cloud Functions can validate commits after push, but this is an asynchronous, post-hoc approach that cannot prevent the push from being accepted; the commits would already be in the repository, violating the enforcement requirement. Option D is wrong because Cloud Source Repositories does not support pre-receive hooks; this feature is available in self-managed Git servers (e.g., GitHub Enterprise, GitLab) but not in Google Cloud's managed repository service.

347
MCQmedium

An alerting policy triggers frequently for a spike in CPU utilization on a Compute Engine instance, but the spike lasts only a few seconds. The SRE team wants to reduce false positives. Which change should they make?

A.Increase the notification channel threshold.
B.Decrease the alerting duration to 0s.
C.Increase the evaluation period and duration.
D.Change the aggregation to mean instead of max.
AnswerC

Longer evaluation period and duration require the condition to persist, reducing false positives from short-lived spikes.

Why this answer

Option C is correct because increasing the evaluation period and duration ensures that the alerting policy only fires when the CPU utilization spike persists over a longer window, filtering out transient spikes that last only a few seconds. This directly reduces false positives by requiring sustained high utilization before triggering an alert, aligning with Google Cloud Monitoring's sliding window evaluation logic.

Exam trap

Google Cloud often tests the misconception that reducing the duration or changing aggregation will solve false positives, but the correct approach is to increase the evaluation window to require sustained anomalous behavior, not to react to every momentary deviation.

How to eliminate wrong answers

Option A is wrong because increasing the notification channel threshold does not affect the alerting condition; it only controls how many notifications are sent, not the sensitivity of the metric evaluation. Option B is wrong because decreasing the alerting duration to 0s would cause the policy to trigger on any single data point, making false positives worse by reacting to every momentary spike. Option D is wrong because changing the aggregation to mean instead of max would smooth out spikes, potentially masking real issues and still not addressing the transient nature of the spike; the mean could still be elevated if the spike is high enough, but the core problem is the short duration, not the aggregation method.

348
MCQmedium

A team uses Cloud Build with a trigger on Cloud Source Repository. The build fails intermittently with error 'Failed to pull builder image 'gcr.io/cloud-builders/gcloud'' but sometimes succeeds. What is the most likely cause?

A.The Cloud Build worker pool is in a different region.
B.The builder image is too large.
C.Network egress from Cloud Build is throttled due to high concurrency.
D.The build service account lacks permissions to access Container Registry.
AnswerC

When many builds run concurrently, Cloud Build may throttle egress, causing timeouts pulling images. Reducing concurrency or using a private pool can resolve this.

Why this answer

The intermittent failure to pull the builder image 'gcr.io/cloud-builders/gcloud' indicates a transient network issue rather than a permanent misconfiguration. Cloud Build uses a shared pool of network resources, and under high concurrency, egress traffic to Container Registry can be throttled, causing pull operations to time out or fail. This explains why the build sometimes succeeds and sometimes fails, as throttling depends on the current load.

Exam trap

Google Cloud often tests the distinction between consistent misconfiguration errors (e.g., permissions, region) and transient network throttling issues, where the 'intermittent' keyword is the critical hint to choose throttling over permanent configuration problems.

How to eliminate wrong answers

Option A is wrong because Cloud Build worker pools are regional resources, but the region does not affect the ability to pull a public image from gcr.io; the error is intermittent, not a permanent region mismatch. Option B is wrong because the size of the builder image is not the cause of intermittent failures; if the image were too large, it would consistently fail or time out, not succeed sometimes. Option D is wrong because if the build service account lacked permissions to access Container Registry, the failure would be consistent (e.g., a 403 Forbidden error), not intermittent.

349
MCQhard

Refer to the exhibit. A DevOps engineer is trying to create a new project using the Cloud Console. The project creation fails with a policy violation. The engineer has permissions on folders/12345678 and folders/87654321 but not on any other folders. They select folder/87654321 as the parent. What is the most likely reason for the failure?

A.The engineer is missing the resourcemanager.projects.create permission.
B.The policy is enforced at the organization level but the engineer's IAM role does not allow creating projects in that folder.
C.The policy is set at the folder level, and folder/87654321 has a different policy.
D.The policy requires the project parent to be one of the allowed folders, and folder/87654321 is not listed.
AnswerD

The allowedValues only include folder/12345678.

Why this answer

Option D is correct because the policy violation indicates that the organization has a constraint restricting which folders can be used as project parents. The engineer selected folder/87654321, but the policy explicitly lists only certain allowed folders, and folder/87654321 is not among them. This is a common organization policy (e.g., a list constraint) that enforces project creation only in approved folders, regardless of the engineer's IAM permissions on that folder.

Exam trap

Google Cloud often tests the distinction between IAM permission errors (e.g., missing resourcemanager.projects.create) and organization policy constraint violations (e.g., list constraints), where candidates mistakenly attribute the failure to missing IAM roles rather than a policy that restricts allowed parent resources.

How to eliminate wrong answers

Option A is wrong because the engineer successfully navigated to the project creation UI and the error is a 'policy violation', not a permissions error; missing resourcemanager.projects.create would produce an 'access denied' or 'permission denied' message, not a policy violation. Option B is wrong because the policy is enforced at the organization level (as a constraint), not at the folder level, and the engineer's IAM role does allow creating projects in that folder (they have permissions on it); the failure is due to a policy constraint, not an IAM role limitation. Option C is wrong because the policy is not set at the folder level on folder/87654321; if it were, the engineer would see a folder-specific policy violation, but the question states the policy is enforced at the organization level, and the engineer has permissions on the folder itself.

350
MCQmedium

Refer to the exhibit. The build fails with error: 'invalid tag format' for the image. What is the issue?

A.The project ID substitution is not present.
B.The substitution $SHORT_SHA is not defined and the tag becomes empty.
C.The build step must explicitly push the image.
D.The image name must include a tag.
AnswerB

If $SHORT_SHA is empty, the tag becomes 'myimage:', which is invalid. Substitutions must be defined when running the build.

Why this answer

Option A is correct because $SHORT_SHA is a substitution that may be empty if not defined, resulting in an invalid tag. Option B is incorrect because a tag is provided. Option C is incorrect because the images array triggers a push automatically.

Option D is irrelevant.

351
MCQmedium

A company is migrating a batch processing workload to Google Cloud. The workload is CPU-intensive and runs for a few hours each day. Which Compute Engine machine family should they choose to optimize performance and cost?

A.Compute-optimized (C2/C2D)
B.Burstable (T2D)
C.GPU-accelerated (A2)
D.Memory-optimized (M2)
AnswerA

C2 family offers high CPU performance per core, ideal for batch processing.

Why this answer

A is correct because the workload is CPU-intensive and runs for a few hours each day, which is a sustained, compute-bound task. Compute-optimized machines (C2/C2D) are designed specifically for high-performance computing (HPC) and batch processing, offering the highest ratio of vCPU to memory and the fastest single-threaded performance on Google Cloud. This minimizes runtime and cost per job compared to general-purpose or burstable families.

Exam trap

The trap here is that candidates often confuse 'cost optimization' with choosing the cheapest machine family (burstable), ignoring that sustained CPU-intensive workloads on burstable instances incur performance throttling and longer runtimes, ultimately increasing total cost of ownership (TCO).

How to eliminate wrong answers

Option B is wrong because Burstable (T2D) machines are designed for workloads with low average CPU utilization and occasional spikes, not sustained CPU-intensive batch processing; they throttle performance when credits are exhausted, leading to unpredictable job completion times. Option C is wrong because GPU-accelerated (A2) machines are optimized for parallel processing workloads like ML training and 3D rendering, not general CPU-intensive batch jobs, and would incur unnecessary cost for unused GPU resources. Option D is wrong because Memory-optimized (M2) machines are designed for memory-intensive workloads such as large in-memory databases and SAP HANA, not CPU-bound tasks, and their higher memory-to-vCPU ratio would waste resources and increase cost.

352
Multi-Selecteasy

A company is moving large amounts of data between regions. Which two actions can reduce network egress costs? (Choose two.)

Select 2 answers
A.Consolidate resources to a single region
B.Use Cloud CDN to cache static content
C.Use Cloud Interconnect to connect to on-premises
D.Use a VPN connection to on-premises
E.Use VPC Network Peering within the same VPC network
AnswersA, B

Moving all resources to one region eliminates cross-region egress costs entirely.

Why this answer

Options B and C are correct. Cloud CDN reduces egress from origin by serving cached content from edge locations. Consolidating resources to a single region eliminates cross-region egress.

Options A, D, and E do not directly reduce inter-region egress costs: VPN uses public internet, same-region VPC peering is free but does not reduce cross-region egress, and Cloud Interconnect mainly reduces egress to on-premises.

353
MCQmedium

A company runs a production web application on Google Compute Engine behind an HTTP(S) load balancer. The application is deployed across multiple managed instance groups in three regions (us-east1, europe-west1, asia-east1). Recently, users report slow page load times. Monitoring shows that CPU utilization on instances is consistently low (around 30%) but memory usage is high (over 80%). The application uses a self-managed in-memory cache per instance to store session data and frequently accessed objects. The team is considering adding more instances to the instance groups to distribute the load. However, they notice that the load balancer's latency is spiking and the cache hit ratio is low. What is the most likely issue and what should the engineer do?

A.Add more instances to the instance groups to increase total memory capacity.
B.Migrate to a managed in-memory cache like Memorystore for Redis to serve as a centralized cache shared by all instances.
C.Increase the machine type of instances to have more memory per instance (e.g., n1-highmem-4).
D.Enable autoscaling based on memory utilization.
AnswerB

A centralized cache eliminates duplication, reduces per-instance memory pressure, and improves cache hit ratio, reducing latency.

Why this answer

The low cache hit ratio and high memory usage indicate that each instance's self-managed in-memory cache is fragmented and inefficient, as session data and frequently accessed objects are not shared across instances. This forces the load balancer to repeatedly fetch data from the backend, causing latency spikes. Migrating to a centralized managed cache like Memorystore for Redis eliminates per-instance cache duplication, improves cache hit ratio, and reduces load balancer latency by serving data from a single, consistent cache.

Exam trap

Google Cloud often tests the misconception that scaling horizontally (adding more instances) or vertically (increasing instance size) can solve performance issues caused by architectural inefficiencies like cache fragmentation, rather than addressing the root cause with a shared caching layer.

How to eliminate wrong answers

Option A is wrong because adding more instances only increases total memory capacity but does not solve the fundamental issue of cache fragmentation; each new instance would still maintain its own isolated cache, leading to continued low cache hit ratios and latency spikes. Option C is wrong because increasing the machine type per instance (e.g., n1-highmem-4) provides more memory per instance but does not address the lack of cache sharing; the cache hit ratio remains low as each instance still caches independently, and the load balancer latency persists. Option D is wrong because enabling autoscaling based on memory utilization would only add more instances when memory is high, but this does not fix the root cause of cache inefficiency; it may even worsen the problem by increasing the number of fragmented caches.

354
Multi-Selecthard

Which THREE are key considerations when setting up a Google Cloud organization for DevOps?

Select 3 answers
A.Use a single project for development, staging, and production to reduce overhead.
B.Enable audit logging and set up log sinks to a centralized logging project.
C.Implement a shared VPC to enable network connectivity across projects.
D.Design a folder hierarchy that mirrors the organizational structure.
E.Store secrets directly in code repositories for easy access by CI/CD pipelines.
AnswersB, C, D

Centralized logging is essential for security and compliance.

Why this answer

Option B is correct because audit logging is essential for security and compliance in a DevOps environment. By enabling audit logs and setting up log sinks to a centralized logging project, you ensure that all API calls and administrative actions across the organization are captured in a single, immutable location, which is critical for incident response and forensic analysis.

Exam trap

Google Cloud often tests the misconception that consolidating all environments into a single project reduces complexity, but the correct approach is to use separate projects with a shared VPC and centralized logging to maintain isolation and compliance.

355
Multi-Selectmedium

A web application experiences high latency during peak hours. Which TWO actions should the team take to optimize performance?

Select 2 answers
A.Implement autoscaling based on CPU utilization
B.Enable Cloud CDN with the origin as the backend bucket
C.Use Cloud Memorystore to cache frequently accessed data
D.Increase the size of the instances serving the application
E.Reduce the number of backend services to simplify routing
AnswersA, C

Autoscaling adjusts capacity to match demand, preventing overload during peak hours.

Why this answer

Option A is correct because implementing autoscaling based on CPU utilization allows the application to dynamically add compute instances during peak hours when CPU load increases, thereby distributing the request load and reducing latency. This aligns with Google Cloud's managed instance groups and autoscaling policies that scale out based on metrics like CPU utilization, ensuring sufficient resources are available to handle traffic spikes without manual intervention.

Exam trap

Google Cloud often tests the misconception that vertical scaling (increasing instance size) is equivalent to horizontal scaling (autoscaling) for handling peak loads, but the exam expects candidates to recognize that horizontal scaling is more cost-effective, resilient, and aligned with cloud-native best practices for optimizing performance under variable traffic.

356
Multi-Selectmedium

Which TWO practices help reduce Mean Time to Resolve (MTTR) for production incidents?

Select 2 answers
A.Conduct postmortems only after major incidents.
B.Implement runbooks for common incident types.
C.Use a shared on-call rotating schedule.
D.Establish a war room procedure for critical incidents.
E.Increase logging verbosity for all services.
AnswersB, D

Runbooks provide step-by-step guidance, speeding up resolution.

Why this answer

B is correct because runbooks provide step-by-step, pre-approved procedures for common incident types, enabling engineers to follow a consistent, repeatable process without needing to diagnose from scratch. This reduces the time spent on investigation and decision-making, directly lowering Mean Time to Resolve (MTTR) by standardizing the response for known issues.

Exam trap

Google Cloud often tests the distinction between practices that directly reduce MTTR (like runbooks and war rooms) versus practices that improve reliability or team health (like postmortems and on-call schedules) but do not directly shorten the resolution time during an incident.

357
Multi-Selectmedium

Which THREE actions should be taken when bootstrapping a CI/CD pipeline on Google Cloud? (Select exactly 3)

Select 3 answers
A.Store secrets in Cloud Source Repositories.
B.Use Cloud Build with a Dockerfile.
C.Enable the Cloud Run API.
D.Create a service account with necessary permissions.
E.Configure triggers for automated builds.
AnswersB, D, E

Common pattern for building container images.

Why this answer

Option B is correct because Cloud Build can use a Dockerfile to build a container image from source code, which is a fundamental step in bootstrapping a CI/CD pipeline. This allows automated builds triggered by code changes, enabling continuous integration and delivery to services like Cloud Run or GKE.

Exam trap

Google Cloud often tests the misconception that enabling specific APIs (like Cloud Run API) is a mandatory step for bootstrapping any CI/CD pipeline, when in fact it is only required if that service is the deployment target.

358
MCQhard

An organization uses Cloud Deploy with Skaffold to manage progressive delivery on GKE. After a rollout, the new revision shows a higher error rate in Stackdriver, but the Cloud Deploy pipeline did not automatically roll back. What is the most likely cause?

A.The rollout strategy includes a manual approval step before advancing to the next phase.
B.The release was created with a '--disable-rollback' flag.
C.The Cloud Deploy pipeline does not have a stackdriverMetrics verification job defined to check error rates.
D.The rollout has not yet reached 100% traffic, so Cloud Deploy waits for full completion before evaluating health.
AnswerA

Cloud Deploy pauses at approval steps; automatic rollback only occurs during phases without requiring approval if metrics are checked via a verification job.

Why this answer

Option A is correct because Cloud Deploy can wait for manual approval or an external verification job; if the rollout strategy is set to require approval, automatic rollback is not triggered. Option B is incorrect because the rollout doesn't need to complete to trigger a rollback if metrics are monitored. Option C is incorrect because the error rate metric is not part of the pipeline unless a custom verification job is configured.

Option D is incorrect because the release configuration doesn't affect automatic rollback behavior.

359
MCQhard

A team uses Cloud Monitoring alerting policies with multiple conditions. They want an incident to fire only when both conditions are met simultaneously. What should they configure?

A.Set the alerting policy combiner to 'AND'
B.Create two separate alerting policies
C.Create a single condition with a ratio metric
D.Use a log-based metric condition
AnswerA

The combiner 'AND' ensures all conditions must be met.

Why this answer

Option A is correct because Cloud Monitoring alerting policies support a 'combiner' field that can be set to 'AND' to require that all conditions are met simultaneously before the incident fires. This ensures the alert triggers only when both conditions are true at the same evaluation window, rather than when either condition is met.

Exam trap

Google Cloud often tests the misconception that creating multiple alerting policies or using a ratio metric can achieve multi-condition AND logic, but the correct approach is to use the combiner field within a single alerting policy.

How to eliminate wrong answers

Option B is wrong because creating two separate alerting policies would result in each condition firing its own independent incident, not a single incident that requires both conditions to be met simultaneously. Option C is wrong because a ratio metric condition calculates a ratio of two metrics but does not enforce that two separate conditions must both be true at the same time; it is a single condition, not a multi-condition AND logic. Option D is wrong because a log-based metric condition is a type of condition that uses logs to create a metric, but it does not provide a mechanism to combine multiple conditions with an AND operator.

360
Matchingmedium

Match each Kubernetes resource to its role in a DevOps pipeline.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Manages desired state for Pods

Stable network endpoint for Pods

External HTTP/S load balancing

Non-sensitive configuration data

Sensitive data like passwords

Why these pairings

Key Kubernetes objects for application management.

361
MCQhard

A DevOps team is troubleshooting a Cloud Build pipeline that fails intermittently when building a container image. The build step uses a custom build step that runs a vulnerability scan. The error log shows: 'Step #1: Error: failed to scan image: context deadline exceeded'. The build configuration includes 'timeout: 600s'. Which is the most likely cause and solution?

A.The scan tool requires a specific dependency; add an installation step before scanning.
B.There is network latency between Cloud Build and the container registry; use VPC Service Controls.
C.The build step is running out of memory; increase the machine type to e2-highcpu-8.
D.The scan step is taking longer than the build timeout; increase the timeout value in the build configuration.
AnswerD

The error 'context deadline exceeded' indicates the step timed out.

Why this answer

The error 'context deadline exceeded' indicates that the custom vulnerability scan step is taking longer than the build's configured timeout of 600 seconds. Cloud Build enforces a hard timeout for the entire build; if any step exceeds this duration, the build is terminated. Increasing the timeout value in the build configuration provides more time for the scan to complete, directly addressing the root cause.

Exam trap

Google Cloud often tests the distinction between resource exhaustion (memory/CPU) and timeout errors, leading candidates to mistakenly select machine type upgrades when the error message explicitly indicates a deadline exceeded.

How to eliminate wrong answers

Option A is wrong because the error is a timeout, not a missing dependency; a missing dependency would produce a 'command not found' or similar error. Option B is wrong because network latency would typically cause connection timeouts or retries, not a 'context deadline exceeded' from the scan tool itself; VPC Service Controls address data exfiltration risks, not latency. Option C is wrong because an out-of-memory error would manifest as an OOM kill or exit code 137, not a 'context deadline exceeded' message.

362
MCQmedium

A company stores historical log data in Cloud Storage. The logs are accessed rarely after 30 days. They want to reduce storage costs while maintaining immediate access for occasional audits over the next year. What should they use?

A.Standard class storage with a lifecycle policy to transition to Nearline after 30 days and to Coldline after 90 days
B.Nearline class storage
C.Coldline class storage with a lifecycle policy to delete after 1 year
D.Archive class storage
AnswerA

This leverages different storage classes based on access patterns, optimizing cost while retaining immediate access after 30 days.

Why this answer

Option D is correct because it uses lifecycle policies to transition to cheaper storage classes over time, balancing cost and accessibility. Option A lacks lifecycle and incurs retrieval costs. Options B and C do not provide optimal cost savings.

363
MCQmedium

An organization wants to be alerted when the total size of a Cloud Storage bucket exceeds 1 TB. Which metric should they monitor?

A.storage.googleapis.com/storage/total_bytes
B.storage.googleapis.com/storage/object_count
C.storage.googleapis.com/storage/network_sent_bytes
D.storage.googleapis.com/storage/request_count
AnswerA

This metric measures total bucket size.

Why this answer

The metric `storage.googleapis.com/storage/total_bytes` directly measures the total amount of data stored in a Cloud Storage bucket, including all object data and metadata. Monitoring this metric allows the organization to set an alert threshold at 1 TB (1,099,511,627,776 bytes) to trigger when the bucket exceeds that size. This is the correct metric for tracking storage capacity usage.

Exam trap

Google Cloud often tests the distinction between metrics that measure capacity (total_bytes) versus metrics that measure activity (object_count, request_count) or throughput (network_sent_bytes), leading candidates to confuse object count with total size.

How to eliminate wrong answers

Option B is wrong because `storage.googleapis.com/storage/object_count` tracks the number of objects in the bucket, not their total size; a bucket could have millions of small objects that total far less than 1 TB. Option C is wrong because `storage.googleapis.com/storage/network_sent_bytes` measures outbound network traffic from the bucket, which is unrelated to the stored data size. Option D is wrong because `storage.googleapis.com/storage/request_count` counts API requests made to the bucket, which does not reflect the total storage consumed.

364
MCQhard

A team has set up the alerting policies shown in the exhibit. They receive an alert for High Memory but not for High CPU. What is the most likely reason?

A.The Cloud Monitoring agent is not installed or not reporting on the instance, so the memory metric is missing.
B.The CPU alert's duration of 300 seconds prevents it from firing before the memory alert.
C.The memory alert has a higher threshold value, making it easier to trigger.
D.The CPU metric is not available because the instance does not have the Cloud Monitoring agent installed.
AnswerA

The agent is required for agent.googleapis.com metrics.

Why this answer

Option A is correct because the High Memory alert fires while the High CPU alert does not, indicating that the memory metric is available but the CPU metric is missing. This typically happens when the Cloud Monitoring agent is installed but not properly reporting CPU metrics, or when the agent is missing entirely and only the memory metric is being collected via a different mechanism (e.g., guest-attributes). Without the agent, standard CPU utilization metrics are not exposed to Cloud Monitoring, while memory metrics may still be available through other means, causing the memory alert to trigger but not the CPU alert.

Exam trap

Google Cloud often tests the misconception that CPU metrics are always available from the hypervisor, but in reality, detailed CPU metrics (like per-process or utilization with specific labels) may require the Cloud Monitoring agent, and the absence of the agent can cause CPU alerts to fail while memory alerts (which also require the agent) may still fire if memory data is collected via a different path.

How to eliminate wrong answers

Option B is wrong because the duration of 300 seconds (5 minutes) for the CPU alert does not prevent it from firing before the memory alert; it simply means the CPU condition must persist for 5 minutes before the alert fires, but if the CPU metric is missing entirely, no alert will ever fire regardless of duration. Option C is wrong because a higher threshold value makes an alert harder to trigger, not easier; the memory alert having a higher threshold would require a more extreme condition to fire, contradicting the scenario where it fires while the CPU alert does not. Option D is wrong because if the instance did not have the Cloud Monitoring agent installed, both CPU and memory metrics would be unavailable, not just the CPU metric; the fact that the memory alert fires indicates that at least some metrics are being reported, so the agent must be present and functional for memory.

365
MCQmedium

Refer to the exhibit. A DevOps engineer is bootstrapping a Google Cloud organization and wants to ensure that no Compute Engine VM instances can have external IP addresses. The engineer applies this Terraform configuration. What is the effect of this configuration on the organization?

A.It blocks external IP access for all VMs in all projects under the organization.
B.It requires a separate script to enforce the policy on existing VMs.
C.It blocks external IP access only for the first project created in the organization.
D.It blocks both internal and external IP access for all VMs.
AnswerA

The boolean policy with enforced=true applies at the organization level, affecting all projects and folders.

Why this answer

The Terraform configuration uses a Google Cloud Organization Policy constraint (`compute.vmExternalIpAccess`) set to `true` in a list policy with `deny` as the enforcement action. This blocks external IP access for all Compute Engine VM instances across all projects within the organization, as organization policies are inherited by all child projects unless overridden. The policy applies to both new and existing VMs, as it is enforced at the resource creation and modification level.

Exam trap

The trap here is that candidates often confuse organization policy inheritance with project-level overrides, thinking the policy only applies to the first project or requires manual reapplication, when in fact it is automatically inherited by all projects and enforced on existing VMs via lifecycle hooks.

How to eliminate wrong answers

Option B is wrong because organization policies are enforced on all VMs, including existing ones, at the time of API calls (e.g., start, modify, or create), so no separate script is needed to retroactively apply the policy. Option C is wrong because organization policies apply to all projects under the organization, not just the first project created; they are inherited by all child resources. Option D is wrong because the constraint `compute.vmExternalIpAccess` specifically targets external IP access only, not internal IP access; internal IP communication remains unaffected.

366
MCQhard

Your team uses a canary deployment strategy on Google Kubernetes Engine (GKE). During a rollback, you notice that the rollback caused a brief period of downtime because the previous version's readiness probe was not properly configured. Which of the following best prevents this issue in the future?

A.Perform a gradual rollback with a managed instance group.
B.Use a blue/green deployment instead.
C.Ensure that the readiness probe is tested as part of the pre-deployment validation.
D.Use a Kubernetes Job to run a post-deployment validation.
AnswerC

Validating probes before deployment ensures both new and old versions are ready.

Why this answer

Option C is correct because the root cause of the downtime was a misconfigured readiness probe on the previous version. By testing the readiness probe as part of pre-deployment validation, you ensure that the probe correctly reflects the application's ability to serve traffic before it is used in a rollback. This prevents the scenario where a rollback deploys a version that fails its readiness check, causing the service to be removed from the load balancer and resulting in downtime.

Exam trap

Google Cloud often tests the misconception that changing deployment strategies (like blue/green or canary) solves all rollback issues, when in fact the real problem is a misconfigured health check that must be validated before the rollback is executed.

How to eliminate wrong answers

Option A is wrong because a managed instance group (MIG) is a Compute Engine concept, not a native GKE resource; gradual rollbacks in GKE are handled by Deployment strategies (e.g., maxSurge/maxUnavailable), and a MIG does not address readiness probe misconfiguration. Option B is wrong because blue/green deployment is a release strategy that reduces risk during rollouts, but it does not inherently validate readiness probes; a rollback in blue/green still relies on the same probe configuration, so a misconfigured probe would cause the same downtime. Option D is wrong because a Kubernetes Job runs a post-deployment validation, which occurs after the deployment is live; it cannot prevent the downtime that happens during the rollback itself, as the probe failure causes immediate traffic disruption before the Job executes.

367
Drag & Dropmedium

Arrange the steps to implement a canary deployment for a Cloud Run service.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order

Why this order

Deploy new revision, shift traffic, monitor, increase, remove old.

368
MCQhard

You are designing a globally distributed application using Cloud Spanner. The application has a write-heavy workload. You notice that write latency increases as the number of nodes increases. What is the most likely cause?

A.The instance is using a multi-region configuration with too many read-only replicas.
B.The workload has many cross-node transactions due to split rows.
C.The application is using stale reads for write transactions.
D.The number of splits is too low, causing hotspots.
AnswerB

Cross-split transactions require coordination, increasing latency.

Why this answer

Option B is correct because in Cloud Spanner, write-heavy workloads with many cross-node transactions cause increased write latency as nodes are added. This occurs because Spanner splits rows across nodes, and transactions that span multiple splits require two-phase commit (2PC) coordination between nodes, which adds network overhead and latency. Adding more nodes increases the likelihood that a transaction touches multiple splits, exacerbating the coordination cost.

Exam trap

The trap here is that candidates often assume adding nodes always improves performance, but Cisco tests the counterintuitive behavior where cross-node coordination overhead in distributed databases like Spanner can degrade write latency with scale.

How to eliminate wrong answers

Option A is wrong because multi-region configurations with read-only replicas do not directly affect write latency; read-only replicas serve reads and do not participate in write quorums, so they do not cause write latency to increase with node count. Option C is wrong because stale reads are used for read-only transactions, not write transactions; write transactions always require strong reads to ensure consistency, so stale reads cannot be applied to writes. Option D is wrong because too few splits cause hotspots (uneven load on nodes), which would increase latency as load grows, but the question states latency increases as nodes increase, which is the opposite of hotspot behavior—hotspots are mitigated by adding nodes, not worsened.

369
MCQmedium

An SRE team needs to implement an incident management workflow that automatically creates a ticket in their ITSM tool when a critical alert fires. They use Cloud Monitoring. Which approach should they use?

A.Configure the alerting policy to send notifications via email to the ITSM system's email-to-ticket feature.
B.Create a webhook notification channel directly to the ITSM tool.
C.Use a Cloud Pub/Sub notification channel and a Cloud Function that receives the alert and calls the ITSM API.
D.Use the Cloud Monitoring API to periodically pull alerts and create tickets.
AnswerC

Pub/Sub ensures reliable delivery, and Cloud Function can transform and forward alerts to the ITSM tool.

Why this answer

Option C is correct because Cloud Monitoring can send alert notifications to a Cloud Pub/Sub topic, which then triggers a Cloud Function. The Cloud Function can parse the alert payload and call the ITSM tool's API to create a ticket, providing a reliable, scalable, and decoupled integration that supports custom logic and error handling.

Exam trap

Google Cloud often tests the misconception that direct webhooks (Option B) are sufficient for ITSM integration, but they ignore that Cloud Monitoring webhooks lack support for custom headers, authentication, and reliable retry mechanisms required by enterprise ITSM tools.

How to eliminate wrong answers

Option A is wrong because email-to-ticket features are unreliable for critical alerts due to potential delays, spam filtering, and lack of guaranteed delivery; they also do not support structured data or automated acknowledgment. Option B is wrong because a direct webhook notification channel in Cloud Monitoring sends HTTP POST requests but does not support authentication headers, retry logic, or payload transformation required by most ITSM APIs, leading to frequent failures. Option D is wrong because periodically pulling alerts via the Cloud Monitoring API introduces latency, misses real-time alerting requirements, and adds unnecessary complexity compared to event-driven push notifications.

370
MCQeasy

To securely manage secrets (e.g., API keys) used in Cloud Build pipelines, which service should be used?

A.Secret Manager
B.Cloud KMS
C.Cloud Key Management Service (duplicate)
D.Cloud Storage
AnswerA

Designed for storing secrets; integrates with Cloud Build via environment variables or volumes.

Why this answer

Option A is correct because Secret Manager is the recommended service for storing and accessing secrets like API keys. Cloud KMS is for encryption keys, not secrets. Cloud Storage is not designed for secrets, and Cloud KMS is not for direct secret storage.

371
MCQmedium

A team is using Cloud Monitoring to track the performance of a microservices application. They set up an uptime check for each service, but they notice that some checks are failing intermittently without actual service degradation. What is the most likely cause?

A.The services are behind a load balancer that occasionally returns 503 during scaling.
B.The timeout setting is too short for the service's typical latency.
C.Uptime checks are deployed in a single region, causing false positives.
D.The project's quota for uptime checks has been exceeded.
AnswerB

A short timeout can cause the check to fail even when the service is healthy, especially during transient latency spikes.

Why this answer

The most likely cause is that the timeout setting is too short, causing false positives when the service response time temporarily exceeds the timeout. Other options are less plausible: uptime checks typically run from multiple regions; load balancer 503 errors would indicate a real issue; quota exceed would prevent checks from running.

372
Multi-Selectmedium

Which TWO options are best practices when bootstrapping a Google Cloud organization for DevOps? (Choose 2)

Select 2 answers
A.Grant the Owner role to a group of DevOps engineers to manage all projects.
B.Store service account keys in the source code repository for ease of use.
C.Create a single VPC network for all environments to simplify management.
D.Use folders to separate environments (e.g., dev, staging, prod) and apply policies at the folder level.
E.Use resource tags to enable conditional access policies and cost tracking.
AnswersD, E

Folders provide hierarchical policy enforcement and organization.

Why this answer

Option D is correct because using folders to separate environments (e.g., dev, staging, prod) allows you to apply IAM policies and organization policies at the folder level, which are inherited by all projects within that folder. This enforces consistent security controls and resource governance across each environment, a key DevOps practice for managing lifecycle and access boundaries.

Exam trap

Google Cloud often tests the misconception that a single VPC network simplifies management, but the trap here is that it sacrifices the network isolation required for safe multi-environment DevOps workflows, which is a core principle of Google Cloud's resource hierarchy design.

373
MCQmedium

Refer to the exhibit. A DevOps engineer wants to reduce compute costs immediately. Which action is most effective?

A.Delete the terminated instance.
B.Rightsize instance-2 from n1-standard-8 to n1-standard-4.
C.Change instance-1 to a non-preemptible instance.
D.Move instance-1 to a different zone.
AnswerB

Reducing the machine size saves cost.

Why this answer

Rightsizing instance-2 from n1-standard-8 to n1-standard-4 reduces vCPUs from 8 to 4, directly lowering cost. The terminated instance incurs no cost, and the preemptible instance is already cost-effective. Zone changes do not affect cost.

374
MCQmedium

Your GKE cluster runs a batch job that processes large files from Cloud Storage. The job uses CPUs inefficiently, with low utilization. You want to reduce cost while maintaining throughput. Which approach should you take?

A.Use Cloud Storage FUSE to stream files directly into containers, avoiding local storage.
B.Configure the node pool to use spot VMs.
C.Use local SSDs for faster file access.
D.Increase the CPU request for the job pods.
AnswerA

Streaming reduces latency and cost by eliminating disk.

Why this answer

Option A is correct because Cloud Storage FUSE allows containers to stream files directly from Cloud Storage without first downloading them to a local disk. This eliminates the I/O bottleneck of writing to local storage and reduces CPU overhead from disk operations, enabling the batch job to process files more efficiently and maintain throughput while using fewer CPU resources.

Exam trap

Google Cloud often tests the misconception that faster storage (local SSDs) or cheaper compute (spot VMs) always reduces cost, when the real issue is inefficient resource utilization that must be addressed at the application or data access layer.

How to eliminate wrong answers

Option B is wrong because spot VMs reduce cost but do not address the root cause of low CPU utilization; they may even increase cost if preemptions cause job restarts and wasted cycles. Option C is wrong because local SSDs improve disk I/O speed, but the problem is CPU inefficiency, not disk latency; faster storage does not fix underutilized CPUs. Option D is wrong because increasing CPU requests for pods will allocate more CPU resources but will not improve utilization if the job is not CPU-bound; it may actually increase cost without improving throughput.

375
Multi-Selecteasy

Which TWO actions can reduce startup latency for a Cloud Run service?

Select 2 answers
A.Use a regional Cloud Run with separate service per region.
B.Increase the maximum instances limit.
C.Optimize the container image to reduce size.
D.Increase the container concurrency setting.
E.Set a minimum number of instances to keep warm.
AnswersC, E

Smaller images pull faster from Container Registry.

Why this answer

Option C is correct because a smaller container image reduces the time required to pull the image from the registry to the compute node during cold starts. Cloud Run's startup latency is dominated by image download and filesystem extraction; optimizing the image (e.g., using distroless base images, multi-stage builds, or removing unnecessary layers) directly shortens this critical path.

Exam trap

Google Cloud often tests the misconception that scaling limits or concurrency settings affect startup latency, when in reality only image optimization and pre-warming (minimum instances) directly reduce cold-start time.

Page 4

Page 5 of 7

Page 6

All pages