Google Professional Cloud DevOps Engineer PCDOE Questions 451–500 | Page 7/7

451

MCQmedium

A company uses BigQuery flat-rate pricing with 1000 slots. During peak hours, queries are queued, but during off-peak, many slots are idle. What is the most cost-effective way to handle the idle slots?

A.Purchase additional slots to reduce queuing

B.Do nothing; idle slots are already paid for

C.Use flex slots to add capacity only during peak

D.Change to on-demand pricing to pay only for data scanned

AnswerC

Flex slots are short-term commitments that provide extra capacity when needed.

Why this answer

Option B is correct because flex slots allow short-term capacity additions during peak, avoiding idle costs. Option A wastes money on idle slots. Option C increases cost.

Option D may not be cheaper overall.

Full explanation →

452

MCQhard

A large stateful service running on Compute Engine experiences variable performance due to CPU throttling from noisy neighbors. Which solution provides the most consistent performance?

A.Enable live migration for the VMs

B.Use sole-tenant nodes to isolate the VMs

C.Use preemptible VMs for stateful workloads

D.Purchase committed use discounts for lower cost

AnswerB

Sole-tenant nodes ensure your VMs are the only ones on the physical machine, eliminating neighbor noise.

Why this answer

Sole-tenant nodes ensure that your VMs are the only ones running on the underlying physical server, eliminating resource contention from other tenants (noisy neighbors). This provides consistent CPU performance because the vCPUs are not oversubscribed and the full physical core capacity is dedicated to your instances.

Exam trap

The trap here is that candidates confuse live migration (which maintains availability during host maintenance) with performance isolation, or assume that committing to a discount (CUD) implies dedicated resources.

How to eliminate wrong answers

Option A is wrong because live migration moves a running VM to another host without downtime but does not prevent noisy neighbor contention on either the source or destination host. Option C is wrong because preemptible VMs are designed for fault-tolerant, stateless batch workloads and can be terminated at any time, making them unsuitable for stateful services that require persistent data and consistent performance. Option D is wrong because committed use discounts reduce cost in exchange for a 1- or 3-year commitment but do not affect CPU throttling or noisy neighbor isolation.

Full explanation →

453

MCQhard

A company uses Cloud Storage with standard storage class for all data. They want to automatically move data that has been accessed more than 30 days ago to a lower-cost storage class, and after 90 days to archive. What should they configure?

A.Lifecycle management rules.

B.Bucket lock.

C.Retention policy.

D.Object versioning.

AnswerA

Lifecycle management enables automated transitions to reduce costs.

Why this answer

Lifecycle management rules can automatically transition objects to different storage classes based on age or last access time.

Full explanation →

454

MCQhard

Your organization is bootstrapping a new Google Cloud environment for a DevOps team. The team consists of 15 engineers who will be working on multiple microservices deployed across several projects. You have created a folder called 'devops' under the organization node. Within this folder, you plan to create three projects: 'devops-dev', 'devops-staging', and 'devops-prod'. You want to enforce that all resources in these projects are created in a specific region (us-central1) and that no external IP addresses can be assigned to Compute Engine instances. Additionally, you want to ensure that all service accounts used by the applications have minimal permissions. After setting up the organization policies, you notice that a developer was able to create a Compute Engine instance with an external IP in the 'devops-dev' project. You check the organization policy constraints and find that the constraint 'compute.vmExternalIpAccess' is set to 'Deny' at the organization level, but the developer bypassed it. What is the most likely reason?

A.The project 'devops-dev' has a policy that overrides the organization-level deny.

B.The organization policy has not propagated to all projects yet.

C.The developer used the wrong constraint name; the correct constraint is 'compute.restrictExternalIp'.

D.The developer tagged the instance with a tag that exempts it from the organization policy.

AnswerA

Project-level policies override organization-level policies if they are less restrictive.

Why this answer

Option A is correct because organization policies can be overridden at a lower level in the resource hierarchy. Even though the constraint 'compute.vmExternalIpAccess' is set to 'Deny' at the organization level, a policy at the project level (or folder level) with a higher priority or a different binding can allow external IPs. In Google Cloud, organization policies are inherited by default, but a child policy can override the parent if it is explicitly set to 'Allow' or if the deny list is not enforced.

The developer likely had a project-level policy that allowed external IPs, bypassing the organization-level deny.

Exam trap

Google Cloud often tests the misconception that organization policies are absolute and cannot be overridden, but in reality, policies can be overridden at lower hierarchy levels unless explicitly configured to be enforced with a 'denyAll' or by using a boolean constraint that cannot be overridden.

How to eliminate wrong answers

Option B is wrong because organization policies propagate almost immediately to all projects under the hierarchy; there is no significant propagation delay that would allow a bypass. Option C is wrong because the correct constraint name for controlling external IPs on Compute Engine instances is 'compute.vmExternalIpAccess', not 'compute.restrictExternalIp'; the latter is not a valid Google Cloud constraint. Option D is wrong because tags do not exempt resources from organization policies; tags are used for metadata and access control, not for policy exemptions.

Full explanation →

455

MCQeasy

A team uses Cloud Build to deploy a Cloud Run service. The build fails with: 'ERROR: (gcloud.run.services.update) PERMISSION_DENIED: Permission 'run.services.update' denied on resource.' The Cloud Build service account has the Cloud Run Admin role. What is missing?

A.The build config must use the Cloud Run deployer step instead of the gcloud command.

B.The Cloud Build service account should have the Owner role on the project.

C.The Cloud Run service must be deployed in the same region as the build.

D.The Cloud Build service account needs the 'run.services.update' permission or the Cloud Run Admin role.

AnswerD

The error indicates missing permissions; Cloud Run Admin includes it.

Why this answer

Option D is correct because the error message explicitly states that the 'run.services.update' permission is denied, which means the Cloud Build service account lacks this specific permission. Although the Cloud Run Admin role includes 'run.services.update', the error indicates the role is not properly assigned or the service account is not using it. Reassigning the Cloud Run Admin role or directly granting the 'run.services.update' permission resolves the issue.

Exam trap

Google Cloud often tests the misconception that using a specific step type (like Cloud Run deployer) bypasses IAM requirements, when in fact all deployment methods require the same underlying permissions.

How to eliminate wrong answers

Option A is wrong because the Cloud Run deployer step is a convenience wrapper that still requires the same underlying IAM permissions; using it instead of the gcloud command does not bypass permission checks. Option B is wrong because the Owner role is overly permissive and unnecessary; the Cloud Run Admin role (roles/run.admin) already includes all required Cloud Run permissions, including 'run.services.update'. Option C is wrong because Cloud Run deployments are not region-restricted by the build's region; the service can be deployed to any region regardless of where Cloud Build runs.

Full explanation →

456

Multi-Selecthard

Which TWO are best practices for implementing CI/CD on Google Cloud?

Select 2 answers

A.Use Cloud Run for all services.

B.Use Artifact Registry for storing container images.

C.Use Cloud Build for all deployments, including infrastructure changes.

D.Use Cloud Deploy for Kubernetes deployments.

E.Use GitHub Actions instead of Cloud Build.

AnswersB, D

Artifact Registry is the recommended registry for Google Cloud.

Why this answer

Option B is correct because Artifact Registry is the recommended service for storing, managing, and securing container images and other artifacts in Google Cloud. It integrates natively with Cloud Build, Cloud Run, and Kubernetes, providing vulnerability scanning and IAM-based access control, which are essential for a secure CI/CD pipeline.

Exam trap

Google Cloud often tests the distinction between CI/CD tools and compute services, so candidates mistakenly select Cloud Run as a CI/CD best practice because it is a popular Google Cloud service, but it is a runtime environment, not a pipeline component.

Full explanation →

457

Multi-Selecteasy

What security checks can be integrated into a Cloud Build CI/CD pipeline? (Select TWO)

Select 2 answers

A.Manual code review

B.Container scanning with Artifact Analysis

C.Network penetration testing

D.Dynamic application security testing (DAST)

E.Static application security testing (SAST) with Cloud Build custom steps

AnswersB, E

Artifact Analysis can scan container images for vulnerabilities as part of the pipeline.

Why this answer

Options A and B are correct. Container scanning with Artifact Analysis (A) is native. SAST can be added via custom steps (B).

Option C (network penetration) is external. Option D (manual code review) is a process, not a tool. Option E (DAST) is for running apps.

Full explanation →

458

MCQmedium

Your company runs a multi-region application on Google Kubernetes Engine. You have implemented Cloud Monitoring dashboards to track cluster resource utilization and application SLIs. After a recent upgrade, you notice that the dashboard shows a sudden drop in CPU utilization for all nodes in one zone, but the application is still serving traffic normally. You suspect a monitoring issue. What should you investigate first?

A.Check if the nodes in that zone have been cordoned.

B.Check if the application's resource requests and limits have changed.

C.Check if the Kubernetes Metrics Server is running correctly in that zone.

D.Check if the Cloud Monitoring agent has been updated incorrectly.

AnswerC

Metrics Server is responsible for collecting resource usage; if it's down, CPU data would drop.

Why this answer

The Kubernetes Metrics Server is responsible for collecting resource metrics from Kubelets and exposing them via the Metrics API, which Cloud Monitoring uses to display CPU utilization. A sudden drop in CPU utilization across all nodes in a single zone, while the application continues to serve traffic normally, strongly indicates that the Metrics Server in that zone has failed or is not reporting metrics, rather than an actual change in workload. Investigating the Metrics Server's health and logs is the correct first step to confirm whether the monitoring pipeline is broken.

Exam trap

Google Cloud often tests the misconception that Cloud Monitoring relies on an external agent for all metrics, when in fact GKE integrates natively with the Metrics Server for node and pod resource utilization, making agent-related options a red herring.

How to eliminate wrong answers

Option A is wrong because cordoning a node prevents new pods from being scheduled but does not affect the reporting of CPU utilization for existing pods; the Metrics Server would still report metrics from running pods on cordoned nodes. Option B is wrong because changes to resource requests and limits affect pod scheduling and resource guarantees, not the actual CPU utilization reported by the Metrics Server; a sudden drop in reported utilization across all nodes in a zone is not caused by request/limit changes. Option D is wrong because Cloud Monitoring agents are not required for GKE node metrics; the Metrics Server collects and exposes node and pod metrics natively via the Kubernetes API, and Cloud Monitoring integrates directly with the Metrics API, not through a separate agent.

Full explanation →

459

MCQeasy

During a Cloud Build pipeline, a build step fails because the Docker image tag already exists in Container Registry. The team wants to avoid overwriting tags. What is the best practice to resolve this?

A.Use the commit SHA as the image tag in the build step.

B.Specify the :latest tag and always push to that tag.

C.Add a step to pull the image before building to ensure it's present.

D.Configure the build to retry on failure with a backoff.

AnswerA

Commit SHA is unique per change, avoiding collisions.

Why this answer

Using the commit SHA as the image tag guarantees uniqueness because each commit produces a distinct SHA. This prevents tag collisions in Container Registry without overwriting, as the SHA is immutable for that commit. It also provides traceability back to the exact source code version that produced the image.

Exam trap

Google Cloud often tests the misconception that retries or pulling images can resolve tag conflicts, when in fact only a unique tag strategy (like commit SHA) prevents the collision at the source.

How to eliminate wrong answers

Option B is wrong because using the :latest tag encourages overwriting, which directly violates the team's requirement to avoid overwriting tags. Option C is wrong because pulling an image before building does not prevent tag conflicts; it only ensures the image is cached locally, and the build step will still fail if the tag already exists in the registry. Option D is wrong because retrying with backoff does not resolve the underlying tag collision; it will simply fail again on each retry since the tag still exists.

Full explanation →

460

MCQhard

A multinational corporation is bootstrapping a Google Cloud organization with multiple subsidiaries. Each subsidiary needs its own folder with IAM policies that are managed locally, but the parent company wants to enforce a global policy that restricts the use of certain machine types (e.g., N2D) for cost control. However, one subsidiary has a legitimate need for those machine types in a specific project. What is the best way to handle this exception while maintaining the global policy?

A.Create a custom organization policy with a condition that excludes the exception project from the restriction.

B.Set an organization policy that denies N2D machine types, then create a separate policy at the project level to allow them for the exception project.

C.Use an audit-only policy and rely on a team to review and approve machine type usage.

D.Place each subsidiary in its own folder and set the machine type restriction only on folders that require it.

AnswerA

Custom policies with conditions allow fine-grained exceptions.

Why this answer

Option B is correct because custom organization policies with conditions can selectively exclude projects from certain restrictions. Option A is wrong because standard machine type constraints do not support per-project whitelists. Option C is wrong because it does not enforce the global policy across subsidiaries.

Option D is wrong because it only audits, not enforces.

Full explanation →

461

MCQhard

A company uses Cloud Run for a stateless API service with concurrency set to 80. During a traffic spike, some requests return HTTP 500 errors and latency spikes. Cloud Monitoring shows container CPU utilization at 100% and memory usage at 70%. What is the most likely cause and the best first step?

A.Concurrency per container is too high; reduce concurrency to 10

B.Maximum instances limit is too low; increase from 10 to 100

C.Min idle instances is too low; set min idle to 5 to reduce cold starts

D.Memory limit is too low; increase memory from 256 MiB to 512 MiB

AnswerA

Lowering concurrency reduces CPU contention, preventing timeouts and 500s.

Why this answer

The correct answer is A because with CPU at 100% and memory at only 70%, the bottleneck is CPU, not memory. Cloud Run containers handle requests concurrently; setting concurrency to 80 means each container processes up to 80 requests simultaneously. When CPU is saturated, requests queue up, causing latency spikes and eventual HTTP 500 errors as the container becomes unresponsive.

Reducing concurrency to 10 lowers the per-container request load, allowing each request to complete before CPU saturation occurs.

Exam trap

Google Cloud often tests the misconception that HTTP 500 errors during spikes are always due to insufficient instances or memory, but the key diagnostic clue here is CPU at 100% with memory well below limit, pointing to concurrency overload as the root cause.

How to eliminate wrong answers

Option B is wrong because increasing the maximum instances limit would add more containers, but each new container would also be configured with concurrency 80 and would immediately hit the same CPU bottleneck, spreading the problem without solving it. Option C is wrong because min idle instances addresses cold start latency for new containers, but the issue here is CPU saturation during a traffic spike, not cold starts; idle instances would still be overwhelmed by the high concurrency setting. Option D is wrong because memory usage is at 70%, not 100%, so memory is not the bottleneck; increasing memory would not resolve CPU saturation and could even increase per-container cost without benefit.

Full explanation →

462

Multi-Selecthard

Which TWO metrics should be monitored to detect a potential memory leak in a Compute Engine VM?

Select 2 answers

A.CPU utilization

B.Process count

C.Memory usage (percentage)

D.Disk read IOPS

E.Network bytes sent

AnswersB, C

A memory leak may cause the application to spawn more processes.

Why this answer

Option B is correct because a memory leak causes processes to consume increasing amounts of memory without releasing it, leading to a growing process count as new instances of the leaking process are spawned or existing processes remain active. Monitoring the process count helps detect abnormal growth that correlates with memory exhaustion. Option C is correct because memory usage percentage directly reflects how much of the VM's available RAM is consumed; a steady upward trend without a corresponding increase in workload indicates a leak.

Exam trap

Google Cloud often tests the misconception that CPU utilization is a primary indicator of memory leaks, but in reality, a leak can silently consume memory without spiking CPU until the system is critically low on memory.

Full explanation →

463

MCQeasy

A DevOps team is setting up a CI/CD pipeline using Cloud Build. They want the Cloud Build service account to have permission to deploy to Cloud Run within a specific project. Which IAM role should be granted to the Cloud Build service account?

A.roles/run.admin

B.roles/run.invoker

C.roles/cloudbuild.builds.editor

D.roles/iam.serviceAccountUser

AnswerA

This allows deploying and managing Cloud Run services.

Why this answer

The Cloud Build service account needs the `roles/run.admin` role to deploy services to Cloud Run. This role grants full control over Cloud Run resources, including creating, updating, and deleting services, which is required for a CI/CD pipeline to perform deployments. Without this role, the service account would lack the necessary permissions to modify Cloud Run configurations.

Exam trap

Google Cloud often tests the distinction between roles that grant management permissions (like `roles/run.admin`) versus roles that only grant invocation or build management, leading candidates to mistakenly choose `roles/run.invoker` or `roles/cloudbuild.builds.editor` when deployment is required.

How to eliminate wrong answers

Option B is wrong because `roles/run.invoker` only allows invoking (calling) Cloud Run services, not deploying or managing them. Option C is wrong because `roles/cloudbuild.builds.editor` grants permissions to manage Cloud Build triggers and builds, but does not include any Cloud Run deployment permissions. Option D is wrong because `roles/iam.serviceAccountUser` allows a principal to impersonate a service account (e.g., to use its identity), but does not grant direct Cloud Run deployment permissions; it is often used in conjunction with other roles, not as a standalone deployment role.

Full explanation →

464

MCQhard

Refer to the exhibit. A rollout to dev succeeds, but when promoting to prod, it fails with 'Target 'prod' not found'. What is the issue?

A.The prod target does not have a required approval rule.

B.The prod target has not been created in the same region.

C.The prod target exists but is in a different project.

D.The delivery pipeline must be redeployed to include the prod target.

AnswerB

Cloud Deploy targets must exist before they can be used in a pipeline. The error indicates the target does not exist.

Why this answer

Option A is correct because the prod target must be created before it can be referenced in a pipeline. Option B is incorrect because redeploying the pipeline won't create the target. Option C is incorrect because the error indicates the target does not exist, not that it's in a different project.

Option D is incorrect because approval is not related.

Full explanation →

465

Multi-Selecteasy

You are designing a monitoring strategy for a cloud-native application. Which THREE components are essential for observability?

Select 3 answers

A.Metrics

B.Alerts

C.Dashboards

D.Traces

E.Logs

AnswersA, D, E

Metrics provide quantitative data about system performance.

Why this answer

Metrics (A) are essential because they provide quantitative, time-series data about system health and performance, such as CPU utilization, request latency, and error rates. In cloud-native observability, metrics are typically collected via Prometheus or similar systems, enabling trend analysis and threshold-based alerting. Without metrics, you cannot measure the overall state of your application or infrastructure over time.

Exam trap

Google Cloud often tests the distinction between the core observability data types (metrics, logs, traces) and the operational tools built on them (alerts, dashboards), trapping candidates who confuse 'observability components' with 'monitoring tools'.

Full explanation →

466

MCQmedium

A company is using BigQuery for analytics and wants to control costs. They have many queries that scan large amounts of data. Which approach is most effective in reducing query costs?

A.Switch to flat-rate pricing to cap costs.

B.Partition tables by date and use partition pruning in queries.

C.Reserve BigQuery slots for dedicated capacity.

D.Use clustering to organize data within partitions.

AnswerB

Partitioning limits the data scanned, reducing query costs.

Why this answer

Partitioning tables by date and using partition pruning in queries directly reduces the amount of data scanned by BigQuery, which is the primary driver of on-demand query costs. By filtering on the partition column, BigQuery can skip entire partitions that do not match the query criteria, minimizing the bytes processed. This is the most effective cost-control measure because it addresses the root cause of high costs—excessive data scanning—without requiring a pricing model change or additional resource commitments.

Exam trap

Google Cloud often tests the misconception that clustering alone is sufficient for cost reduction, but clustering only optimizes data within a partition and cannot skip entire partitions, making partitioning the primary mechanism for cost control.

How to eliminate wrong answers

Option A is wrong because switching to flat-rate pricing caps the maximum cost but does not reduce the amount of data scanned; it simply changes the billing model from per-byte to a fixed monthly fee, which could be more expensive if usage is low or variable. Option C is wrong because reserving BigQuery slots for dedicated capacity provides predictable performance and cost but does not inherently reduce the amount of data scanned; it is a capacity-based pricing model that still charges for the slots regardless of query efficiency. Option D is wrong because clustering organizes data within partitions to improve query performance and reduce costs by limiting the data scanned within a partition, but it is secondary to partitioning; without partition pruning, clustering alone cannot skip entire partitions and is less effective at reducing overall data scanned.

Full explanation →

467

MCQeasy

Based on the exhibit, which Cloud Logging query filter will return all logs of this type?

A.severity>=ERROR

B.severity:"ERROR"

C.severity=ERROR

D.jsonPayload.severity:ERROR

AnswerC

This is the correct syntax to match logs with severity exactly 'ERROR'.

Why this answer

Option C is correct because Cloud Logging uses the `severity=ERROR` syntax to filter logs by exact severity level. The `=` operator performs an exact match on the severity field, which is a standard LogEntry field with predefined values (DEFAULT, DEBUG, INFO, NOTICE, WARNING, ERROR, CRITICAL, ALERT, EMERGENCY). This filter returns all logs where the severity is exactly ERROR, matching the requirement.

Exam trap

Google Cloud often tests the distinction between exact match (`=`) and text search (`:`) operators in Cloud Logging, and candidates mistakenly apply SQL-like range operators (`>=`) or confuse the severity field with a JSON payload field.

How to eliminate wrong answers

Option A is wrong because `severity>=ERROR` uses a comparison operator that is not supported in Cloud Logging filtering; severity filtering requires exact match operators (`=` or `!=`), not range comparisons. Option B is wrong because `severity:"ERROR"` uses a colon operator which is for text search or has field matching in some logging systems, but Cloud Logging requires the `=` operator for exact severity field matching. Option D is wrong because `jsonPayload.severity:ERROR` references a nested field under `jsonPayload`, but the severity field is a top-level LogEntry field, not part of the JSON payload; this filter would look for a custom field and miss the standard severity logs.

Full explanation →

468

Matchingmedium

Match each Google Cloud DevOps capability to its benefit.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Managed continuous delivery to GKE

Centralized container and package storage

Private Git repositories integrated with Cloud Build

IDE plugins for Kubernetes and Cloud Run

CLI for continuous development on Kubernetes

Why these pairings

Tools that accelerate DevOps workflows.

Full explanation →

469

Multi-Selectmedium

Which TWO actions can reduce tail latency in a microservices architecture deployed on GKE? (Choose 2)

Select 2 answers

A.Run multiple replicas of each service and use a load balancer with least-request algorithm.

B.Use a single large machine type for all services.

C.Enable session affinity to keep users on the same pod.

D.Increase the batch size for processing requests.

E.Implement request hedging by sending duplicate requests to multiple replicas.

AnswersA, E

Distributes load and reduces queuing delay.

Why this answer

Option A is correct because running multiple replicas and using a load balancer with a least-request algorithm distributes incoming requests to the pod with the fewest active connections, reducing queuing delay and preventing any single replica from becoming a hotspot. This directly lowers tail latency by ensuring that slow or overloaded pods are not overwhelmed, and the load balancer's algorithm minimizes the variance in response times across replicas.

Exam trap

Google Cloud often tests the misconception that session affinity (sticky sessions) improves performance, but in reality it harms tail latency by preventing even load distribution and causing pod overload under variable traffic.

Full explanation →

470

MCQhard

A Cloud Deploy pipeline fails during a rollout with: 'FAILED_PRECONDITION: The release is not in a state that can be promoted.' The Cloud Build service account has the IAM roles shown in the exhibit. What is the missing role or permission?

A.The service account is missing the 'roles/clouddeploy.jobRunner' role.

B.The service account is missing the 'roles/cloudbuild.builds.builder' role.

C.The service account is missing the 'roles/clouddeploy.operator' role.

D.The service account is missing the 'roles/clouddeploy.approver' role, which includes the 'clouddeploy.releases.promote' permission.

AnswerD

Approver role is needed for promotion.

Why this answer

The error 'FAILED_PRECONDITION: The release is not in a state that can be promoted' occurs when a Cloud Deploy pipeline attempts to promote a release but the service account lacks the `clouddeploy.releases.promote` permission. This permission is included in the `roles/clouddeploy.approver` role, which is required to trigger a promotion from one target to the next in the pipeline. Without this role, the release cannot be promoted even if other deployment permissions are present.

Exam trap

Google Cloud often tests the distinction between the `clouddeploy.operator` role (which manages releases and rollouts) and the `clouddeploy.approver` role (which specifically allows promotion), leading candidates to mistakenly choose the operator role for promotion actions.

How to eliminate wrong answers

Option A is wrong because the `roles/clouddeploy.jobRunner` role is used for executing deployment jobs (e.g., running Skaffold render/apply) and does not include the `clouddeploy.releases.promote` permission needed for promotion. Option B is wrong because the `roles/cloudbuild.builds.builder` role is for Cloud Build execution, not for Cloud Deploy release promotion; it does not grant `clouddeploy.releases.promote`. Option C is wrong because the `roles/clouddeploy.operator` role provides broader management permissions (e.g., creating releases, rollbacks) but does not include the `clouddeploy.releases.promote` permission, which is exclusive to the `roles/clouddeploy.approver` role.

Full explanation →

471

MCQhard

An organization has a service that must meet a 99.99% SLO. The service runs on GKE and uses Cloud SQL. The team notices that during a major incident, the error budget is consumed rapidly. They want to implement a mechanism to automatically rollback deployments that cause sustained error budget consumption above a threshold. What is the best approach?

A.Use Cloud Scheduler to run a script that checks error budget and rolls back if needed.

B.Set up a deployment pipeline with Cloud Deploy that includes a predeployment validation step that checks the current error budget burn rate and blocks the release if the burn rate exceeds 10% per hour.

C.Implement a canary deployment strategy with manual approval steps.

D.Configure Cloud Build to automatically revert the last commit if error budget is consumed.

AnswerB

Automated policy prevents deployments that would consume error budget quickly.

Why this answer

Option A is correct because using Cloud Deploy or Spinnaker with automated rollback based on error budget burn rate is the recommended pattern. Option B is wrong because Cloud Build is CI, not deployment orchestration. Option C is wrong because canary deployments reduce blast radius but don't auto-rollback.

Option D is wrong because manual rollback via Cloud Console is not automated.

Full explanation →

472

MCQhard

An organization has a strict compliance requirement that all CI/CD pipelines must use customer-managed encryption keys (CMEK) for any artifacts stored in Cloud Storage. How can this be enforced at the organization level?

A.Use IAM conditions on storage buckets to require CMEK.

B.Create an Organization Policy with constraint `constraints/gcp.storageRequireCmeK`.

C.Use Cloud Asset Inventory to scan for non-compliant buckets.

D.Configure Cloud Audit Logs to monitor and alert on non-CMEK usage.

AnswerB

Enforces CMEK on all new Cloud Storage objects in projects under the organization.

Why this answer

Option B is correct because the Organization Policy constraint `constraints/gcp.storageRequireCmeK` is specifically designed to enforce that all new Cloud Storage buckets must be created with a customer-managed encryption key (CMEK). This policy is applied at the organization, folder, or project level and prevents the creation of buckets that do not use CMEK, thereby meeting the compliance requirement at the organizational level.

Exam trap

The trap here is that candidates confuse IAM conditions (which control access) with Organization Policy constraints (which enforce creation-time requirements), leading them to choose Option A instead of the correct policy-based enforcement.

How to eliminate wrong answers

Option A is wrong because IAM conditions on storage buckets can restrict access based on encryption key type, but they cannot enforce the requirement that buckets must be created with CMEK; IAM conditions control access, not creation policies. Option C is wrong because Cloud Asset Inventory can identify non-compliant buckets after they are created, but it does not prevent their creation or enforce the policy proactively. Option D is wrong because Cloud Audit Logs can monitor and alert on non-CMEK usage, but they are reactive and do not enforce the requirement at the time of bucket creation.

Full explanation →

473

MCQhard

A large enterprise uses Cloud Build across multiple projects for different microservices. They want to create a centralized CI/CD governance where a single trigger can initiate builds across multiple projects, but each project's artifacts must be stored in a shared Artifact Registry. What is the best way to achieve this?

A.Use a shared VPC and a single Cloud Build private pool accessible to all projects, and configure triggers in each project.

B.Create a Cloud Build trigger in the governance project that uses a service account with permissions to send build requests to other projects.

C.Use a single Cloud Build trigger in the governance project and configure triggers in each project to listen to Pub/Sub notifications from the governance trigger.

D.Deploy a Cloud Function that listens to Cloud Source Repo events and creates Cloud Build builds in each project.

AnswerB

Cloud Build triggers can invoke builds in other projects using the 'projects/{projectId}/builds' resource with appropriate IAM.

Why this answer

Option B is correct because using a Cloud Build trigger with a cross-project service account and a multi-project configuration is the most native approach. Option A is incorrect - triggers are per project. Option C is incorrect - service accounts in each project is less centralized.

Option D is incorrect - Cloud Functions adds unnecessary complexity.

Full explanation →

474

MCQeasy

A company uses Cloud Build and wants to trigger builds only from the master branch. Which configuration is required?

A.Create separate triggers for each branch.

B.Set the branch filter to 'master' in the trigger.

C.Use a custom Cloud Build step to check the branch name.

D.Use a Cloud Function to call Cloud Build for master only.

AnswerB

Directly filters on branch name.

Why this answer

Option B is correct because Cloud Build triggers allow you to specify a branch filter using a regex pattern. Setting the filter to 'master' ensures that only pushes or pull requests targeting the master branch initiate the build. This is the native, supported method for branch-based triggering without additional overhead.

Exam trap

Google Cloud often tests the misconception that you need external services or custom logic to filter branches, when in fact Cloud Build's built-in trigger branch filter is the simplest and most efficient solution.

How to eliminate wrong answers

Option A is wrong because creating separate triggers for each branch would cause builds for all branches, not just master, and adds unnecessary complexity. Option C is wrong because using a custom Cloud Build step to check the branch name would still trigger the build for every branch, wasting resources and time; the branch check should happen before the build starts, not during it. Option D is wrong because using a Cloud Function to call Cloud Build for master only introduces an unnecessary intermediary, adding latency and complexity when the native branch filter in the trigger already achieves the goal directly.

Full explanation →

475

Multi-Selectmedium

Which TWO practices should be implemented to optimize query performance in Cloud Spanner?

Select 2 answers

A.Split large tables into multiple smaller tables to distribute load.

B.Create as many indexes as possible on all columns.

C.Use interleaved tables to co-locate related rows.

D.Use globally distributed interleaving across regions.

E.Define secondary indexes on columns used in WHERE clauses.

AnswersC, E

Interleaving ensures parent and child rows are stored together, reducing cross-node reads.

Why this answer

Option C is correct because interleaved tables in Cloud Spanner physically co-locate parent and child rows on the same split, reducing cross-node round trips and improving join performance. This design leverages Spanner's hierarchical storage model to minimize latency for queries that access related data together.

Exam trap

Google Cloud often tests the misconception that more indexes always improve query performance, but in Spanner, excessive indexes degrade write performance and storage efficiency, while interleaving and selective secondary indexes are the correct optimization strategies.

Full explanation →

476

MCQhard

A company wants to enforce that all service accounts are created with a specific naming convention (e.g., prefix 'sa-'). What is the most efficient way to enforce this?

A.Use a custom role that restricts service account creation to users who follow the naming convention.

B.Use a Cloud Function that monitors and remediates non-compliant service accounts.

C.Use an organization policy constraint with a condition on the service account name.

D.Use a folder-level attribute with a policy on service account names.

AnswerB

A Cloud Function can detect violations and automatically delete or rename.

Why this answer

Option B is correct because a Cloud Function can be triggered by a Pub/Sub notification on the `google.cloud.audit.log.v1.activityLog` topic for `google.iam.admin.v1.CreateServiceAccount` events. The function can immediately delete or disable non-compliant service accounts that do not match the 'sa-' prefix, providing automated enforcement without blocking legitimate creation attempts. This approach is event-driven and avoids the latency or complexity of periodic scanning.

Exam trap

The trap here is that candidates often assume organization policy constraints can enforce naming conventions because they are familiar with resource location or domain restrictions, but Google Cloud's organization policies do not support regex or prefix matching on IAM resource names.

How to eliminate wrong answers

Option A is wrong because custom roles cannot enforce naming conventions at creation time; they only control permissions to call the API, not validate input parameters like the account name. Option C is wrong because organization policy constraints (e.g., `constraints/iam.allowedPolicyMemberDomains`) do not support conditions on service account names; they only restrict resource locations, domains, or specific IAM conditions, not naming patterns. Option D is wrong because folder-level attributes and policies in Google Cloud do not have a native mechanism to enforce naming conventions on service accounts; folder policies apply to resource hierarchies but cannot validate string patterns on IAM resources.

Full explanation →

477

MCQmedium

A DevOps team wants to automate the deployment of a microservice application to Google Kubernetes Engine (GKE) using Cloud Build. They have a Cloud Build configuration file that builds a container image and deploys it to GKE. However, the deployment step fails with an authorization error. What is the most likely cause?

A.The Cloud Build service account does not have the Kubernetes Engine Developer IAM role.

B.The user triggering the build does not have IAM permissions to deploy to GKE.

C.Cloud Build does not have permission to access the source code repository.

D.The Docker image build step failed due to missing dependencies.

AnswerA

To deploy to GKE, the service account needs roles/container.developer or roles/container.clusterAdmin.

Why this answer

Cloud Build uses a default service account (the Cloud Build service account) to execute build steps, including deploying to GKE. The Kubernetes Engine Developer IAM role (roles/container.developer) grants the necessary permissions to deploy and manage workloads on GKE clusters. Without this role, the Cloud Build service account cannot authenticate to the GKE cluster's Kubernetes API, resulting in an authorization error during the deployment step.

Exam trap

Google Cloud often tests the distinction between the identity that triggers a build (user) and the identity that executes build steps (Cloud Build service account), leading candidates to incorrectly blame user permissions when the service account lacks the necessary Kubernetes Engine IAM role.

How to eliminate wrong answers

Option B is wrong because the user triggering the build only needs permission to start the Cloud Build execution; the actual deployment to GKE is performed by the Cloud Build service account, not the user's identity. Option C is wrong because an authorization error during the deployment step is distinct from source code repository access; if Cloud Build lacked repository permissions, the error would occur earlier during the source fetch step, not during deployment. Option D is wrong because a Docker build failure due to missing dependencies would cause a build step failure, not an authorization error; the error message specifically indicates an authorization issue, not a build failure.

Full explanation →

478

MCQmedium

A company is bootstrapping a Google Cloud organization. They have created a Shared VPC host project. They want to allow a service project's default compute service account to launch instances that use the Shared VPC's subnets. Which IAM role should be granted to that service account at the host project level?

A.roles/compute.xpnAdmin

B.roles/compute.securityAdmin

C.roles/compute.networkUser

D.roles/compute.networkAdmin

AnswerC

This role allows using subnets in the host project.

Why this answer

The correct answer is C because the `roles/compute.networkUser` role grants a service account the necessary permissions to use the subnets of a Shared VPC host project. Specifically, this role includes the `compute.subnetworks.use` permission, which allows the service account to launch instances in the host project's subnets without granting broader network management rights.

Exam trap

The trap here is that candidates often confuse the `networkUser` role with the `networkAdmin` role, mistakenly thinking that launching instances requires full network administration privileges, when in fact only the `compute.subnetworks.use` permission is needed.

How to eliminate wrong answers

Option A is wrong because `roles/compute.xpnAdmin` is used for administering the Shared VPC (XPN) configuration itself, such as attaching or detaching service projects, not for granting a service account the ability to use subnets. Option B is wrong because `roles/compute.securityAdmin` provides permissions to manage firewall rules and SSL certificates, but does not include the `compute.subnetworks.use` permission required to launch instances in Shared VPC subnets. Option D is wrong because `roles/compute.networkAdmin` grants full control over network resources, including creating and deleting subnets, which is overly permissive and not the least-privilege role needed for simply using existing subnets.

Full explanation →

479

MCQhard

When bootstrapping a new Google Cloud organization for DevOps, which set of initial IAM roles should be assigned to the DevOps team to enable them to create and manage projects, folders, and billing accounts?

A.Folder Admin, Billing Admin, Security Reviewer

B.Org Admin, Project Creator, Billing Admin

C.Project Creator, Billing Account User, Organization Policy Administrator

D.Project Creator, Billing Admin, Folder Admin

AnswerC

These roles provide the minimum required to create projects, link billing, and set policies.

Why this answer

Option C is correct because the DevOps team needs the Project Creator role to create new projects, the Billing Account User role to link billing accounts to those projects, and the Organization Policy Administrator role to set organization-wide policies that control resource constraints. These three roles together provide the minimum necessary permissions for bootstrapping a Google Cloud organization without granting excessive administrative privileges.

Exam trap

The trap here is that candidates often confuse Billing Admin (roles/billing.admin) with Billing Account User (roles/billing.user), mistakenly thinking full billing management is needed when only the ability to link projects to a billing account is required.

How to eliminate wrong answers

Option A is wrong because Folder Admin allows management of folder hierarchy but not project creation or billing account linking, and Security Reviewer only provides read-only access to IAM policies, lacking the permissions needed to create projects or manage billing. Option B is wrong because Org Admin grants broad organization-level management permissions that are too permissive for a DevOps team, and Project Creator alone cannot link billing accounts without the Billing Account User role. Option D is wrong because Billing Admin provides full billing account management (including modifying billing account settings) which is excessive, and Folder Admin is not required for initial project creation; the correct role for linking billing is Billing Account User, not Billing Admin.

Full explanation →

480

MCQeasy

A startup runs a batch data processing job every night. The job processes large datasets and takes about 6 hours to complete. The job is designed to handle interruptions gracefully by saving checkpoint files to Cloud Storage every few minutes. The startup wants to minimize compute costs. The current setup uses a managed instance group with 10 n1-standard-4 VMs running for the entire 6-hour window. They are considering using preemptible VMs. However, they are concerned about cost stability and potential preemption. What should they do?

A.Use sole-tenant nodes to improve performance.

B.Use a mix of preemptible and regular VMs to ensure at least some progress if preempted.

C.Use preemptible VMs for all instances, as the job can resume from checkpoints.

D.Use regular VMs with committed use discounts for a 1-year term.

AnswerC

Preemptible VMs offer cost savings of 60-80% and are ideal for fault-tolerant batch jobs that can checkpoint progress.

Why this answer

Option A is correct because the job is fault-tolerant and can resume from checkpoints, making preemptible VMs suitable and significantly cheaper. Option B adds unnecessary complexity. Option C requires a long-term commitment and is more expensive.

Option D increases cost without benefit.

Full explanation →

481

Multi-Selecteasy

Which TWO are best practices for implementing service monitoring strategies in Google Cloud?

Select 2 answers

A.Monitor the four golden signals (latency, traffic, errors, saturation) for every service.

B.Rely solely on synthetic monitoring to measure user experience.

C.Define Service Level Objectives (SLOs) and use them to drive alerting.

D.Use multiple monitoring tools to cover all aspects of the system.

E.Manually analyze logs and metrics to identify issues.

AnswersA, C

The four golden signals provide a high-level overview of service health.

Why this answer

A is correct because the four golden signals (latency, traffic, errors, saturation) are the foundational metrics recommended by Google SRE practices for monitoring any service. Monitoring these signals provides a comprehensive view of service health and user experience, enabling rapid detection of issues like high latency or resource exhaustion.

Exam trap

Google Cloud often tests the misconception that synthetic monitoring is sufficient for user experience measurement, but the correct approach combines synthetic and real user monitoring to capture the full picture.

Full explanation →

482

MCQhard

A company uses Cloud CDN to deliver content globally. They notice increasing egress costs. Which change will most effectively reduce egress costs?

A.Switch to a premium tier network for lower egress rates.

B.Enable gzip compression for all responses.

C.Use Cloud Armor to block malicious traffic.

D.Configure Cloud CDN to cache more content and increase cache hit ratio.

AnswerD

Higher cache hit ratio reduces the amount of data fetched from the origin, lowering egress costs.

Why this answer

Increasing the cache hit ratio reduces the number of requests that reach the origin server, which directly lowers the volume of data transferred from the origin to the CDN edge. Since egress costs are primarily driven by data served from the CDN edges to users, caching more content at the edge minimizes the need to fetch and serve data from the origin, thereby reducing overall egress traffic and associated costs.

Exam trap

The trap here is that candidates often confuse reducing data size (compression) with reducing data volume (caching), or assume that blocking traffic (Cloud Armor) is the primary cost-control mechanism, when in fact increasing cache efficiency is the most direct and effective method to lower egress costs in a CDN architecture.

How to eliminate wrong answers

Option A is wrong because switching to a premium tier network typically increases egress rates (higher performance, higher cost) and does not reduce egress costs; it may even increase them. Option B is wrong because enabling gzip compression reduces the size of responses, which can lower bandwidth usage, but it does not address the root cause of egress costs—the volume of data served from the CDN—and compression is often already applied by default or has limited impact on already-compressed content like images and video. Option C is wrong because Cloud Armor blocks malicious traffic at the edge, which can reduce some egress from attacks, but legitimate user traffic still generates egress costs; blocking malicious traffic does not significantly reduce overall egress costs for a globally distributed content delivery service.

Full explanation →

483

MCQhard

A company is bootstrapping their Google Cloud organization for DevOps. They want to implement a least-privilege model for service accounts used by CI/CD pipelines. The pipelines need to deploy resources in multiple projects. What is the best practice for managing service account keys?

A.Use a user account for the CI/CD pipeline and assign it the necessary roles.

B.Store service account keys in Secret Manager and have the pipeline retrieve them at runtime.

C.Generate a single service account key and securely distribute it to the CI/CD system.

D.Use workload identity federation to allow the CI/CD system to impersonate a service account without keys.

AnswerD

Eliminates the need for keys and follows least privilege.

Why this answer

Option D is correct because workload identity federation allows an external CI/CD system (e.g., Jenkins, GitHub Actions) to impersonate a Google Cloud service account without managing or storing any long-lived keys. This eliminates the security risk of key leakage and aligns with the least-privilege principle by enabling short-lived, scoped credentials via the Security Token Service (STS) and OAuth 2.0 token exchange.

Exam trap

Google Cloud often tests the misconception that storing keys in a secure vault like Secret Manager is the best practice, but the trap here is that any long-lived key — even if encrypted at rest — introduces a persistent secret that can be exfiltrated, whereas workload identity federation eliminates the key entirely.

How to eliminate wrong answers

Option A is wrong because using a user account violates the least-privilege model — user accounts have persistent, broad permissions and are not designed for automated pipelines, creating a security risk and auditability gap. Option B is wrong because storing service account keys in Secret Manager still requires managing a long-lived, static secret that can be compromised; the key itself is a high-value target and must be rotated, which adds operational overhead. Option C is wrong because generating a single service account key and distributing it securely still introduces a long-lived credential that can be leaked, rotated only with manual effort, and violates the principle of using short-lived, just-in-time credentials.

Full explanation →

484

MCQhard

Refer to the exhibit. What is the effect of the metricRelabelings section in this ServiceMonitor?

A.It adds a label called 'container_' to all metrics.

B.It only keeps metrics whose name starts with 'container_'.

C.It renames all metrics with the prefix 'container_' to remove the prefix.

D.It drops all metrics whose name starts with 'container_'.

AnswerD

The drop action with regex 'container_.*' removes matching metrics.

Why this answer

The `metricRelabelings` section in a ServiceMonitor uses Prometheus relabeling rules to modify or filter metrics before ingestion. The `action: drop` directive, combined with a `regex: 'container_.*'` pattern matching the metric name, causes all metrics whose names start with 'container_' to be dropped (i.e., not scraped). This is a common pattern to exclude unwanted metrics and reduce cardinality.

Exam trap

Google Cloud often tests the distinction between `drop` and `keep` actions in relabeling configs, and the trap here is that candidates confuse 'drop' with 'keep' or mistakenly think `metricRelabelings` adds or renames labels instead of filtering metrics.

How to eliminate wrong answers

Option A is wrong because `metricRelabelings` does not add labels; it modifies or filters existing metric names or labels, and the `action: drop` specifically removes metrics, not adds. Option B is wrong because the `action: drop` with a regex that matches metrics starting with 'container_' drops those metrics, not keeps them; keeping would require `action: keep`. Option C is wrong because renaming metrics (e.g., removing a prefix) would use `action: replace` with a `replacement` field, not `action: drop`.

Full explanation →

485

MCQhard

A company wants to enforce that all projects in the organization have a specific VPC Service Controls perimeter. What is the most efficient way to achieve this?

A.Use folder-level VPC Service Controls perimeters.

B.Use project-level VPC Service Controls perimeters.

C.Use organization policies to set the perimeter.

D.Use a custom script to monitor and alert on non-compliant projects.

AnswerA

Folder-level perimeters apply to all projects in the folder, ensuring consistent enforcement.

Why this answer

Folder-level VPC Service Controls perimeters allow you to apply a single perimeter configuration to all projects within a folder, ensuring consistent enforcement across the organization without needing to configure each project individually. This is the most efficient method because it leverages the resource hierarchy to inherit the policy, reducing administrative overhead and preventing misconfigurations.

Exam trap

The trap here is that candidates often confuse organization policies with VPC Service Controls, assuming that an organization policy can directly set a perimeter, but in reality, organization policies are for different constraints and cannot define perimeters.

How to eliminate wrong answers

Option B is wrong because project-level perimeters require manual attachment to each project, which is inefficient and error-prone for enforcing a policy across many projects. Option C is wrong because organization policies (e.g., constraints/compute.restrictVpcPeering) cannot directly define VPC Service Controls perimeters; they are used for different types of restrictions like resource location or service usage. Option D is wrong because a custom script only monitors and alerts on non-compliance but does not enforce the perimeter, leaving a window of non-compliance and requiring additional remediation steps.

Full explanation →

486

Multi-Selectmedium

A site reliability engineer is defining SLOs for a microservice application running on Google Kubernetes Engine. The application serves user-facing API requests. Which TWO approaches should the engineer take to effectively monitor the service's performance?

Select 2 answers

A.Monitor average latency because it is most representative of typical user experience.

B.Monitor container CPU utilization as a proxy for application latency.

C.Monitor the 99th percentile of request latency directly using Cloud Monitoring custom metrics.

D.Use logs-based metrics to count error rates (e.g., HTTP 5xx responses).

E.Use the number of running pods as the primary SLO indicator.

AnswersC, D

Direct latency measurement at the 99th percentile accurately reflects the experience of slow requests and is a standard SLO indicator.

Why this answer

Option C is correct because monitoring the 99th percentile (p99) of request latency directly captures the experience of the slowest 1% of users, which is critical for user-facing APIs where tail latency directly impacts user satisfaction. Cloud Monitoring custom metrics allow the engineer to instrument the application to emit precise latency distributions, enabling accurate SLO tracking rather than relying on averages that mask outliers.

Exam trap

Google Cloud often tests the misconception that average latency or infrastructure metrics like CPU/pod count are sufficient for SLOs, when in fact user-facing SLOs must directly measure the user experience via tail latency and error rates.

Full explanation →

487

Multi-Selecthard

Which TWO of the following are essential elements of a comprehensive incident post-mortem document according to Google's Site Reliability Engineering (SRE) best practices?

Select 2 answers

A.Exact code line numbers and commits that caused the incident.

B.A timeline of events leading to and during the incident.

C.A detailed analysis of the root cause only.

D.An attribution of blame to the individual or team responsible.

E.A list of action items with owners and deadlines to prevent recurrence.

AnswersB, E

Timeline helps understand sequence.

Why this answer

Option B is correct because a timeline of events is a core element of an incident post-mortem in Google SRE practice, as it provides a chronological reconstruction of the incident's progression, enabling teams to understand the sequence of failures and responses. This timeline is essential for identifying contributing factors and evaluating the effectiveness of mitigation actions, not just the root cause.

Exam trap

Google Cloud often tests the misconception that a post-mortem is solely about finding the root cause or assigning blame, but SRE best practices emphasize a blameless culture and a comprehensive review that includes a timeline and actionable follow-ups.

Full explanation →

488

MCQhard

A company runs a critical application on Compute Engine instances behind a TCP/UDP Network Load Balancer. They notice intermittent high latency for a subset of users. The application logs show no errors, and instance CPU is below 50%. Which next step is most effective to diagnose the latency?

A.Increase the number of instances behind the load balancer.

B.Enable VPC Flow Logs and analyze for dropped packets.

C.Switch to an HTTP(S) Load Balancer for better visibility.

D.Analyze Cloud Monitoring metrics for the load balancer, including backend latency and request counts.

AnswerD

These metrics pinpoint where latency occurs.

Why this answer

Option D is correct because Cloud Monitoring provides detailed metrics for TCP/UDP Network Load Balancers, including backend latency and request counts, which directly help identify whether the latency originates from the load balancer itself or the backend instances. Since instance CPU is below 50% and application logs show no errors, the issue is likely at the network or load balancer level, and these metrics offer the most targeted diagnostic data without changing the architecture.

Exam trap

The trap here is that candidates assume VPC Flow Logs (Option B) are the go-to tool for diagnosing latency, but they only show flow-level metadata and not latency metrics, whereas Cloud Monitoring provides the specific performance data needed for this scenario.

How to eliminate wrong answers

Option A is wrong because increasing the number of instances does not diagnose the root cause of latency; it only masks the symptom and may not help if the issue is due to network congestion, load balancer configuration, or regional user distribution. Option B is wrong because VPC Flow Logs capture metadata about network flows (e.g., source/destination IP, ports, packet count) but do not provide latency or dropped packet analysis for a Network Load Balancer; they are more suited for security auditing or connection tracking, not performance diagnostics. Option C is wrong because switching to an HTTP(S) Load Balancer would change the architecture and introduce Layer 7 processing overhead, which is unnecessary for a TCP/UDP application and does not directly diagnose the existing latency issue; the current load balancer type is appropriate for the protocol, and the problem should be investigated with existing monitoring tools.

Full explanation →

489

MCQhard

A media streaming service uses Cloud Storage to store video files and serves them via Cloud CDN. Users in Asia report buffering issues. The team notices that the cache hit ratio is low in that region. The origin is a single Cloud Storage bucket in us-central1. Which set of actions would best improve performance for Asian users?

A.Use Cloud Load Balancing with Cloud Armor to protect the origin.

B.Enable HTTP/2 on Cloud CDN and increase the TTL for video content.

C.Configure a custom domain on Cloud CDN with SSL and enable request collapsing.

D.Create a new Cloud Storage bucket in an Asian region and use dual-region bucket with Cloud CDN.

AnswerD

A closer origin reduces latency for cache misses, improving performance.

Why this answer

Option D is correct because creating a new Cloud Storage bucket in an Asian region and using a dual-region bucket with Cloud CDN reduces latency by serving content from a geographically closer origin, improving cache hit ratios for Asian users. Cloud CDN caches content at edge locations, but if the origin is far away (us-central1), the first miss still incurs high latency. A dual-region bucket provides a local origin for cache misses, significantly reducing round-trip time.

Exam trap

Google Cloud often tests the misconception that Cloud CDN alone solves all latency issues, but the trap here is that cache hit ratio depends on both edge caching and origin proximity; without a local origin, cache misses still cause high latency for distant users.

How to eliminate wrong answers

Option A is wrong because Cloud Load Balancing with Cloud Armor provides DDoS protection and traffic distribution, but does not address the low cache hit ratio or reduce latency for Asian users; it does not change the origin location or caching behavior. Option B is wrong because enabling HTTP/2 and increasing TTL for video content can improve performance generally, but HTTP/2 does not fix the fundamental issue of a distant origin, and longer TTLs only help if content is already cached; they do not improve cache hit ratio for a region with a faraway origin. Option C is wrong because configuring a custom domain with SSL and enabling request collapsing improves security and reduces origin load by collapsing concurrent requests, but does not address the geographic distance between Asian users and the us-central1 origin, so cache misses still suffer high latency.

Full explanation →

490

MCQeasy

Which Google Cloud service provides a fully managed, private Git repository that integrates with Cloud Build for continuous integration?

A.Cloud Deployment Manager

B.Cloud Source Repositories

C.Cloud Storage

D.Container Registry

AnswerB

Designed for hosted Git repositories; native integration with Cloud Build.

Why this answer

Cloud Source Repositories is the correct answer because it provides fully managed, private Git repositories hosted on Google Cloud. It integrates natively with Cloud Build, enabling automatic triggers for continuous integration (CI) pipelines whenever code is pushed to a repository branch or tag, without requiring external Git hosting.

Exam trap

The trap here is confusing Cloud Source Repositories with Container Registry, as both integrate with Cloud Build, but only Cloud Source Repositories provides Git repository hosting, while Container Registry stores built container images.

How to eliminate wrong answers

Option A is wrong because Cloud Deployment Manager is an infrastructure-as-code service for managing Google Cloud resources using declarative templates (YAML/Python), not a Git repository service. Option C is wrong because Cloud Storage is an object storage service for storing unstructured data (blobs) via HTTP/S, not a Git repository with version control or CI integration. Option D is wrong because Container Registry is a private container image registry for storing and managing Docker images, not a Git repository; it integrates with Cloud Build for building images from source, but it does not host Git repositories.

Full explanation →

491

MCQmedium

A mid-size company has multiple Google Cloud projects for different departments. The finance team wants to set up budget alerts to track spending across all projects. They have enabled billing export to BigQuery. The budget should trigger alerts when total cumulative spending exceeds 80% and 100% of the monthly budget of $10,000. The budget must be applied across all projects in the organization. They created a budget at the billing account level with amount $10,000, set alert thresholds at 80% and 100%, and added email recipients. However, after two weeks, spending in one project alone has exceeded $8,000 but no alert was triggered. What is the most likely cause?

A.The budget was set at the organization level, not billing account.

B.The budget was scoped to a single project instead of the billing account.

C.The budget thresholds were set as percentage of current spending not cumulative.

D.The budget amount includes credits or discounts.

AnswerB

A budget scoped to a single project only monitors that project's spending, missing spending from other projects.

Why this answer

Option D is correct. If the budget was scoped to a single project, it would not consider spending from other projects. Option A is incorrect because credits do not affect whether alerts are triggered.

Option B is incorrect; thresholds are based on cumulative spending. Option C is incorrect because budgets must be at the billing account level, not organization level.

Full explanation →

492

MCQhard

A team uses Cloud SQL for MySQL. They have read-heavy traffic and want to reduce costs. Which strategy is most effective?

A.Use read replicas to offload read queries.

B.Use committed use discounts.

C.Use High Availability with a standby instance.

D.Use vertical scaling to increase instance size.

AnswerA

Read replicas distribute reads, reducing primary load and enabling potential downsizing.

Why this answer

Using read replicas offloads read queries from the primary instance, allowing the primary to be downsized or use a lower-tier instance, reducing cost.

Full explanation →

493

MCQmedium

A GKE cluster node fails, causing pods to be rescheduled. However, some pods remain in 'CrashLoopBackOff' state. After examining logs, you find the application has a dependency on local SSD that was ephemeral. What is the best long-term solution?

A.Use PersistentVolumes with ReadWriteOnce access mode.

B.Configure pod anti-affinity.

C.Increase the node pool size.

D.Use a DaemonSet to run the application.

AnswerA

PersistentVolumes retain data across pod rescheduling and node failures.

Why this answer

The correct answer is A because the application's dependency on local SSD (ephemeral storage) means that when the node fails, the data is lost, causing the pods to crash. PersistentVolumes (PVs) with ReadWriteOnce (RWO) access mode provide durable, node-independent storage that survives node failures, ensuring pods can be rescheduled on any node and access their data. This is the best long-term solution because it decouples storage from the node lifecycle, preventing CrashLoopBackOff due to missing local data.

Exam trap

Google Cloud often tests the misconception that scaling resources (e.g., node pool size) or controlling pod placement (e.g., anti-affinity) can fix data persistence issues, but the trap here is that ephemeral storage is tied to the node's lifecycle, so only persistent storage solutions like PersistentVolumes address the root cause.

How to eliminate wrong answers

Option B is wrong because pod anti-affinity controls pod placement (e.g., spreading pods across nodes) but does not address the root cause of data loss from ephemeral local SSD; it would not prevent CrashLoopBackOff if the data is missing. Option C is wrong because increasing the node pool size adds more nodes but does not solve the problem of ephemeral storage being tied to a failed node; pods would still fail on new nodes if they rely on local SSD that is not replicated. Option D is wrong because a DaemonSet runs one pod per node, but it does not provide persistent storage; if the node fails, the pod is rescheduled on another node without the local SSD data, leading to the same CrashLoopBackOff issue.

Full explanation →

494

MCQmedium

A team uses Spanner for a global database. They notice increased read latency and high CPU utilization on some nodes. The workload is read-heavy with occasional writes. Which action is most likely to improve performance?

A.Create read-only replicas in each region.

B.Split the most frequently read tables into smaller tables.

C.Increase the number of nodes in the instance.

D.Add more nodes to the instance and ensure read requests are distributed evenly.

AnswerD

More nodes spread read load and reduce CPU per node.

Why this answer

In a read-heavy Spanner workload with high CPU utilization on some nodes, adding more nodes and ensuring read requests are distributed evenly (Option D) directly addresses the bottleneck by increasing the instance's compute capacity and spreading the load across all nodes. Spanner's architecture uses a shared-nothing design where each node handles a portion of the data and traffic; uneven distribution can cause hot spots. Adding nodes scales out processing power, and ensuring even distribution (e.g., via proper key design or using Spanner's built-in load balancing) reduces latency and CPU spikes on individual nodes.

Exam trap

Google Cloud often tests the misconception that adding nodes alone (Option C) solves performance issues, but the trap is that without even distribution of read requests, hot spots persist, making Option D the only complete solution.

How to eliminate wrong answers

Option A is wrong because read-only replicas in Spanner are used for improving read latency for stale reads (non-strong reads) and do not reduce CPU utilization on the primary nodes; they also cannot serve strong reads, which are common in read-heavy workloads. Option B is wrong because splitting frequently read tables into smaller tables does not inherently reduce CPU utilization or read latency; it may increase complexity and cross-table joins, and Spanner already partitions data into splits automatically. Option C is wrong because simply increasing the number of nodes without ensuring even distribution of read requests can leave hot spots unresolved; the key issue is uneven load, not just insufficient capacity.

Full explanation →

495

Multi-Selectmedium

A team is optimizing a Cloud Run service. Which two actions can reduce request latency? (Select TWO.)

Select 2 answers

A.Increase max-instances

B.Enable HTTP/2

C.Reduce container image size

D.Use a regional endpoint

E.Enable min-instances

AnswersC, E

Reduces startup time, lowering latency for new instances.

Why this answer

Enabling min-instances reduces cold starts, and reducing container image size lowers startup time, both reducing latency.

Full explanation →

496

MCQmedium

After deploying a new version of a Cloud Run service, the team notices an increase in 5xx errors. They want to quickly revert to the previous version while minimizing user impact. What is the recommended approach?

A.Set the minimum number of instances of the new revision to 0.

B.Redeploy the previous version from the container registry.

C.Modify the ingress settings to restrict traffic to the new revision.

D.Use Cloud Run's traffic management to set 100% of traffic to the previous revision.

AnswerD

Traffic splitting achieves instant rollback.

Why this answer

Cloud Run supports traffic splitting between revisions, allowing you to instantly route 100% of traffic to the previous revision without redeploying. This minimizes user impact because the rollback is immediate and does not require rebuilding or re-pulling container images. Option D is correct because it leverages Cloud Run's built-in traffic management feature for zero-downtime rollbacks.

Exam trap

Google Cloud often tests the misconception that you must redeploy or delete a revision to roll back, when in fact Cloud Run's traffic management allows instant, traffic-level rollbacks without any deployment action.

How to eliminate wrong answers

Option A is wrong because setting the minimum number of instances of the new revision to 0 does not stop traffic from reaching it; it only affects scaling behavior, and the revision would still serve requests if it receives traffic. Option B is wrong because redeploying the previous version from the container registry is unnecessary and slower; Cloud Run already retains the previous revision, so you can simply shift traffic to it. Option C is wrong because modifying ingress settings to restrict traffic to the new revision would block all incoming traffic to that revision, but it does not automatically route traffic to the previous revision, leaving users with no service until you manually adjust routing.

Full explanation →

497

MCQmedium

An engineer wants to ensure that an alert is escalated if not acknowledged within 5 minutes. Which feature of Cloud Monitoring can achieve this?

A.Incident management tool like PagerDuty.

B.Notification channel with escalation configuration.

C.Using a webhook notification channel.

D.Alerting policy with a condition that checks for acknowledgment.

AnswerB

Escalation channels in notification settings allow sending to different recipients after a delay.

Why this answer

Notification channels in Google Cloud Monitoring can be configured with an escalation policy that defines a sequence of recipients or notification methods to be contacted if an alert is not acknowledged within a specified time. By setting the escalation duration to 5 minutes, the system automatically escalates the alert to the next tier or channel if no acknowledgment is received, directly meeting the engineer's requirement.

Exam trap

Google Cloud often tests the distinction between alerting policy conditions (what triggers an alert) and notification channel escalation (what happens after the alert fires), leading candidates to incorrectly select option D because they confuse the condition's 'duration' with the escalation timeout.

How to eliminate wrong answers

Option A is wrong because PagerDuty is an external incident management tool that integrates with Cloud Monitoring via notification channels, but it is not a built-in feature of Cloud Monitoring itself; the question asks for a feature of Cloud Monitoring, not an external tool. Option C is wrong because a webhook notification channel simply sends HTTP POST requests to a specified URL when an alert fires; it does not inherently support escalation logic or acknowledgment tracking. Option D is wrong because an alerting policy condition defines the metric or log criteria that trigger an alert, not the escalation behavior after the alert fires; acknowledgment and escalation are handled by the notification channel's configuration, not the policy condition.

Full explanation →

498

Multi-Selectmedium

Your organization is adopting DevOps practices and needs to bootstrap a Google Cloud organization with multiple projects. You want to enforce consistent resource naming conventions and apply common organization policies across all projects. Which two services should you use together to achieve this?

Select 2 answers

A.Cloud Shell and Cloud Source Repositories

B.Cloud Deployment Manager and Cloud Audit Logs

C.Service Accounts and IAM roles

D.Organization Policies and Resource Manager folders

AnswersC, D

Incorrect: Service accounts and IAM control access, not resource naming or organization policies.

Why this answer

Organization Policies allow you to centrally constrain actions across all projects in the hierarchy, while Resource Manager folders let you group projects and apply policies consistently. Together, they enable you to enforce naming conventions (e.g., via a custom constraint) and common policies (e.g., disabling external IPs) across multiple projects without manual per-project configuration.

Exam trap

Google Cloud often tests the distinction between identity/access management (IAM) and organization-level policy enforcement, leading candidates to mistakenly choose Service Accounts and IAM roles when the question specifically asks for consistent naming conventions and common policies across projects.

Full explanation →

499

MCQeasy

Which GCP tool provides recommendations for rightsizing Compute Engine VMs?

A.Cloud Console Cost Table.

B.Cloud Monitoring.

C.Cloud Billing Reports.

D.Recommender.

AnswerD

Recommender provides rightsizing and other cost optimization recommendations.

Why this answer

The Recommender tool in GCP uses machine learning to analyze historical usage patterns and provides actionable recommendations, including rightsizing suggestions for Compute Engine VMs. It identifies underutilized VMs and suggests optimal machine types to reduce costs without compromising performance, making it the correct choice for this task.

Exam trap

The trap here is that candidates often confuse Cloud Monitoring's alerting capabilities with proactive recommendations, assuming that monitoring data alone provides rightsizing suggestions, but Recommender is the dedicated tool for that purpose.

How to eliminate wrong answers

Option A is wrong because Cloud Console Cost Table is not a GCP tool; it is a generic term and does not exist as a specific service for rightsizing recommendations. Option B is wrong because Cloud Monitoring focuses on collecting metrics, logs, and setting alerts, but it does not generate rightsizing recommendations; it provides data that could be used manually but lacks the automated analysis. Option C is wrong because Cloud Billing Reports provide cost breakdowns and usage trends, but they do not offer specific VM rightsizing suggestions; they are for historical cost analysis, not proactive optimization.

Full explanation →

500

Multi-Selectmedium

Which TWO tools should be used for real-time incident collaboration and communication?

Select 2 answers

A.Google Meet only.

B.Jira.

C.Cloud Monitoring Incident Response (beta).

D.Google Chat with a dedicated incident room.

E.Cloud Trace.

AnswersC, D

This tool centralizes incident management and collaboration.

Why this answer

Cloud Monitoring Incident Response (beta) provides a dedicated interface for real-time incident management, including automated notifications, escalation policies, and a centralized timeline for collaboration. Google Chat with a dedicated incident room enables real-time communication and coordination among incident responders, allowing them to share updates, logs, and runbooks in a structured channel. Together, these tools fulfill the need for both incident orchestration and synchronous collaboration during active incidents.

Exam trap

Google Cloud often tests the distinction between tools that support real-time collaboration (like Chat rooms) versus tools that are for asynchronous tracking (like Jira) or monitoring-only (like Cloud Trace), leading candidates to select familiar but incorrect options like Jira for incident communication.

Full explanation →

Google Professional Cloud DevOps Engineer (PCDOE) — Questions 451–500