Knowledge + Practice

Google Professional Cloud DevOps Engineer (PCDOE) — Questions 76–150

500 questions total · 7pages · All types, answers revealed

Take a mock exam Exam hub

Page 2 of 7

76

MCQmedium

A fintech company deploys a critical payment service on GKE using Cloud Deploy with a canary deployment strategy. They want to automatically roll back if the canary release causes an increase in error rates over 1%. They have set up Cloud Monitoring to expose a custom metric 'error_rate' from the service. They want Cloud Deploy to evaluate this metric during the canary phase and roll back if the threshold is exceeded. What is the minimal configuration needed?

A.In the Skaffold configuration, define a 'verify' section with a 'stackdriverMetrics' verification job that queries the 'error_rate' metric and sets a threshold. In Cloud Deploy, ensure the rollout strategy is 'canary' and no manual approval is required.

B.Install the Cloud Operations agent on the GKE nodes and configure Cloud Deploy to read the 'error_rate' metric from Cloud Monitoring by default.

C.Set the Cloud Deploy pipeline's 'strategy' field to 'canary' and set 'autoRollback: true' on the release.

D.Create a Cloud Deploy rollout with multiple phases and add a 'stackdriverMetrics' job to the 'postDeploy' phase.

AnswerA

This configures automatic metric-based verification and rollback in the canary phase.

Why this answer

Option A is correct: Cloud Deploy supports 'canaryDeployment' with phases and requires a verification job that queries Stackdriver metrics. Defining a 'stackdriverMetrics' verification job in the Skaffold configuration achieves this. Option B is incorrect because the rollout strategy 'strategy' is not a top-level field; it's part of the pipeline.

Option C is incorrect because Cloud Deploy does not inherently know the metrics; a verification job must be defined. Option D is incorrect because additional phases are not needed and don't enable metric evaluation.

Full explanation →

77

Multi-Selecthard

You are the DevOps engineer for a large e-commerce platform running on Google Kubernetes Engine (GKE). During a flash sale, you observe that the payments service is experiencing high latency and intermittent errors. The service is deployed with HorizontalPodAutoscaler (HPA) based on CPU utilization. You need to quickly diagnose and mitigate the issue. Which TWO actions should you take?

Select 2 answers

A.Use Cloud Monitoring to examine the payments service's request latency and error rate metrics, and create a custom dashboard for real-time monitoring.

B.Check the GKE node's network performance using VPC Flow Logs and increase the node pool size.

C.Modify the HPA to use memory utilization instead of CPU, as memory is more indicative of the service's performance.

D.Configure a custom metric in Cloud Monitoring for the payments service's request queue depth and use it for HPA.

E.Manually scale up the payments service deployment to more replicas to handle the increased load.

AnswersA, E

Cloud Monitoring provides latency and error metrics via Istio or GKE metrics; a custom dashboard helps visualize the issue.

Why this answer

Option A is correct because Cloud Monitoring provides the necessary observability to diagnose the root cause of high latency and intermittent errors by examining request latency and error rate metrics. Creating a custom dashboard enables real-time monitoring, allowing you to correlate performance degradation with traffic spikes during the flash sale. This is the first step in incident management: observe before acting.

Exam trap

Google Cloud often tests the misconception that scaling actions (like manual scaling or changing HPA metrics) are the first step in incident response, when in fact observability and diagnosis must precede any mitigation to avoid making the problem worse.

Full explanation →

78

MCQhard

A team uses Spinnaker on GKE for deployment. They notice that deployments are taking too long because of manual judgment gates. They want to automatically approve deployments if the canary analysis passes predefined thresholds. What Spinnaker feature should they use?

A.Automated rollback

B.Policy engine with OPA

C.Pipeline expressions

D.Automated canary analysis

AnswerD

ACA with Kayenta automates canary evaluation based on metrics thresholds.

Why this answer

Option D is correct because Spinnaker's Automated Canary Analysis (ACA) feature allows teams to define canary analysis thresholds and automatically promote or roll back deployments based on the analysis results, eliminating the need for manual judgment gates. This directly addresses the requirement to automatically approve deployments when predefined thresholds are met, without human intervention.

Exam trap

The trap here is that candidates may confuse the Policy Engine (OPA) with automated approval logic, but OPA is for policy enforcement (e.g., 'only deploy from master branch'), not for statistical canary analysis and automated promotion.

How to eliminate wrong answers

Option A is wrong because automated rollback is a reactive mechanism that reverts a deployment after a failure is detected, not a proactive feature to automatically approve deployments based on canary analysis. Option B is wrong because the Policy Engine with OPA (Open Policy Agent) is used for enforcing governance and compliance policies (e.g., restricting which accounts can deploy), not for automating canary analysis approvals. Option C is wrong because Pipeline Expressions are used for dynamic parameterization and conditional logic within pipeline stages, but they cannot perform the statistical analysis of canary metrics required for automated approval.

Full explanation →

79

Matchingmedium

Match each Google Cloud service to its primary purpose.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

CI/CD pipeline

Metrics and alerting

Log management and analysis

Application error aggregation

Real-time code inspection

Why these pairings

These are core DevOps services on Google Cloud.

Full explanation →

80

MCQeasy

A company is bootstrapping a new Google Cloud organization for DevOps. They want to separate development, staging, and production environments using folders. Which folder structure follows Google-recommended best practices?

A.Create a single folder called 'Environments' with subfolders Dev, Staging, Prod, and place projects in the subfolders.

B.Create a folder for each team, then subfolders for environments (e.g., Team1/Dev, Team1/Prod).

C.Use labels on projects to denote environment (e.g., env=dev) instead of folders.

D.Create folders under the Organization node: Development, Staging, Production. Place all projects in the appropriate folder.

AnswerD

This is the recommended structure for environment separation.

Why this answer

Option D is correct because Google Cloud best practices recommend creating top-level folders under the Organization node for each environment (Development, Staging, Production) to enforce consistent policy inheritance and resource isolation. This structure allows you to apply organization policies and IAM roles at the environment level, ensuring that production resources are strictly separated from development and staging resources, which is critical for compliance and security.

Exam trap

The trap here is that candidates often think a single 'Environments' parent folder (Option A) is cleaner or more organized, but Google specifically recommends against unnecessary nesting because it complicates policy inheritance and violates the principle of least privilege.

How to eliminate wrong answers

Option A is wrong because placing environment subfolders under a single 'Environments' folder adds an unnecessary nesting level that complicates policy inheritance and does not align with Google's recommended flat hierarchy for environments. Option B is wrong because organizing by team first, then by environment, makes it difficult to apply consistent policies across all environments (e.g., a production-wide policy would need to be applied to each team's Prod subfolder individually), increasing administrative overhead and risk of misconfiguration. Option C is wrong because labels are metadata tags and do not provide the structural isolation or policy inheritance that folders offer; labels cannot enforce resource hierarchy or prevent cross-environment access, making them unsuitable for environment separation.

Full explanation →

81

Multi-Selectmedium

A company uses Cloud SQL for their transactional database. They are experiencing slow read performance. Which THREE actions can improve read throughput? (Choose three.)

Select 3 answers

A.Enable query caching

B.Use Cloud SQL Proxy for connections

C.Add a read replica

D.Use connection pooling

E.Increase the tier size of the primary instance

AnswersC, D, E

Read replicas serve read queries, reducing load on the primary.

Why this answer

Adding a read replica offloads read queries from the primary Cloud SQL instance, distributing the read workload and improving read throughput. This is a common pattern for scaling read-heavy workloads in Cloud SQL, as replicas serve read traffic asynchronously without impacting the primary instance's write performance.

Exam trap

Google Cloud often tests the misconception that connection pooling or caching alone can solve read throughput issues, but the key is that read replicas directly scale read capacity, while connection pooling only reduces connection overhead and caching is often deprecated or ineffective in transactional databases.

Full explanation →

82

Multi-Selectmedium

A team is running a stateful application on Compute Engine with high disk I/O. They want to optimize disk performance. Which TWO actions should they take? (Choose two.)

Select 2 answers

A.Use standard persistent disk for cost savings

B.Enable disk encryption

C.Use SSD persistent disk for data

D.Use local SSD for temporary data

E.Increase disk size

AnswersC, D

SSD persistent disk offers significantly higher IOPS compared to standard disk.

Why this answer

Option C is correct because SSD persistent disks provide higher IOPS and lower latency than standard persistent disks, making them suitable for high disk I/O workloads. Option D is correct because local SSDs offer even higher performance by attaching directly to the host VM, but their data is ephemeral, making them ideal for temporary data like caches or scratch space.

Exam trap

Google Cloud often tests the misconception that increasing disk size is the primary way to improve disk performance, but the correct approach for high I/O is to choose the right disk type (SSD) and use local SSDs for ephemeral data, not just resize the disk.

Full explanation →

83

MCQmedium

A DevOps engineer is setting up a Cloud Build trigger that builds a container image and deploys it to Cloud Run. The build fails with a permission error when trying to access resources in a different project. The engineer has created a service account in the project where Cloud Build runs and granted it roles/run.admin and roles/storage.objectViewer on the target project. What is the most likely cause of the failure?

A.The service account has been deleted or disabled.

B.Cloud Build’s default compute engine service account is being used instead of the custom one.

C.The Cloud Build service account lacks the iam.serviceAccounts.actAs permission on the Cloud Run runtime service account.

D.The service account must be created in the target project instead of the source project.

AnswerC

To deploy to Cloud Run, Cloud Build must act as the Cloud Run runtime service account.

Why this answer

The error occurs because Cloud Build needs the `iam.serviceAccounts.actAs` permission on the Cloud Run runtime service account to impersonate it when deploying the container. Even though the custom service account has `roles/run.admin` and `roles/storage.objectViewer` on the target project, without the `actAs` permission, Cloud Build cannot assume the runtime service account's identity to push the image and deploy the revision. This permission is typically granted via `roles/iam.serviceAccountUser` on the runtime service account itself.

Exam trap

The trap here is that candidates assume granting high-level roles like `roles/run.admin` on the target project is sufficient, overlooking the separate requirement for the `iam.serviceAccounts.actAs` permission on the specific runtime service account that Cloud Run uses.

How to eliminate wrong answers

Option A is wrong because the question states the service account was created and granted roles, and a deleted/disabled account would produce a 'not found' or 'disabled' error, not a generic permission error when accessing resources. Option B is wrong because the engineer explicitly created a custom service account; Cloud Build would use that custom account if it was properly configured in the trigger, and the default compute engine service account is only used if no custom service account is specified. Option D is wrong because service accounts can be created in any project and granted IAM roles on resources in other projects; the service account does not need to reside in the target project to access its resources.

Full explanation →

84

MCQhard

A multinational corporation is bootstrapping their Google Cloud organization. They have multiple business units in different countries, each with its own compliance requirements (e.g., data residency, encryption keys). The organization structure must support: (1) each business unit as a separate folder with its own admin; (2) projects within each folder must have a label 'bu-<businessunit>'; (3) all resources must be created in regions allowed by the business unit; (4) audit logging must be centralized. They have 200 existing projects and 10,000 VMs. The team wants to use Google Cloud's native tools to enforce these policies without third-party software. What is the most effective first step?

A.Apply organization policies at the root level to restrict regions and labeling, and ignore folders.

B.Use a single folder for all projects and rely on IAM roles to enforce compliance per business unit.

C.Create a separate Cloud Function for each business unit to monitor resources and enforce compliance.

D.Create folders for each business unit, move all projects into corresponding folders using a script, and apply organization policies for allowed regions and labeling.

AnswerD

Folders provide isolation; org policies enforce regions and labels centrally.

Why this answer

Option D is correct because it establishes a folder hierarchy that mirrors the business units, enabling hierarchical inheritance of organization policies. By moving existing projects into the correct folders and applying organization policies for allowed regions and labeling at the folder level, each business unit's compliance requirements are enforced natively without third-party tools. This approach also supports centralized audit logging by enabling audit logs at the organization level, which aggregate logs from all folders and projects.

Exam trap

Google Cloud often tests the misconception that IAM roles alone can enforce compliance policies, when in fact IAM governs who can do what, not what resources can be created or where; organization policies are required for resource-level constraints.

How to eliminate wrong answers

Option A is wrong because applying organization policies at the root level ignores the need for per-business-unit compliance; it would enforce the same region and labeling constraints globally, violating data residency and encryption key requirements. Option B is wrong because using a single folder with IAM roles cannot enforce resource-level constraints like allowed regions or mandatory labels; IAM controls access, not resource configuration. Option C is wrong because creating separate Cloud Functions for monitoring introduces unnecessary complexity, latency, and potential for drift; Google Cloud's native organization policies provide real-time, preventive enforcement at resource creation time, which is more reliable and scalable.

Full explanation →

85

MCQhard

A DevOps engineer notices that developers are accidentally deleting Cloud Storage buckets. The organization wants to prevent accidental deletion while still allowing developers to manage bucket objects. What is the best practice?

A.Set a bucket retention policy with deletion lock.

B.Enable Cloud Audit Logging and set up alerts on bucket deletion.

C.Set a bucket IAM policy denying storage.objects.delete for developers.

D.Use an organization policy to disable bucket deletion across the org.

AnswerA

Retention policy prevents deletion until lock expires.

Why this answer

A bucket retention policy with a deletion lock prevents the deletion of the bucket itself, even by users with owner permissions, while still allowing developers to manage objects within the bucket. This is the only option that directly enforces a hard, irreversible lock on bucket deletion, meeting the requirement to prevent accidental deletion without restricting object management.

Exam trap

The trap here is that candidates often confuse preventing object deletion with preventing bucket deletion, leading them to choose IAM policies that restrict object-level actions (Option C) instead of using the bucket-level retention lock mechanism.

How to eliminate wrong answers

Option B is wrong because Cloud Audit Logging with alerts only notifies administrators after a deletion occurs; it does not prevent the deletion from happening in the first place. Option C is wrong because denying storage.objects.delete prevents developers from deleting objects, which is not the requirement—the organization wants to allow object management while only preventing bucket deletion. Option D is wrong because an organization policy to disable bucket deletion across the org would block all bucket deletions for all users, including authorized administrators, which is overly restrictive and not a best practice for targeted accidental deletion prevention.

Full explanation →

86

MCQmedium

An organization is bootstrapping its Google Cloud environment and needs to establish a secure CI/CD pipeline that deploys infrastructure using Terraform. The pipeline must run in a dedicated project, and Terraform state must be stored in a Cloud Storage bucket. What is the most secure way to grant the CI/CD service account the minimal permissions required to manage the state bucket?

A.Grant the service account the Storage Object Admin role on the service account itself.

B.Grant the service account the Storage Object Admin role at the project level.

C.Grant the service account the Storage Admin role at the project level.

D.Grant the service account the Storage Object Admin role on the specific Cloud Storage bucket.

AnswerD

This grants minimal permissions to manage objects in that bucket only.

Why this answer

Option D is correct because granting the Storage Object Admin role (roles/storage.objectAdmin) on the specific Cloud Storage bucket follows the principle of least privilege. This allows the CI/CD service account to manage objects (including Terraform state files) within that bucket without granting broader permissions to other buckets or project-level storage operations. The pipeline runs in a dedicated project, but the state bucket itself is the resource that needs access control, making a resource-level role assignment the most secure approach.

Exam trap

The trap here is that candidates often confuse granting roles at the project level versus the resource level, mistakenly thinking project-level permissions are required for a single bucket, or they incorrectly assume that roles can be granted on a service account itself rather than on the resource being accessed.

How to eliminate wrong answers

Option A is wrong because granting the Storage Object Admin role on the service account itself is nonsensical—IAM roles are granted to identities (like service accounts) on resources, not on the identity itself; this would not grant any permissions to manage the bucket. Option B is wrong because granting the Storage Object Admin role at the project level would apply the permission to all Cloud Storage buckets in the project, violating least privilege by allowing the service account to manage objects in any bucket, not just the Terraform state bucket. Option C is wrong because the Storage Admin role (roles/storage.admin) at the project level grants full control over all buckets and objects, including the ability to create, delete, and modify bucket configurations, which far exceeds the minimal permissions needed to manage state files and introduces unnecessary security risk.

Full explanation →

87

Multi-Selecthard

A Cloud Build pipeline that deploys a container to Cloud Run fails with the error: `Missing required permission run.routes.invoke`. The Cloud Build service account has the 'Cloud Run Invoker' role. Which TWO additional steps should be taken?

Select 2 answers

A.Ensure the service account has the `iam.serviceAccounts.actAs` permission on the Cloud Run runtime service account.

B.Enable the Cloud Run API in the project.

C.Grant the 'Cloud Run Developer' role to the service account.

D.Use a different service account for deployment.

E.Add the `run.routes.invoke` permission to a custom role.

AnswersA, C

Required to deploy revisions because the service account acts as the runtime SA.

Why this answer

Option A is correct because the Cloud Build service account needs the `iam.serviceAccounts.actAs` permission on the Cloud Run runtime service account (the service account that Cloud Run uses to run the container). Without this permission, Cloud Build cannot impersonate the runtime service account to deploy the container, even if it has the Cloud Run Invoker role. The `run.routes.invoke` error occurs because the deployment process requires the ability to invoke the route, which is tied to the runtime service account's permissions.

Exam trap

Google Cloud often tests the misconception that granting the `run.routes.invoke` permission directly to the Cloud Build service account (via a custom role) will fix the error, when in fact the error arises because the runtime service account lacks the permission and the Cloud Build service account cannot impersonate it without the `actAs` permission.

Full explanation →

88

MCQmedium

Your organization is migrating a monolithic application to microservices on Cloud Run. You need to monitor the health of each microservice and aggregate logs and metrics in a central dashboard. You have set up Cloud Monitoring custom dashboards and logs-based metrics. After the initial deployment, you notice that the dashboards show data only for some services, while others appear to have no metrics. You verify that all services are running and emitting logs. What is the most likely cause?

A.The services are not exporting metrics to Cloud Monitoring via the Monitoring API.

B.The logs-based metrics are not configured to parse logs from all services.

C.The services are running in different GCP projects and you are viewing only one project.

D.The Cloud Monitoring agent is not installed on the Cloud Run instances.

AnswerC

Dashboards in Cloud Monitoring are project-scoped unless configured with a metrics scope.

Why this answer

Option C is correct because Cloud Monitoring dashboards and logs-based metrics are scoped to a single GCP project. If microservices are deployed across multiple projects, metrics from services in other projects will not appear unless the dashboard is configured to aggregate data from all relevant projects. Since the question states that some services have no metrics while all are running and emitting logs, the most likely cause is that those services reside in different projects.

Exam trap

Google Cloud often tests the misconception that all GCP services automatically share monitoring data across projects, when in fact Cloud Monitoring is inherently project-scoped and requires explicit cross-project configuration via metrics scopes.

How to eliminate wrong answers

Option A is wrong because Cloud Run services automatically export built-in metrics (e.g., request count, latency) to Cloud Monitoring without requiring explicit use of the Monitoring API; the issue is not about manual metric export. Option B is wrong because logs-based metrics are defined using filters that apply to all logs ingested into Cloud Logging within the project; if logs are being emitted, the metrics would appear unless the filter explicitly excludes those services, but the question states logs are emitted and the dashboard shows no metrics at all for some services, which points to a project-scope issue. Option D is wrong because Cloud Run is a serverless platform; there is no Cloud Monitoring agent to install on instances—the agent is used for Compute Engine VMs, not Cloud Run.

Full explanation →

89

Matchingmedium

Match each SRE term to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Actual measured service performance

Target reliability level for a service

Contractual commitment to customers

Allowed downtime before SLO breach

Manual, repetitive operational work

Why these pairings

Core SRE concepts from Google's SRE book.

Full explanation →

90

Multi-Selectmedium

Which TWO of the following are best practices for implementing service monitoring in Google Cloud? (Choose 2)

Select 2 answers

A.Set static alert thresholds without considering historical baselines.

B.Use Cloud Monitoring uptime checks to verify that services are reachable from external locations.

C.Use the USE method (Utilization, Saturation, Errors) for service-level monitoring.

D.Define service level indicators (SLIs) using the RED method (Rate, Errors, Duration).

E.Alert on cause-based metrics (e.g., CPU utilization) rather than symptom-based metrics (e.g., latency).

AnswersB, D

Uptime checks verify external accessibility.

Why this answer

Option B is correct because Cloud Monitoring uptime checks verify that a service is reachable from external locations by sending HTTP, HTTPS, or TCP requests from Google Cloud's global vantage points. This validates external availability and helps detect regional outages or DNS resolution issues, which is a core best practice for service monitoring.

Exam trap

Google Cloud often tests the distinction between resource-level monitoring (USE method) and service-level monitoring (RED method), trapping candidates who apply the USE method to services or confuse cause-based alerts with symptom-based alerts.

Full explanation →

91

MCQeasy

A startup wants to implement infrastructure as code for their Google Cloud environment to ensure reproducibility. They are using Terraform and want to manage state securely. What is the recommended approach?

A.Store Terraform state in a file in Cloud Shell home directory.

B.Store Terraform state in Cloud Source Repositories as a YAML file.

C.Store Terraform state locally on each developer's machine and use git to sync.

D.Store Terraform state in a Cloud Storage bucket with object versioning enabled.

AnswerD

Correct as explained.

Why this answer

Storing Terraform state in a Cloud Storage bucket with object versioning enabled is the recommended approach because it provides a centralized, remote backend that supports state locking (via Cloud Storage's object lease mechanism) and version history for rollback. This ensures consistency across team members and protects against state corruption or accidental deletion, aligning with infrastructure as code best practices for reproducibility.

Exam trap

The trap here is that candidates may assume Git-based version control is sufficient for state management, overlooking that Terraform state requires a backend with native locking and versioning support to prevent corruption from concurrent operations.

How to eliminate wrong answers

Option A is wrong because storing state in Cloud Shell's home directory is ephemeral and not shared across team members, leading to state drift and potential data loss when the Cloud Shell environment is reset. Option B is wrong because Cloud Source Repositories is a Git-based version control service for source code, not a state backend; storing state as a YAML file there would bypass Terraform's locking mechanism and risk corruption from concurrent writes. Option C is wrong because storing state locally on each developer's machine and using git to sync introduces merge conflicts and race conditions, as Terraform state is a binary file that cannot be safely merged via Git's text-based diffing.

Full explanation →

92

MCQmedium

You need to monitor a multi-step login flow that involves calling an API, validating a token, and redirecting. Which type of uptime check should you use?

A.Cloud Endpoint check

B.TCP check

C.HTTP GET check

D.Synthetic Monitor (Cloud Functions)

AnswerD

Synthetic monitors can script multi-step flows.

Why this answer

A synthetic monitor using Cloud Functions is the correct choice because it can simulate a multi-step login flow by executing custom code that calls an API, validates a token, and performs a redirect. Unlike simple HTTP GET or TCP checks, synthetic monitors can handle stateful interactions and conditional logic, making them ideal for complex transaction monitoring.

Exam trap

Google Cloud often tests the misconception that an HTTP GET check can handle multi-step flows because it can follow redirects, but in reality, HTTP GET checks in uptime monitoring tools (like Google Cloud Monitoring) do not execute JavaScript or manage session state, making them unsuitable for token validation and conditional redirects.

How to eliminate wrong answers

Option A is wrong because Cloud Endpoint checks are designed for monitoring specific endpoints (e.g., IP addresses or URLs) with basic health probes, not for executing multi-step workflows with token validation and redirects. Option B is wrong because a TCP check only verifies that a port is open and accepting connections; it cannot validate application-layer logic like token authentication or redirect handling. Option C is wrong because an HTTP GET check only performs a single request and checks for a response code (e.g., 200 OK); it cannot follow redirects, maintain session state, or validate tokens across multiple steps.

Full explanation →

93

MCQmedium

Refer to the exhibit. If the error rate spikes to 2% for only 2 minutes, why does the alert not fire?

A.The notification rate limit prevents the alert from firing.

B.The duration of 300s requires the condition to be met for 5 minutes.

C.The threshold value of 1% is too high.

D.The alignment period of 60s is too short.

AnswerB

The spike lasted only 2 minutes, insufficient to meet the duration requirement.

Why this answer

Option B is correct because the alert condition requires the error rate to exceed 1% for a duration of 300 seconds (5 minutes). A spike lasting only 2 minutes does not meet the minimum duration requirement, so the alert remains in a 'pending' state and never fires. In Cisco PCDOE, the duration parameter defines how long the condition must be continuously true before the alert transitions from pending to firing.

Exam trap

Google Cloud often tests the distinction between the threshold value and the duration parameter, tricking candidates into thinking a spike above the threshold should always trigger an alert, when in fact the duration requirement must also be satisfied.

How to eliminate wrong answers

Option A is wrong because the notification rate limit controls how often alerts can send notifications after they have fired, not whether the alert fires in the first place. Option C is wrong because the threshold value of 1% is actually appropriate—the error rate spikes to 2%, which exceeds the threshold, so the threshold itself is not the issue. Option D is wrong because the alignment period of 60s defines how often the metric is evaluated (e.g., averaging over 60-second windows), but it does not affect the duration requirement; a shorter alignment period would not cause the alert to fire if the 300s duration is not met.

Full explanation →

94

MCQmedium

Refer to the exhibit. The Cloud Build fails with a permission error. The Cloud Build service account has roles/cloudbuild.builds.builder and roles/cloudfunctions.developer on the project. What is the missing permission?

A.cloudfunctions.functions.setIamPolicy

B.cloudfunctions.functions.get

C.iam.serviceAccounts.actAs

D.cloudfunctions.functions.sourceCodes.set

AnswerA

Required to set IAM policy for unauthenticated access.

Why this answer

Option A is correct because the --allow-unauthenticated flag requires the cloudfunctions.functions.setIamPolicy permission to make the function publicly accessible. The roles/cloudfunctions.developer does not include this permission. Option B is wrong because cloudfunctions.functions.get is included in the developer role.

Option C is wrong because iam.serviceAccounts.actAs is not needed for this deployment. Option D is wrong because cloudfunctions.functions.sourceCodes.set is part of the developer role.

Full explanation →

95

MCQeasy

A startup wants to implement Infrastructure as Code (IaC) using Terraform for their Google Cloud environment. They need to manage state files securely. What is the best practice?

A.Use a Cloud SQL database to store state.

B.Store state in a Git repository.

C.Store Terraform state in Cloud Storage with uniform bucket-level access using a dedicated bucket.

D.Store state locally on the developer's machine.

AnswerC

Secure and collaborative.

Why this answer

Storing Terraform state in Cloud Storage with uniform bucket-level access is the best practice because it provides a centralized, durable, and versioned backend that supports state locking via object lease mechanisms. This approach prevents concurrent modifications and ensures consistency across team members, while uniform bucket-level access simplifies IAM management by disabling ACLs and enforcing bucket policies exclusively.

Exam trap

Google Cloud often tests the misconception that Git-based state management is acceptable for team collaboration, but the trap here is that Git lacks state locking and exposes sensitive data, making Cloud Storage with uniform bucket-level access the only secure and collaborative option.

How to eliminate wrong answers

Option A is wrong because Cloud SQL is a relational database not designed for Terraform state storage; it lacks native state locking and versioning support, and introduces unnecessary latency and cost. Option B is wrong because storing state in a Git repository exposes sensitive data (e.g., plaintext secrets, resource IDs) in version history, and Git does not provide state locking, leading to race conditions when multiple users run `terraform apply` simultaneously. Option D is wrong because storing state locally on a developer's machine creates a single point of failure, prevents team collaboration, and violates the principle of shared state required for consistent infrastructure management.

Full explanation →

96

MCQmedium

A company uses Cloud Build to deploy applications and wants to ensure that builds from forked repositories cannot access sensitive environment variables. What is the best practice?

A.Disable builds from forks entirely.

B.Use Cloud Build's 'encrypted variables' and mark them as 'not available for pull requests from forks'.

C.Use Cloud Build's 'substitutions' instead of environment variables.

D.Use Cloud Build's 'secrets' manager instead of environment variables.

AnswerB

This option explicitly hides the variables from forked PR builds.

Why this answer

Option B is correct because Cloud Build allows you to mark encrypted variables as 'not available for pull requests from forks', which prevents forked PRs from accessing sensitive environment variables. This is the most direct and secure method to protect secrets in CI/CD pipelines involving external contributions.

Exam trap

The trap here is that candidates often assume Cloud Secret Manager is always the best choice for secrets, but the question specifically asks about preventing access from forked repositories, which requires the fork-specific restriction that only encrypted variables with the 'not available for pull requests from forks' flag provide.

How to eliminate wrong answers

Option A is wrong because disabling builds from forks entirely is overly restrictive and prevents legitimate contributions from external developers. Option C is wrong because substitutions are not designed for secrets; they are plain-text variables that can be overridden at build time and are still accessible to forked PRs if not explicitly restricted. Option D is wrong because while Cloud Secret Manager is a secure way to store secrets, it does not inherently restrict access based on fork status; you would still need additional controls to prevent forked PRs from accessing the secret.

Full explanation →

97

MCQeasy

A backend service receives bursts of requests that cause timeouts. The team wants to smooth out the load while ensuring all requests are processed eventually. Which strategy should they use?

A.Use Cloud Tasks to queue incoming requests and process at a controlled rate

B.Implement client-side rate limiting

C.Use Cloud Load Balancing with connection draining

D.Increase the number of backend instances

AnswerA

Cloud Tasks decouples request submission from processing, allowing smooth rate-controlled execution.

Why this answer

Cloud Tasks is designed to decouple request processing by queuing incoming requests and then dispatching them to a target handler at a controlled rate. This allows the backend to process requests smoothly, preventing timeouts during bursts, while ensuring every request is eventually processed through retry and dead-letter mechanisms.

Exam trap

Google Cloud often tests the distinction between load balancing (which distributes traffic) and queuing (which buffers traffic); candidates mistakenly choose connection draining or scaling because they think smoothing load is about distributing or adding capacity, not about buffering and rate-limiting.

How to eliminate wrong answers

Option B is wrong because client-side rate limiting only controls the rate at which individual clients send requests, but it does not smooth out load from multiple clients or guarantee that all requests are eventually processed; it simply drops or delays requests at the source. Option C is wrong because Cloud Load Balancing with connection draining is used to gracefully terminate existing connections during instance shutdown, not to queue or rate-limit incoming requests; it does not provide a buffer for burst traffic. Option D is wrong because increasing the number of backend instances improves capacity but does not smooth out load; bursts can still overwhelm the system if the rate of incoming requests exceeds the aggregate processing capacity, and it does not guarantee eventual processing of all requests without a queue.

Full explanation →

98

MCQmedium

A DevOps team is setting up a Google Cloud organization and wants to ensure that all billing alerts are centrally managed. What should they do?

A.Use Cloud Monitoring to create custom metrics for billing.

B.Set up billing alerts individually for each project.

C.Create a project dedicated to billing and manage alerts there.

D.Set up billing alerts at the organization level using Cloud Billing.

AnswerD

Centralized management of alerts.

Why this answer

Setting up billing alerts at the organization level using Cloud Billing (Option D) is correct because it allows centralized management of billing thresholds, notifications, and budget policies across all projects within the organization. This approach leverages the Cloud Billing budget feature, which can be configured at the billing account level to send alerts to Pub/Sub topics or email recipients, ensuring consistent oversight without per-project configuration.

Exam trap

The trap here is that candidates often confuse project-level billing alerts with organization-level billing alerts, assuming a dedicated billing project (Option C) provides central management, when in fact billing alerts must be configured at the billing account level to apply across all projects in the organization.

How to eliminate wrong answers

Option A is wrong because Cloud Monitoring custom metrics are used for monitoring application or infrastructure performance, not for billing alerts; billing alerts are managed through Cloud Billing budgets, not Monitoring. Option B is wrong because setting up billing alerts individually for each project defeats centralized management, leading to administrative overhead and potential inconsistencies in alert thresholds. Option C is wrong because creating a dedicated project for billing does not centralize alerts; billing alerts are tied to the billing account, not to a specific project, and alerts must be configured at the billing account or organization level to apply across all projects.

Full explanation →

99

MCQeasy

Refer to the exhibit. The output shows three folders created directly under the organization node. Which gcloud command was most likely executed to produce this output?

A.gcloud resource-manager folders list --organization=123456789012

B.gcloud organizations list

C.gcloud projects list --filter='parent.id:123456789012'

D.gcloud resource-manager folders list --folder=123456789012

AnswerA

This command lists folders under the given organization, matching the exhibit.

Why this answer

Option A is correct because the `gcloud resource-manager folders list --organization=123456789012` command lists all folders directly under the specified organization node. The output shows three folders (e.g., folder1, folder2, folder3) at the top level, which matches the behavior of listing folders with the organization ID as the parent. This command is specifically designed to retrieve immediate child folders of an organization, not nested folders.

Exam trap

Google Cloud often tests the distinction between listing resources under an organization vs. under a folder, and the trap here is that candidates confuse `--organization` with `--folder` or assume `gcloud projects list` can list folders, when in fact it only lists projects.

How to eliminate wrong answers

Option B is wrong because `gcloud organizations list` lists all organizations accessible to the authenticated user, not folders under a specific organization. Option C is wrong because `gcloud projects list --filter='parent.id:123456789012'` lists projects (not folders) that have the specified organization as their parent; it would show projects, not folder names. Option D is wrong because `gcloud resource-manager folders list --folder=123456789012` lists folders under a specific folder (using a folder ID), not under an organization; using an organization ID as a folder ID would either fail or return incorrect results because the resource type is different.

Full explanation →

100

MCQmedium

You are a DevOps engineer tasked with bootstrapping a Google Cloud organization for a company that develops a SaaS product. The company has three teams: Platform, Application, and Data. Each team needs to manage their own projects, but the network should be centrally managed. You decide to use a shared VPC. You create a host project 'shared-vpc-host' and attach three service projects: 'platform-service', 'app-service', and 'data-service'. You grant the Network Admin role to the Platform team for the host project. The Application team needs to deploy Compute Engine instances in their service project, but they should not be able to modify network resources. You grant them the Compute Instance Admin role at the service project level. However, the Application team reports that they cannot create instances because they don't have permission to use the subnets in the shared VPC. What is the most likely missing step?

A.Grant the Application team the Compute Network Admin role on the host project.

B.Grant the Application team the Compute Network User role on the service project.

C.Grant the Application team the Compute Network User role on the host project or the specific subnets.

D.Grant the Application team the roles/compute.subnetUser on the subnet.

AnswerC

Compute Network User allows using subnets without managing them.

Why this answer

The Application team needs the Compute Network User role (roles/compute.networkUser) on the host project or the specific subnets to use the shared VPC subnets. This role allows them to attach instances to existing subnets without granting permission to modify network resources. Granting it at the service project level (Option B) is insufficient because the subnet permissions are inherited from the host project in a shared VPC setup.

Exam trap

Google Cloud often tests the misconception that granting a role on the service project is sufficient for shared VPC subnet access, when in fact the Compute Network User role must be granted on the host project or the specific subnets to allow instance attachment.

How to eliminate wrong answers

Option A is wrong because the Compute Network Admin role (roles/compute.networkAdmin) grants full control over network resources, which violates the requirement that the Application team should not be able to modify network resources. Option B is wrong because the Compute Network User role must be granted on the host project or the specific subnets, not on the service project, as subnet permissions are managed at the host project level in a shared VPC. Option D is wrong because roles/compute.subnetUser is not a valid predefined role; the correct role is roles/compute.networkUser, and it must be granted on the host project or subnet, not just the subnet without specifying the host project context.

Full explanation →

101

MCQeasy

You are bootstrapping a Google Cloud organization for a DevOps team. You need to set up a shared VPC host project that will be used by multiple service projects. What is the minimal set of roles required for the DevOps team to create and manage service projects in the host project?

A.Project Creator and Service Project Admin

B.Compute Network Admin and Service Project Admin

C.Compute Shared VPC Admin

D.Owner and Service Project Admin

AnswerB

Compute Network Admin manages networks; Service Project Admin attaches service projects.

Why this answer

Option B is correct because the minimal set of roles required for a DevOps team to create and manage service projects in a shared VPC host project is Compute Network Admin (roles/compute.networkAdmin) on the host project and Service Project Admin (roles/compute.xpnAdmin) at the organization or folder level. Compute Network Admin grants permissions to manage networking resources, while Service Project Admin allows attaching service projects to the host project. Without both, the team cannot configure shared VPC networking or associate service projects.

Exam trap

Google Cloud often tests the misconception that a single role like Compute Shared VPC Admin is sufficient, but the exam trap is that you need both Compute Network Admin (for network management) and Service Project Admin (for project association) to fully bootstrap and manage shared VPC service projects.

How to eliminate wrong answers

Option A is wrong because Project Creator (roles/resourcemanager.projectCreator) only allows creating new projects but does not grant permissions to manage shared VPC networking or attach service projects to a host project. Option C is wrong because Compute Shared VPC Admin (roles/compute.xpnAdmin) alone allows attaching service projects but lacks the network management permissions (e.g., to create subnets or firewall rules) needed to fully manage the shared VPC environment. Option D is wrong because Owner (roles/owner) is a highly privileged role that grants full control over all resources, which is excessive for the minimal set required; it violates the principle of least privilege and is not minimal.

Full explanation →

102

MCQmedium

A team deploys a microservice on Google Kubernetes Engine (GKE) that processes user uploads. The service latency has increased over time. Monitoring shows that CPU utilization is low, but memory usage is high and garbage collection (GC) pauses are frequent. Which action is most likely to reduce latency?

A.Scale out the deployment by increasing the number of replicas.

B.Reduce the number of replicas to concentrate load.

C.Increase the CPU limit to allow faster processing.

D.Increase the memory limit and requests for the container.

AnswerD

More memory reduces GC frequency and pauses.

Why this answer

Frequent GC pauses and high memory usage with low CPU indicate the JVM heap is too small, causing the garbage collector to run more often. Increasing the memory limit and requests gives the JVM more headroom to reduce GC frequency, directly lowering latency. This is a classic JVM tuning scenario in containerized environments like GKE.

Exam trap

Google Cloud often tests the misconception that scaling out or adding CPU fixes all performance issues, but here the symptom of low CPU and high memory with GC pauses specifically points to a memory constraint, not a throughput or compute bottleneck.

How to eliminate wrong answers

Option A is wrong because scaling out replicas does not address the root cause of GC thrashing; it distributes the same memory-constrained workload across more pods, each still suffering from frequent GC pauses. Option B is wrong because reducing replicas concentrates the load on fewer pods, worsening memory pressure and GC frequency. Option C is wrong because increasing CPU limits does not help when CPU utilization is already low; the bottleneck is memory, not compute.

Full explanation →

103

Multi-Selecthard

Which THREE approaches can help reduce egress costs while improving performance for a multi-region application using Cloud Load Balancing? (Choose 3)

Select 3 answers

A.Optimize data transfer by compressing responses.

B.Use internal load balancers for traffic between regions.

C.Increase the number of instances in each region.

D.Use Cloud CDN to cache content at edge locations.

E.Use premium tier networking for lower latency.

AnswersA, B, D

Compression reduces data transferred and egress cost.

Why this answer

Option A is correct because compressing responses reduces the amount of data transferred over the network, which directly lowers egress costs charged by cloud providers. Smaller payloads also reduce latency and improve perceived performance for end users, as less data needs to travel across regions.

Exam trap

Google Cloud often tests the misconception that adding more instances or using premium networking always improves performance and reduces costs, but in reality, these actions increase costs without addressing egress charges, while compression, CDN, and internal load balancers directly target data transfer volume and routing.

Full explanation →

104

MCQhard

A DevOps engineer is troubleshooting a Cloud Build failure. The build log shows the error: 'Permission denied for resource projects/my-project/locations/us-central1/repositories/my-repo'. The Cloud Build service account (PROJECT_NUMBER@cloudbuild.gserviceaccount.com) is used. What is the most likely missing role?

A.roles/artifactregistry.reader

B.roles/artifactregistry.admin

C.roles/cloudbuild.builds.builder

D.roles/artifactregistry.writer

AnswerD

This allows pushing artifacts to repositories.

Why this answer

The error 'Permission denied for resource projects/my-project/locations/us-central1/repositories/my-repo' indicates that the Cloud Build service account lacks permission to write artifacts to Artifact Registry. The Cloud Build service account needs the `roles/artifactregistry.writer` role to upload build artifacts (e.g., container images) to the repository. Without this role, the build fails at the step that attempts to push artifacts.

Exam trap

The trap here is that candidates often confuse `roles/artifactregistry.reader` (which only allows pulling) with the write permission needed for pushing artifacts, or they mistakenly think the default Cloud Build service account role includes Artifact Registry access.

How to eliminate wrong answers

Option A is wrong because `roles/artifactregistry.reader` only allows reading artifacts (e.g., pulling images), not writing them, so it would not resolve a permission denied error during artifact upload. Option B is wrong because `roles/artifactregistry.admin` grants full administrative control over repositories, which is excessive and violates the principle of least privilege; the build only needs write access, not admin rights. Option C is wrong because `roles/cloudbuild.builds.builder` is the default role for Cloud Build service accounts and provides permissions to execute builds, but it does not include Artifact Registry write permissions; the error is specifically about Artifact Registry, not Cloud Build itself.

Full explanation →

105

MCQhard

A team wants to implement multi-cluster monitoring for GKE using Managed Service for Prometheus. Which configuration is required?

A.Enable Managed Service for Prometheus in one cluster and have other clusters forward metrics to it

B.Enable Managed Service for Prometheus in each cluster and configure a single Cloud Monitoring workspace to collect metrics from all clusters

C.Use Cloud Monitoring agent on nodes in each cluster

D.Set up a separate workspace per cluster

AnswerB

This aggregates metrics from multiple clusters in one workspace.

Why this answer

Managed Service for Prometheus is a Google Cloud-managed, multi-cluster monitoring solution. To collect metrics from multiple GKE clusters, you must enable the service in each cluster individually and then configure a single Cloud Monitoring workspace to aggregate the data. This ensures each cluster runs its own collection pipeline, while the workspace provides a unified view across all clusters.

Exam trap

The trap here is that candidates assume a single 'central' cluster can aggregate metrics from others (like a traditional Prometheus federation), but Managed Service for Prometheus requires each cluster to independently send metrics to a shared Cloud Monitoring workspace.

How to eliminate wrong answers

Option A is wrong because Managed Service for Prometheus does not support a hub-and-spoke forwarding model; each cluster must run its own collection pipeline and cannot simply forward metrics to another cluster's service. Option C is wrong because the Cloud Monitoring agent (formerly Stackdriver agent) is a legacy solution for collecting system metrics and logs, not Prometheus metrics; Managed Service for Prometheus uses its own managed collector based on the Prometheus server, not the Cloud Monitoring agent. Option D is wrong because using separate workspaces per cluster defeats the purpose of multi-cluster monitoring, which requires a single workspace to aggregate and visualize metrics from all clusters in one place.

Full explanation →

106

Multi-Selecteasy

Which THREE are required steps when setting up a CI/CD pipeline with Cloud Build for the first time? (Choose three.)

Select 3 answers

A.Create a custom base image for builds.

B.Enable the Cloud Build API.

C.Grant the Cloud Build service account the roles/cloudbuild.builds.builder role.

D.Set up a Cloud Router for network connectivity.

E.Create a trigger with a repository connection.

AnswersB, C, E

APIs must be enabled for the service to work.

Why this answer

Option B is correct because the Cloud Build API must be enabled in your Google Cloud project before you can use any Cloud Build features, including triggers and builds. Without enabling the API, Cloud Build services are unavailable, and any attempt to create or run builds will fail with an API not found error.

Exam trap

Google Cloud often tests the distinction between mandatory prerequisites (like enabling the API and granting service account roles) and optional enhancements (like custom base images or network configuration), leading candidates to select unnecessary steps as required.

Full explanation →

107

MCQmedium

Refer to the exhibit. A DevOps engineer notices that instance-1 runs on older CPU platform. The application is sensitive to CPU features that are only available on Skylake or newer. Which action should be taken to optimize performance?

A.Live migrate instance-1 to a different host.

B.Use Terraform to add a lifecycle rule to ignore changes.

C.Terminate instance-1 and recreate it with a newer machine type.

D.Stop instance-1 and update the minimum CPU platform to Skylake.

AnswerD

Setting min-cpu-platform ensures the instance runs on at least Skylake.

Why this answer

Option D is correct because stopping the instance and updating the minimum CPU platform to Skylake ensures that the instance is rescheduled onto a host that meets the required CPU feature set. This action directly addresses the application's sensitivity to CPU features available only on Skylake or newer, without requiring a full recreation or risking compatibility issues during live migration.

Exam trap

Google Cloud often tests the distinction between live migration (which preserves the current CPU platform) and stop/start actions (which can change the host and CPU platform), leading candidates to incorrectly choose live migration as a quick fix.

How to eliminate wrong answers

Option A is wrong because live migration does not change the underlying CPU platform; it moves the instance to another host of the same or similar CPU generation, so the older CPU platform issue persists. Option B is wrong because adding a lifecycle rule in Terraform to ignore changes only prevents infrastructure-as-code drift and does not affect the actual CPU platform of the running instance. Option C is wrong because terminating and recreating the instance with a newer machine type is unnecessarily disruptive; stopping and updating the minimum CPU platform achieves the same result without losing the instance's metadata, attached disks, or network configuration.

Full explanation →

108

MCQhard

A CI/CD pipeline must deploy to multiple environments (dev, staging, prod) with manual approval required before prod deployment. Which Google Cloud service is best for orchestrating this?

A.Spinnaker

B.Cloud Deploy with promotion

C.Jenkins on GKE

D.Cloud Build with manual steps

AnswerB

Cloud Deploy supports manual approvals and promotion across targets.

Why this answer

Cloud Deploy provides a delivery pipeline with sequential stages and manual approval gates, making it ideal for multi-environment deployments.

Full explanation →

109

Multi-Selecthard

A team uses Cloud Build with a cloudbuild.yaml that deploys to multiple environments. They want to ensure that the production deployment step only runs when the build is triggered by a tag matching 'v*.*.*'. Which TWO configurations achieve this? (Choose two.)

Select 2 answers

A.In the cloudbuild.yaml, use a 'waitFor' condition that only runs the production step when the substitution variable $TAG_NAME matches 'v*.*.*'.

B.Create a Cloud Build trigger with a tag filter '^v[0-9]+\.[0-9]+\.[0-9]+$' and use that trigger for production deployments.

C.In the cloudbuild.yaml, add a condition that checks if the branch name matches 'v*.*.*'.

D.Create a separate cloudbuild.yaml for production and use a branch filter '^main$' to trigger it.

E.Configure a manual approval step in Cloud Build that requires a production manager to approve before running the production deployment.

AnswersA, B

Conditional step execution based on tag substitution.

Why this answer

Option A is correct because Cloud Build supports substitution variables like $TAG_NAME, which can be used in a 'waitFor' condition or as part of a step's entrypoint logic to gate execution. By checking if $TAG_NAME matches the glob pattern 'v*.*.*', the production deployment step will only run when the build is triggered by a matching tag, ensuring environment-specific control within a single cloudbuild.yaml.

Exam trap

Google Cloud often tests the distinction between branch-based and tag-based triggers, and candidates mistakenly apply branch filters (like '^main$') or branch-name checks when the requirement explicitly specifies tag-based triggers, leading them to select options C or D.

Full explanation →

110

MCQhard

Your company is bootstrapping a Google Cloud organization for DevOps. The organization consists of three folders: Dev, Staging, and Prod. Each folder contains multiple projects for different microservices. You have been tasked with setting up a centralized CI/CD pipeline using Cloud Build and Cloud Deploy. The pipeline must deploy to multiple environments in sequence: Dev → Staging → Prod. Each environment requires approval from a different approver group. You have set up Cloud Deploy delivery pipelines with targets pointing to each environment. However, during testing, you notice that after a successful deployment to Dev, the pipeline automatically proceeds to Staging without waiting for approval. What is the most likely cause and solution?

A.Cause: The delivery pipeline is defined with a single target instead of multiple targets. Solution: Create separate delivery pipelines for each environment.

B.Cause: The delivery pipeline has a single promotion sequence that includes all targets. Solution: Remove Staging and Prod from the pipeline and create separate pipelines.

C.Cause: The Cloud Deploy service account lacks the `clouddeploy.approver` role on Staging and Prod projects. Solution: Grant the role to the service account.

D.Cause: The targets in Staging and Prod are missing the `require_approval` attribute set to `true`. Solution: Add `require_approval: true` to the Staging and Prod target definitions.

AnswerD

Correct: Without `require_approval: true`, Cloud Deploy proceeds automatically.

Why this answer

The correct answer is D because Cloud Deploy requires explicit `require_approval: true` on each target to enforce manual approval gates. Without this attribute, the pipeline treats the target as automatically approved and proceeds to the next target in the promotion sequence. The behavior described—automatic progression after Dev—indicates that Staging and Prod targets lack this flag, causing the pipeline to skip the approval step.

Exam trap

Google Cloud often tests the distinction between IAM roles (like `clouddeploy.approver`) and target-level configuration (`require_approval`), leading candidates to mistakenly focus on permissions when the real issue is a missing attribute in the target definition.

How to eliminate wrong answers

Option A is wrong because having a single target would prevent deployment to multiple environments, not cause automatic progression; the pipeline already has multiple targets. Option B is wrong because removing targets from the pipeline would break the sequential deployment requirement; a single pipeline with multiple targets is the correct approach for sequential promotions. Option C is wrong because the `clouddeploy.approver` role is used for approving rollouts, not for triggering automatic progression; the issue is about approval enforcement, not permissions.

Full explanation →

111

Multi-Selecthard

Which TWO are valid methods to manage service account keys securely? (Select exactly 2)

Select 2 answers

A.Rotate keys manually every 90 days.

B.Automate key rotation using a Cloud Function.

C.Use workload identity federation to avoid keys.

D.Embed keys in application code.

E.Store keys in Cloud Storage with public access.

AnswersB, C

Automates rotation, reducing risk.

Why this answer

Option B is correct because automating key rotation with a Cloud Function ensures service account keys are rotated on a schedule without manual intervention, reducing the risk of key exposure. This aligns with Google Cloud's best practices for key management, as it enforces rotation policies programmatically and can integrate with Secret Manager to securely store and version keys.

Exam trap

Google Cloud often tests the misconception that manual rotation is acceptable, but the PCDOE exam emphasizes automation and keyless authentication as the only secure methods for managing service account keys at scale.

Full explanation →

112

Multi-Selectmedium

Which TWO options are valid methods to create a custom metric descriptor in Cloud Monitoring?

Select 2 answers

A.Creating a log-based metric

B.Using the gcloud commands: gcloud monitoring metrics create

C.Using Terraform resource google_monitoring_metric_descriptor

D.Deploying a Prometheus exporter with the Ops Agent

E.Using the Cloud Monitoring API CreateMetricDescriptor

AnswersB, E

gcloud CLI can create custom metric descriptors.

Why this answer

Option B is correct because the `gcloud monitoring metrics create` command directly invokes the Cloud Monitoring API to create a custom metric descriptor. Option E is correct because the Cloud Monitoring API's `CreateMetricDescriptor` method is the programmatic way to define a custom metric, specifying its type, unit, and labels. Both methods result in a new metric descriptor being registered in the project's metric store.

Exam trap

Google Cloud often tests the distinction between creating a metric descriptor (the schema) and writing metric data (time series); candidates confuse log-based metrics (which auto-create a descriptor) with the explicit creation of a custom descriptor, or assume that any tool that produces metrics (like Prometheus exporters) directly creates a descriptor, when in fact the Ops Agent handles descriptor creation automatically for Prometheus endpoints.

Full explanation →

113

MCQmedium

A DevOps engineer is setting up alerting policies for a critical API service. They want to receive an alert if the error rate exceeds 5% for at least 5 minutes, but only during business hours (9 AM to 5 PM). Which approach should they use?

A.Create a log-based metric for errors and use a condition with a threshold, then set the alert policy to only run during business hours using the 'condition' schedule.

B.Create an alerting policy with a condition that triggers when the error rate is above 5% for 5 minutes, and configure the notification channel to only send notifications during business hours using a webhook receiver that checks time.

C.Create two separate alert policies, one for business hours and one for off-hours, each with different thresholds.

D.Use Cloud Scheduler to enable and disable the alerting policy at the start and end of business hours.

AnswerB

This approach uses a custom notification channel to filter by time.

Why this answer

Option B is correct because it uses a single alerting policy with a condition that triggers when the error rate exceeds 5% for 5 minutes, and then controls notification delivery via a webhook receiver that checks the current time. This approach ensures the alert is evaluated continuously (so the 5-minute window is respected) but only notifications are suppressed outside business hours, which is the most reliable way to meet the requirement without missing alert evaluations or relying on external scheduling.

Exam trap

Google Cloud often tests the distinction between 'when the alert is evaluated' versus 'when notifications are sent' — candidates mistakenly think scheduling the condition itself (Option A) or toggling the entire policy (Option D) is valid, but the correct approach is to keep evaluation always on and only filter notifications.

How to eliminate wrong answers

Option A is wrong because log-based metrics and conditions do not have a 'condition schedule' to restrict evaluation to business hours; alert conditions evaluate continuously, and scheduling is applied at the policy level, not the condition level. Option C is wrong because creating two separate policies with different thresholds would either trigger alerts off-hours (if thresholds are the same) or miss the 5% threshold requirement (if thresholds differ), and it unnecessarily duplicates management overhead. Option D is wrong because using Cloud Scheduler to enable/disable the alerting policy would stop all evaluation during off-hours, meaning the 5-minute window would not be maintained across the boundary (e.g., an error spike starting at 4:58 PM would not be detected until 9 AM the next day, breaking the 'at least 5 minutes' requirement).

Full explanation →

114

MCQmedium

A gaming company runs a real-time multiplayer server on GKE. They want to minimize latency between players worldwide. Which approach should they use?

A.Increase the number of nodes in the cluster

B.Use a single regional cluster

C.Use Cloud Functions

D.Use a multi-cluster setup with clusters in multiple regions and use a global load balancer

AnswerD

Deploys servers near players, reducing round-trip time.

Why this answer

A multi-cluster setup with clusters in multiple regions, fronted by a global load balancer, minimizes latency by placing game servers physically closer to players worldwide. The global load balancer uses Anycast IP and Google Front Ends (GFEs) to route traffic to the nearest healthy backend cluster, reducing round-trip time (RTT). This approach is specifically designed for real-time multiplayer workloads that require low latency across geographic regions.

Exam trap

Google Cloud often tests the misconception that scaling nodes (Option A) or using a single regional cluster (Option B) can solve global latency issues, when in fact geographic distribution is required for low-latency worldwide coverage.

How to eliminate wrong answers

Option A is wrong because increasing the number of nodes in a single cluster does not reduce geographic latency; it only increases compute capacity within the same region, leaving distant players with high latency. Option B is wrong because a single regional cluster serves only one geographic area, forcing players far from that region to experience high latency, which defeats the goal of minimizing worldwide latency. Option C is wrong because Cloud Functions are stateless, short-lived, and not designed for persistent, real-time multiplayer game sessions; they lack the low-latency, stateful, and long-running connection requirements of a gaming server.

Full explanation →

115

Multi-Selecteasy

A DevOps team wants to optimize the performance of a Cloud Run service that experiences sporadic traffic. Which TWO strategies should they implement?

Select 2 answers

A.Set min-instances to 5 to avoid cold starts

B.Use Cloud Run for Anthos on GKE for better performance

C.Use Cloud Scheduler to trigger the service periodically

D.Reduce container image size and use startup probes

E.Enable CPU boost during cold starts

AnswersD, E

Smaller image loads faster, and startup probes delay traffic until the instance is ready, preventing errors.

Why this answer

Option D is correct because reducing the container image size decreases the time required to pull and start the container, directly mitigating cold start latency. Startup probes allow Cloud Run to defer sending traffic until the container is ready, preventing premature request failures and improving perceived performance. Together, these optimizations address the root causes of cold starts in a serverless environment.

Exam trap

Google Cloud often tests the misconception that setting min-instances or using periodic triggers (like Cloud Scheduler) are effective performance optimizations for sporadic traffic, when in fact they are cost-increasing workarounds that do not address the fundamental cold start problem.

Full explanation →

116

MCQmedium

Your company runs an e-commerce application on Google Kubernetes Engine (GKE) with a microservice architecture. During a Black Friday sale, the orders service experiences a sudden increase in latency and errors. You notice that the database connection pool in the orders service is exhausted, leading to timeouts. The service is written in Java and uses HikariCP connection pool. You need to mitigate the incident quickly. Which action should you take first?

A.Increase the number of replicas of the orders service.

B.Add more database instances.

C.Enable connection pooling at the database side.

D.Temporarily reduce the maximum connection pool size.

AnswerA

More replicas mean more connection pools, increasing total connections and reducing load per pod.

Why this answer

Increasing the number of replicas of the orders service is the fastest way to mitigate the incident because it horizontally scales the application tier, distributing incoming requests across more pods. This reduces the load per pod, which in turn reduces the number of concurrent database connections each pod attempts to acquire from its HikariCP pool, alleviating pool exhaustion and timeouts without requiring a database or code change.

Exam trap

Google Cloud often tests the misconception that database connection pool exhaustion is always a database-side problem, leading candidates to choose database scaling or connection pooling changes instead of recognizing that the immediate fix is to scale the application tier.

How to eliminate wrong answers

Option B is wrong because adding more database instances addresses database-side capacity but does not directly solve the connection pool exhaustion at the application tier; the existing pods would still attempt to open connections to the new instances, and the pool exhaustion is a client-side issue. Option C is wrong because connection pooling at the database side (e.g., PgBouncer or ProxySQL) is a valid architectural improvement but requires deployment and configuration changes that take time, making it unsuitable for immediate mitigation during an incident. Option D is wrong because temporarily reducing the maximum connection pool size would worsen the bottleneck by allowing even fewer concurrent database operations, increasing queueing and latency rather than resolving the exhaustion.

Full explanation →

117

MCQmedium

A team wants to simulate real-world user traffic to identify performance bottlenecks before a launch. Which tool should they use to generate load from multiple regions?

A.Cloud Monitoring

B.gcloud beta load test

C.Cloud Load Testing (Distributed Load Testing on GCP)

D.ab (Apache Benchmark)

AnswerC

This solution generates distributed load from multiple regions using Compute Engine instances.

Why this answer

Cloud Load Testing (Distributed Load Testing on GCP) is the correct choice because it is a managed service that can generate synthetic traffic from multiple geographic regions simultaneously, simulating real-world user distribution. This allows the team to identify performance bottlenecks across different network paths and regional endpoints before launch.

Exam trap

Google Cloud often tests the distinction between monitoring tools (which observe) and load generation tools (which create traffic), leading candidates to mistakenly choose Cloud Monitoring because it sounds related to performance analysis.

How to eliminate wrong answers

Option A is wrong because Cloud Monitoring is a observability and alerting service, not a load generation tool; it can monitor performance but cannot generate traffic. Option B is wrong because 'gcloud beta load test' is not a valid gcloud command; the correct command for load testing is 'gcloud alpha loadtest' or using the Cloud Load Testing API, but even that does not natively support multi-region traffic generation without additional configuration. Option D is wrong because ab (Apache Benchmark) is a single-threaded, single-origin HTTP benchmarking tool that cannot generate load from multiple regions or simulate distributed user traffic.

Full explanation →

118

MCQeasy

Refer to the exhibit. A DevOps engineer needs to view the cost report for a project. Which role should be granted at the organization level to allow viewing billing data for all projects in the organization?

A.roles/billing.admin

B.roles/billing.creator

C.roles/billing.viewer

D.roles/billing.user

AnswerC

Billing viewer has read access to billing data for the entire billing account.

Why this answer

Option C is correct because roles/billing.viewer provides read-only access to billing data. Option A is too permissive. Option B allows linking billing accounts but not viewing reports.

Option D is not a valid role.

Full explanation →

119

MCQeasy

A small startup uses Cloud Functions for their backend and wants to monitor function execution times and error rates. They have enabled Cloud Monitoring and are viewing metrics in the Cloud Console. They notice that the execution time metric for a particular function shows an average of 200ms, but occasionally there are spikes to 5 seconds, which correspond to user-reported slow responses. They want to be alerted when the function exceeds 1 second for any invocation. What is the simplest way to achieve this?

A.Create a log-based metric for function duration and set a threshold alert.

B.Configure a Cloud Monitoring uptime check for the function URL.

C.Use the built-in Cloud Functions latency metric and create a metric threshold alert for the max value over 1 minute.

D.Use Cloud Error Reporting to capture slow responses.

AnswerC

This directly uses the existing metric and alerts on the maximum value, catching spikes.

Why this answer

Option C is correct because Cloud Functions automatically emits a built-in `execution_time` metric (measured in milliseconds) to Cloud Monitoring. By creating a metric threshold alert on the `max` value of this metric over a 1-minute window, you can trigger an alert whenever any single invocation exceeds 1 second, directly matching the requirement to be alerted per invocation spike.

Exam trap

Google Cloud often tests the distinction between built-in metrics and log-based metrics, and the trap here is that candidates overcomplicate by choosing log-based metrics (Option A) when a simpler built-in metric already satisfies the requirement.

How to eliminate wrong answers

Option A is wrong because log-based metrics require you to parse structured logs and define custom metrics, which adds unnecessary complexity when a built-in metric already exists for execution time. Option B is wrong because an uptime check only verifies that the function URL is reachable and returns a response; it does not measure execution duration or error rates for individual invocations. Option D is wrong because Cloud Error Reporting captures only errors (exceptions, crashes), not slow responses that complete successfully but exceed a latency threshold.

Full explanation →

120

Multi-Selectmedium

A team is troubleshooting a slow response time on an App Engine standard environment application. The application uses Cloud SQL as its database. Which TWO actions should the team take to identify the bottleneck?

Select 2 answers

A.Examine App Engine request logs for latency patterns.

B.Increase the number of App Engine instances.

C.Enable Cloud SQL slow query logging and analyze long-running queries.

D.Enable Cloud CDN to cache responses.

E.Disable caching to ensure fresh data.

AnswersA, C

Correlates with slow queries.

Why this answer

Option A is correct because examining App Engine request logs reveals latency patterns, such as which endpoints or operations are slow, helping to pinpoint whether the bottleneck is in the application code, network, or database. Option C is correct because enabling Cloud SQL slow query logging identifies long-running SQL queries that could be causing database contention or inefficient data retrieval, directly addressing a common performance bottleneck in App Engine applications using Cloud SQL.

Exam trap

Google Cloud often tests the misconception that scaling the application tier (more instances) always improves performance, but the trap here is that the bottleneck may be at the database layer, where scaling the app tier without addressing database issues can actually degrade performance due to increased connection contention.

Full explanation →

121

MCQmedium

What is the effect of the 'timeshiftDuration' of '3600s' in the dashboard widget?

A.The chart compares current data to data from 1 hour ago

B.The chart shows data from the last 1 hour

C.The metric is aggregated over 1 hour

D.The dashboard updates every hour

AnswerA

Timeshift adds a series shifted by the specified duration.

Why this answer

The 'timeshiftDuration' parameter in a dashboard widget shifts the time range of the comparison data relative to the primary time range. Setting 'timeshiftDuration' to '3600s' means the chart compares the current data (within the selected time range) against data from exactly 1 hour (3600 seconds) earlier. This is commonly used for offset comparisons, such as week-over-week or hour-over-hour analysis, and does not alter the primary data range or aggregation.

Exam trap

Google Cloud often tests the distinction between 'timeshiftDuration' (comparison offset) and 'timeRange' (primary data window), leading candidates to confuse shifting the comparison data with changing the chart's visible time range.

How to eliminate wrong answers

Option B is wrong because 'timeshiftDuration' does not set the time range of the chart; it only shifts the comparison data. The chart's primary time range is defined separately (e.g., by 'timeRange' or the dashboard's global time picker). Option C is wrong because 'timeshiftDuration' has no effect on metric aggregation; aggregation is controlled by the 'aggregation' or 'rollup' parameter (e.g., 'avg', 'sum', 'count').

Option D is wrong because 'timeshiftDuration' does not control dashboard refresh intervals; refresh behavior is set by the 'refreshInterval' parameter or the dashboard's auto-refresh setting.

Full explanation →

122

MCQhard

During a post-incident review, the team discovers that a misconfiguration in Cloud Armor caused legitimate traffic to be blocked, leading to a outage. The misconfiguration was introduced by a junior engineer who had overly permissive IAM roles. What is the best way to prevent similar incidents in the future?

A.Enforce a mandatory peer review for all Cloud Armor configuration changes.

B.Revoke the junior engineer's access to Cloud Armor and grant read-only access.

C.Enable Cloud Armor security policy logs and create alerting for blocked traffic spikes.

D.Use Organization Policy constraints to restrict allowed IP ranges and rules in Cloud Armor security policies.

AnswerD

Prevents creation of overly permissive rules.

Why this answer

Option D is correct because enforcing guardrails with Organization Policies can prevent misconfigurations at scale. Option A is wrong because removing the engineer's access is punitive but doesn't prevent others. Option B is wrong because peer reviews reduce human error but are not automated.

Option C is wrong because Cloud Armor logs help detection but not prevention.

Full explanation →

123

MCQeasy

A company is using Cloud CDN to deliver static content globally. Some users in Asia report slow load times. Which configuration change would most likely improve performance for these users?

A.Add an additional CDN origin

B.Enable Cloud Armor

C.Use a global external HTTP(S) load balancer with Cloud CDN

D.Increase the cache TTL

AnswerC

Global load balancer with anycast IP and edge caching reduces latency significantly.

Why this answer

Cloud CDN relies on a global external HTTP(S) load balancer to route user requests to the nearest cache node. Without this load balancer, Cloud CDN cannot leverage Google's global network and edge caches, so users in Asia would not be served from a nearby point of presence. Option C correctly identifies that using a global external HTTP(S) load balancer with Cloud CDN enables geographic load balancing and edge caching, directly improving latency for distant users.

Exam trap

Google Cloud often tests the misconception that Cloud CDN works independently of the load balancer type, but Cloud CDN requires a global external HTTP(S) load balancer to route traffic to edge caches; using a regional load balancer or no load balancer at all will not provide global performance improvements.

How to eliminate wrong answers

Option A is wrong because adding an additional CDN origin does not affect the distribution of cache nodes; Cloud CDN uses the same global edge network regardless of origin count, and multiple origins are for redundancy or content separation, not latency reduction. Option B is wrong because Cloud Armor is a web application firewall and DDoS protection service; it does not improve content delivery speed or cache hit ratio, and may add slight processing overhead. Option D is wrong because increasing the cache TTL only extends how long content is stored in cache; it does not bring content closer to users in Asia or change the routing path, so it cannot fix slow load times caused by geographic distance.

Full explanation →

124

Matchingmedium

Match each Google Cloud tool to its function in incident management.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

End-to-end incident lifecycle tool

Third-party alerting and on-call scheduling

Asynchronous messaging for event-driven alerts

Serverless automation for incident response

Containerized event-driven applications

Why these pairings

Tools used in incident response workflows.

Full explanation →

125

MCQeasy

A build step needs to access a secret stored in Secret Manager. How should the secret be passed to the build step?

A.Use an environment variable with the secret directly in cloudbuild.yaml

B.Use gcloud command inside the build step to fetch the secret

C.Use a script to fetch the secret from an encrypted file

D.Use availableSecrets in cloudbuild.yaml

AnswerD

This is the secure method to access secrets in Cloud Build.

Why this answer

Cloud Build's availableSecrets field allows you to securely access secrets from Secret Manager and inject them as environment variables or volumes.

Full explanation →

126

MCQeasy

Your incident response team uses a follow-the-sun model. An incident occurs during the Asia-Pacific shift, but the escalation path requires sign-off from the US-based team lead. This causes delays. What change should you recommend?

A.Use a chatbot for automated responses.

B.Implement a global incident commander role with delegated authority.

C.Increase the number of US team members.

D.Schedule the US team lead to work overnight.

AnswerB

This empowers regional leads to make decisions quickly.

Why this answer

Option B is correct because a global incident commander with delegated authority can make escalation decisions without waiting for a specific time-zone-based team lead. This role operates across shifts, ensuring that critical incident response actions are not delayed by geographic handoffs. In a follow-the-sun model, this role provides continuous, authoritative decision-making, aligning with ITIL incident management best practices for global teams.

Exam trap

Google Cloud often tests the misconception that adding more staff or automating responses can solve process delays, when the real issue is a lack of delegated authority across time zones.

How to eliminate wrong answers

Option A is wrong because a chatbot for automated responses cannot replace human judgment for escalation sign-offs; it handles only predefined, low-complexity tasks and lacks the authority to approve critical incident actions. Option C is wrong because increasing the number of US team members does not address the root cause of delayed sign-offs during the Asia-Pacific shift; the US team lead remains unavailable during that time, and more team members do not grant them escalation authority. Option D is wrong because scheduling the US team lead to work overnight is unsustainable, leads to burnout, and violates the follow-the-sun model's intent of balanced global coverage; it also does not scale for incidents in other time zones.

Full explanation →

127

MCQhard

Your application runs in two GCP regions. A regional outage occurs in the primary region. You have a Cloud Load Balancer with a failover backend. However, the failover did not trigger because the health check passed on a stale connection. What is the best solution?

A.Use a passive health check.

B.Use a global load balancer with HTTP health checks based on application health.

C.Configure a custom health check that checks the database.

D.Use TCP health checks with a shorter interval.

AnswerB

HTTP health checks test actual application response, reducing false positives.

Why this answer

Option B is correct because a global load balancer with HTTP health checks can probe the actual application endpoint (e.g., /healthz) to verify end-to-end functionality, not just TCP connectivity. This prevents the stale connection issue where a TCP health check passes on an existing but broken session, ensuring failover triggers only when the application is genuinely unhealthy.

Exam trap

The trap here is that candidates assume TCP health checks are sufficient for failover, but Cisco tests the nuance that stale connections can mask application failure, requiring application-layer (HTTP) health checks to ensure true failover.

How to eliminate wrong answers

Option A is wrong because passive health checks (e.g., connection draining) rely on observing traffic failures rather than active probing, which would not detect a stale connection that still passes traffic. Option C is wrong because a custom health check that checks the database introduces unnecessary complexity and dependency; the health check should validate the application's own readiness, not an external service that may be unrelated to the outage. Option D is wrong because TCP health checks with a shorter interval still only verify layer-4 connectivity, not application health; a stale TCP connection can persist and pass the check even if the application is unresponsive.

Full explanation →

128

MCQeasy

A DevOps team is setting up a Google Cloud organization. They want to centralize logging and monitoring across all projects. What is the recommended approach?

A.Enable logging and monitoring in each project individually and use the Cloud Console to view them.

B.Create a dedicated project for logging and monitoring, and configure all other projects to send logs and metrics to that project.

C.Enable Cloud Audit Logs in the organization and view them from the Organization level.

D.Use Stackdriver (now Operations) to aggregate logs from all projects automatically.

AnswerB

This is the recommended pattern for centralized observability.

Why this answer

Option B is correct because the recommended approach for centralizing logging and monitoring in a Google Cloud organization is to create a dedicated project that acts as a log sink and metric aggregator. By configuring aggregated sinks at the organization or folder level, all projects automatically forward logs to the dedicated project's Cloud Logging bucket, and metrics can be collected via the Monitoring API. This ensures a single pane of glass for operations without manual per-project setup.

Exam trap

The trap here is that candidates confuse Cloud Audit Logs (which are a subset of logs) with full centralized logging, or assume that Google Cloud Operations automatically aggregates logs across projects without explicit configuration of sinks or a metrics scope.

How to eliminate wrong answers

Option A is wrong because enabling logging and monitoring in each project individually and using the Cloud Console to view them does not centralize data; it requires switching between projects and lacks a unified view, violating the principle of centralized observability. Option C is wrong because enabling Cloud Audit Logs at the organization level only captures admin activity and data access logs, not all logs (e.g., application logs, custom metrics), and viewing them from the Organization level does not aggregate metrics or provide monitoring dashboards. Option D is wrong because Stackdriver (now Google Cloud Operations) does not automatically aggregate logs from all projects; it requires explicit configuration of sinks, metrics scopes, or a dedicated project to receive logs and metrics.

Full explanation →

129

MCQhard

A company uses Cloud Build with a private pool in a shared VPC to access on-premises resources. Several builds fail intermittently with 'failed to connect to backend' errors when trying to pull from a private npm registry hosted on-premises. The error occurs only during peak hours. What is the most likely cause?

A.The DNS resolution for the private npm registry is failing due to caching issues.

B.The private pool has insufficient worker count or is too small, causing connectivity timeouts.

C.The private pool's service account does not have the correct SSL certificates.

D.The Cloud NAT IP addresses have been blocked by the on-premises firewall.

AnswerB

Peak hours increase concurrent builds, exhausting pool capacity.

Why this answer

The intermittent 'failed to connect to backend' errors during peak hours point to resource exhaustion in the private pool. Private pools have a fixed number of workers; when all workers are busy, new builds must wait, and if the queue or connection timeout is exceeded, the build fails. This is a classic capacity issue, not a DNS, certificate, or firewall problem.

Exam trap

Google Cloud often tests the concept that intermittent failures during peak hours are caused by resource exhaustion (e.g., insufficient workers, concurrency limits) rather than configuration issues like DNS or firewall rules, which would cause consistent failures.

How to eliminate wrong answers

Option A is wrong because DNS caching issues would cause persistent failures, not just during peak hours, and Cloud Build uses the VPC's DNS resolution which is stable. Option C is wrong because SSL certificate issues would cause TLS handshake failures, not generic 'failed to connect' errors, and the service account does not manage certificates for outbound connections. Option D is wrong because if the on-premises firewall were blocking Cloud NAT IPs, the failures would be consistent, not intermittent and tied to peak hours.

Full explanation →

130

Drag & Dropmedium

Order the steps to set up a CI/CD pipeline using Cloud Build and Cloud Deploy for a Cloud Run service.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Create build config, set trigger, define delivery pipeline, push code, promote.

Full explanation →

131

MCQhard

A DevOps engineer notices that the budget alerts for a project are not triggering even though spending exceeds thresholds. The budget is configured at the billing account level, filtering for that project only. The project is linked to the correct billing account. What is the most likely cause?

A.The budget filter excludes credit types, and the spending includes credits that reduce the effective spend below threshold

B.The budget amount is set in units of 1000, so $5000 is actually 5 units, causing confusion

C.The budget notifications are only sent to Pub/Sub, and no subscriber is active

D.The budget filter includes services that are not used in the project

AnswerC

Without an active subscriber, Pub/Sub messages are not delivered.

Why this answer

Option A is correct because the budget sends notifications to a Pub/Sub topic. If there is no active subscription, alerts are not delivered. Option B could reduce spend but the question says spending exceeds thresholds.

Options C and D are less likely or incorrect.

Full explanation →

132

MCQeasy

Your company runs a microservices application on Google Kubernetes Engine (GKE) with shared Istio service mesh across multiple namespaces. You use Cloud Monitoring and Cloud Logging for observability. At 10:30 AM, you receive an alert that the checkout service is returning high 5xx errors (over 20%) and latency is above 5 seconds. The incident response team is assembled, and you are the incident commander. The team suspects a recent deployment (v2.1) to the checkout service at 10:00 AM. The deployment was a minor configuration update. The team is divided: some want to immediately roll back, others want to analyze traces. You have access to the GCP console. What should you do first to ensure a swift and effective incident response?

A.Review the deployment history of the checkout service alongside Cloud Monitoring metrics and logs to identify the exact time and nature of the change.

B.Check the Error Reporting dashboard to view aggregated error logs and stack traces for the checkout service.

C.Immediately roll back the checkout service to the previous version and monitor if errors decrease.

D.Declare the incident, assign roles, and start a postmortem document.

AnswerA

This correlates the deployment with the incident symptoms, providing evidence for the best course of action.

Why this answer

Option D is correct because comparing the deployment changes with monitoring metrics helps correlate the incident with the deployment, providing evidence to guide next steps. Option A is premature without confirming the rollback will fix the issue and acknowledging potential side effects. Option B is useful but might not pinpoint the root cause as quickly as comparing metrics.

Option C is a good practice but not the first action; you need to understand the impact first.

Full explanation →

133

MCQeasy

A developer wants to trigger a Cloud Build execution whenever a pull request is created against the main branch in Cloud Source Repositories. Which Cloud Build trigger configuration should be used?

A.Event: pull request on branch ^main$

B.Event: push on branch ^main$

C.Event: push on tag

D.Event: manual invocation

AnswerA

Cloud Build triggers support pull request events with branch filtering.

Why this answer

Option B is correct because Cloud Build triggers can be configured to respond to pull request events on a specific branch. Option A is for push events on main branch. Option C is for tags.

Option D is for manual trigger.

Full explanation →

134

MCQeasy

An e-commerce platform uses Cloud SQL for its database. The team notices that read queries are slow. They want to improve read performance without significant cost increase. Which action should they take?

A.Add a read replica

B.Increase the number of vCPUs on the instance

C.Increase the storage size

D.Enable binary logging

AnswerA

A read replica distributes read queries, reducing load on the primary.

Why this answer

Adding a read replica offloads read traffic from the primary instance, improving read performance with minimal cost. Increasing vCPUs or storage incurs higher cost. Binary logging is for replication, not read performance.

Full explanation →

135

MCQhard

A DevOps team is setting up a new Google Cloud organization. They want to enforce that all projects have a specific set of labels, and that Cloud Logging is enabled. They have written a custom Organization Policy constraint to enforce the labels. However, they are unsure how to enforce Cloud Logging. Which of the following approaches should they use?

A.Create a custom Organization Policy constraint that checks if Cloud Logging is enabled.

B.Disable the ability to turn off Cloud Logging by using the Organization Policy 'compute.disableCloudLogging'.

C.Use the Organization Policy 'constraints/gcp.resourceAuditLogging' to enforce audit logs.

D.Set a project-level default for Cloud Logging using Organization Policies.

AnswerC

This policy ensures audit logs are enabled, which is part of Cloud Logging.

Why this answer

Option C is correct because the Organization Policy constraint `constraints/gcp.resourceAuditLogging` enforces that audit logs (which include Cloud Logging) are enabled for all projects in the organization. This policy ensures that logging cannot be disabled at the project level, meeting the requirement to enforce Cloud Logging across the organization.

Exam trap

The trap here is that candidates may confuse custom Organization Policy constraints with built-in constraints, or mistakenly think that Cloud Logging can be enforced via a custom constraint or a project-level default, when in fact the built-in `constraints/gcp.resourceAuditLogging` is the correct mechanism.

How to eliminate wrong answers

Option A is wrong because custom Organization Policy constraints cannot check if Cloud Logging is enabled; they are limited to enforcing constraints on resource properties like labels, locations, or resource types, not service enablement. Option B is wrong because `compute.disableCloudLogging` is not a valid Organization Policy constraint; the correct constraint for logging is `constraints/gcp.resourceAuditLogging`, and there is no such policy to disable Cloud Logging. Option D is wrong because Organization Policies do not support setting project-level defaults for Cloud Logging; they enforce boolean constraints or lists, not default configurations.

Full explanation →

136

MCQeasy

A DevOps engineer runs the command above and gets the output shown. What does this output indicate?

A.The instance's disk is full, causing write errors.

B.An application running on the instance encountered a connection timeout to a backend service.

C.The instance failed to authenticate with the metadata server.

D.A health check probe failed to reach the instance.

AnswerB

The log message explicitly states 'Connection timeout to backend service'.

Why this answer

The output shows a 'Connection timed out' error when attempting to reach a backend service. This indicates that the application on the instance is unable to establish a TCP connection to the specified IP and port, typically due to network issues, firewall rules, or the backend service being down. The error is specific to application-level connectivity, not disk space or authentication.

Exam trap

The trap here is that candidates confuse application-level connection timeouts with infrastructure-level issues like disk full or health check failures, but the specific 'Connection timed out' message directly points to a network connectivity problem to a backend service.

How to eliminate wrong answers

Option A is wrong because a full disk would produce 'No space left on device' or write-related errors, not a connection timeout. Option C is wrong because metadata server authentication failures result in 401 Unauthorized or 'Failed to retrieve metadata' errors, not a TCP connection timeout. Option D is wrong because a health check probe failure would be reported by the load balancer or monitoring system (e.g., 'Unhealthy' status), not as a connection timeout from within the instance's application logs.

Full explanation →

137

MCQhard

A data engineering team runs frequent aggregation queries on a large BigQuery table. Query performance is slow and costs are high. Which optimization technique would best improve performance and reduce cost?

A.Use materialized views for pre-aggregated results

B.Convert to non-partitioned table

C.Use clustering on the partition key

D.Increase the number of slots in the reservation

AnswerA

Materialized views pre-compute and store aggregation results, drastically reducing query time and cost.

Why this answer

Materialized views pre-compute and store the results of frequent aggregation queries, allowing BigQuery to serve subsequent queries directly from the cached results rather than scanning the entire base table. This drastically reduces the amount of data processed, lowering both query latency and cost (since BigQuery charges by bytes processed). For repeated aggregation patterns, this is the most effective optimization.

Exam trap

Google Cloud often tests the misconception that clustering on the partition key is redundant and that increasing compute resources (slots) is the primary fix for cost and performance, when in reality reducing data scanned through pre-aggregation is the most impactful lever.

How to eliminate wrong answers

Option B is wrong because converting to a non-partitioned table would increase the amount of data scanned per query, worsening performance and cost. Option C is wrong because clustering on the partition key provides no additional benefit beyond partitioning itself; clustering is most effective on non-partitioned columns or columns with high cardinality to improve filter pruning. Option D is wrong because increasing slots only improves concurrency and execution speed for queries that are already optimized; it does not reduce the bytes processed, so cost would remain high or increase.

Full explanation →

138

MCQmedium

A company is implementing CI/CD for a microservices application on Google Kubernetes Engine (GKE). The team wants to ensure that each service can be built and deployed independently without affecting other services. They also need to enforce that only successfully tested builds are deployed to production. Which CI/CD approach should they use?

A.Create separate Cloud Build triggers per microservice, each building a container image, and use Cloud Deploy to manage canary deployments to GKE with automated promotion after tests pass.

B.Use Cloud Build to build all services and deploy to Cloud Run, then use traffic splitting to promote new versions.

C.Create a single Cloud Build trigger that builds all services and deploys to a staging cluster, then manually promote to production.

D.Use Spinnaker with a single pipeline that builds all services, and configure manual judgment gates for production promotion.

AnswerA

This approach provides independent builds and automated promotion based on tests.

Why this answer

Option A is correct because it uses separate Cloud Build triggers per microservice to enable independent builds, and Cloud Deploy with canary deployments and automated promotion after tests pass ensures that only successfully tested builds reach production. This approach aligns with the requirements of independent service deployment and gated promotion to production on GKE.

Exam trap

Google Cloud often tests the distinction between independent service pipelines versus monolithic pipelines, and the trap here is assuming that a single pipeline or manual gates can satisfy both independence and automated gating, leading candidates to choose options like C or D.

How to eliminate wrong answers

Option B is wrong because it deploys to Cloud Run instead of GKE, which does not meet the requirement of using GKE for the microservices application. Option C is wrong because a single Cloud Build trigger that builds all services violates the independence requirement, and manual promotion to production does not enforce automated gating based on test success. Option D is wrong because Spinnaker with a single pipeline that builds all services also violates independence, and manual judgment gates are not automated promotion after tests pass.

Full explanation →

139

MCQhard

A company is bootstrapping their organization using Terraform and wants to store the Terraform state file in a Cloud Storage bucket with versioning enabled. Which of the following is the best practice for securing the state file?

A.Use a bucket with fine-grained access and grant roles/storage.objectCreator to the Terraform service account.

B.Use a bucket with object versioning and enable VPC Service Controls.

C.Use a bucket with a CMEK key and grant roles/storage.objectAdmin to the Terraform service account.

D.Use a bucket with uniform access and grant roles/storage.objectViewer to the Terraform service account.

AnswerC

CMEK encrypts the state file, and objectAdmin allows full object control.

Why this answer

Option C is correct because using a CMEK (Customer-Managed Encryption Key) ensures the state file is encrypted at rest with a key controlled by the organization, and granting roles/storage.objectAdmin to the Terraform service account provides the necessary read, write, and delete permissions on objects. This combination meets the security requirements for a bootstrapping scenario where the state file contains sensitive infrastructure configuration and must be protected from unauthorized access while allowing Terraform to manage it.

Exam trap

Google Cloud often tests the misconception that VPC Service Controls or uniform bucket access alone are sufficient for state file security, but the exam expects candidates to recognize that encryption key management (CMEK) and the correct IAM role (objectAdmin) are the best practices for protecting sensitive state data.

How to eliminate wrong answers

Option A is wrong because roles/storage.objectCreator only allows creating objects, not reading or deleting them, so Terraform would fail to read the state file for planning or destroy operations. Option B is wrong because VPC Service Controls restrict data exfiltration but do not provide encryption key management or granular access control; they are a network-level control, not a substitute for encryption or proper IAM. Option D is wrong because roles/storage.objectViewer provides read-only access, preventing Terraform from writing or updating the state file, which is required for state management.

Full explanation →

140

MCQmedium

You are using Memorystore for Redis as a cache for a high-traffic web application. You observe that cache hit ratio is low, causing high database load. What is the most effective way to improve cache hit ratio?

A.Use the allkeys-lru eviction policy to keep frequently accessed keys.

B.Increase the instance size to store more data.

C.Migrate to Memorystore for Memcached for lower latency.

D.Set a longer TTL for all cache entries.

AnswerA

LRU policy keeps popular items, improving hit ratio.

Why this answer

The allkeys-lru eviction policy is the most effective way to improve cache hit ratio because it automatically evicts the least recently used keys across the entire keyspace when memory is full, retaining the most frequently accessed data. This ensures that the cache always contains the hottest data, directly increasing the likelihood of cache hits without requiring manual intervention or additional resources.

Exam trap

Google Cloud often tests the misconception that increasing memory or TTL alone solves cache performance issues, but the trap here is that without an appropriate eviction policy like allkeys-lru, the cache will still evict hot keys in favor of cold ones, leaving the hit ratio low regardless of size or TTL settings.

How to eliminate wrong answers

Option B is wrong because simply increasing the instance size stores more data but does not address the underlying issue of which data is retained; without an appropriate eviction policy, the cache may still be filled with stale or rarely accessed keys, leaving the hit ratio low. Option C is wrong because migrating to Memorystore for Memcached does not inherently improve cache hit ratio; Memcached lacks built-in LRU eviction across all keys (it uses a slab-based allocator with per-slab LRU) and does not support data persistence or advanced eviction policies like allkeys-lru, making it less suitable for optimizing hit ratios in this scenario. Option D is wrong because setting a longer TTL for all cache entries can cause the cache to fill with stale data that is no longer accessed, reducing the effective cache capacity for hot keys and potentially worsening the hit ratio; TTL should be tuned per key based on access patterns, not uniformly extended.

Full explanation →

141

MCQmedium

Your team is using Cloud Monitoring to track the health of a distributed microservices application. You notice that the error rate for the checkout service has increased significantly, but no alerts are firing. The SLO for checkout is 99.9% availability over a 28-day rolling window. You inspect the alerting policy and find it uses a time series aggregation with a 1-minute alignment period and a condition that triggers when the ratio of errors to total requests exceeds 0.001 for 5 consecutive minutes. What is the most likely reason the alert is not firing?

A.The alert condition requires 5 consecutive minutes of breach, but the error rate spikes are intermittent and not sustained.

B.The error budget has been exhausted, so the alert is suppressed.

C.The SLO window is too long, and the alert condition uses a different measurement period.

D.The ratio threshold is too high because the total request count is low.

AnswerA

The alert requires 5 consecutive minutes of the ratio exceeding 0.001; intermittent spikes may not meet this condition.

Why this answer

The alert condition requires the error ratio to exceed 0.001 for 5 consecutive 1-minute alignment periods. If the error rate spikes are intermittent—lasting only a minute or two before returning to normal—the condition of 5 consecutive minutes of breach is never met, so the alert remains silent. This is a classic case where the alerting policy's duration setting is too long relative to the bursty nature of the errors.

Exam trap

Google Cloud often tests the distinction between a threshold being breached and the alert condition's duration requirement being met, leading candidates to overlook that the 'for' parameter (e.g., 5 consecutive minutes) is a separate, critical condition that must be satisfied for the alert to fire.

How to eliminate wrong answers

Option B is wrong because error budget exhaustion does not suppress alerts; it is a separate SLO metric that tracks cumulative availability over the 28-day window, and alerts are independent of budget status. Option C is wrong because the SLO window (28 days) and the alert condition measurement period (1-minute alignment) are intentionally different—the alert uses a short-term metric to detect immediate problems, not the long-term SLO window. Option D is wrong because the ratio threshold of 0.001 (0.1%) is standard for a 99.9% SLO; a low total request count would make the ratio more volatile but does not inherently prevent the alert from firing if the condition is met.

Full explanation →

142

MCQeasy

Your application uses Cloud SQL for MySQL and you notice that read replica lag is increasing. Which action would most likely reduce replica lag?

A.Configure automatic failover to the replica.

B.Decrease the memory of the primary instance.

C.Increase the machine type of the replica.

D.Promote the replica to a standalone instance.

AnswerC

More powerful replica can keep up with replication.

Why this answer

Increasing the machine type of the replica (Option C) directly addresses read replica lag by providing more CPU and memory resources to the replica instance. This allows the replica to apply binary log (binlog) events from the primary faster, reducing the replication lag. Cloud SQL for MySQL uses asynchronous replication, so a replica with insufficient resources cannot keep up with the write throughput of the primary.

Exam trap

Google Cloud often tests the misconception that promoting a replica or failing over will resolve lag, but the correct approach is to scale the replica's resources to match the primary's write rate.

How to eliminate wrong answers

Option A is wrong because automatic failover does not reduce replica lag; it only switches traffic to the replica after a primary failure, and the replica must already be caught up for a successful failover. Option B is wrong because decreasing the memory of the primary instance would likely increase write latency and could worsen replication lag by slowing down the primary's ability to commit transactions and generate binlog events. Option D is wrong because promoting the replica to a standalone instance breaks replication entirely, eliminating the replica lag but also removing the read replica's purpose; it does not reduce lag—it stops replication.

Full explanation →

143

Multi-Selectmedium

Your team is implementing SLO monitoring. Which TWO tools should they use to create and monitor SLIs?

Select 2 answers

A.Error Reporting

B.Cloud Monitoring

C.Cloud Logging

D.Cloud Trace

E.Cloud Profiler

AnswersB, C

Cloud Monitoring is the primary tool for creating metric-based SLIs and SLOs.

Why this answer

Cloud Monitoring (option B) is the correct tool for creating and monitoring Service Level Indicators (SLIs) because it allows you to define custom metrics, set up alerting policies, and create dashboards to track the performance of your services. Cloud Logging (option C) is also correct because SLIs often rely on log-based metrics, such as request latency or error counts, which are extracted from log entries using log-based metrics in Cloud Logging. Together, these two tools provide the data ingestion and monitoring capabilities needed to define and observe SLIs.

Exam trap

Google Cloud often tests the distinction between tools that collect raw data (like Cloud Logging) and tools that aggregate and alert on that data (like Cloud Monitoring), and the trap here is that candidates might think Error Reporting or Cloud Trace are sufficient for SLI monitoring, when they lack the metric aggregation and alerting capabilities required for SLO monitoring.

Full explanation →

144

MCQmedium

A company is bootstrapping a new Google Cloud organization. They want to ensure that all projects are created under specific folders and that certain IAM roles are automatically granted to a group for new projects. What is the most efficient approach?

A.Use folder-level IAM roles to grant permissions to projects.

B.Use Terraform to create projects and assign IAM roles.

C.Use a Cloud Function triggered by project creation events to apply IAM roles.

D.Use organization policies to restrict project creation to specific folders.

AnswerC

Cloud Functions can automate IAM assignment on creation.

Why this answer

Option C is correct because it leverages a Cloud Function that listens to the `google.cloud.resourcemanager.project.v1.created` event via Eventarc, enabling automated, event-driven application of IAM roles to new projects without manual intervention or polling. This approach is the most efficient for bootstrapping because it ensures that every newly created project automatically inherits the required IAM bindings, regardless of how the project was created (e.g., via Console, gcloud, or API).

Exam trap

The trap here is that candidates often confuse organization policies (which only restrict placement) with IAM inheritance (which only applies at the folder level) and fail to recognize that an event-driven Cloud Function is the only option that automatically applies IAM roles to new projects without manual or scheduled intervention.

How to eliminate wrong answers

Option A is wrong because folder-level IAM roles grant permissions to all projects within that folder at creation time, but they do not automatically grant specific roles to a group for new projects; they only set inherited permissions on the folder itself, not dynamic role assignments triggered by project creation. Option B is wrong because Terraform is an infrastructure-as-code tool that can create projects and assign IAM roles, but it requires manual execution or a CI/CD pipeline trigger for each new project, making it less efficient for an event-driven, fully automated bootstrapping scenario. Option D is wrong because organization policies (e.g., `constraints/resourcemanager.allowedPolicyMemberDomains`) restrict where projects can be created but do not automatically grant IAM roles to a group for new projects; they only enforce placement constraints.

Full explanation →

145

MCQeasy

A company is setting up a new Google Cloud organization. They want to apply a consistent set of IAM roles to all projects within a specific department. What is the most efficient method to achieve this?

A.Use a script to apply roles to all projects periodically.

B.Assign roles directly to each project using the Google Cloud console.

C.Create a folder for the department and assign roles at the folder level.

D.Use a custom role and assign it to the organization node.

AnswerC

IAM policies at folder level are inherited by all projects in the folder.

Why this answer

Option C is correct because Google Cloud IAM supports hierarchical policy inheritance, where roles assigned at the folder level are automatically inherited by all projects within that folder. This eliminates the need for per-project assignments and ensures consistent permissions across the department's projects without manual overhead or scripting.

Exam trap

The trap here is that candidates often assume that roles must be assigned at the project level or organization level, overlooking the folder-level inheritance as the most efficient and scalable method for department-wide consistency.

How to eliminate wrong answers

Option A is wrong because using a script to periodically apply roles is inefficient, introduces potential drift between script runs, and violates the principle of declarative, inheritance-based IAM management. Option B is wrong because assigning roles directly to each project is manual, error-prone, and does not scale; it also bypasses the hierarchical inheritance that folders provide. Option D is wrong because assigning a custom role at the organization node would apply the role to all projects in the organization, not just the specific department, and custom roles are not required for this use case—predefined roles at the folder level suffice.

Full explanation →

146

MCQhard

A company runs a large-scale data processing workload on Dataflow. The pipeline processes streaming data from Pub/Sub and writes results to BigQuery. The current monthly Dataflow cost is $50,000, and the company wants to reduce it. They have already optimized the pipeline code and reduced data shuffling. They notice that the Dataflow workers are running at 100% CPU for most of the time, but the job's autoscaling is set to 'throughput-based' and currently uses 20 workers. The job's latency SLAs are tight. They consider switching to 'flexible resource scheduling' (FlexRS) to reduce costs. What should they evaluate before enabling FlexRS?

A.FlexRS reduces costs by using slower disk types, which may impact performance.

B.FlexRS requires using preemptible VMs, which may cause data loss.

C.FlexRS may increase pipeline latency because it is designed for batch jobs with looser latency requirements.

D.FlexRS is only available for batch pipelines, not streaming.

AnswerD

Flexible Resource Scheduling (FlexRS) is a feature for batch Dataflow pipelines to reduce costs; it cannot be enabled for streaming pipelines.

Why this answer

Option C is correct because FlexRS is only available for batch pipelines, not streaming pipelines. The current pipeline is streaming, so FlexRS cannot be used. Option A is partially correct but not the primary reason.

Option B is true for batch but irrelevant here. Option D is incorrect; FlexRS uses slower disk types? Actually FlexRS uses slower startup but not disk types.

Full explanation →

147

MCQeasy

An organization wants to implement a CI/CD pipeline that automatically deploys to a staging environment on every push to the main branch, and deploys to production only after a manual approval. They use Cloud Build and Cloud Deploy. What is the best way to configure this?

A.Configure a Cloud Deploy delivery pipeline with a staging target (automatic promotion) and a production target (require approval).

B.Create a single Cloud Build pipeline that deploys to both staging and production using conditional steps based on branch name.

C.Use Cloud Build to deploy to Cloud Run, and configure traffic splitting to gradually shift traffic from staging to production.

D.Use Cloud Build triggers with two separate build configs: one for staging (automatic), one for production (manual trigger).

AnswerA

Cloud Deploy supports automatic promotion to staging and manual approval to production.

Why this answer

Option A is correct because Cloud Deploy natively supports delivery pipelines with multiple targets, where you can configure automatic promotion to staging and require manual approval for production. This aligns with the requirement for a CI/CD pipeline that deploys to staging on every push to main and to production only after manual approval, using Cloud Build for the build and Cloud Deploy for the deployment orchestration.

Exam trap

Google Cloud often tests the distinction between Cloud Build (build and test) and Cloud Deploy (deployment orchestration with approval gates), so the trap here is assuming Cloud Build alone can handle manual approvals or multi-environment promotion, when Cloud Deploy is the correct service for that workflow.

How to eliminate wrong answers

Option B is wrong because it suggests using a single Cloud Build pipeline with conditional steps based on branch name, but Cloud Build is a build and test service, not a deployment orchestrator; it lacks native support for manual approval gates and promotion between environments, which is a core requirement. Option C is wrong because it describes traffic splitting on Cloud Run, which is a canary deployment strategy, not a CI/CD pipeline with separate staging and production targets requiring manual approval; Cloud Deploy is the appropriate service for such multi-target pipelines. Option D is wrong because it proposes using two separate Cloud Build triggers with manual trigger for production, but Cloud Build triggers do not provide a built-in manual approval workflow; Cloud Deploy's delivery pipeline with approval targets is the designed solution for this pattern.

Full explanation →

148

MCQeasy

You need to monitor the health of an external HTTP endpoint. Which resource should you create?

A.Load balancer with a health probe

B.Internal health check in Compute Engine

C.Uptime check in Cloud Monitoring

D.Cloud Logging log-based metric

AnswerC

Uptime checks monitor external endpoints for availability.

Why this answer

To monitor the health of an external HTTP endpoint, you need a service that can reach the endpoint from outside your VPC and provide alerting and visibility. Cloud Monitoring's Uptime Check is specifically designed for this purpose: it verifies that an external HTTP(S) endpoint is reachable and responsive, and can trigger alerting policies based on the results. This is the correct resource because it operates from Google's infrastructure, not from within your project, and directly supports external endpoint monitoring.

Exam trap

Google Cloud often tests the distinction between internal health checks (for VMs/backends) and external uptime checks (for public endpoints), so the trap here is that candidates confuse a load balancer health probe (which checks backends) with a tool for monitoring an external HTTP endpoint.

How to eliminate wrong answers

Option A is wrong because a Load Balancer with a health probe is used to distribute traffic and check the health of backend instances within your VPC, not to monitor an external HTTP endpoint from outside your network. Option B is wrong because an Internal health check in Compute Engine is designed to verify the health of VM instances within the same VPC using internal IPs, and cannot reach external endpoints. Option D is wrong because Cloud Logging log-based metrics are used to count or measure log entries based on filters, not to actively probe or monitor the availability of an external HTTP endpoint.

Full explanation →

149

Multi-Selecthard

Which THREE are best practices for securing a CI/CD pipeline on Google Cloud?

Select 3 answers

A.Use Secret Manager to securely pass credentials to build steps.

B.Enable Binary Authorization to enforce that only signed images are deployed.

C.Run builds on the publicly hosted pool for better scalability.

D.Store service account keys in the source repository for build steps to use.

E.Use Cloud Build private pools to isolate build execution.

AnswersA, B, E

Secret Manager integrates with Cloud Build for secure access.

Why this answer

Options A, C, and D are correct. Using a private pool avoids exposure to public internet, storing credentials in Secret Manager avoids hardcoding, and using Binary Authorization ensures container integrity. Option B is incorrect - using personal credentials is insecure.

Option E is incorrect - running builds on a public pool is less secure.

Full explanation →

150

MCQeasy

A DevOps engineer notices that a critical service is down, but no alert has been received. The engineer checks Cloud Monitoring and sees that the alerting policy appears to be correctly configured. What is the most likely cause?

A.The metrics for the service are not being collected.

B.The incident was created but automatically closed.

C.The SLO is defined too loosely, so the error budget is not exhausted.

D.The notification channel is misconfigured.

AnswerD

If the notification channel is misconfigured, the alert condition may be met but the alert is not delivered.

Why this answer

Option D is correct because if the notification channel is misconfigured (e.g., invalid webhook URL, incorrect email address, or missing PagerDuty integration key), Cloud Monitoring will correctly detect the incident but fail to deliver the alert. The engineer sees the alerting policy appears correct because the policy itself is valid, but the channel configuration prevents the notification from reaching the recipient.

Exam trap

Google Cloud often tests the misconception that a correctly configured alerting policy guarantees alert delivery, ignoring that the notification channel is a separate configuration layer that can silently fail.

How to eliminate wrong answers

Option A is wrong because if metrics were not being collected, the alerting policy would show a 'no data' status or the metric would be absent from the dashboard, which the engineer would have noticed when checking Cloud Monitoring. Option B is wrong because an incident that was created and automatically closed would still generate a notification when the incident is created; the engineer would have received an alert for the opening event. Option C is wrong because an SLO and error budget are unrelated to alert delivery; they define service level targets, not whether a notification channel functions correctly.

Full explanation →

Page 2 of 7

All pages

Practice PCDOE by domain

Target a specific domain to shore up weak areas.

Bootstrapping a Google Cloud organization for DevOps Managing service incidents Managing Google Cloud costs Building and implementing CI/CD pipelines Implementing service monitoring strategies Optimizing service performance

See all domains with question counts →