Sample questions
Google Professional Cloud DevOps Engineer practice questions
Order the steps to set up a CI/CD pipeline using Cloud Build and Cloud Deploy for a Cloud Run service.
Drag steps to the numbered slots on the right, or tap a step then tap a slot.
Order the steps to configure a VPC Network Peering between two projects.
Drag steps to the numbered slots on the right, or tap a step then tap a slot.
Order the steps to respond to a Google Cloud security incident involving a compromised service account key.
Drag steps to the numbered slots on the right, or tap a step then tap a slot.
Refer to the exhibit. The Cloud Build fails with a permission error. The Cloud Build service account has roles/cloudbuild.builds.builder and roles/cloudfunctions.developer on the project. What is the missing permission?
Exhibit
steps:
- name: 'gcr.io/cloud-builders/gcloud'
entrypoint: 'bash'
args:
- '-c'
- |
gcloud functions deploy my-function \
--runtime nodejs18 \
--trigger-http \
--allow-unauthenticatedTrap 1: cloudfunctions.functions.get
This permission is included in cloudfunctions.developer.
Trap 2: iam.serviceAccounts.actAs
Not needed for this step.
Trap 3: cloudfunctions.functions.sourceCodes.set
This is part of cloudfunctions.developer.
- A
cloudfunctions.functions.setIamPolicy
Required to set IAM policy for unauthenticated access.
- B
cloudfunctions.functions.get
Why wrong: This permission is included in cloudfunctions.developer.
- C
iam.serviceAccounts.actAs
Why wrong: Not needed for this step.
- D
cloudfunctions.functions.sourceCodes.set
Why wrong: This is part of cloudfunctions.developer.
A company is setting up a new Google Cloud organization. They want to ensure that all projects inherit common IAM policies. What is the best practice?
Trap 1: Apply IAM policies at the folder level.
Folder-level policies apply to projects under that folder, but not to projects outside it.
Trap 2: Apply IAM policies at the project level.
Project-level policies do not propagate to other projects.
Trap 3: Use multiple organizations to isolate policies.
Using multiple organizations adds complexity and is not recommended for policy inheritance.
- A
Apply IAM policies at the folder level.
Why wrong: Folder-level policies apply to projects under that folder, but not to projects outside it.
- B
Apply IAM policies at the project level.
Why wrong: Project-level policies do not propagate to other projects.
- C
Apply IAM policies at the organization level.
Organization-level policies apply to all projects and folders under the organization.
- D
Use multiple organizations to isolate policies.
Why wrong: Using multiple organizations adds complexity and is not recommended for policy inheritance.
A DevOps team is bootstrapping CI/CD pipelines that need access to API keys stored in Secret Manager. The pipelines run on Cloud Build. What is the best practice for granting access to secrets?
Trap 1: Use a custom service account with roles/secretmanager.admin and run…
This grants excessive admin permissions.
Trap 2: Store the API keys as build substitutions.
Substitutions are visible in logs and not secure.
Trap 3: Use Cloud KMS to encrypt secrets and pass them as environment…
This is not a best practice; using Secret Manager is preferred.
- A
Use a custom service account with roles/secretmanager.admin and run Cloud Build as that account.
Why wrong: This grants excessive admin permissions.
- B
Store the API keys as build substitutions.
Why wrong: Substitutions are visible in logs and not secure.
- C
Grant the Cloud Build service account roles/secretmanager.secretAccessor on the project containing secrets.
This provides least-privilege access to secrets.
- D
Use Cloud KMS to encrypt secrets and pass them as environment variables.
Why wrong: This is not a best practice; using Secret Manager is preferred.
A DevOps team is bootstrapping their Google Cloud organization and wants to enable Infrastructure as Code (IaC) using Terraform. They need a service account that Terraform can use to create and manage resources across multiple projects. What is the best practice for creating and managing this service account?
Trap 1: Generate a service account key and store it in a Cloud Storage…
Keys should be managed securely, not stored in a bucket accessible to many.
Trap 2: Use a user account with two-factor authentication for Terraform…
User accounts are for humans, not automated systems; service accounts are for automation.
Trap 3: Use the Compute Engine default service account from the project…
Default SAs have excessive permissions and can't be scoped easily across projects.
- A
Create a service account in a separate 'admin' project and grant it the required roles on each project via IAM.
This provides centralized control and separates credentials from workloads.
- B
Generate a service account key and store it in a Cloud Storage bucket accessible to the team.
Why wrong: Keys should be managed securely, not stored in a bucket accessible to many.
- C
Use a user account with two-factor authentication for Terraform automation.
Why wrong: User accounts are for humans, not automated systems; service accounts are for automation.
- D
Use the Compute Engine default service account from the project where Terraform runs.
Why wrong: Default SAs have excessive permissions and can't be scoped easily across projects.
A multinational corporation is bootstrapping a Google Cloud organization with multiple subsidiaries. Each subsidiary needs its own folder with IAM policies that are managed locally, but the parent company wants to enforce a global policy that restricts the use of certain machine types (e.g., N2D) for cost control. However, one subsidiary has a legitimate need for those machine types in a specific project. What is the best way to handle this exception while maintaining the global policy?
Trap 1: Set an organization policy that denies N2D machine types, then…
Project-level policies cannot override parent denies; they can only add more restrictions.
Trap 2: Use an audit-only policy and rely on a team to review and approve…
Audit-only does not prevent non-compliant resources from being created.
Trap 3: Place each subsidiary in its own folder and set the machine type…
This does not enforce a global policy; some folders may not have the restriction.
- A
Create a custom organization policy with a condition that excludes the exception project from the restriction.
Custom policies with conditions allow fine-grained exceptions.
- B
Set an organization policy that denies N2D machine types, then create a separate policy at the project level to allow them for the exception project.
Why wrong: Project-level policies cannot override parent denies; they can only add more restrictions.
- C
Use an audit-only policy and rely on a team to review and approve machine type usage.
Why wrong: Audit-only does not prevent non-compliant resources from being created.
- D
Place each subsidiary in its own folder and set the machine type restriction only on folders that require it.
Why wrong: This does not enforce a global policy; some folders may not have the restriction.
To securely manage secrets (e.g., API keys) used in Cloud Build pipelines, which service should be used?
Trap 1: Cloud KMS
Cloud KMS is for encryption keys, not for storing secrets like API keys.
Trap 2: Cloud Key Management Service (duplicate)
Same as Cloud KMS; not suitable for secret storage.
Trap 3: Cloud Storage
Not inherently secure; requires additional encryption and access controls; not recommended for secrets.
- A
Secret Manager
Designed for storing secrets; integrates with Cloud Build via environment variables or volumes.
- B
Cloud KMS
Why wrong: Cloud KMS is for encryption keys, not for storing secrets like API keys.
- C
Cloud Key Management Service (duplicate)
Why wrong: Same as Cloud KMS; not suitable for secret storage.
- D
Cloud Storage
Why wrong: Not inherently secure; requires additional encryption and access controls; not recommended for secrets.
A DevOps engineer needs to set up a centralized logging solution for multiple projects. They want to store logs in a BigQuery dataset for analysis. What is the best approach?
Trap 1: Use Cloud Logging's export feature to Pub/Sub and then to BigQuery.
Adds unnecessary complexity.
Trap 2: Use the BigQuery Data Transfer Service for logs.
Not designed for real-time log export.
Trap 3: Create a sink in each project to export logs to the BigQuery…
Manual and repetitive.
- A
Use Cloud Logging's export feature to Pub/Sub and then to BigQuery.
Why wrong: Adds unnecessary complexity.
- B
Use the BigQuery Data Transfer Service for logs.
Why wrong: Not designed for real-time log export.
- C
Create a sink in each project to export logs to the BigQuery dataset.
Why wrong: Manual and repetitive.
- D
Create an aggregated sink at the organization or folder level to export logs to BigQuery.
Centralized and efficient.
An organization is using Cloud Source Repositories and wants to enforce that all commits are signed with a verified GPG key. How can they enforce this?
Trap 1: Use a branch protection rule in Cloud Source Repositories.
Does not enforce signed commits.
Trap 2: Use Cloud Functions to validate commits after push.
Reactive and not enforceable.
Trap 3: Use a pre-receive hook in Cloud Source Repositories.
Pre-receive hooks are not supported in Cloud Source Repositories.
- A
Use a branch protection rule in Cloud Source Repositories.
Why wrong: Does not enforce signed commits.
- B
Use Cloud Functions to validate commits after push.
Why wrong: Reactive and not enforceable.
- C
Enable the Signed Commits policy in the repository settings.
Native feature to require GPG-signed commits.
- D
Use a pre-receive hook in Cloud Source Repositories.
Why wrong: Pre-receive hooks are not supported in Cloud Source Repositories.
A DevOps engineer notices that developers are accidentally deleting Cloud Storage buckets. The organization wants to prevent accidental deletion while still allowing developers to manage bucket objects. What is the best practice?
Trap 1: Enable Cloud Audit Logging and set up alerts on bucket deletion.
Reactive, not preventive.
Trap 2: Set a bucket IAM policy denying storage.objects.delete for…
This restricts object deletion, not bucket deletion.
Trap 3: Use an organization policy to disable bucket deletion across the…
Too restrictive; prevents necessary deletions.
- A
Set a bucket retention policy with deletion lock.
Retention policy prevents deletion until lock expires.
- B
Enable Cloud Audit Logging and set up alerts on bucket deletion.
Why wrong: Reactive, not preventive.
- C
Set a bucket IAM policy denying storage.objects.delete for developers.
Why wrong: This restricts object deletion, not bucket deletion.
- D
Use an organization policy to disable bucket deletion across the org.
Why wrong: Too restrictive; prevents necessary deletions.
A DevOps engineer is designing a CI/CD pipeline using Cloud Build. Which TWO configurations are necessary to ensure secure and reliable deployments? (Choose two.)
Trap 1: Use Cloud Build triggers with branch filters.
Useful for automation but not specifically for security/reliability of deployments.
Trap 2: Push all artifacts to a public Container Registry.
Public registry is insecure.
Trap 3: Enable Cloud Build service account with Editor role.
Too permissive; use least privilege.
- A
Use manual approval steps for production deployments.
Provides a gate for reliability.
- B
Store secrets in Cloud Secret Manager.
Secrets should be stored securely.
- C
Use Cloud Build triggers with branch filters.
Why wrong: Useful for automation but not specifically for security/reliability of deployments.
- D
Push all artifacts to a public Container Registry.
Why wrong: Public registry is insecure.
- E
Enable Cloud Build service account with Editor role.
Why wrong: Too permissive; use least privilege.
A startup is bootstrapping their Google Cloud organization with the following constraints: they have a small team of 10 developers, each with varying levels of expertise. They want a simple setup that allows developers to experiment in their own projects but prevents them from deleting production resources. They also want to enforce a budget limit on each project to avoid unexpected costs. The team has no prior Google Cloud experience and wants minimal operational overhead. Which of the following approaches best meets their needs?
Trap 1: Use a single project with VPC Service Controls to isolate resources.
VPC SC is for data exfiltration, not for simple environment separation.
Trap 2: Create a single project for all developers, use budget alerts, and…
Too permissive; no isolation.
Trap 3: Create a project per developer, give them Owner role, and set a…
No production isolation; each dev can create resources without oversight.
- A
Use a single project with VPC Service Controls to isolate resources.
Why wrong: VPC SC is for data exfiltration, not for simple environment separation.
- B
Create a single project for all developers, use budget alerts, and give everyone Owner role.
Why wrong: Too permissive; no isolation.
- C
Create a project per developer, give them Owner role, and set a budget on each project.
Why wrong: No production isolation; each dev can create resources without oversight.
- D
Create a folder for production and a folder for development, assign developers Editor on dev projects and Viewer on prod, set budget alerts on both folders.
Isolation of environments and budget control.
You are designing alerting policies for a microservice architecture. Which TWO metrics are most suitable for triggering a page to the on-call engineer?
Trap 1: Number of requests per second.
Request volume alone does not indicate a problem.
Trap 2: CPU utilization at 50%.
50% utilization is not critical; might be normal.
Trap 3: Memory usage trend.
Memory trends are not immediate page triggers unless critical.
- A
Latency P99 exceeding the SLO target for 5 minutes.
Breaching latency SLO directly impacts users.
- B
Error budget burn rate exceeding 10x in 1 hour.
High burn rate indicates fast consumption of error budget, requiring action.
- C
Number of requests per second.
Why wrong: Request volume alone does not indicate a problem.
- D
CPU utilization at 50%.
Why wrong: 50% utilization is not critical; might be normal.
- E
Memory usage trend.
Why wrong: Memory trends are not immediate page triggers unless critical.
A multinational company runs an application on Google Cloud with an SLO of 99.99% monthly availability. They use a multi-region deployment with Cloud Load Balancing and Cloud Spanner. During a regional outage in us-central1, traffic fails over to us-east1. However, the incident response team is not alerted because the error budget burn rate remained below the alert threshold. What should the team change to ensure timely alerting for such regional failures?
Trap 1: Shorten the SLO compliance window from 30 days to 7 days.
Shorter window makes SLO more sensitive but may cause false alarms.
Trap 2: Change the SLO to 99.9% to allow more error budget.
Does not improve alerting; weakens SLO.
Trap 3: Reduce the error budget burn rate alert threshold from 10% to 5%…
May increase noise but not specific to regional failures.
- A
Shorten the SLO compliance window from 30 days to 7 days.
Why wrong: Shorter window makes SLO more sensitive but may cause false alarms.
- B
Create a custom dashboard and alert for regional unavailability using Cloud Monitoring metrics like load_balancing/backend_request_count and region health checks.
Direct alerts for regional failures catch issues early.
- C
Change the SLO to 99.9% to allow more error budget.
Why wrong: Does not improve alerting; weakens SLO.
- D
Reduce the error budget burn rate alert threshold from 10% to 5% per hour.
Why wrong: May increase noise but not specific to regional failures.
An organization has a service that must meet a 99.99% SLO. The service runs on GKE and uses Cloud SQL. The team notices that during a major incident, the error budget is consumed rapidly. They want to implement a mechanism to automatically rollback deployments that cause sustained error budget consumption above a threshold. What is the best approach?
Trap 1: Use Cloud Scheduler to run a script that checks error budget and…
Script solution is brittle; not a managed service.
Trap 2: Implement a canary deployment strategy with manual approval steps.
Manual approval delays rollback; not fully automated.
Trap 3: Configure Cloud Build to automatically revert the last commit if…
Cloud Build is for building containers, not for rollback orchestration.
- A
Use Cloud Scheduler to run a script that checks error budget and rolls back if needed.
Why wrong: Script solution is brittle; not a managed service.
- B
Set up a deployment pipeline with Cloud Deploy that includes a predeployment validation step that checks the current error budget burn rate and blocks the release if the burn rate exceeds 10% per hour.
Automated policy prevents deployments that would consume error budget quickly.
- C
Implement a canary deployment strategy with manual approval steps.
Why wrong: Manual approval delays rollback; not fully automated.
- D
Configure Cloud Build to automatically revert the last commit if error budget is consumed.
Why wrong: Cloud Build is for building containers, not for rollback orchestration.
During a post-incident review, the team discovers that a misconfiguration in Cloud Armor caused legitimate traffic to be blocked, leading to a outage. The misconfiguration was introduced by a junior engineer who had overly permissive IAM roles. What is the best way to prevent similar incidents in the future?
Trap 1: Enforce a mandatory peer review for all Cloud Armor configuration…
Reduces risk but not foolproof; human error can still occur.
Trap 2: Revoke the junior engineer's access to Cloud Armor and grant…
Doesn't prevent mistakes by other engineers.
Trap 3: Enable Cloud Armor security policy logs and create alerting for…
Detects incidents but doesn't prevent.
- A
Enforce a mandatory peer review for all Cloud Armor configuration changes.
Why wrong: Reduces risk but not foolproof; human error can still occur.
- B
Revoke the junior engineer's access to Cloud Armor and grant read-only access.
Why wrong: Doesn't prevent mistakes by other engineers.
- C
Enable Cloud Armor security policy logs and create alerting for blocked traffic spikes.
Why wrong: Detects incidents but doesn't prevent.
- D
Use Organization Policy constraints to restrict allowed IP ranges and rules in Cloud Armor security policies.
Prevents creation of overly permissive rules.
During a canary deployment of a new version of a microservice, the engineer notices increased error rates in the canary instances. What is the best immediate action?
Trap 1: Continue the rollout to see if errors stabilize.
Continuing could affect more users; not advisable.
Trap 2: Scale up the canary instances to handle load.
Scaling up would increase the error rate and impact more users.
Trap 3: Pause the rollout and investigate the errors.
While investigation is good, pausing allows errors to continue; rollback is faster.
- A
Continue the rollout to see if errors stabilize.
Why wrong: Continuing could affect more users; not advisable.
- B
Perform a rollback of the canary to the previous version.
Rolling back immediately stops the errors and protects users.
- C
Scale up the canary instances to handle load.
Why wrong: Scaling up would increase the error rate and impact more users.
- D
Pause the rollout and investigate the errors.
Why wrong: While investigation is good, pausing allows errors to continue; rollback is faster.
An SRE team created the above logs-based metric. They expect it to count the number of HTTP 500 errors per instance. However, the metric shows no data. What is the most likely cause?
Exhibit
```
"logsBasedMetric": {
"filter": "resource.type=\"gce_instance\" AND jsonPayload.status=\"500\"",
"metricDescriptor": {
"metricKind": "DELTA",
"valueType": "INT64",
"name": "custom.googleapis.com/errors/5xx"
},
"labelExtractors": {
"instance_id": "EXTRACT(jsonPayload.instance_id)"
},
"description": "Count of 500 errors per instance"
}
```Trap 1: The metric kind is DELTA but should be CUMULATIVE.
DELTA is appropriate for counting events over time; CUMULATIVE would not change the filter matching issue.
Trap 2: The metric name does not follow the required naming convention.
custom.googleapis.com/errors/5xx is a valid custom metric name.
Trap 3: The labelExtractors must use regex instead of JSON path.
EXTRACT with jsonPayload is valid; regex is not required.
- A
The metric kind is DELTA but should be CUMULATIVE.
Why wrong: DELTA is appropriate for counting events over time; CUMULATIVE would not change the filter matching issue.
- B
The log entries might not have the 'status' field in jsonPayload; it could be in a different location or format.
If the logs are structured differently, the filter will not match, resulting in no data.
- C
The metric name does not follow the required naming convention.
Why wrong: custom.googleapis.com/errors/5xx is a valid custom metric name.
- D
The labelExtractors must use regex instead of JSON path.
Why wrong: EXTRACT with jsonPayload is valid; regex is not required.
A team wants to implement multi-cluster monitoring for GKE using Managed Service for Prometheus. Which configuration is required?
Trap 1: Enable Managed Service for Prometheus in one cluster and have other…
Each cluster must have the managed collection enabled independently.
Trap 2: Use Cloud Monitoring agent on nodes in each cluster
Managed Service for Prometheus uses its own collection, not the legacy agent.
Trap 3: Set up a separate workspace per cluster
This defeats the purpose of multi-cluster monitoring; a single workspace is needed.
- A
Enable Managed Service for Prometheus in one cluster and have other clusters forward metrics to it
Why wrong: Each cluster must have the managed collection enabled independently.
- B
Enable Managed Service for Prometheus in each cluster and configure a single Cloud Monitoring workspace to collect metrics from all clusters
This aggregates metrics from multiple clusters in one workspace.
- C
Use Cloud Monitoring agent on nodes in each cluster
Why wrong: Managed Service for Prometheus uses its own collection, not the legacy agent.
- D
Set up a separate workspace per cluster
Why wrong: This defeats the purpose of multi-cluster monitoring; a single workspace is needed.
A DevOps engineer needs to verify if a load balancer's health check is behaving normally by examining historical trends. Where should they look?
Trap 1: Cloud Logging
Logs are not the primary source for time-series metrics.
Trap 2: Cloud Console health check page
Shows current status, not historical trends.
Trap 3: Cloud Load Balancing logs
These logs contain request logs, not health check metrics.
- A
Cloud Monitoring Metrics Explorer
Metrics Explorer stores health check metrics for historical analysis.
- B
Cloud Logging
Why wrong: Logs are not the primary source for time-series metrics.
- C
Cloud Console health check page
Why wrong: Shows current status, not historical trends.
- D
Cloud Load Balancing logs
Why wrong: These logs contain request logs, not health check metrics.
A team is using Cloud Monitoring to track the performance of a microservices application. They set up an uptime check for each service, but they notice that some checks are failing intermittently without actual service degradation. What is the most likely cause?
Trap 1: The services are behind a load balancer that occasionally returns…
A 503 response indicates an actual scaling issue, not a false positive.
Trap 2: Uptime checks are deployed in a single region, causing false…
If only one region, false negatives are more likely, not false positives.
Trap 3: The project's quota for uptime checks has been exceeded.
If quota is exceeded, checks would not run at all, not fail intermittently.
- A
The services are behind a load balancer that occasionally returns 503 during scaling.
Why wrong: A 503 response indicates an actual scaling issue, not a false positive.
- B
The timeout setting is too short for the service's typical latency.
A short timeout can cause the check to fail even when the service is healthy, especially during transient latency spikes.
- C
Uptime checks are deployed in a single region, causing false positives.
Why wrong: If only one region, false negatives are more likely, not false positives.
- D
The project's quota for uptime checks has been exceeded.
Why wrong: If quota is exceeded, checks would not run at all, not fail intermittently.
A DevSecOps team is configuring Cloud Monitoring alerts for proactive incident response. Which two practices are recommended for effective alerting? (Choose two.)
Trap 1: Alert on every microsecond of latency increase.
Alerting on every tiny change generates noise and desensitizes responders.
Trap 2: Use a single high-level alert that covers all symptoms.
A single alert for all symptoms conflates issues and makes triage difficult.
Trap 3: Set alert thresholds based on arbitrary guesses.
Thresholds should be data-driven, not arbitrary.
- A
Define clear escalation paths for different alert severities.
Clear escalation ensures the right team is notified based on severity.
- B
Alert on every microsecond of latency increase.
Why wrong: Alerting on every tiny change generates noise and desensitizes responders.
- C
Use a single high-level alert that covers all symptoms.
Why wrong: A single alert for all symptoms conflates issues and makes triage difficult.
- D
Set alert thresholds based on arbitrary guesses.
Why wrong: Thresholds should be data-driven, not arbitrary.
- E
Create separate alerts for different symptom classes.
Separate alerts for latency, errors, etc., help focus on specific issues.
Question Discussion
Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.
Sign in to join the discussion.