Knowledge + Practice

Google Professional Cloud Developer (PCD) — Questions 76–150

500 questions total · 7pages · All types, answers revealed

Take a mock exam Exam hub

Page 2 of 7

76

MCQhard

A company uses Cloud Monitoring with custom metrics. They have a custom metric called 'requests_total' with labels 'endpoint', 'status_code'. They want to create an alert that fires if the error rate (status_code >=500) for any endpoint exceeds 5% over a 5-minute window. Which MQL query should they use?

A.fetch custom::requests_total | { filter status_code >= 500 ; group_by [endpoint], sum() } / { group_by [endpoint], sum() } | condition gt 0.05

B.fetch custom::requests_total | filter status_code < 500 | ratio | condition gt 0.05

C.fetch custom::requests_total | group_by [endpoint], sum() | filter status_code >= 500 | ratio | condition gt 0.05

D.fetch custom::requests_total | filter status_code >= 500 | ratio | condition gt 0.05

AnswerA

Correct: groups errors and total by endpoint, divides, and applies condition.

Why this answer

Option A is correct because it first filters for error responses (status_code >= 500), then groups by endpoint and sums the error count, and divides that by the total count per endpoint (also grouped and summed). This computes the error rate per endpoint, and the condition fires when that rate exceeds 0.05 (5%) over the 5-minute window. The use of two separate group_by operations within a join (the `{ ... } / { ... }` syntax) is the correct MQL pattern for calculating a ratio per label.

Exam trap

Cisco often tests the distinction between `ratio` (which operates on the number of time series) and explicit division with group_by (which operates on metric values per label), leading candidates to incorrectly choose a `ratio`-based query that ignores per-endpoint grouping.

How to eliminate wrong answers

Option B is wrong because it filters for status_code < 500 (successes) instead of errors, and uses `ratio` without the proper group_by to compute per-endpoint rates, which would produce an overall ratio across all endpoints. Option C is wrong because it applies `group_by [endpoint], sum()` before filtering for errors, which sums all requests first and then filters, making it impossible to compute a per-endpoint error rate correctly. Option D is wrong because it uses `ratio` without any group_by, which would compute the overall error rate across all endpoints combined, not per endpoint as required.

Full explanation →

77

MCQmedium

A company is developing a microservices application on Google Cloud. Each service is deployed as a Docker container on Cloud Run. The development team wants to ensure that inter-service communication is encrypted and authenticated. What is the best approach?

A.Use Cloud Run's built-in IAM-based authentication and automatic TLS for internal requests.

B.Configure mutual TLS (mTLS) between services using Cloud Endpoints.

C.Deploy a sidecar proxy on each Cloud Run service to handle TLS termination.

D.Assign a service account to each service and use its private key to sign requests.

AnswerA

Cloud Run uses IAM to authenticate requests between services and automatically provisions TLS certificates.

Why this answer

Cloud Run automatically provisions TLS certificates for all incoming requests and supports IAM-based authentication for internal requests between services in the same Google Cloud project. This means inter-service communication is encrypted by default via HTTPS and can be authenticated by configuring the receiving service to require a valid IAM token from the caller, without any additional infrastructure or sidecar proxies.

Exam trap

Cisco often tests the misconception that you need to manually configure mTLS or deploy sidecar proxies for encryption and authentication in Cloud Run, when in fact Cloud Run's built-in IAM and automatic TLS handle both requirements natively.

How to eliminate wrong answers

Option B is wrong because Cloud Endpoints is an API management service for external-facing APIs, not designed for internal service-to-service mTLS on Cloud Run; Cloud Run already handles TLS termination natively. Option C is wrong because deploying a sidecar proxy on Cloud Run is unnecessary and adds complexity — Cloud Run automatically terminates TLS at the ingress and supports IAM-based authentication without requiring a separate proxy. Option D is wrong because using a service account's private key to sign requests is not a built-in Cloud Run feature; Cloud Run uses IAM tokens (e.g., OIDC tokens) for authentication, not raw private key signing.

Full explanation →

78

Matchingmedium

Match each command-line tool to its primary use.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Manage Google Cloud resources

Interact with Cloud Storage

Run BigQuery queries and manage datasets

Manage Kubernetes clusters

Continuous development for Kubernetes applications

Why these pairings

These CLI tools are essential for developers working on Google Cloud.

Full explanation →

79

MCQmedium

A company uses Cloud Run for a serverless application. They notice that cold starts are causing high latency for some requests. What is the best strategy to reduce cold starts?

A.Increase the max instances setting

B.Set a minimum number of instances to keep containers always warm

C.Migrate the application to Cloud Functions

D.Reduce the container concurrency setting

AnswerB

Min instances ensures pre-warmed containers are always ready.

Why this answer

Option B is correct because setting a minimum number of instances ensures that Cloud Run keeps a baseline of container instances always warm and ready to serve requests. This eliminates cold starts for the first requests that hit those pre-warmed instances, directly addressing the latency issue. Cloud Run automatically scales to zero when idle, but a minimum instance setting overrides that behavior for the specified number of containers.

Exam trap

The trap here is that candidates often confuse 'max instances' with 'min instances,' thinking that raising the upper limit will somehow pre-warm containers, when in fact it only controls the ceiling for scaling out, not the floor for keeping instances alive.

How to eliminate wrong answers

Option A is wrong because increasing the max instances setting only raises the upper scaling limit, which does nothing to prevent cold starts; it can actually increase the number of cold starts if traffic spikes cause new instances to be created. Option C is wrong because migrating to Cloud Functions does not inherently solve cold starts—Cloud Functions also has cold start latency, and the underlying infrastructure is similar; the recommendation would be the same (set a minimum instance count). Option D is wrong because reducing the container concurrency setting limits how many concurrent requests a single container can handle, which may force more instances to be created (increasing cold starts) rather than reducing them.

Full explanation →

80

Multi-Selecthard

Which THREE common issues cause deployment failures on App Engine? (Choose 3.)

Select 3 answers

A.Using a runtime version that is not available in the app's region.

B.Exceeding the maximum file size limit for application files.

C.Setting the app to scale to 0 instances.

D.Uploading a configuration file (e.g., cron.yaml) with invalid syntax.

E.Creating a resource with backend type set to 'backend' instead of 'frontend'.

AnswersA, B, D

Some runtimes may not be available everywhere.

Why this answer

Option A is correct because App Engine requires that the runtime version specified in your app.yaml is available in the region where the application is deployed. If you select a runtime version that has been deprecated or is not yet rolled out to that region, the deployment will fail with an error indicating the runtime is unavailable. This is a common issue when using newer runtime versions that are only available in certain regions.

Exam trap

Cisco often tests the misconception that scaling to 0 instances is a valid App Engine configuration, but in reality, the standard environment requires at least one instance to serve traffic, and the flexible environment also has a minimum of 1 instance by default.

Full explanation →

81

MCQhard

Refer to the exhibit. The user developer@example.com tries to create a firewall rule and receives a permission denied error. What is the most likely reason?

A.The user lacks compute.networkAdmin role

B.The user lacks compute.securityAdmin role

C.The user is missing compute.firewalls.create permission

D.All of the above

AnswerA, B, C, D

compute.networkAdmin includes firewall permissions.

Why this answer

The correct answer is D: All of the above. Creating a firewall rule in Google Cloud requires the `compute.securityAdmin` role (which includes `compute.firewalls.create` permission) or the `compute.networkAdmin` role (which also includes `compute.firewalls.create` permission). If the user lacks any of these roles or the specific permission, they will receive a permission denied error.

Option A, B, and C are all individually correct reasons, making D the most comprehensive and accurate choice.

Exam trap

The trap here is that Cisco presents three individually correct statements (A, B, C) and expects candidates to pick only one, but the question is designed to test whether you recognize that all three are valid reasons for the same error, making 'All of the above' the correct answer.

How to eliminate wrong answers

Option A is correct because the `compute.networkAdmin` role includes the `compute.firewalls.create` permission, and lacking it would cause a permission denied error. Option B is correct because the `compute.securityAdmin` role also includes the `compute.firewalls.create` permission, and lacking it would also cause the error. Option C is correct because the `compute.firewalls.create` permission is the specific IAM permission required to create firewall rules, and missing it directly results in a permission denied error.

Since all three options are individually valid reasons, the question expects the candidate to recognize that multiple factors can cause the same error, making D the only fully correct answer.

Full explanation →

82

MCQhard

A company has a Cloud Run service that ingests messages from a Cloud Pub/Sub subscription. The service uses automatic scaling based on CPU. Recently, the team noticed that when message volume spikes, the service scales up slowly, causing a backlog. What is the most effective solution to reduce the time to scale out?

A.Increase the max instances for the Cloud Run service to 100.

B.Use Cloud Tasks to buffer messages and configure a Cloud Scheduler job to pull from the queue.

C.Set a minimum number of instances on the Cloud Run service to 5.

D.Change the subscription type from pull to push and set the Cloud Run service as the push endpoint.

AnswerD

Push subscriptions invoke the service directly upon message delivery, reducing latency and improving scaling speed.

Why this answer

Option A is correct because Cloud Run can use direct Pub/Sub push subscriptions to trigger invocations, which reduces the polling interval and improves scaling responsiveness compared to pull subscriptions with CPU-based scaling. Option B is wrong because min-instances causes idle cost but does not improve scale-up speed when below min; it only ensures a base level. Option C is wrong because Cloud Tasks adds another queueing layer without solving the scaling delay.

Option D is wrong because increasing max instances helps during high load but does not speed initial scaling.

Full explanation →

83

MCQhard

Your team is using Cloud Build to build and test a Java application. The build includes unit tests, integration tests, and static code analysis. The build is failing intermittently due to flaky tests. You want to automatically retry the failed steps without rebuilding everything. Which Cloud Build feature should you use?

A.Configure a Cloud Build trigger to rerun the build on failure

B.Set the 'allowFailure: false' and 'retry: 2' options on the test steps in the cloudbuild.yaml

C.Use build substitutions to pass different test parameters on failure

D.Increase the timeout for the build to allow retries

AnswerB

Cloud Build supports step-level retry with 'retry' field.

Why this answer

Option B is correct because Cloud Build supports the `retry` option on individual build steps, which allows a step to be automatically retried a specified number of times upon failure without re-executing previous steps. This is ideal for handling flaky tests, as it only reruns the failed step, preserving build artifacts and avoiding a full rebuild.

Exam trap

Cisco often tests the misconception that retrying a build must involve the entire pipeline (trigger or timeout), when in fact Cloud Build provides a step-level retry option that preserves previous step outputs and avoids full rebuilds.

How to eliminate wrong answers

Option A is wrong because configuring a Cloud Build trigger to rerun the entire build on failure would rebuild everything from scratch, including steps that succeeded, which is inefficient and does not target only the flaky test step. Option C is wrong because build substitutions are used to parameterize build configurations at submission time, not to trigger retries on failure; they cannot automatically rerun a failed step. Option D is wrong because increasing the build timeout only extends the maximum duration allowed for the build, it does not provide any retry mechanism for failed steps.

Full explanation →

84

MCQmedium

A development team wants to implement a CI/CD pipeline for a containerized application on Google Cloud. They are using Cloud Build and Cloud Deploy. The application requires canary deployments with automatic rollback if the error rate increases by more than 10% within 5 minutes after deployment. Which Cloud Deploy feature should they configure?

A.Define a Cloud Deploy deployment policy with a rollout policy that uses a canary strategy and a verification phase with automated rollback

B.Configure a Pub/Sub notification on the rollout to trigger a rollback via a Cloud Function

C.Use Cloud Monitoring to create an alert policy that triggers a Cloud Function to rollback the deployment

D.Set up a Cloud Build trigger to rebuild the previous image on error

AnswerA

Cloud Deploy deployment policies can automate rollback based on criteria like error rate thresholds.

Why this answer

Option A is correct because Cloud Deploy's deployment policies allow you to define a canary rollout strategy with an automated verification phase. When the verification phase detects that the error rate exceeds the defined threshold (e.g., 10% increase within 5 minutes), Cloud Deploy automatically initiates a rollback to the previous stable revision, meeting the team's requirement without additional custom code.

Exam trap

The trap here is that candidates often assume external monitoring and custom functions (Options B and C) are required for automated rollbacks, overlooking Cloud Deploy's native deployment policy feature that directly supports canary rollouts with automated rollback based on verification phase conditions.

How to eliminate wrong answers

Option B is wrong because while Pub/Sub notifications can be used to trigger external actions, this approach requires a custom Cloud Function to interpret the notification and perform the rollback, which is not a native Cloud Deploy feature and adds unnecessary complexity and latency. Option C is wrong because Cloud Monitoring alert policies can trigger Cloud Functions, but this is an external workaround that does not leverage Cloud Deploy's built-in automated rollback capabilities; it also introduces a dependency on external monitoring and custom rollback logic. Option D is wrong because Cloud Build triggers are designed for building and testing, not for managing deployment rollbacks; rebuilding a previous image does not automatically revert the running deployment and ignores Cloud Deploy's rollout management.

Full explanation →

85

MCQmedium

An application running on GKE is experiencing high latency. The team uses Cloud Trace to identify the bottleneck. They notice that a particular service spends most of its time waiting on a database query. How can they optimize performance?

A.Decrease the number of pods to reduce load

B.Use Cloud CDN to cache database results

C.Optimize the database query and add appropriate indexes

D.Increase the number of replicas for the service

AnswerC

Query optimization reduces execution time.

Why this answer

Option C is correct because the bottleneck is identified as a database query causing high latency. Optimizing the query and adding appropriate indexes directly reduces the time spent waiting on the database, which is the root cause. Cloud Trace shows the service is waiting on the database, so improving database performance is the most effective solution.

Exam trap

Google Cloud often tests the misconception that scaling horizontally (adding replicas) solves all performance issues, but here the bottleneck is external to the service (database), so scaling the service does not reduce the per-query wait time.

How to eliminate wrong answers

Option A is wrong because decreasing the number of pods reduces concurrency and can increase latency under load, not decrease it. Option B is wrong because Cloud CDN caches static content at edge locations, not dynamic database query results, and cannot cache database responses that are unique per request. Option D is wrong because increasing replicas spreads the load but does not address the database query latency; the service will still wait the same amount of time per query, and may even increase database contention.

Full explanation →

86

MCQmedium

An application running on GKE needs to access a Cloud SQL instance. The team wants to avoid using Cloud SQL Auth Proxy to reduce complexity. What is the most secure alternative?

A.Whitelist the GKE node external IPs in Cloud SQL authorized networks.

B.Use a Cloud SQL read replica with a public IP.

C.Use Private Service Connect to connect privately.

D.Configure Cloud SQL to allow all traffic from the VPC.

AnswerC

Private Service Connect offers secure private connectivity.

Why this answer

Option A is correct because Private Service Connect provides private, secure connectivity without the need for a proxy. Option B is wrong because whitelisting node IPs is insecure due to shared IPs. Option C is wrong because read replicas with public IP are less secure.

Option D is wrong because allowing all VPC traffic is too permissive.

Full explanation →

87

MCQmedium

Your team manages a service that receives thousands of requests per second. They have set up Cloud Monitoring alerting based on the 99th percentile latency. Recently, they received an alert warning that latency exceeded 1 second, but after investigating, they found it was a false alarm caused by a single very slow request. How can they improve their alert to reduce false positives?

A.Set the alert to fire only if the condition persists for a longer duration.

B.Use a log-based metric instead of latency.

C.Increase the alerting threshold to 2 seconds.

D.Use a different latency metric like median or 95th percentile.

AnswerD

Lower percentiles are less sensitive to outliers, reducing false alarms while still capturing most user experience.

Why this answer

The 99th percentile is sensitive to outliers; switching to a lower percentile like the 95th or median reduces the impact of rare slow requests and provides a more stable indicator of typical performance.

Full explanation →

88

MCQeasy

You are setting up Cloud Build to automatically deploy a container to Cloud Run when code is pushed to the main branch of a GitHub repository. What is the minimal configuration required?

A.Create a Cloud Build trigger connected to GitHub, and include a cloudbuild.yaml with steps to build and deploy.

B.Set up GitHub Actions to push images to Container Registry and then use Cloud Run.

C.Create a Cloud Build trigger without a build config file, using the inline builder.

D.Use Artifact Registry to store images and then manually trigger deployment.

AnswerA

This directly deploys.

Why this answer

Option A is correct because a Cloud Build trigger connected to GitHub, and a cloudbuild.yaml that builds and deploys to Cloud Run is the minimal. Option B is wrong because Artifact Registry is recommended but not required. Option C is wrong because a separate build configuration file is needed.

Option D is wrong because GitHub Actions is separate.

Full explanation →

89

Multi-Selecteasy

A company stores sensitive user data in Cloud Storage. They want to ensure that only authenticated users with the appropriate permissions can access the data, and that data is encrypted at rest. Which two steps should they take? (Choose TWO.)

Select 2 answers

A.Configure a Customer-Managed Encryption Key (CMEK) in Cloud KMS.

B.Enable default encryption on the bucket using Google-managed keys.

C.Use IAM roles to grant access to specific users and groups.

D.Set bucket-level public access prevention.

E.Enable VPC Service Controls to restrict data access.

AnswersB, C

Default server-side encryption is already enabled.

Why this answer

Option B is correct because Cloud Storage buckets are encrypted at rest by default using Google-managed keys, which satisfies the requirement for data encryption without additional configuration. Option C is correct because IAM roles provide fine-grained access control, ensuring only authenticated users with appropriate permissions can access the data.

Exam trap

Cisco often tests the misconception that enabling default encryption or using CMEK is optional or that public access prevention alone satisfies access control, when in fact IAM is the primary mechanism for user-level authorization and default encryption is already enabled.

Full explanation →

90

MCQhard

A company runs a stateful application on Compute Engine instances with local SSDs. They need to perform maintenance that requires stopping the instances. What is the best approach to ensure data durability and minimal downtime?

A.Create a snapshot of the local SSD before stopping the instance

B.Use instance groups with autohealing to automatically recreate instances

C.Enable live migration on the instance

D.Migrate data to persistent disks and configure the application to use persistent disks

AnswerD

Persistent disks are durable and can be detached and reattached to other instances, ensuring data persistence during maintenance.

Why this answer

Local SSDs provide ephemeral storage that is tied to the lifecycle of the Compute Engine instance. When an instance is stopped or terminated, data on local SSDs is permanently lost. To ensure data durability during maintenance that requires stopping the instance, the application must use persistent disks, which are durable network-attached storage that persists independently of the instance.

Option D is correct because migrating the application to persistent disks ensures data survives the stop and allows the instance to be restarted with the same data, minimizing downtime.

Exam trap

The trap here is that candidates assume local SSDs can be snapshotted or that live migration works with local SSDs, but Google Cloud explicitly disables both features for local SSDs, making persistent disks the only durable option for stateful workloads requiring maintenance.

How to eliminate wrong answers

Option A is wrong because snapshots cannot be created directly from local SSDs; local SSDs are ephemeral and do not support snapshot creation. Option B is wrong because instance groups with autohealing recreate instances based on health checks, but they do not preserve data on local SSDs, which are lost when instances are terminated or recreated. Option C is wrong because live migration is enabled by default for instances with persistent disks, but it is not supported for instances with local SSDs; local SSDs prevent live migration, so the instance must be stopped for maintenance.

Full explanation →

91

MCQmedium

A team wants to monitor CPU utilization on their Compute Engine instances. They need an alert that sends a notification when the average CPU utilization across all instances in a project exceeds 80% for more than 5 minutes. Which alerting configuration should they use?

A.Use Cloud Scheduler to periodically check CPU and trigger notification

B.Create a log-based alert using metrics from Cloud Logging

C.Use an uptime check to monitor CPU utilization

D.Create an alert policy with a metric threshold condition for compute.googleapis.com/instance/cpu/utilization, aggregated across all instances with alignment period 1 min and duration 5 min

AnswerD

This correctly sets up a threshold alert on CPU utilization.

Why this answer

Option D is correct because Cloud Monitoring alert policies allow you to define a metric threshold condition using the `compute.googleapis.com/instance/cpu/utilization` metric, aggregate it across all instances in the project, and set an alignment period of 1 minute with a duration of 5 minutes. This configuration ensures the alert fires only when the average CPU utilization exceeds 80% for a sustained period of 5 minutes, meeting the exact requirement.

Exam trap

The trap here is that candidates confuse log-based alerts (which work on log entries) with metric-based alerts (which work on numeric time-series data), leading them to incorrectly choose Option B.

How to eliminate wrong answers

Option A is wrong because Cloud Scheduler is a cron job service for triggering actions on a schedule, not a monitoring or alerting tool; it cannot natively evaluate metric thresholds or aggregate CPU utilization across instances. Option B is wrong because log-based alerts are designed for log entries, not for numeric metric thresholds like CPU utilization; they cannot directly monitor `compute.googleapis.com/instance/cpu/utilization` as a metric. Option C is wrong because uptime checks monitor HTTP/HTTPS/TCP endpoint availability and response, not CPU utilization metrics; they are used for service health, not infrastructure resource usage.

Full explanation →

92

MCQmedium

A company uses Cloud Deploy for continuous delivery with multiple targets (dev, staging, prod). After a successful promotion to staging, the team discovers a critical bug and needs to roll back the production target to the previous release. The production target has already been promoted to the current release, but the staging target should remain on the current release. How should the team roll back the production target?

A.Create a new release with the same image tag as the previous release and promote it to production.

B.Use the 'gcloud deploy rollback' command targeting the production target.

C.Redeploy the previous release by running the previous Cloud Deploy command.

D.Manually delete the current release and then promote the previous release again.

AnswerB

Rollback creates a new release with the previous rollout's configuration and deploys it to the target.

Why this answer

Option B is correct because the 'gcloud deploy rollback' command is specifically designed to roll back a Cloud Deploy target to its previous successful release without affecting other targets. This command reverts the production target to the prior release while leaving the staging target on the current release, as required. It operates by redeploying the last known good release to the specified target, ensuring minimal disruption and preserving the promotion history.

Exam trap

Cisco often tests the misconception that rolling back a target requires creating a new release or manually manipulating releases, when in fact Cloud Deploy provides a dedicated rollback command that handles the process cleanly without affecting other targets or the release history.

How to eliminate wrong answers

Option A is wrong because creating a new release with the same image tag as the previous release would create a duplicate release in the pipeline, not a true rollback; it would also require a new promotion, which could trigger unintended side effects like re-running tests or approvals. Option C is wrong because rerunning the previous Cloud Deploy command would attempt to create a new release or promotion from scratch, not revert the production target to a prior state, and it could overwrite the current release history. Option D is wrong because manually deleting the current release is not supported in Cloud Deploy—releases are immutable once created—and promoting the previous release again would require it to still exist in the pipeline, which it does, but the manual deletion step is invalid and could break the deployment pipeline.

Full explanation →

93

Multi-Selectmedium

Which TWO security best practices should be implemented when using Cloud Build to deploy applications? (Choose 2.)

Select 2 answers

A.Add SSH keys to Cloud Build for private Git repos.

B.Use Cloud KMS to encrypt sensitive environment variables.

C.Use container image tags instead of digests in build configs.

D.Store secrets in Cloud Build's default substitution variables.

E.Restrict Cloud Build trigger creation to specific IAM roles.

AnswersB, E

Encrypted variables are decrypted at build time.

Why this answer

Options A and D are correct. Option A prevents exposure of build secrets. Option D ensures only authorized triggers.

Option B is wrong because Cloud Build does not encrypt variables by default. Option C is wrong because Cloud Build verifies images by digest, not tag. Option E is wrong because Cloud Build does not use SSH keys natively.

Full explanation →

94

MCQhard

A financial services company has a critical application that must survive a regional outage. They deployed on Compute Engine across multiple zones within a single region and now want to redirect traffic to a secondary region if the primary region becomes unavailable. Which load balancing solution should they use?

A.SSL Proxy Load Balancer

B.External HTTP(S) Load Balancer

C.Proxy Network Load Balancer

D.Internal TCP/UDP Load Balancer

E.Network Load Balancer

AnswerB

Global load balancer that can distribute traffic to backends in multiple regions and perform health-check-based failover.

Why this answer

The External HTTP(S) Load Balancer is the correct choice because it supports global load balancing across multiple regions, enabling traffic failover to a secondary region when the primary region becomes unavailable. It uses anycast IP addresses and is designed for HTTP/S traffic, making it suitable for a critical application that must survive a regional outage.

Exam trap

The trap here is that candidates often confuse regional load balancers (like Network Load Balancer or SSL Proxy) with global ones, assuming any load balancer can handle cross-region failover, but only the External HTTP(S) Load Balancer (and the External TCP/UDP Network Load Balancer with global access) supports multi-region failover for HTTP/S traffic.

How to eliminate wrong answers

Option A is wrong because SSL Proxy Load Balancer is a regional load balancer that terminates SSL connections and forwards TCP traffic, but it does not support cross-region failover or global load balancing. Option C is wrong because Proxy Network Load Balancer is a regional load balancer for TCP/UDP traffic and cannot redirect traffic to a secondary region. Option D is wrong because Internal TCP/UDP Load Balancer is a regional internal load balancer used for private traffic within a VPC and cannot handle cross-region failover.

Option E is wrong because Network Load Balancer is a regional passthrough load balancer for TCP/UDP traffic and does not support global load balancing or regional failover.

Full explanation →

95

MCQmedium

The developer runs the command above and sees both instances are unhealthy. The instances are running and serving traffic on port 80 when accessed directly. What is the most likely cause?

A.Firewall rules block the health check probe IP ranges

B.The instances have been deleted

C.The instances are not running the specified health check port

D.The load balancer is misconfigured

E.The instances are out of memory and unable to respond

AnswerA

Health check probes originate from Google's health checker IP ranges; they must be allowed in firewall rules.

Why this answer

The most likely cause is that firewall rules are blocking the health check probe IP ranges. Google Cloud Platform (GCP) load balancers use specific, documented IP ranges for health check probes. If a firewall rule denies traffic from these ranges, the load balancer will mark the instances as unhealthy even though the instances are running and serving traffic on port 80 when accessed directly.

This is a common misconfiguration because the health check probes originate from these special IP ranges, not from the load balancer's frontend IP.

Exam trap

Cisco often tests the misconception that health checks originate from the load balancer's frontend IP or that the instance's direct accessibility implies it will pass health checks, ignoring that health check probes come from specific, separate IP ranges that must be explicitly allowed in firewall rules.

How to eliminate wrong answers

Option B is wrong because the instances are explicitly described as 'running and serving traffic on port 80 when accessed directly,' so they have not been deleted. Option C is wrong because the instances are serving traffic on port 80, which matches the specified health check port (port 80), so the port is correct. Option D is wrong because the load balancer is correctly configured to send health checks to port 80, and the instances respond on that port; the issue is that the health check probes are being blocked, not that the load balancer configuration is incorrect.

Option E is wrong because the instances are serving traffic on port 80 when accessed directly, indicating they are not out of memory and are capable of responding; memory exhaustion would prevent all responses, not just health check responses.

Full explanation →

96

Multi-Selecthard

Which THREE are valid ways to create custom metrics in Cloud Monitoring? (Select exactly 3.)

Select 3 answers

A.Use the Cloud Billing pricing calculator to estimate metric costs.

B.Use the Cloud Monitoring API to write time series directly.

C.Install the Cloud Monitoring agent and configure custom metrics in its configuration file.

D.Define a log-based metric in Cloud Logging based on log content.

E.Deploy Ops Agent with default configuration.

AnswersB, C, D

Allows programmatic metric creation.

Why this answer

Option B is correct because the Cloud Monitoring API allows you to write time series data directly via the `projects.timeSeries.create` method, which is a primary mechanism for ingesting custom metrics. This enables you to programmatically send metric data from any source, bypassing the need for an agent.

Exam trap

Cisco often tests the distinction between agent-based collection of predefined metrics (Ops Agent default) and the explicit creation of custom metrics via API or log-based definitions, leading candidates to mistakenly select the Ops Agent default as a valid method for custom metrics.

Full explanation →

97

Multi-Selectmedium

A team uses GitHub for source control. They want to automatically trigger Cloud Build builds on pull request creation. Which two actions are required? (Choose two.)

Select 2 answers

A.Install the Cloud Build GitHub app in the repository

B.Create a Cloud Build trigger that listens to 'pull_request' event

C.Configure a webhook in GitHub to send push events to Cloud Build

D.Use Cloud Source Repositories as a mirror of GitHub

E.In the Cloud Build trigger, set the event to 'push' and branch filter to 'pull-request/*'

AnswersA, B

The app is required to allow Cloud Build to receive webhook events from GitHub.

Why this answer

Installing the Cloud Build GitHub app and creating a trigger with the 'pull_request' event are the two necessary steps. Other options are either not needed or incorrect.

Full explanation →

98

MCQeasy

A startup is deploying a Node.js application on App Engine Standard Environment. They have configured the application in app.yaml with runtime: nodejs16. After deploying with gcloud app deploy, the deployment succeeds, but when they access the application, they get a 502 Bad Gateway error. They check the logs and see "Failed to start container" and "Error: Cannot find module 'express'". The application uses Express. The team has confirmed that the package.json file includes express as a dependency. What is the most likely cause?

A.The application is running on a different port than the one specified in the environment variable PORT.

B.The node_modules folder was not uploaded because it is in the .gcloudignore file.

C.The package.json file is missing the express dependency.

D.The request exceeds the 60-second timeout.

AnswerB

If node_modules is ignored, App Engine will install dependencies during deployment, but if there is a lockfile issue or missing package.json fields, it may fail. However, the error indicates express is not installed, so the build process may not have run correctly, possibly due to .gcloudignore preventing upload of a needed file.

Why this answer

Option A is correct because the error indicates that the express module cannot be found. This typically happens if npm install was not run or if node_modules was not included in the deployment. In App Engine Standard, dependencies are automatically installed based on package.json, but if the node_modules folder is present in the project directory and contains an incomplete install, it might cause issues.

However, the most common reason is that the node_modules folder was not uploaded because it is listed in .gcloudignore, preventing the automatic install. Option B is the direct cause. Option C is incorrect because a port mismatch would cause a different error.

Option D is incorrect because timeout errors are logged as such.

Full explanation →

99

Multi-Selectmedium

Which TWO best practices should be followed when deploying a containerized application to Cloud Run for production?

Select 2 answers

A.Use a minimal base image with only necessary dependencies.

B.Set min-instances to 0 to save costs when idle.

C.Set max-instances to unlimited to handle traffic spikes.

D.Configure CPU to be always allocated to reduce latency.

E.Always use the latest public image from Docker Hub for dependencies.

AnswersA, D

Minimal images reduce attack surface and improve start time.

Why this answer

Options A and D are correct. A: Use container images with minimal surface area to reduce vulnerabilities and cold starts. D: Set CPU always allocated to avoid cold starts and ensure responsiveness.

Option B is wrong because max instances should be limited to avoid unlimited cost. Option C is wrong because using public images is a security risk. Option E is wrong because Cloud Run automatically handles scaling.

Full explanation →

100

MCQeasy

A developer wants to store application logs from Compute Engine instances in a centralized logging system. Which service should they use?

A.Cloud Monitoring

B.Cloud Trace

C.Cloud Debugger

D.Cloud Logging

AnswerD

Cloud Logging is designed to store, search, and analyze log data.

Why this answer

Option A is correct because Cloud Logging is the centralized logging service for Google Cloud. Option B is for monitoring metrics, option C is for tracing, and option D is for debugging.

Full explanation →

101

MCQeasy

A developer runs the above command and receives a successful deployment. However, the service is not accessible from the internet. The service is intended to be public. What should the developer check next?

A.The region us-central1 is not available

B.The Cloud Run service has a custom domain mapped

C.The container image is healthy

D.The service IAM policy to ensure allUsers has Cloud Run Invoker role

AnswerD

This is the most common reason for a publicly inaccessible Cloud Run service after successful deployment.

Why this answer

Option D is correct because Cloud Run services are private by default; even after a successful deployment, the service will not be accessible from the internet unless the IAM policy explicitly grants the `roles/run.invoker` role to `allUsers`. Without this permission, any HTTP request from outside the project will be denied with a 403 Forbidden error, regardless of the service's health or region.

Exam trap

Cisco often tests the misconception that a successful deployment or a healthy container automatically makes a service publicly accessible, when in fact Cloud Run requires an explicit IAM binding to allow unauthenticated invocations.

How to eliminate wrong answers

Option A is wrong because `us-central1` is a standard, fully available Google Cloud region; region unavailability would cause a deployment failure, not a post-deployment accessibility issue. Option B is wrong because a custom domain is optional for public access — Cloud Run automatically provides a `*.run.app` URL that is publicly resolvable; the issue is IAM, not DNS. Option C is wrong because a healthy container image is required for a successful deployment, but it does not control network-level access; the container could be perfectly healthy yet still unreachable if IAM denies unauthenticated invocations.

Full explanation →

102

MCQmedium

A team is using Cloud Source Repositories and wants to enforce code reviews before merging. What tool should they use?

A.Cloud Source Repositories pull requests without restrictions.

B.Cloud Deploy with manual approval.

C.Cloud Source Repositories with branch protection rules that require pull request reviews and passing status checks.

D.Cloud Build triggers with approval gates.

AnswerC

Enforces mandatory code reviews and CI checks.

Why this answer

Cloud Source Repositories (CSR) integrates with Cloud Build and Git. To enforce mandatory code reviews before merging, you configure branch protection rules on the CSR repository. These rules require pull request reviews and passing status checks (e.g., from Cloud Build), preventing direct pushes to protected branches.

This is the native Git-based mechanism for enforcing review workflows.

Exam trap

The trap here is confusing deployment approval gates (Cloud Deploy or Cloud Build) with repository-level merge controls, leading candidates to pick a CI/CD tool instead of the correct branch protection feature within Cloud Source Repositories.

How to eliminate wrong answers

Option A is wrong because CSR pull requests without restrictions do not enforce code reviews; they allow merging without any approval, defeating the requirement. Option B is wrong because Cloud Deploy is a continuous delivery service for deploying to GKE, Cloud Run, etc., not a code review or repository management tool; its manual approval gates apply to deployment pipelines, not to merging code. Option D is wrong because Cloud Build triggers with approval gates control whether a build runs after a commit, not whether a pull request can be merged; they do not enforce code review requirements on the repository itself.

Full explanation →

103

Multi-Selectmedium

A developer wants to automatically detect and capture application errors in a production environment on Google Cloud. Which two Google Cloud services should be enabled? (Choose two.)

Select 2 answers

A.Cloud Error Reporting

B.Cloud Trace

C.Cloud Profiler

D.Cloud Debugger

E.Cloud Logging

AnswersA, E

Cloud Error Reporting automatically detects and groups application errors.

Why this answer

Cloud Error Reporting aggregates and displays application errors in real time, allowing developers to automatically detect and capture errors in production. Cloud Logging stores all application logs, which Error Reporting uses as a source to identify and analyze error events. Together, they provide a complete solution for error detection and capture without manual intervention.

Exam trap

Cisco often tests the distinction between monitoring (Error Reporting, Logging) and debugging/tracing tools (Debugger, Trace, Profiler), leading candidates to select Debugger or Trace for error detection when they are designed for different purposes.

Full explanation →

104

MCQmedium

An application uses Cloud SQL and is experiencing slow query performance. The team wants to monitor query latency and identify slow queries. Which Google Cloud tool should they use?

A.Cloud SQL Insights

B.Cloud Debugger

C.Cloud Monitoring

D.Cloud Trace

AnswerA

Cloud SQL Insights is designed for query performance monitoring.

Why this answer

Cloud SQL Insights is the correct tool because it is specifically designed to provide detailed query performance diagnostics for Cloud SQL databases. It captures query latency, execution plans, and wait events, enabling teams to identify and troubleshoot slow queries directly within the Cloud SQL console without additional configuration.

Exam trap

The trap here is that candidates often confuse Cloud Trace (which traces request latency across services) with database query tracing, but Cloud Trace does not provide per-query execution plans or database-specific wait events, making Cloud SQL Insights the only tool that directly addresses slow query identification in Cloud SQL.

How to eliminate wrong answers

Option B (Cloud Debugger) is wrong because it is used for inspecting the state of a running application (e.g., capturing variable values and stack traces) in production, not for monitoring database query latency. Option C (Cloud Monitoring) is wrong because while it can collect metrics and set alerts for Cloud SQL, it does not provide per-query latency breakdowns or execution plan analysis; it is a general monitoring tool, not a query-specific diagnostic tool. Option D (Cloud Trace) is wrong because it focuses on end-to-end request latency across distributed services (e.g., HTTP requests), not on individual database query performance within Cloud SQL.

Full explanation →

105

MCQeasy

A developer runs the command above. What is the effect of the --promote flag in this deployment?

A.It creates a new default service version with split traffic.

B.It promotes the previous version to receive traffic.

C.It causes the new version (v2) to receive 100% of traffic after deployment.

D.It enables automatic scaling for the new version.

AnswerC

Correct: --promote directs all traffic to the newly deployed version.

Why this answer

The --promote flag causes the newly deployed version to receive all traffic immediately. Without it, the version is deployed but does not receive traffic until manually migrated.

Full explanation →

106

MCQmedium

A developer deploys this Cloud Run service. During a load test, each incoming request starts a new container instance, even though concurrency is set to 80. What is the reason?

A.The memory limit is too low

B.The container is CPU-bound and cannot handle multiple requests concurrently

C.The CPU limit is too low

D.The concurrency setting of 80 is too high and Cloud Run ignores it

E.The container is not designed to handle multiple concurrent requests (single-threaded)

AnswerE

If the container processes one request at a time, Cloud Run will start a new instance per request.

Why this answer

Option E is correct because Cloud Run's concurrency setting controls how many requests the runtime can send to a container instance, but the container itself must be capable of handling those requests concurrently. If the application is single-threaded or uses a blocking I/O model (e.g., a simple Flask or Express server without async workers), it can only process one request at a time. Cloud Run detects that the container is busy and starts a new instance for each incoming request, effectively ignoring the concurrency setting.

Exam trap

Cisco often tests the misconception that Cloud Run's concurrency setting is a hard limit that the platform enforces regardless of application design, when in reality the application must be capable of handling concurrent requests for the setting to take effect.

How to eliminate wrong answers

Option A is wrong because a low memory limit would cause out-of-memory errors or container restarts, not the creation of a new container instance per request. Option B is wrong because being CPU-bound does not prevent a container from handling multiple concurrent requests; it may slow down processing, but Cloud Run still sends multiple requests to the same instance if concurrency is set. Option C is wrong because a low CPU limit would throttle CPU usage, not force a new instance per request; the container would still receive concurrent requests, just processed more slowly.

Option D is wrong because Cloud Run does not ignore a concurrency setting of 80; it respects the setting as long as the container can handle the load, but if the container is single-threaded, it effectively becomes a bottleneck.

Full explanation →

107

MCQmedium

You are a cloud architect at a financial services company. The company is deploying a new application on Google Kubernetes Engine (GKE) that processes sensitive financial transactions. The application must be highly available across two regions (us-central1 and europe-west1) and must fail over automatically if one region becomes unavailable. The application uses Cloud Spanner as its primary database. Additionally, the application needs to send audit logs to a centralized Cloud Storage bucket for compliance. The current design uses GKE clusters in each region with a global HTTP(S) load balancer. However, during a recent test, when the us-central1 cluster was deliberately taken down, the load balancer continued to send traffic to that region, causing errors. You need to troubleshoot and fix the issue. What is the most likely cause and the best solution?

A.The load balancer is configured with only one backend service. Solution: Create a separate backend service for each region.

B.The load balancer backend lacks a health check that marks the backend as unhealthy when the cluster is down. Solution: Configure a health check on the backend service that points to a readiness endpoint on the GKE cluster.

C.The Cloud Spanner instance is not configured with multi-region replication, causing write failures. Solution: Configure Cloud Spanner as a multi-region instance.

D.The GKE clusters are not configured as network endpoint groups (NEGs). Solution: Create NEGs for each cluster and use them as backends.

AnswerB

Health checks are required for the load balancer to stop routing traffic to unhealthy backends.

Why this answer

The issue is that the global HTTP(S) load balancer continues to send traffic to the us-central1 region because its backend service lacks a health check that can detect when the GKE cluster is down. By configuring a health check on the backend service that probes a readiness endpoint (e.g., /healthz) on the GKE cluster, the load balancer will automatically stop routing traffic to the unhealthy region and fail over to the healthy region. This ensures high availability across the two regions as required.

Exam trap

The trap here is that candidates often confuse infrastructure-level health checks (e.g., instance group health) with application-level health checks, or assume that GKE's built-in ingress controller automatically configures health checks for regional failover, when in fact the load balancer backend service must have an explicit health check configured to detect a complete regional cluster outage.

How to eliminate wrong answers

Option A is wrong because creating separate backend services for each region does not solve the problem; the load balancer already uses separate backends (one per region) via the GKE ingress, but without proper health checks, it cannot detect regional failure. Option C is wrong because the issue is about traffic routing from the load balancer, not database write failures; Cloud Spanner multi-region replication is important for database availability but does not affect load balancer traffic distribution. Option D is wrong because while NEGs are a best practice for GKE with load balancers, the core issue is the absence of health checks, not the use of NEGs; NEGs alone do not enable automatic failover without health checks.

Full explanation →

108

Drag & Dropmedium

Drag and drop the steps to set up a Cloud Function triggered by a Cloud Storage event in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Cloud Functions can be triggered by Cloud Storage events; deployment includes specifying the bucket trigger.

Full explanation →

109

MCQhard

An application running on Compute Engine generates structured logs. The operations team needs to parse a specific field from the logs and create a metric that counts occurrences of a particular value. They want the metric to be available for alerting with minimal delay. What should they do?

A.Export logs to BigQuery and use scheduled queries

B.Write a Cloud Function to process logs from Pub/Sub

C.Create a log-based metric in Cloud Logging

D.Use the Cloud Monitoring agent to collect logs

AnswerC

Log-based metrics are designed for this use case and provide low-latency metrics.

Why this answer

Log-based metrics in Cloud Logging are designed to extract specific fields from structured logs and count occurrences of particular values with near-real-time latency, making them ideal for alerting with minimal delay. They are natively integrated with Cloud Monitoring, so the metric is automatically available for alerting policies without additional infrastructure or data movement.

Exam trap

Cisco often tests the distinction between log-based metrics (native, low-latency) and log export to external systems (higher latency, more complex), tempting candidates to choose BigQuery or Pub/Sub because they seem more powerful for analysis, but they are not optimal for real-time alerting.

How to eliminate wrong answers

Option A is wrong because exporting logs to BigQuery and using scheduled queries introduces significant latency (minutes to hours) due to export batching and query scheduling, which is unsuitable for alerting with minimal delay. Option B is wrong because writing a Cloud Function to process logs from Pub/Sub adds unnecessary complexity, latency, and cost; Cloud Functions are event-driven but still require setting up a Pub/Sub sink and custom code, whereas log-based metrics provide a simpler, native solution with lower overhead. Option D is wrong because the Cloud Monitoring agent collects metrics from VM instances, not logs; it cannot parse structured log fields or create count-based metrics from log content.

Full explanation →

110

MCQmedium

A team uses Cloud Endpoints to manage their API. They want to monitor API latency for each API method. What is the recommended approach?

A.Parse Cloud Logging endpoint logs to calculate latency.

B.Use Cloud Trace to analyze samples and estimate latency.

C.Instrument the API code with a custom metric for each method.

D.View the built-in Cloud Endpoints latency metrics in Cloud Monitoring.

AnswerD

Endpoints exports per-method latency metrics automatically.

Why this answer

Cloud Endpoints automatically sends metrics including request latency per method to Cloud Monitoring. Cloud Trace can trace individual requests but not aggregate per method easily. Custom metrics require code changes.

Cloud Logging latency is not built-in.

Full explanation →

111

Matchingmedium

Match each Cloud Storage class to its typical use case.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Frequently accessed data

Data accessed less than once a month

Data accessed less than once a quarter

Long-term archival data accessed less than once a year

Automatic transition between classes based on access patterns

Why these pairings

Cloud Storage offers different storage classes for cost optimization.

Full explanation →

112

MCQmedium

A developer is designing a data pipeline using Pub/Sub and Dataflow. They need to guarantee at-least-once delivery with no duplicates in the sink. Which Dataflow feature should they use?

A.Exactly-once processing

B.Checkpointing

C.Idempotent writes

D.Windowing

AnswerA

Exactly-once processing ensures each record is processed once, eliminating duplicates in the sink.

Why this answer

Option A is correct because Dataflow's exactly-once processing (also known as 'exactly-once semantics' or 'EOS') ensures that each record is processed exactly once, even if the pipeline restarts or fails. This eliminates duplicates in the sink while still guaranteeing at-least-once delivery from Pub/Sub, because Dataflow uses a combination of source-side deduplication (via Pub/Sub message IDs) and sink-side idempotent writes (via the Dataflow sink's commit protocol). The result is that no duplicate records are written to the sink, meeting the requirement of no duplicates.

Exam trap

The trap here is that candidates confuse 'idempotent writes' (a sink-side property) with Dataflow's built-in 'exactly-once processing' feature, or they mistakenly think checkpointing alone eliminates duplicates, when in fact checkpointing only saves state and does not prevent duplicate writes to the sink.

How to eliminate wrong answers

Option B is wrong because checkpointing is a mechanism for saving pipeline state (e.g., snapshots of progress) to enable recovery after failures, but it does not by itself prevent duplicates in the sink — it only ensures that processing can resume from the last checkpoint, which may still cause duplicate writes if the sink is not idempotent. Option C is wrong because idempotent writes are a property of the sink (e.g., BigQuery's insertId or Cloud Storage's generation number) that allows the same write to be applied multiple times without creating duplicates, but the question asks for a Dataflow feature, not a sink feature; Dataflow's exactly-once processing uses idempotent writes as part of its implementation, but the feature itself is exactly-once processing. Option D is wrong because windowing is a Dataflow feature that groups unbounded data into finite windows (e.g., fixed, sliding, session) for aggregation or processing, but it has no direct role in guaranteeing at-least-once delivery or preventing duplicates — it is a time-based grouping mechanism, not a delivery semantics feature.

Full explanation →

113

MCQmedium

Your team manages a serverless application deployed on Cloud Run. The application processes image uploads and stores metadata in Firestore. You have set up a Cloud Monitoring alert based on the 'request_count' metric for the Cloud Run service. The alert triggers when the request count exceeds 1000 requests per minute. Recently, the alert has been firing frequently, but the team notices that the application is performing well and there are no errors. The team is concerned about alert fatigue. You review the metric and notice that the request count metric is based on all HTTP requests, including health checks from the Cloud Run system. The health check requests account for about 30% of the total requests. What should you do to reduce unnecessary alerts while still monitoring real user traffic?

A.Increase the alert threshold to 1500 requests per minute

B.Create a new log-based metric that filters out health check requests, and use that in the alert

C.Disable health checks on the Cloud Run service

D.Configure the existing metric to exclude health check logs

AnswerB

This metric will only count user requests, reducing noise.

Why this answer

Option B is correct because creating a new log-based metric that filters out health check requests allows you to monitor only real user traffic. Cloud Run's system health checks (e.g., from the Cloud Run infrastructure) are included in the default 'request_count' metric, inflating the count. By using a log-based metric with a filter that excludes these health check requests, you can set an accurate alert threshold based on actual user demand, reducing alert fatigue without losing visibility into real issues.

Exam trap

Cisco often tests the misconception that you can modify built-in metrics or that simply adjusting thresholds is sufficient, when in reality you must create a custom metric to filter out noise like health checks.

How to eliminate wrong answers

Option A is wrong because simply increasing the threshold to 1500 requests per minute does not address the root cause—health check requests are still included, and the threshold may still be exceeded by a combination of real traffic and health checks, or it may be too high to detect real traffic spikes. Option C is wrong because disabling health checks on Cloud Run is not recommended; health checks are essential for ensuring the service is healthy and for routing traffic correctly, and disabling them could cause the service to be marked unhealthy or stop receiving traffic. Option D is wrong because the existing 'request_count' metric is a built-in metric that cannot be configured to exclude specific logs; you must create a new custom log-based metric with a filter to exclude health check requests.

Full explanation →

114

MCQmedium

A company is deploying a batch job that runs once a day on Compute Engine. They are using a startup script to install dependencies and run the job. The job writes output to Cloud Storage. Recently, the job started failing intermittently with "No space left on device" errors, even though the persistent disk has 100 GB free. The team has verified that the disk is not fragmented and that the inode usage is low. The job processes large files and creates many temporary files in /tmp. They suspect the /tmp directory is filling up. What is the most likely cause?

A.The /tmp partition is using a small temporary disk that is separate from the persistent disk.

B.The instance's RAM is insufficient, causing swap to fill the disk.

C.The startup script is not cleaning up temporary files.

D.The Cloud Storage bucket quota is exceeded.

AnswerA

Often /tmp is a tmpfs with limited capacity; creating too many temporary files fills it.

Why this answer

Option A is correct because on many Compute Engine images, /tmp is mounted as a tmpfs (in-memory filesystem) which has a limited size, often a fraction of the instance's memory. When the job creates many temporary files, it can fill the tmpfs, causing "No space left on device" even though the persistent disk has ample free space. Option B is incorrect because while cleanup would help, the root cause is limited space in /tmp.

Option C is incorrect because insufficient RAM would cause swapping, not a filesystem full error. Option D is incorrect because Cloud Storage quota would produce errors when writing to the bucket, not on the local filesystem.

Full explanation →

115

MCQeasy

A startup expects low and predictable traffic initially but wants to use containers with minimal operational overhead. Which compute service should they choose?

A.App Engine Flexible Environment

B.Google Kubernetes Engine (GKE)

C.Cloud Run

D.Compute Engine

E.Cloud Functions

AnswerC

Fully managed, autoscaling, no infrastructure to manage.

Why this answer

Cloud Run is the correct choice because it runs containers in a fully managed, serverless environment that automatically scales from zero, requires no cluster management, and charges only for resources used during request processing. This matches the startup's need for minimal operational overhead and low, predictable traffic, as Cloud Run abstracts away infrastructure management entirely.

Exam trap

Cisco often tests the distinction between serverless containers (Cloud Run) and managed Kubernetes (GKE), where candidates mistakenly choose GKE for container support without considering the operational overhead of cluster management.

How to eliminate wrong answers

Option A is wrong because App Engine Flexible Environment requires managing VM instances and has a minimum of 1 instance running, incurring cost even with no traffic, and does not offer the same zero-scaling efficiency as Cloud Run. Option B is wrong because Google Kubernetes Engine (GKE) requires managing a Kubernetes cluster, including node pools, upgrades, and networking, which adds significant operational overhead unsuitable for minimal management. Option D is wrong because Compute Engine requires full VM management, including OS patching, scaling configuration, and capacity planning, contradicting the goal of minimal operational overhead.

Option E is wrong because Cloud Functions is for event-driven, short-lived code snippets, not for running containers, and has a 9-minute timeout and limited runtime support, making it unsuitable for containerized applications.

Full explanation →

116

MCQhard

A team is deploying a microservices application on Cloud Run and needs to implement canary deployments with traffic splitting. They are using Cloud Deploy. What is the correct configuration to gradually shift traffic from the old revision to the new revision?

A.Use Cloud Build to deploy with a script that gradually increases traffic using the Cloud Run API.

B.Use a Cloud Deploy pipeline with a blue-green strategy that swaps all traffic at once.

C.Use a Cloud Deploy delivery pipeline with a canary strategy that specifies percentages like [5, 10, 50, 100] and includes a verification step.

D.Use Cloud Run's built-in traffic splitting with `gcloud run deploy --traffic` and manage manually.

AnswerC

This leverages Cloud Deploy's built-in canary deployment capability with progressive traffic shifting.

Why this answer

Option C is correct because Cloud Deploy natively supports canary deployments with traffic splitting for Cloud Run. By defining a canary strategy with incremental percentages (e.g., [5, 10, 50, 100]) and including a verification step, the pipeline automatically shifts traffic in stages, pausing for verification at each phase to ensure the new revision is healthy before progressing. This approach integrates directly with Cloud Deploy's delivery pipeline, eliminating the need for manual scripts or external API calls.

Exam trap

The trap here is that candidates often confuse Cloud Run's manual traffic splitting (`gcloud run deploy --traffic`) with Cloud Deploy's automated canary pipeline, assuming manual commands are sufficient for gradual shifts, but the exam requires understanding that Cloud Deploy provides the orchestration, verification, and rollback needed for production canary deployments.

How to eliminate wrong answers

Option A is wrong because using Cloud Build with a script to gradually increase traffic via the Cloud Run API bypasses Cloud Deploy's native canary support, adding unnecessary complexity and losing pipeline observability and rollback capabilities. Option B is wrong because a blue-green strategy swaps all traffic at once, which contradicts the requirement for gradual traffic shifting; it does not support incremental percentages. Option D is wrong because using `gcloud run deploy --traffic` manually requires ongoing manual intervention and does not leverage Cloud Deploy's automated pipeline, verification steps, or rollback mechanisms.

Full explanation →

117

MCQhard

A company deploys a stateful application on GKE using a StatefulSet with PersistentVolumeClaims (PVCs). After a node failure, the pod is rescheduled to another node but the PVC remains in 'Pending' state. What is the most likely reason?

A.The PVC is bound to a PV that is still attached to the failed node.

B.The StorageClass has reclaimPolicy: Delete so the PV was deleted.

C.The PV's claimRef still points to the old PVC UID and is in Released state.

D.The StatefulSet's pod management policy prevents reattachment.

AnswerC

By default, PV has retain policy; claimRef must be removed to reuse.

Why this answer

Option D is correct because the PersistentVolume (PV) is in 'Released' state and cannot be reused without manual intervention. Option A is wrong because the PVC is not bound to a PV. Option B is wrong because pod rescheduling does not delete PV.

Option C is wrong because reclaim policy can be Retain or Delete.

Full explanation →

118

MCQmedium

A stateful service on GKE needs to persist data that must be accessible from any pod in the cluster, regardless of which node the pod runs on. Which volume type should they use?

A.PersistentVolumeClaim with RWX access mode

B.emptyDir

C.ConfigMap

D.hostPath

AnswerA

PersistentVolumeClaim with ReadWriteMany allows multiple pods to access the same volume concurrently, even across nodes.

Why this answer

A PersistentVolumeClaim (PVC) with RWX (ReadWriteMany) access mode is correct because it allows multiple pods across different nodes to read and write to the same persistent volume simultaneously. This is essential for a stateful service where data must be accessible from any pod in the cluster, regardless of which node the pod runs on. RWX is typically backed by network filesystems like NFS or GKE Filestore, which provide shared access across nodes.

Exam trap

Cisco often tests the distinction between access modes (RWO, RWM, RWX) and candidates mistakenly choose hostPath or emptyDir because they think local storage is sufficient, overlooking the requirement for cross-node accessibility.

How to eliminate wrong answers

Option B (emptyDir) is wrong because it creates a temporary directory that is tied to the lifecycle of a pod and is not shared across pods on different nodes; data is lost when the pod is deleted. Option C (ConfigMap) is wrong because it is designed for storing non-sensitive configuration data as key-value pairs, not for persistent storage of application data; it cannot be used for read/write operations by pods. Option D (hostPath) is wrong because it mounts a file or directory from the host node's filesystem into a pod, making data inaccessible from pods running on other nodes and violating the requirement for cluster-wide accessibility.

Full explanation →

119

MCQeasy

During a code review, a developer notices that the application's Cloud Storage client library is using the default credentials of the Compute Engine instance. What is a more secure alternative for a production environment?

A.Create a dedicated service account with minimal permissions and attach it to the instance

B.Store user credentials in a configuration file

C.Use an API key for Cloud Storage

D.Generate an access token and embed it in the code

AnswerA

This follows the principle of least privilege and avoids using default credentials.

Why this answer

Option A is correct because creating a dedicated service account with minimal permissions and attaching it to the Compute Engine instance follows the principle of least privilege. This avoids using the overly permissive default Compute Engine service account, which often has broad access to many Google Cloud services. By scoping the service account to only the required Cloud Storage permissions (e.g., roles/storage.objectViewer), you reduce the attack surface and adhere to production security best practices.

Exam trap

Cisco often tests the misconception that the default Compute Engine service account is acceptable for production, when in fact it is overly permissive and should be replaced with a custom service account scoped to the minimum required roles.

How to eliminate wrong answers

Option B is wrong because storing user credentials in a configuration file on the instance is insecure; credentials can be exposed via file read vulnerabilities or accidental commits, and user credentials are not designed for server-to-server service calls. Option C is wrong because API keys are a simplistic authentication mechanism that do not support fine-grained access control, are tied to the project rather than a specific identity, and are vulnerable to leakage in URLs or logs. Option D is wrong because embedding an access token directly in code is a severe security anti-pattern; tokens expire and require rotation, and hardcoding them makes them impossible to revoke or rotate without redeploying the application.

Full explanation →

120

MCQmedium

Refer to the exhibit. A Cloud Build config deploys a new image to GKE. After the build succeeds, the pods restart with the new image but the application configuration is unchanged. What is the most likely cause?

A.The ConfigMap is not updated with the new configuration values.

B.The deployment rollout strategy is set to Recreate, causing downtime.

C.The new image is not being pulled because of imagePullPolicy: IfNotPresent.

D.The GKE cluster does not have sufficient permissions to pull from Container Registry.

AnswerA

Correct; the application config is stored in a ConfigMap that is not refreshed during deployment.

Why this answer

A is correct because a Cloud Build config that deploys a new image to GKE does not automatically update the ConfigMap. The pods restart with the new image, but the application configuration remains unchanged because the ConfigMap still holds the old values. To apply new configuration, the ConfigMap must be updated separately, and the pods must be restarted or redeployed to pick up the changes.

Exam trap

Cisco often tests the misconception that deploying a new image automatically updates the application configuration, when in fact ConfigMaps and Secrets must be updated independently.

How to eliminate wrong answers

Option B is wrong because the Recreate rollout strategy would cause downtime, but it would still apply the new image and any updated configuration; the question states the application configuration is unchanged, not that there is downtime. Option C is wrong because imagePullPolicy: IfNotPresent only affects whether the image is pulled if it already exists locally; it does not prevent the new image from being pulled if the tag is different (e.g., a new digest or tag). Option D is wrong because if the GKE cluster lacked permissions to pull from Container Registry, the build would fail or the pods would fail to start with an ImagePullBackOff error, not simply restart with unchanged configuration.

Full explanation →

121

MCQeasy

A developer is building a CI/CD pipeline for a microservices application. The pipeline should build a container image, run unit tests, and deploy to Google Kubernetes Engine (GKE) only if all tests pass. Which Google Cloud service is best suited for orchestrating this pipeline?

A.Cloud Build

B.Compute Engine

C.Cloud Run

D.Cloud Functions

AnswerA

Cloud Build is the native CI/CD service for building, testing, and deploying on Google Cloud.

Why this answer

Cloud Build is the correct choice because it is a fully managed CI/CD platform that natively supports building container images, running unit tests, and deploying to GKE. It can be configured with a cloudbuild.yaml file to define steps for building, testing, and deploying, and it only proceeds to the deploy step if all prior steps (including tests) succeed. This makes it the best fit for orchestrating the entire pipeline in a single, integrated service.

Exam trap

The trap here is that candidates may confuse Cloud Run (a deployment target) with a CI/CD orchestrator, or assume Compute Engine is needed for custom CI/CD tools, but Cloud Build is the native, fully managed service for this exact pipeline workflow.

How to eliminate wrong answers

Option B (Compute Engine) is wrong because it provides raw virtual machines, not a CI/CD orchestration service; you would need to manually install and manage CI/CD tools like Jenkins or GitLab Runner, which adds overhead and lacks native integration with GKE. Option C (Cloud Run) is wrong because it is a serverless compute platform for running stateless containers, not a CI/CD pipeline orchestrator; it cannot build images or run tests as part of a pipeline. Option D (Cloud Functions) is wrong because it is an event-driven compute service for single-purpose functions, not designed for multi-step CI/CD workflows; it lacks built-in support for building container images or deploying to GKE.

Full explanation →

122

Multi-Selectmedium

A developer is deploying a Python web application to App Engine Flexible Environment. The application requires a specific third-party binary that is not pre-installed on the runtime image. Which two steps should the developer take to ensure the binary is available? (Choose two.)

Select 2 answers

A.Configure a VM-level startup script in the Google Cloud Console.

B.Specify the binary as a dependency in the requirements.txt file.

C.Include the binary in the application's Git repository and reference it in the app.yaml.

D.Use a startup script in the app.yaml to install the binary.

E.Add the binary installation commands to a Dockerfile and use a custom runtime.

AnswersD, E

Startup scripts in app.yaml can run commands to install binaries.

Why this answer

Option D is correct because App Engine Flexible Environment supports a `startup_script` field in `app.yaml` that runs shell commands during instance initialization, allowing installation of third-party binaries. Option E is correct because using a custom runtime with a Dockerfile gives full control over the base image and dependencies, enabling the developer to install any required binary via `RUN` commands.

Exam trap

The trap here is that candidates confuse App Engine Flexible Environment's `startup_script` with Compute Engine's VM-level startup scripts, or assume that `requirements.txt` can handle system dependencies, when in fact it only manages Python packages.

Full explanation →

123

MCQeasy

A developer wants to deploy a Cloud Function that connects to a Cloud SQL database. What is the simplest way to securely inject database credentials?

A.Store credentials in the Cloud Function code as environment variables.

B.Use Cloud Key Management Service to encrypt credentials and pass them via HTTP headers.

C.Use Secret Manager to store and access the database password.

D.Embed credentials in the database connection string in the source code.

AnswerC

Secret Manager provides secure storage and access control for secrets.

Why this answer

Option C is correct because Secret Manager provides a secure, centralized service for storing sensitive data like database passwords, and the Cloud Function can access the secret at runtime via the Secret Manager API or by mounting it as a volume. This avoids hardcoding credentials in code or environment variables, which can be exposed in logs or source control. It is the simplest and most secure approach recommended by Google Cloud for injecting database credentials into Cloud Functions.

Exam trap

The trap here is that candidates often confuse environment variables (Option A) as a secure method because they are not in source code, but Cisco tests the understanding that environment variables in serverless environments can still be exposed through logs or the console, whereas Secret Manager provides dedicated encryption and access control.

How to eliminate wrong answers

Option A is wrong because storing credentials as environment variables in Cloud Function code is not secure; environment variables can be exposed in logs, error messages, or through the Cloud Functions UI, and they do not provide encryption at rest or access control. Option B is wrong because using Cloud KMS to encrypt credentials and passing them via HTTP headers is unnecessarily complex and insecure; HTTP headers are visible in transit unless TLS is used (which is standard), but the decryption key management adds overhead, and this approach does not integrate natively with Cloud Functions' runtime. Option D is wrong because embedding credentials in the database connection string in the source code is a security risk; it exposes secrets in version control, build artifacts, and logs, violating the principle of least privilege and making rotation difficult.

Full explanation →

124

MCQhard

An organization deploys a critical application on GKE with multiple namespaces. They want to enforce that only certain images from approved Artifact Registry repositories can be deployed in the production namespace. Which GKE feature should they use?

A.Binary Authorization

B.Network Policies

C.Workload Identity

D.Pod Security Policies (deprecated)

AnswerA

Binary Authorization enforces policies on container images based on attestations.

Why this answer

Binary Authorization enforces deploy-time policies based on image attestations. Option A controls network traffic. Option B is deprecated and not image-based.

Option C is for service accounts.

Full explanation →

125

MCQhard

You are designing a data pipeline that ingests streaming data from IoT devices using Cloud IoT Core, processes it with Dataflow, and stores results in BigQuery. The data volume is expected to be 10 GB per day with occasional spikes. You need to minimize processing latency and cost. Which configuration should you choose for the Dataflow pipeline?

A.Use streaming mode with autoscaling and maximum workers set to 10.

B.Use Dataflow Prime for automatic optimization.

C.Use streaming mode with streaming engine enabled and 2 workers.

D.Use batch mode with a fixed number of workers to reduce cost.

AnswerC

Streaming engine reduces latency and cost for moderate throughput.

Why this answer

Option C is correct because streaming mode with Streaming Engine is designed for low-latency, continuous data ingestion from IoT Core, and setting 2 workers minimizes cost while handling the expected 10 GB/day volume with occasional spikes through autoscaling. Streaming Engine offloads state management to the backend, reducing worker overhead and improving latency, making it ideal for this use case.

Exam trap

Google Cloud often tests the misconception that batch mode is cheaper for streaming data, but the trap here is that batch mode incurs higher latency and requires manual triggering, making it unsuitable for real-time IoT pipelines despite lower compute cost per GB.

How to eliminate wrong answers

Option A is wrong because setting maximum workers to 10 may over-provision resources for a 10 GB/day workload, increasing cost without latency benefit, and autoscaling alone doesn't guarantee the low-latency optimization that Streaming Engine provides. Option B is wrong because Dataflow Prime is a premium feature that adds cost for automatic optimization, which is unnecessary for this predictable, moderate-volume streaming workload and does not inherently minimize latency or cost compared to Streaming Engine. Option D is wrong because batch mode is designed for finite, bounded data and introduces higher latency (minutes to hours) due to windowing and triggering, which is unsuitable for real-time IoT streaming data that requires low processing latency.

Full explanation →

126

MCQmedium

A team is setting up a CI/CD pipeline for a Node.js App Engine application using Cloud Build. The source code is in Cloud Source Repositories. What must be configured to automatically run unit tests before deployment?

A.Enable Cloud Build triggers on the repository

B.Use the App Engine deployment wizard

C.Add a cloudbuild.yaml file with a test step

D.Use a Dockerfile to run tests

AnswerC

The build config defines the steps, including running tests; a trigger can then invoke it on push.

Why this answer

Option C is correct because Cloud Build uses a cloudbuild.yaml file to define build steps, and adding a test step ensures unit tests run automatically before deployment. Without this configuration, Cloud Build will not execute tests; it only runs the steps explicitly defined in the build configuration file.

Exam trap

Cisco often tests the misconception that enabling a trigger alone is sufficient to run tests, when in fact the trigger only initiates the build; the actual test execution must be explicitly defined in the build configuration file.

How to eliminate wrong answers

Option A is wrong because enabling Cloud Build triggers on the repository only starts the build process on code changes, but does not define what steps (like tests) to run; triggers alone do not execute tests. Option B is wrong because the App Engine deployment wizard is a manual GUI tool in the Google Cloud Console, not an automated CI/CD pipeline component, and it does not integrate with Cloud Build to run tests. Option D is wrong because a Dockerfile is used to build a container image, not to define CI/CD pipeline steps; Cloud Build ignores Dockerfiles for pipeline logic and requires a cloudbuild.yaml for test execution.

Full explanation →

127

Multi-Selectmedium

A company is deploying a containerized application on Cloud Run that requires access to a Cloud SQL PostgreSQL instance. The application needs to connect to the database using private IP to minimize latency and avoid public internet exposure. The Cloud Run service and Cloud SQL instance are in the same region and project. The database user and password are stored in Secret Manager. Which two steps should the developer take to enable the connection? (Choose TWO.)

Select 2 answers

A.Grant the Cloud Run service account the Cloud SQL Client role.

B.Set the CLOUD_SQL_CONNECTION_NAME environment variable in the Cloud Run service.

C.Enable the Cloud SQL Admin API.

D.Configure the Cloud Run service to use a VPC connector and set up a private services access connection for Cloud SQL.

E.Deploy the Cloud SQL Auth proxy as a sidecar container in Cloud Run.

AnswersA, D

The Cloud SQL Client role allows the service account to connect to Cloud SQL instances.

Why this answer

Option C is correct because Cloud Run requires a VPC connector to access resources on a VPC network, and Private Services Access must be configured to allow Cloud Run to reach the Cloud SQL private IP. Option D is correct because the Cloud Run service account needs the Cloud SQL Client role to authenticate with Cloud SQL. Option A is incorrect because enabling the Cloud SQL Admin API is a prerequisite but not a direct step for the connection itself; it is often already enabled.

Option B is incorrect because the Cloud SQL Auth proxy is not needed when using private IP. Option E is incorrect because the CLOUD_SQL_CONNECTION_NAME environment variable is used with the Cloud SQL Auth proxy, not with private IP.

Full explanation →

128

MCQeasy

A development team is using Cloud Monitoring to set up an alerting policy for a Compute Engine instance. They want to be notified when the instance's CPU utilization exceeds 80% for at least 5 minutes. Which alerting policy configuration should they use?

A.Condition type: Metric Threshold, Trigger: For 5 minutes, Threshold: 80%

B.Condition type: Metric Threshold, Trigger: For most recent value, Threshold: 80%

C.Condition type: Change Rate, Trigger: For 5 minutes, Threshold: 80%

D.Condition type: Metric Absence, Duration: 5 minutes

AnswerA

Triggers when condition holds for 5 minutes.

Why this answer

Option A is correct because Cloud Monitoring alerting policies use a Metric Threshold condition type to evaluate a metric against a static threshold. Setting the trigger to 'For 5 minutes' ensures the condition is met only when the CPU utilization exceeds 80% consistently over the specified duration, preventing false alarms from transient spikes.

Exam trap

Cisco often tests the distinction between 'For most recent value' and 'For X minutes' triggers, where candidates mistakenly choose the single-point trigger thinking it's simpler, missing the requirement for sustained threshold crossing.

How to eliminate wrong answers

Option B is wrong because 'For most recent value' triggers an alert based on a single data point, which would fire on any momentary spike above 80% rather than requiring sustained high utilization for 5 minutes. Option C is wrong because 'Change Rate' condition type measures the rate of change of a metric over time, not a static threshold; it is used for detecting anomalies in trends, not for fixed CPU utilization limits. Option D is wrong because 'Metric Absence' condition type triggers when data is missing for a specified duration, not when a metric exceeds a threshold; it is designed for detecting data gaps, not high CPU usage.

Full explanation →

129

Multi-Selectmedium

A company is using Cloud Run for a stateless API. They want to ensure that the service can handle sudden traffic spikes. Which two features should they configure?

Select 2 answers

A.Enable container concurrency.

B.Use Cloud Load Balancing.

C.Enable CPU always on allocation.

D.Set max instances to a high value.

E.Set min instances to zero to save cost.

AnswersA, D

Container concurrency allows multiple requests per container, increasing throughput.

Why this answer

Option A is correct because enabling container concurrency allows a single Cloud Run container instance to handle multiple requests simultaneously, up to the configured concurrency limit (default 80, max 1000). This improves throughput and resource utilization during traffic spikes without requiring additional instances. Option D is correct because setting max instances to a high value ensures the service can scale out to handle sudden load by creating more container instances, up to the configured maximum, preventing cold starts and request queuing.

Exam trap

Cisco often tests the misconception that Cloud Load Balancing is required for scaling Cloud Run, but Cloud Run's built-in autoscaling and managed HTTPS load balancer already handle traffic spikes; the trap is that candidates confuse external load balancing with internal scaling mechanisms.

Full explanation →

130

MCQeasy

A startup is building a REST API on Cloud Run. They expect unpredictable traffic spikes and want to ensure the service can scale from 0 to many instances automatically. What scaling configuration should they use?

A.Set max instances to 1 to control costs.

B.Set min instances to 0 and max instances to 1000.

C.Use manual scaling with a fixed number of instances.

D.Set min instances to 5 and max to 100.

AnswerB

This configuration allows the service to scale from zero to a high number as needed, handling spikes while minimizing cost during idle periods.

Why this answer

Option B is correct because Cloud Run's autoscaling allows min instances to be set to 0, enabling the service to scale down to zero when idle (cost-efficient), and max instances to 1000 to handle unpredictable traffic spikes by scaling out horizontally. This configuration ensures the service can start from zero and automatically add instances up to the maximum limit as demand increases, which is ideal for unpredictable workloads.

Exam trap

The trap here is that candidates often confuse 'min instances' with 'max instances' or assume that setting min instances to 0 will cause the service to be unavailable during cold starts, but Cloud Run handles cold starts transparently, and the question specifically asks for scaling from 0 to many instances, which requires min=0 and a high max limit.

How to eliminate wrong answers

Option A is wrong because setting max instances to 1 prevents the service from scaling out beyond a single instance, which cannot handle traffic spikes and defeats the purpose of autoscaling. Option C is wrong because manual scaling with a fixed number of instances does not allow dynamic scaling from 0 or to many instances; it requires manual intervention to adjust capacity, which is unsuitable for unpredictable spikes. Option D is wrong because setting min instances to 5 forces at least 5 instances to run continuously, incurring cost even when there is no traffic, and does not allow scaling down to zero, which contradicts the requirement to scale from 0.

Full explanation →

131

Multi-Selectmedium

A developer is building a serverless application that processes user-uploaded images. The images are stored in Cloud Storage, and each upload should trigger a Cloud Function that performs image analysis and stores the result in Firestore. Which TWO Google Cloud services are essential for this integration? (Choose 2)

Select 2 answers

A.Cloud Pub/Sub

B.Cloud Storage

C.Cloud Tasks

D.Eventarc

E.Cloud Scheduler

AnswersA, B

Cloud Pub/Sub receives storage notifications and triggers the Cloud Function.

Why this answer

Cloud Storage is the source of events (uploaded images). Cloud Pub/Sub is used to deliver notifications from Cloud Storage to the Cloud Function. Cloud Tasks, Cloud Scheduler, and Eventarc are not required for this pattern.

Full explanation →

132

Multi-Selectmedium

Which THREE of the following are best practices for building secure applications on Google Cloud?

Select 3 answers

A.Use Secret Manager to manage sensitive configuration values.

B.Disable authentication on a test Cloud Run service for end-user testing.

C.Use a single service account for all Cloud Functions to simplify permissions.

D.Enable VPC Service Controls to prevent data exfiltration.

E.Store source code in Cloud Source Repositories with IAM restrictions.

AnswersA, D, E

Secret Manager securely stores and accesses secrets.

Why this answer

Option A is correct because Secret Manager provides a centralized and secure way to store and manage sensitive configuration values such as API keys, database passwords, and certificates. By using Secret Manager, you avoid hardcoding secrets in source code or configuration files, reducing the risk of exposure. It integrates with IAM for fine-grained access control and supports automatic rotation, ensuring that secrets are protected at rest and in transit.

Exam trap

Cisco often tests the principle of least privilege and the misconception that simplifying permissions by using a single service account is acceptable, when in fact it creates a single point of failure and broad attack surface.

Full explanation →

133

Multi-Selecthard

A company is deploying a microservices architecture on Google Cloud using Cloud Run. They need to ensure that services can communicate securely with each other and with other Google Cloud services, such as Cloud Storage and Secret Manager. Which three steps should they take? (Choose three.)

Select 3 answers

A.Enable Cloud Service Mesh for sidecar proxy injection.

B.Configure Cloud Run services to use internal load balancing.

C.Use Cloud Run's direct VPC egress to access resources in a VPC network.

D.Use service accounts with least privilege permissions for each service.

E.Enable VPC Connector for each Cloud Run service.

AnswersC, D, E

Direct VPC egress allows Cloud Run services to send traffic to VPC networks.

Why this answer

Options A, B, and E are correct. A VPC Connector (option A) enables Cloud Run services to communicate with resources in a VPC network, including internal communication between services if they are within the same VPC. Service accounts with least privilege (option B) ensure secure access to Google Cloud services.

Direct VPC egress (option E) allows Cloud Run services to send traffic to a VPC network without a VPC Connector. Option C is incorrect because internal load balancing is not a standard feature for Cloud Run; it is used with GKE. Option D is incorrect because Cloud Service Mesh is primarily for GKE, not Cloud Run (standard Cloud Run does not support sidecar injection).

Full explanation →

134

MCQeasy

During a rolling update, the new pods are failing to start because they require more memory than available on nodes. What is the most likely cause?

A.The maxSurge value is too low.

B.The resource requests and limits are misconfigured.

C.The replicas count is too high.

D.The strategy type is wrong.

AnswerB

The requests are too high for the available node memory, causing the new pods to fail to schedule.

Why this answer

The resource requests specify 1Gi memory and 500m CPU. If nodes do not have enough memory to satisfy the request (plus existing pods), new pods will fail to schedule. The requests may be too high for the cluster's node resources.

Full explanation →

135

MCQhard

You are designing a monitoring strategy for a microservices architecture running on GKE. Each service emits custom business metrics (e.g., order processing time). You want to create a dashboard that shows the 99th percentile latency for each service over the last 7 days. Which approach should you take?

A.Export logs to Cloud Logging and use Log Analytics to compute percentiles.

B.Write custom metrics to Cloud Monitoring and create a dashboard with the 99th percentile aligner.

C.Use Metrics Explorer to view the metrics and manually compute percentiles.

D.Use Prometheus monitoring built into GKE and query the avg() function.

AnswerB

Cloud Monitoring custom metrics support percentile aligners like 99th.

Why this answer

Option B is correct because Cloud Monitoring supports custom metrics and provides built-in aligners, including a 99th percentile aligner, which can be applied directly in a dashboard chart. This allows you to compute the 99th percentile latency for each service over the last 7 days without manual calculation or exporting logs. Custom metrics are the appropriate mechanism for business metrics like order processing time, as they are designed for numeric time-series data.

Exam trap

Cisco often tests the distinction between logs and metrics, and the trap here is that candidates may think exporting logs to Cloud Logging is a valid way to compute percentiles, overlooking that Cloud Monitoring is the correct service for numeric time-series data and provides native percentile computation.

How to eliminate wrong answers

Option A is wrong because Cloud Logging is designed for log data, not numeric time-series metrics; computing percentiles from logs requires parsing and aggregation, which is inefficient and not the intended use case. Option C is wrong because Metrics Explorer allows you to view and chart metrics, but it does not provide a built-in function to compute percentiles; you would have to export the data and calculate manually, which is not a scalable or recommended approach. Option D is wrong because Prometheus's avg() function computes the average, not the 99th percentile, and while Prometheus can be used with GKE, the question specifies using Cloud Monitoring's native capabilities for a dashboard.

Full explanation →

136

Multi-Selecthard

A DevOps team wants to set up custom metrics for a serverless application running on Cloud Run. The application emits metrics using OpenTelemetry. They need to collect these metrics and create an alerting policy that triggers when the 99th percentile latency exceeds 500ms for 5 minutes. Which TWO actions must they take? (Choose two.)

Select 2 answers

A.Create a custom distribution metric for the latency data and set up a metric threshold alert using the 99th percentile value.

B.Deploy the OpenTelemetry Collector as a sidecar or external service and configure it to export metrics to Cloud Monitoring using the Cloud Monitoring exporter.

C.Install the Cloud Monitoring agent on the Cloud Run instance to collect custom metrics.

D.Define a log-based metric from the application logs that captures latency entries.

E.Configure the Cloud Monitoring dashboard to query the metrics using PromQL.

AnswersA, B

Distribution metrics support percentile calculations in alert policies.

Why this answer

Option A is correct because to alert on the 99th percentile of latency, you must create a custom distribution metric, which stores a histogram of values and allows percentile calculations. A metric-threshold alert policy can then be configured to evaluate the 99th percentile value against the 500ms threshold over a 5-minute window.

Exam trap

Cisco often tests the misconception that log-based metrics can replace custom distribution metrics for percentile alerts, but logs lack the histogram structure required for precise percentile calculations.

Full explanation →

137

Multi-Selectmedium

A developer is deploying a new version of a microservice to Cloud Run. The developer wants to ensure that the new revision is tested with a small percentage of traffic before rolling out to all users. Which TWO approaches can the developer use?

Select 2 answers

A.Use the 'gcloud run deploy' command with '--no-traffic' and then use 'gcloud run services update-traffic --to-revisions=REVISION=5' to send 5% of traffic.

B.Use the 'gcloud run deploy' command with '--no-traffic' to deploy without serving traffic, then use 'gcloud run services update-traffic' to gradually increase traffic.

C.Set the 'max-instances' parameter to limit the number of instances handling requests.

D.Use the 'gcloud run deploy' command with '--tag' to assign a tag to the new revision, then direct test traffic to that tag.

E.Deploy the new revision with the same revision name as the old one to overwrite it, then roll back if issues occur.

AnswersA, B

This directly sets a specific percentage of traffic to the new revision.

Why this answer

Option A is correct because the '--no-traffic' flag deploys the new revision without serving any traffic, and then 'gcloud run services update-traffic --to-revisions=REVISION=5' allows you to send exactly 5% of traffic to that revision for canary testing. Option B is also correct because it describes the same two-step process: deploy with '--no-traffic' to avoid immediate traffic, then use 'update-traffic' to gradually increase the percentage, which is the standard canary deployment pattern on Cloud Run.

Exam trap

Google Cloud often tests the distinction between traffic splitting (percentage-based routing) and direct access via tags; candidates mistakenly think tagging alone can serve a percentage of production traffic, but tags only provide a separate URL for testing without affecting the main service's traffic distribution.

Full explanation →

138

MCQhard

A developer deployed the Kubernetes Deployment shown. The application takes about 45 seconds to fully initialize and respond on the /healthz endpoint. What problem will occur with this configuration?

A.The readiness probe will never succeed, and the pod will be removed from service.

B.The deployment will not create any pods because of a syntax error.

C.The liveness probe will start too early and cause the pod to be restarted before it becomes ready.

D.The pod will be marked ready immediately because the readiness probe uses the same endpoint as liveness.

AnswerC

Correct: Liveness probe at 30s will fail, and after three failures the pod restarts, preventing it from ever becoming ready.

Why this answer

The readiness probe starts after 5 seconds and checks every 10 seconds; if the app is not ready until 45 seconds, the first readiness check at 5s will fail, then at 15s, 25s, 35s, 45s (assuming first check at 5s, then 15,25,35,45). At 45s the check succeeds, but before that the pod is not ready and not receiving traffic. However, the liveness probe starts at 30s; at 30s the first liveness probe will fail (because app not ready yet), and after 3 consecutive failures (at 30,60,90) the kubelet will restart the pod, causing a crash loop.

The correct answer is that the liveness probe will cause the pod to restart before it becomes ready.

Full explanation →

139

Multi-Selectmedium

A company uses Cloud Spanner for a global application. They want to improve read performance for point-reads (individual row lookups). Which TWO strategies should they adopt?

Select 2 answers

A.Use read replicas

B.Create secondary indexes

C.Partition the table by time

D.Use batch reads

E.Use interleaved tables

AnswersB, E

Secondary indexes enable efficient point reads on columns other than the primary key.

Why this answer

Secondary indexes in Cloud Spanner allow point-reads to be served directly from the index table, avoiding a full table scan and reducing latency. Interleaved tables store child rows physically adjacent to their parent row, enabling efficient single-row lookups without cross-node coordination.

Exam trap

Cisco often tests the misconception that read replicas or batch operations improve point-read latency, when in fact they address throughput or bulk retrieval, not the speed of individual row lookups.

Full explanation →

140

MCQmedium

After updating the image to v2, users report that the frontend application returns errors because it cannot reach the backend service. The backend service is running on GKE with the name 'backend-service' in the same namespace. What is the most likely cause?

A.The selector labels do not match the pods.

B.The application expects the backend URL from an environment variable named BACKEND_SERVICE_URL, but the deployment sets BACKEND_URL.

C.The backend service is not listening on port 8080.

D.The termination grace period is too short causing connection drops.

AnswerB

Variable name mismatch is a common cause of application misconfiguration.

Why this answer

The environment variable is set to BACKEND_URL, but the application likely expects a different variable name like BACKEND_SERVICE_URL. This mismatch causes the application to fail to find the backend URL.

Full explanation →

141

MCQhard

A company wants to create an SLO for their API with a target of 99.9% availability over a 30-day rolling window. They are using Cloud Monitoring. Which combination of resources and techniques should they use?

A.Manually compute availability using external monitoring tools.

B.Use the Cloud Monitoring SLO service with a request latency SLI.

C.Create an uptime check and a log-based metric for errors. Use the SLI formula: (successful requests / total requests).

D.Use Cloud Trace to measure latency and create a custom metric.

AnswerC

This leverages native Cloud Monitoring SLO capabilities, defining availability as the fraction of successful probes or requests, and automatically tracks the SLO over a rolling window.

Why this answer

Option C is correct because it combines an uptime check (to measure total requests) with a log-based metric for errors (to count failed requests), allowing the SLI formula (successful requests / total requests) to compute availability. This approach directly aligns with the 99.9% availability target over a 30-day rolling window, using Cloud Monitoring's native capabilities without external tools or irrelevant latency metrics.

Exam trap

Cisco often tests the distinction between availability and latency SLIs, so the trap here is assuming that any monitoring metric (like latency) can be used for an availability SLO, when in fact availability requires a success/failure ratio, not a performance threshold.

How to eliminate wrong answers

Option A is wrong because manually computing availability using external monitoring tools bypasses Cloud Monitoring's built-in SLO service, which is designed to automate SLI calculation and alerting, and introduces unnecessary manual effort and potential inconsistency. Option B is wrong because a request latency SLI measures response time, not availability; availability is about whether requests succeed or fail, not how fast they respond, so this SLI does not match the 99.9% availability target. Option D is wrong because Cloud Trace is a distributed tracing tool for analyzing latency and request flows, not for counting successful vs. total requests; using it to create a custom metric for availability would be inefficient and misaligned with the purpose of the service.

Full explanation →

142

MCQmedium

A company runs a global e-commerce platform on GKE. They need to serve users with low latency from multiple regions. Which load balancing solution should they use?

A.Regional external HTTP(S) Load Balancer

B.Global external HTTP(S) Load Balancer

C.Internal TCP/UDP Load Balancer

D.SSL Proxy Load Balancer

AnswerB

Global load balancer routes users to the nearest region, minimizing latency.

Why this answer

A global external HTTP(S) Load Balancer is the correct choice because it provides a single anycast IP address that routes traffic from users worldwide to the nearest GKE backend, minimizing latency. It supports cross-regional failover and integrates with Cloud CDN for caching static content, making it ideal for a global e-commerce platform. Regional load balancers cannot serve traffic across multiple regions with a single IP, which is required for global low-latency access.

Exam trap

Cisco often tests the misconception that a Regional external HTTP(S) Load Balancer can be used for global traffic by simply deploying it in one region, but the trap is that it lacks anycast IP and cannot route users to the nearest region, causing higher latency for distant users.

How to eliminate wrong answers

Option A is wrong because a Regional external HTTP(S) Load Balancer only distributes traffic within a single GCP region, so it cannot serve users globally with low latency from multiple regions. Option C is wrong because an Internal TCP/UDP Load Balancer is designed for private VPC traffic within a region and does not expose a public endpoint for external users. Option D is wrong because an SSL Proxy Load Balancer terminates SSL/TLS connections but does not provide global anycast IP or HTTP(S) content-based routing; it is limited to TCP traffic and lacks the global scope needed for multi-region user distribution.

Full explanation →

143

MCQeasy

A company is designing a global e-commerce platform on Google Cloud. The application requires low-latency access for users worldwide and must be highly available. Which load balancing solution should they use?

A.External TCP/UDP Network Load Balancer

B.External HTTP(S) Load Balancer

C.Cloud CDN

D.Internal TCP/UDP Load Balancer

AnswerB

External HTTP(S) Load Balancer is a global load balancer that provides low latency and high availability for web applications.

Why this answer

The External HTTP(S) Load Balancer is the correct choice because it is a global, proxy-based Layer 7 load balancer that terminates HTTP/HTTPS traffic at Google's edge points of presence (PoPs) and routes requests to the nearest healthy backend. This provides low-latency access for users worldwide by leveraging Google's global network and anycast IPs, while also offering built-in high availability, SSL offloading, and content-based routing.

Exam trap

Cisco often tests the misconception that a Layer 4 load balancer (like External TCP/UDP Network Load Balancer) is sufficient for global low-latency access, but candidates must remember that only Layer 7 global load balancers provide anycast IPs and cross-region routing for worldwide users.

How to eliminate wrong answers

Option A is wrong because the External TCP/UDP Network Load Balancer is a regional, Layer 4 load balancer that does not provide global anycast IP or cross-region failover, so it cannot deliver low-latency access for users worldwide. Option C is wrong because Cloud CDN is a content delivery network that caches static content at edge locations, not a load balancer; it can be used in conjunction with a load balancer but does not itself handle traffic distribution or high availability for dynamic requests. Option D is wrong because the Internal TCP/UDP Load Balancer is a regional, private load balancer designed for internal traffic within a VPC, not for global external user access.

Full explanation →

144

MCQeasy

A company runs a batch job that processes large files from Cloud Storage every night. The job must complete within a 2-hour window. If the job fails, it should retry automatically. Which Google Cloud service should they use to orchestrate this job?

A.Compute Engine with startup script

B.Cloud Run

C.App Engine Cron

D.Cloud Composer

AnswerD

Cloud Composer is a managed workflow orchestration service that supports scheduling, retries, and complex dependencies, ideal for batch jobs.

Why this answer

Cloud Composer (D) is the correct choice because it is a fully managed workflow orchestration service built on Apache Airflow, designed to schedule, monitor, and retry batch jobs with complex dependencies. It can trigger a Cloud Storage file processing job, enforce a 2-hour execution window, and automatically retry on failure using Airflow's built-in retry mechanisms and SLA monitoring.

Exam trap

Cisco often tests the distinction between simple scheduling (App Engine Cron) and full orchestration with retry and dependency management (Cloud Composer), leading candidates to pick App Engine Cron because they overlook the requirement for automatic retry and time-window enforcement.

How to eliminate wrong answers

Option A is wrong because Compute Engine with a startup script is a manual, single-instance solution that lacks built-in scheduling, retry logic, and orchestration capabilities; it would require custom scripting and external cron to handle failures and time windows. Option B is wrong because Cloud Run is a serverless container platform for request-driven or event-driven workloads, not designed for long-running batch orchestration with retry policies and time-window enforcement; it lacks native workflow sequencing and retry orchestration. Option C is wrong because App Engine Cron is a simple scheduling service that triggers HTTP endpoints at fixed intervals, but it does not provide retry logic, dependency management, or execution time-window enforcement; it cannot automatically retry a failed job or ensure completion within a 2-hour window.

Full explanation →

145

Multi-Selectmedium

A developer is building an event-driven system using Cloud Pub/Sub. They need to ensure reliable message delivery and processing. Which three practices should they follow?

Select 3 answers

A.Set a minimum number of delivery attempts.

B.Use pull subscriptions with synchronous acknowledgment.

C.Use message ordering.

D.Configure a dead-letter topic.

E.Use exponential backoff for pull subscriptions.

AnswersB, D, E

Sync ack allows you to acknowledge after processing, preventing loss.

Why this answer

Pull subscriptions with synchronous acknowledgment (option B) ensure that a message is not acknowledged until the subscriber has successfully processed it. This prevents premature acknowledgment and message loss, because Cloud Pub/Sub will redeliver the message if the acknowledgment deadline expires without a synchronous ack. This is a core pattern for reliable processing.

Exam trap

The trap here is confusing reliability features with ordering or delivery attempt counts; Cisco often tests that reliable processing relies on synchronous acknowledgment and dead-letter topics, not on setting a minimum delivery attempts or enabling ordering.

Full explanation →

146

MCQhard

Your company runs a multi-tier application on Compute Engine with a Cloud SQL backend. Recently, during peak hours, users report slow page loads. Cloud Monitoring shows high CPU on the app servers, but no memory pressure. Cloud Trace shows that the application spends most of its time waiting for database queries. The Cloud SQL instance is a high-memory machine type with 16 vCPUs and 64 GB RAM, but CPU utilization on the database is only 30%. There are no slow query alerts. What is the most likely cause and what should you do?

A.The database lacks indexes. Use Cloud SQL Query Insights to identify missing indexes.

B.The application is performing unnecessary queries. Add caching with Memorystore.

C.The database connection pool is exhausted. Increase the maximum number of connections.

D.The Cloud SQL instance is under-provisioned. Upgrade to a larger machine type.

AnswerA

Missing indexes force full table scans, causing slow queries. Query Insights can reveal the specific slow queries and suggest indexes.

Why this answer

The symptoms—high app server CPU, low database CPU, and queries consuming most of the application’s wait time—point to inefficient queries due to missing indexes. Cloud SQL Query Insights can identify these missing indexes by analyzing query execution plans and wait events. Adding appropriate indexes reduces query execution time, lowering app server CPU usage and resolving the slow page loads.

Exam trap

Cisco often tests the misconception that high app server CPU always means the app server is the bottleneck, when in fact the CPU is consumed waiting for slow database queries caused by missing indexes.

How to eliminate wrong answers

Option B is wrong because the application is already waiting on database queries, not performing unnecessary queries; caching would mask the underlying indexing issue but not fix the root cause. Option C is wrong because connection pool exhaustion would cause connection timeouts or errors, not high app server CPU and low database CPU; Cloud SQL’s 30% CPU utilization indicates connections are not saturated. Option D is wrong because the database CPU is only 30% utilized, so the instance is not under-provisioned; upgrading would not address the query performance bottleneck.

Full explanation →

147

MCQeasy

A developer wants to receive notifications when the error rate of their application exceeds 1% over a 5-minute window. What should they create in Cloud Monitoring?

A.Alerting policy with metric threshold condition

B.Log-based metric

C.Dashboard with error rate chart

D.Uptime check

AnswerA

Alerting policies evaluate metrics and send notifications.

Why this answer

An alerting policy with a metric threshold condition is the correct approach because Cloud Monitoring evaluates a metric (e.g., error rate) against a threshold (1%) over a specified window (5 minutes) and triggers a notification when the condition is met. This directly fulfills the requirement to be notified when the error rate exceeds the threshold, as alerting policies are designed for proactive notification based on metric data.

Exam trap

Cisco often tests the distinction between alerting policies (which trigger notifications) and other monitoring components like dashboards or log-based metrics, so candidates mistakenly choose a log-based metric or dashboard because they confuse data collection with alerting.

How to eliminate wrong answers

Option B is wrong because a log-based metric is used to extract quantitative data from logs (e.g., count of error log entries) but does not itself trigger notifications; it must be used within an alerting policy to generate alerts. Option C is wrong because a dashboard with an error rate chart provides a visual representation of the metric but does not generate notifications or alerts; it is a passive monitoring tool. Option D is wrong because an uptime check monitors the availability and responsiveness of a resource (e.g., HTTP response codes) and is not designed to track application error rates or trigger alerts based on a percentage threshold over a time window.

Full explanation →

148

MCQhard

A company uses Cloud Storage for backups. They need to comply with a regulation requiring immutable storage for 7 years. Which bucket configuration should they use?

A.Use a bucket with a retention policy (not locked)

B.Set a lifecycle rule to archive to Coldline

C.Enable Object Versioning

D.Set a retention policy and lock the bucket

AnswerD

Locking the retention policy makes it permanent, ensuring objects cannot be deleted or overwritten for the specified duration.

Why this answer

Option D is correct because locking a retention policy in Cloud Storage enforces immutable storage for the specified duration (7 years). Once locked, the retention policy cannot be removed or shortened, ensuring compliance with regulations that require data to be preserved in its original state and not modifiable or deletable until the retention period expires.

Exam trap

The trap here is that candidates confuse a simple retention policy (which can be removed) with a locked retention policy (which is immutable), or they assume Object Versioning alone provides sufficient protection against deletion.

How to eliminate wrong answers

Option A is wrong because a retention policy that is not locked can be removed or shortened, which does not provide the immutable guarantee required by regulation. Option B is wrong because a lifecycle rule to archive to Coldline only moves data to a lower-cost storage class; it does not prevent deletion or modification of objects. Option C is wrong because Object Versioning alone does not prevent deletion of object versions; it only preserves previous versions when objects are overwritten or deleted, but versions can still be deleted manually or by lifecycle rules.

Full explanation →

149

MCQeasy

A company runs a stateless application on Compute Engine behind a load balancer. They want to monitor the number of active requests per instance without adding custom instrumentation. What is the most straightforward approach?

A.Configure the Cloud Monitoring agent to collect request metrics.

B.Install the Cloud Logging agent and parse access logs.

C.Deploy Prometheus and instrument the application.

D.Use the load balancer's built-in 'request_count' metric.

AnswerD

This metric is available without additional agents.

Why this answer

Option D is correct because the load balancer's built-in 'request_count' metric directly provides the number of active requests per instance without requiring any additional instrumentation or agents. This metric is automatically collected by Cloud Monitoring for Google Cloud HTTP(S) load balancers, making it the most straightforward approach for a stateless application on Compute Engine.

Exam trap

Cisco often tests the distinction between agent-based monitoring (Cloud Monitoring agent) and built-in managed service metrics (load balancer metrics), where candidates mistakenly assume an agent is required for any application-level metric, ignoring that Google Cloud's managed services automatically expose relevant metrics.

How to eliminate wrong answers

Option A is wrong because the Cloud Monitoring agent collects system-level metrics (CPU, memory, disk) from VM instances, not application-level request counts; it cannot capture active request counts without custom instrumentation. Option B is wrong because installing the Cloud Logging agent and parsing access logs would require additional log-based metric configuration and processing, which is less straightforward than using the built-in load balancer metric. Option C is wrong because deploying Prometheus and instrumenting the application introduces significant complexity and custom code, which contradicts the requirement of 'without adding custom instrumentation'.

Full explanation →

150

MCQmedium

A company needs to build a CI/CD pipeline for a microservices architecture. They want to run unit tests quickly by only testing code that has changed. Which approach should they use?

A.Use Cloud Build with a step that caches test results based on file hashes.

B.Use Cloud Build with a step that runs all tests in parallel.

C.Use Cloud Build with a step that uses `git log` to find changed files and run tests.

D.Use Cloud Build with a step that checks `git diff` against the previous commit and runs tests only on affected modules using a test runner that supports file-based filtering.

AnswerD

This approach directly targets changed files, minimizing test execution time.

Why this answer

Option D is correct because using a custom builder to check the diff and run only relevant tests is a best practice for fast CI. Option A is suboptimal because running all tests in parallel is not selective. Option B is not a built-in feature and can be unreliable.

Option C is not as efficient as a custom diff-based approach.

Full explanation →

Page 2 of 7

All pages

Practice PCD by domain

Target a specific domain to shore up weak areas.

Designing highly scalable, available, and reliable cloud-native applications Building and testing applications Deploying applications Integrating Google Cloud services Managing application performance monitoring

See all domains with question counts →