Knowledge + Practice

Google Professional Cloud Developer (PCD) — Questions 1–75

500 questions total · 7pages · All types, answers revealed

Take a mock exam Exam hub

Page 1 of 7

1

MCQhard

Based on the Cloud Trace exhibit, which service is the primary contributor to the overall request latency?

A.The productcatalog service

B.The frontend service itself

C.The auth service

D.The recommendations service

AnswerC

Auth service has the longest child span duration (800ms).

Why this answer

The Cloud Trace exhibit shows that the auth service accounts for the largest segment of the overall request latency, as indicated by the longest span duration in the trace waterfall. In Google Cloud Trace, each span represents a service's contribution to the total latency, and the service with the highest cumulative span time is the primary contributor. Since the auth service span is the longest, it is the correct answer.

Exam trap

Google Cloud often tests the misconception that the frontend service (the entry point) is the primary latency contributor, but the trace waterfall clearly shows that downstream service spans, not the root span, account for the majority of the latency.

How to eliminate wrong answers

Option A is wrong because the productcatalog service span shows a shorter duration than the auth service span, indicating it contributes less to the overall latency. Option B is wrong because the frontend service itself is the entry point and its own span duration is minimal compared to the downstream auth service call; the frontend's latency is dominated by waiting for the auth service response. Option D is wrong because the recommendations service span is either absent or has a negligible duration in the trace, meaning it is not a significant contributor to the total request latency.

Full explanation →

2

MCQhard

An organization uses Cloud Build to deploy multiple microservices to GKE. They want to ensure that the deployment process can be audited and that each deployment can be rolled back to a previous version. What is the recommended approach?

A.Use Kubernetes Deployment history to rollback by specifying a revision.

B.Use Cloud Deploy to manage deployments with rollback capabilities and audit logs.

C.Store each manifest version in Artifact Registry and manually apply kubectl.

D.Use Cloud Build to redeploy a previous image tag when rollback is needed.

AnswerB

Cloud Deploy provides automated rollback and deployment history.

Why this answer

Cloud Deploy is the recommended service for managing progressive deliveries and rollbacks on GKE, as it provides built-in rollback capabilities, audit logging, and delivery pipeline management. Unlike raw Kubernetes Deployment history, Cloud Deploy integrates with Cloud Build and offers a controlled, auditable deployment process with the ability to roll back to any previous release revision.

Exam trap

Cisco often tests the misconception that Kubernetes native rollback mechanisms (like `kubectl rollout undo`) are sufficient for enterprise audit requirements, but the exam expects candidates to recognize that Cloud Deploy provides the necessary audit logs and structured rollback workflows for production environments.

How to eliminate wrong answers

Option A is wrong because Kubernetes Deployment history only supports rollback via `kubectl rollout undo` to a specific revision, but it lacks native audit logging and does not provide a centralized, auditable deployment pipeline across multiple microservices. Option C is wrong because manually applying manifests from Artifact Registry bypasses automated deployment pipelines, introduces human error, and does not provide rollback capabilities or audit trails. Option D is wrong because using Cloud Build to redeploy a previous image tag is a manual workaround that does not offer structured rollback management, release tracking, or audit logs; it also requires rebuilding or re-tagging, which can lead to inconsistencies.

Full explanation →

3

Multi-Selecteasy

A company is using Cloud Storage to store sensitive customer data. They need to ensure data is encrypted at rest and access is controlled. Which TWO statements are true regarding data protection in Cloud Storage? (Choose two.)

Select 2 answers

A.By default, data in Cloud Storage is encrypted at rest using Google-managed encryption keys.

B.Using a signed URL revokes the underlying object's ACL.

C.Enabling uniform bucket-level access disables encryption at rest.

D.Customer-managed encryption keys (CMEK) can be used to control the encryption keys used to protect data.

E.Bucket-level policies can restrict access to only compute instances with specific service accounts.

AnswersA, D

Cloud Storage automatically encrypts all data at rest with Google-managed keys.

Why this answer

Option A is correct because Cloud Storage automatically encrypts all data at rest using server-side encryption with Google-managed encryption keys by default, without any additional configuration. This ensures that data is protected before it is written to disk and remains encrypted throughout its lifecycle.

Exam trap

Cisco often tests the misconception that uniform bucket-level access affects encryption or that signed URLs modify ACLs, when in fact these features operate on separate layers of the security model.

Full explanation →

4

Multi-Selecthard

A team is designing a globally distributed application on Google Cloud that requires strong consistency for writes but can tolerate eventual consistency for reads. The application expects millions of concurrent users. Which two strategies should they implement? (Choose two.)

Select 2 answers

A.Use Cloud Spanner for write operations requiring strong consistency.

B.Use Firestore in multi-region mode for all operations.

C.Use global HTTP(S) Load Balancer with Cloud CDN for read-heavy traffic.

D.Deploy Cloud SQL with cross-region replication for read scalability.

E.Use Cloud Bigtable for reading data with strong consistency.

AnswersA, C

Spanner provides global strong consistency and high availability.

Why this answer

Cloud Spanner provides strongly consistent writes globally through synchronous replication using the TrueTime API and Paxos-based consensus. This ensures that all write operations are immediately consistent across regions, meeting the requirement for strong consistency on writes.

Exam trap

Cisco often tests the misconception that a single database can handle both strong consistency and high scalability for reads and writes, leading candidates to choose Firestore or Bigtable without considering the specific consistency requirements for writes versus reads.

Full explanation →

5

Multi-Selectmedium

Which three are valid ways to authenticate a service account when using the Google Cloud client libraries? (Choose three.)

Select 3 answers

A.Attaching a service account to a Compute Engine instance and letting the metadata server provide credentials.

B.Setting the GOOGLE_APPLICATION_CREDENTIALS environment variable to point to a service account key file.

C.Using OAuth 2.0 client IDs for web applications.

D.Using the gcloud CLI default application credentials.

E.Using an API key in the client library initialization.

AnswersA, B, D

This is the automatic method on Compute Engine.

Why this answer

Options A, B, and C are correct. These are the standard methods for Application Default Credentials (ADC). Option D is wrong because API keys are not used for service account authentication.

Option E is wrong because OAuth 2.0 client IDs are for user authentication.

Full explanation →

6

MCQeasy

A team deploys a containerized web application on Google Kubernetes Engine (GKE) using a Deployment. They need to expose the application externally via a stable IP address and enable SSL termination. Which resource should they use?

A.HorizontalPodAutoscaler

B.Ingress with Google-managed SSL certificate

C.Service type NodePort

D.Service type LoadBalancer

AnswerB

Provides SSL termination and a stable IP via the load balancer.

Why this answer

An Ingress with a Google-managed SSL certificate is the correct choice because it provides a single stable IP address via a global forwarding rule, terminates SSL at the Google Cloud HTTP(S) load balancer, and routes traffic to the GKE Deployment. This approach offloads SSL decryption from the application pods and uses a managed certificate that auto-renews, meeting both the stable IP and SSL termination requirements.

Exam trap

Cisco often tests the misconception that a Service type LoadBalancer provides SSL termination, but it only provides L4 load balancing with a stable IP; SSL termination requires an L7 Ingress or a dedicated SSL proxy.

How to eliminate wrong answers

Option A is wrong because a HorizontalPodAutoscaler only adjusts the number of pod replicas based on CPU/memory metrics and does not expose the application externally or handle SSL termination. Option C is wrong because a Service type NodePort exposes the application on a high-port on each node's IP, which is not a stable IP address and does not provide SSL termination. Option D is wrong because a Service type LoadBalancer creates a regional TCP/UDP load balancer with an ephemeral external IP (unless static IP is manually reserved) and does not natively terminate SSL; it would require additional configuration like a separate SSL proxy or an Ingress.

Full explanation →

7

MCQhard

Refer to the exhibit. The function returns 'Error' even though the document exists. What is the most likely reason?

A.The document ID in the query parameter is URL-encoded and needs to be decoded using `decodeURIComponent`.

B.The function has insufficient IAM permissions for Firestore.

C.The Firestore emulator is not running.

D.The `update` method requires that the document exists.

AnswerA

Spaces and special characters in query strings are encoded; decoding is necessary to form the correct document path.

Why this answer

The most likely reason is that the document ID in the query parameter is URL-encoded, and the function does not decode it before using it as a Firestore document reference. Firestore document IDs are case-sensitive and must match exactly; a URL-encoded string like 'doc%20name' will not match the actual document 'doc name', causing the update to fail and return 'Error'. Using `decodeURIComponent` on the parameter before passing it to Firestore resolves this.

Exam trap

Cisco often tests the subtle distinction between a document not existing and a document ID mismatch due to encoding, leading candidates to incorrectly choose the 'document must exist' option (D) when the real issue is a URL-encoded ID not being decoded.

How to eliminate wrong answers

Option B is wrong because insufficient IAM permissions would typically result in a permission-denied error (HTTP 403) or an exception, not a generic 'Error' return from the function, and the question states the document exists, implying the function can access Firestore. Option C is wrong because if the Firestore emulator were not running, the function would fail to connect entirely, throwing a network or connection error, not a conditional 'Error' after checking document existence. Option D is wrong because the `update` method in Firestore does require the document to exist, but the question explicitly states the document exists, so this is not the cause of the error; the issue is the mismatch due to URL encoding.

Full explanation →

8

Multi-Selectmedium

A company is designing a cloud-native application on Google Kubernetes Engine. They want to ensure high availability and scalability for their microservices. Which two best practices should they follow?

Select 2 answers

A.Use a single cluster per region.

B.Use a single replica for each service to reduce cost.

C.Use horizontal pod autoscaling based on custom metrics.

D.Use stateful sets for all services.

E.Deploy services across multiple zones.

AnswersC, E

HPA allows scaling based on application-specific metrics.

Why this answer

Horizontal Pod Autoscaling (HPA) based on custom metrics allows the application to automatically scale the number of pod replicas in response to application-specific signals (e.g., requests per second, queue depth) rather than just CPU/memory. This ensures that each microservice can handle varying load efficiently, maintaining high availability and scalability without over-provisioning.

Exam trap

Cisco often tests the misconception that high availability requires a single cluster per region, but the trap is that true resilience demands multi-zone or multi-region deployment to survive zonal failures, not just cluster redundancy.

Full explanation →

9

Multi-Selecthard

Which TWO actions should a developer take to ensure that a Cloud Run service can access a Cloud SQL instance securely?

Select 2 answers

A.Use a Cloud NAT to provide outbound internet access for the service.

B.Assign a public IP to the Cloud SQL instance and allow all traffic from Cloud Run.

C.Use the Cloud SQL Auth Proxy as a sidecar container in the same pod.

D.Configure the service with a VPC connector and use private IP for Cloud SQL.

E.Create a service account with the cloudsql.instances.connect permission.

AnswersC, D

Correct; Cloud SQL Auth Proxy provides secure IAM-based access.

Why this answer

Option C is correct because the Cloud SQL Auth Proxy, when deployed as a sidecar container in the same pod, provides encrypted connections and IAM-based authentication to Cloud SQL without requiring a public IP or complex network configuration. It automatically handles TLS 1.3 encryption and uses the service account's IAM permissions to authorize connections, ensuring secure access from Cloud Run.

Exam trap

Cisco often tests the misconception that a service account permission alone (Option E) is sufficient for secure access, when in reality the permission must be paired with a connectivity method like the Cloud SQL Auth Proxy or a VPC connector to actually establish the encrypted channel.

Full explanation →

10

MCQeasy

A developer sets environment variables for a Cloud Function as shown. What is the security concern?

A.The variable name FOO is too short and does not follow naming conventions.

B.The variable DB_PASS should be set as a build variable instead.

C.There is no encryption applied to environment variables.

D.The password is exposed in plain text and should be stored in Secret Manager.

AnswerD

Secrets should never be in plain text environment variables.

Why this answer

Option B is correct because the password is exposed in plain text in environment variables. Option A is wrong because the variable name is acceptable. Option C is wrong because build variables are for build time, not runtime.

Option D is wrong because encryption at rest is not the issue; the problem is visibility.

Full explanation →

11

MCQhard

An application running on Compute Engine uses Cloud Storage for storing user-uploaded images. During load testing, the application experiences high latency when reading images. The developer suspects that the application is making too many small read requests. Which approach should the developer take to optimize performance?

A.Enable Cloud CDN to cache the images at edge locations.

B.Rewrite the objects to use a different storage class.

C.Increase the read size to reduce the number of API requests.

D.Mount the Cloud Storage bucket using Cloud Storage FUSE and read files from the local filesystem.

AnswerC

Reading larger chunks reduces the number of HTTP requests and improves throughput, especially for sequential access patterns.

Why this answer

Option C is correct because the high latency is caused by many small read requests, each incurring API overhead. By increasing the read size (e.g., reading larger chunks or using range requests), the application reduces the number of API calls, which lowers cumulative latency and improves throughput. This directly addresses the root cause of excessive small reads.

Exam trap

Google Cloud often tests the misconception that caching (Cloud CDN) or filesystem mounting (FUSE) solves performance issues caused by small read patterns, when the real fix is to reduce the number of API calls by increasing the read size.

How to eliminate wrong answers

Option A is wrong because Cloud CDN caches content at edge locations to reduce latency for repeated reads, but it does not reduce the number of small read requests the application makes; it only serves cached responses for subsequent requests, not the initial small-read pattern. Option B is wrong because changing the storage class (e.g., to Nearline or Coldline) affects cost and retrieval latency for infrequently accessed data, but it does not optimize the read size or reduce the number of API requests for small reads. Option D is wrong because Cloud Storage FUSE mounts the bucket as a local filesystem, but it still translates file operations into API calls; small reads from the filesystem still generate many underlying API requests, and FUSE can introduce additional overhead, not reduce it.

Full explanation →

12

MCQhard

Refer to the exhibit. A developer deployed a Cloud Run service with the above command. They notice that the service's latency is higher than expected under load. The service performs CPU-intensive tasks. What is the most likely reason for the high latency?

A.The service is using gen2, which does not support CPU-intensive workloads

B.The service should be deployed with --max-instances set to a lower number

C.The execution environment is gen2, which only allocates CPU during request processing by default; the high concurrency causes CPU contention

D.The memory is insufficient for the concurrency level

AnswerC

Gen2 CPU is only allocated during request processing unless CPU always on is set.

Why this answer

Cloud Run (gen2) allocates CPU based on request processing. With concurrency 80, and CPU-intensive tasks, the CPU may be throttled between requests. Option A is correct: gen2 only allocates CPU during request processing if not using CPU always on.

Option B is incorrect because gen2 supports CPU-intensive tasks. Option C is incorrect because 4Gi memory should be sufficient. Option D is incorrect because scaling to 10 instances could help but doesn't address concurrency issue.

Full explanation →

13

MCQhard

A company runs a critical application on Compute Engine with a stateful database. They need to achieve 99.99% availability for the database tier. Which architecture should they implement?

A.A Compute Engine instance group with managed instance groups and a regional persistent disk configured for synchronous replication.

B.Use Cloud SQL with automatic failover and read replicas.

C.Two Compute Engine instances in different zones with a shared Zonal persistent disk.

D.Single Compute Engine instance with a persistent disk snapshot scheduled every hour.

AnswerA

Regional persistent disks replicate data synchronously across zones, and the managed instance group can automatically fail over to a new instance in another zone on failure, achieving high availability.

Why this answer

Option A is correct because a managed instance group with a regional persistent disk configured for synchronous replication provides the necessary 99.99% availability by ensuring the database runs across two zones with synchronous writes to both replicas. This architecture allows automatic failover within seconds if one zone fails, meeting the high-availability requirement without data loss.

Exam trap

Cisco often tests the misconception that a shared zonal persistent disk across two instances provides high availability, but the trap is that a zonal disk is still tied to a single zone and fails if that zone goes down, whereas a regional persistent disk is required for true multi-zone resilience.

How to eliminate wrong answers

Option B is wrong because Cloud SQL with automatic failover and read replicas is a managed database service that offers up to 99.95% availability, not 99.99%, and read replicas are asynchronous, which can lead to data loss during failover. Option C is wrong because two Compute Engine instances in different zones with a shared zonal persistent disk cannot achieve 99.99% availability, as a zonal disk is tied to a single zone and becomes inaccessible if that zone fails, causing a single point of failure. Option D is wrong because a single Compute Engine instance with hourly persistent disk snapshots provides no high availability; recovery from a snapshot can take minutes to hours, far exceeding the downtime allowed for 99.99% availability (approximately 52.56 minutes per year).

Full explanation →

14

MCQhard

A company has a multi-region Cloud Run service with traffic splitting between revisions. They notice that a newly rolled-out revision is receiving 0% of traffic even though they set traffic to 100% via the console. The revision shows 'Ready: Yes'. What is the most likely cause?

A.The revision has a low CPU limit causing it to be throttled.

B.The revision is not healthy because of a misconfigured health check.

C.The revision has a tag but no traffic percentage assigned; the tag is being used for routing.

D.The revision has a concurrency setting of 0, which is invalid.

AnswerC

If a revision has a tag, it may be accessible only via that URL; without a traffic percentage, it won't serve at the default URL.

Why this answer

When a revision shows 'Ready: Yes' but receives 0% traffic despite setting 100% via the console, the most likely cause is that the revision has a tag assigned but no traffic percentage. In Cloud Run, tags are used for direct URL routing (e.g., for testing) and do not receive any traffic from the service's main URL unless a traffic percentage is explicitly assigned. The console's traffic splitting UI allows setting a tag without a percentage, which can lead to this confusion.

Exam trap

The trap here is that candidates assume setting traffic to 100% in the console automatically distributes traffic to the latest revision, but they overlook that a tag can override this behavior by creating a separate routing path without a traffic percentage.

How to eliminate wrong answers

Option A is wrong because a low CPU limit would cause throttling or performance degradation, not a complete 0% traffic assignment; Cloud Run still routes traffic to the revision even if it is throttled. Option B is wrong because if the revision were unhealthy due to a misconfigured health check, the revision would show 'Ready: No' or be in a failed state, not 'Ready: Yes'. Option D is wrong because a concurrency setting of 0 is invalid and would cause a deployment error or revision failure, not a 0% traffic split with a healthy revision.

Full explanation →

15

Multi-Selecthard

A Cloud SQL for PostgreSQL instance is experiencing high query latency. The database has a high number of read replicas and is used for reporting. The team has identified that index scans are not being used effectively. Which THREE actions should they take to improve query performance?

Select 3 answers

A.Analyze table statistics using VACUUM ANALYZE.

B.Increase the number of CPUs on the primary instance.

C.Enable automatic storage increase.

D.Use pg_stat_statements to identify slow queries.

E.Create additional read replicas.

AnswersA, B, D

Updating statistics helps the query planner choose index scans over sequential scans.

Why this answer

Option A is correct because `VACUUM ANALYZE` updates table statistics that the PostgreSQL query planner relies on to choose efficient index scans. Stale statistics can cause the planner to underestimate the selectivity of index conditions, leading to sequential scans instead of index scans, which increases latency. Regular analysis ensures the planner has accurate data distribution information to optimize query execution plans.

Exam trap

Cisco often tests the distinction between symptom mitigation (adding replicas or CPUs) and root-cause resolution (updating statistics), leading candidates to choose resource scaling options instead of the correct maintenance operation.

Full explanation →

16

Multi-Selectmedium

Which TWO are best practices for testing containerized applications on Google Cloud?

Select 2 answers

A.Use Kubernetes for testing only.

B.Use Distroless images for testing.

C.Use Cloud Build to build and test containers.

D.Use a different base image for testing than production.

E.Run tests inside the container as a separate layer using Docker multi-stage builds.

AnswersC, E

Cloud Build integrates seamlessly with container workflows.

Why this answer

Cloud Build is a managed CI/CD platform that can build container images from source code and execute tests as part of the build pipeline. It integrates natively with Google Cloud services like Container Registry and Artifact Registry, and supports custom build steps, making it an ideal tool for building and testing containerized applications in a consistent, automated environment.

Exam trap

Cisco often tests the misconception that testing should use a different base image to avoid production bloat, but the correct practice is to use the same base image for testing and production to ensure consistency, while leveraging multi-stage builds to separate build and test dependencies from the final runtime image.

Full explanation →

17

MCQeasy

A company deploys a microservices application on Google Kubernetes Engine (GKE). The operations team needs to monitor API latency between services. Which Google Cloud service should they use to trace requests across services?

A.Error Reporting

B.Cloud Logging

C.Cloud Monitoring

D.Cloud Trace

AnswerD

Cloud Trace provides distributed tracing to analyze latency across services.

Why this answer

Cloud Trace is the correct choice because it is a distributed tracing system designed to capture latency data as requests propagate through microservices. It provides end-to-end visibility by collecting trace spans from each service, allowing the operations team to identify bottlenecks and measure API latency between services in a GKE environment.

Exam trap

The trap here is that candidates confuse Cloud Monitoring (metrics) with Cloud Trace (distributed tracing), assuming that latency metrics alone can trace requests across services, but metrics lack the span-level context needed to follow a single request's path.

How to eliminate wrong answers

Option A is wrong because Error Reporting aggregates and analyzes application errors, not latency traces. Option B is wrong because Cloud Logging stores and queries log data, but it does not provide the distributed trace context needed to follow a request across multiple services. Option C is wrong because Cloud Monitoring focuses on metrics, alerts, and dashboards (e.g., CPU, memory), not on tracing individual request paths or measuring per-hop latency.

Full explanation →

18

Multi-Selecthard

A company uses Cloud Spanner for a globally distributed application. They need to design their table schema for maximum scalability and performance. Which two design considerations are critical? (Choose two.)

Select 2 answers

A.Use interleaved tables to colocate related data.

B.Store large binary blobs directly in Spanner.

C.Define secondary indexes on every column.

D.Use monotonically increasing primary keys.

E.Choose primary keys that distribute write load evenly across nodes.

AnswersA, E

Interleaved tables store parent and child rows in the same split, reducing the number of participants in transactions and improving performance.

Why this answer

Interleaved tables in Cloud Spanner physically colocate parent and child rows on the same split, reducing cross-node reads and improving join performance. This design is critical for globally distributed applications because it minimizes latency and ensures that related data is stored together for efficient access.

Exam trap

Cisco often tests the misconception that secondary indexes on every column improve query performance, but in Spanner they increase write latency and storage costs without benefiting all queries.

Full explanation →

19

MCQeasy

A developer is using Cloud Spanner for a global application. They need to design a schema to avoid hotspots. Which practice should they follow?

A.Use a UUID primary key

B.Use a monotonically increasing primary key

C.Use a composite primary key with a timestamp

D.Use a hash prefix on the primary key

AnswerD

A hash prefix distributes write load evenly across nodes, avoiding hotspots.

Why this answer

Option D is correct because Cloud Spanner uses a distributed architecture that splits data across splits based on the primary key. A monotonically increasing or timestamp-based key causes all new writes to hit a single split, creating a hotspot. By using a hash prefix on the primary key, writes are distributed uniformly across all splits, avoiding hotspots and maximizing write throughput.

Exam trap

Cisco often tests the misconception that any random key (like UUID) automatically avoids hotspots, but in Cloud Spanner, the key distribution must be explicitly designed to avoid sequential patterns, and a hash prefix is the recommended pattern.

How to eliminate wrong answers

Option A is wrong because a UUID primary key, while random, is 128 bits and can still lead to uneven distribution and hotspotting if the UUID generation is not truly random or if the application uses sequential UUIDs (e.g., UUID v1). Option B is wrong because a monotonically increasing primary key (e.g., auto-increment integer) causes all new rows to be written to the last split, creating a severe write hotspot. Option C is wrong because a composite primary key with a timestamp as the leading column causes all writes at the same time to target the same split, again creating a hotspot.

Full explanation →

20

Multi-Selecthard

Which THREE of the following are valid reasons to use Cloud Deploy instead of manually applying kubectl commands in a CI/CD pipeline?

Select 3 answers

A.Cloud Deploy automatically containerizes applications.

B.Cloud Deploy maintains a deployment history for auditing.

C.Cloud Deploy enforces IAM roles on Kubernetes clusters.

D.Cloud Deploy provides automatic rollbacks on deployment failure.

E.Cloud Deploy supports canary and blue-green deployments out of the box.

AnswersB, D, E

Audit trail is built-in.

Why this answer

Option B is correct because Cloud Deploy automatically maintains a detailed deployment history, including the state of each rollout, approvals, and metadata. This history is stored in the Cloud Deploy API and can be queried for auditing, compliance, and troubleshooting purposes, which is not natively provided by manual kubectl commands in a CI/CD pipeline.

Exam trap

The trap here is that candidates may confuse Cloud Deploy's role in the CI/CD pipeline with containerization or cluster-level security, assuming it handles build or IAM enforcement, when in fact it is a continuous delivery service focused on rollout strategies and auditability.

Full explanation →

21

Multi-Selectmedium

A team is implementing a CI/CD pipeline for a Cloud Function using Cloud Build. Which three steps should they include in their cloudbuild.yaml? (Choose 3)

Select 3 answers

A.Static code analysis

B.Deploy the function

C.Run unit tests

D.Build a container image

E.Manual approval step

AnswersA, B, C

Static analysis (linting, security scanning) is a good practice to include in the pipeline.

Why this answer

Static code analysis (A) is correct because it helps identify code quality issues, security vulnerabilities, and adherence to coding standards early in the pipeline, which is a best practice for Cloud Functions. Running unit tests (C) is essential to validate function logic before deployment. Deploying the function (B) is the final step that pushes the validated code to Cloud Functions, making it a required step in the CI/CD pipeline.

Exam trap

Cisco often tests the misconception that Cloud Functions require building a container image for all runtimes, but in reality, only custom container runtimes (e.g., using Dockerfile) need that step, while the default runtimes use source-based deployment.

Full explanation →

22

MCQeasy

A startup wants to deploy a web application on App Engine standard environment. They need to handle sudden traffic spikes automatically. How should they configure scaling?

A.Use automatic scaling.

B.Use basic scaling with idle timeout.

C.Use manual scaling with a fixed number of instances.

D.Use a combination of manual and automatic scaling.

AnswerA

Automatic scaling adjusts instance count based on traffic.

Why this answer

Automatic scaling is designed for traffic spikes. Manual scaling requires manual adjustment. Basic scaling is for instances that handle a single request at a time.

Full explanation →

23

Multi-Selecteasy

Which TWO features are provided by Google Cloud Deploy? (Choose 2.)

Select 2 answers

A.Rollback to a previous deployment revision.

B.Run containers without managing infrastructure.

C.Automated canary analysis based on deployment verification.

D.Build container images from source code.

E.Manage Kubernetes clusters across multi-cloud environments.

AnswersA, C

Cloud Deploy supports rollbacks.

Why this answer

Options A and D are correct. A: Cloud Deploy supports canary and blue/green deployments. D: It provides rollback capability.

Option B is wrong because it's a CI tool, not Cloud Deploy. Option C is wrong because it's a serverless container platform. Option E is wrong because it's a different service.

Full explanation →

24

Multi-Selectmedium

A company is deploying a containerized application on Cloud Run that needs to connect to a Cloud SQL (MySQL) database. The database must not be accessible from the public internet. Which two steps should the company take to secure the connection?

Select 2 answers

A.Enable automatic IAM database authentication for Cloud SQL.

B.Create a Cloud NAT gateway for outbound traffic from Cloud Run.

C.Use Cloud SQL Auth proxy within the Cloud Run container.

D.Set Cloud SQL to have a private IP address only.

E.Configure a VPC Serverless Access connector and attach it to the Cloud Run service.

AnswersD, E

A private IP ensures the database is not exposed to the public internet, meeting the security requirement.

Why this answer

Setting Cloud SQL to have a private IP address only (Option D) ensures the database is accessible only within the VPC network, not from the public internet. This eliminates exposure to external threats and is a fundamental security best practice for private database access.

Exam trap

The trap here is that candidates often think Cloud SQL Auth proxy alone is sufficient for security, but it does not prevent public internet access to the database; the proxy only encrypts the connection, while the database's IP address remains publicly reachable unless explicitly set to private.

Full explanation →

25

MCQeasy

A developer wants to containerize a Node.js application and deploy it to Cloud Run. They need to ensure the container is as small as possible. What should they do?

A.Use a full Ubuntu base image with all dependencies.

B.Use a multi-stage Dockerfile with a distroless base image.

C.Use a node:latest image and remove unnecessary files.

D.Use a simple FROM scratch image.

AnswerB

Multi-stage builds copy only runtime dependencies, and distroless images are minimal.

Why this answer

Option B is correct because a multi-stage Dockerfile allows you to separate the build environment from the runtime environment. By using a distroless base image (e.g., gcr.io/distroless/nodejs), you include only the application and its runtime dependencies, omitting package managers, shells, and other OS utilities. This results in a significantly smaller container image, which reduces attack surface and improves deployment speed on Cloud Run.

Exam trap

Cisco often tests the misconception that 'FROM scratch' is the smallest possible image for any application, but candidates must recognize that scratch images lack the runtime libraries required by interpreted languages like Node.js, making distroless the correct minimal choice.

How to eliminate wrong answers

Option A is wrong because using a full Ubuntu base image with all dependencies results in a large image (hundreds of MB) that includes unnecessary OS utilities, increasing attack surface and deployment time. Option C is wrong because using node:latest and removing unnecessary files is inefficient; the image still contains the full OS layer and package manager, and manual removal is error-prone and does not achieve the minimal size of a distroless image. Option D is wrong because a FROM scratch image provides no base filesystem or runtime libraries, and Node.js applications require the Node.js runtime and system libraries (e.g., libc, libstdc++) that are not present in a scratch image, causing the container to fail to start.

Full explanation →

26

MCQeasy

A developer wants to quickly test changes to a containerized web application that will run on Cloud Run, without building and deploying a new container. Which approach should they use?

A.Deploy to a staging Cloud Run service

B.Run locally with Docker

C.Use traffic splitting to test a new revision

D.Use Cloud Run for Anthos

AnswerB

Running the container locally with Docker provides the fastest feedback loop as it avoids deployment steps.

Why this answer

Running locally with Docker allows rapid iteration without the overhead of building and pushing to a registry and redeploying. Staging deployment is slower. Traffic splitting is for production traffic management.

Cloud Run for Anthos is for hybrid deployments.

Full explanation →

27

Multi-Selecthard

Which TWO statements about Cloud Trace are correct?

Select 2 answers

A.Trace can be integrated with Cloud Monitoring for alerting

B.Trace collects latency data from all requests by default

C.Trace automatically creates dashboards for visualization

D.Trace can be used to analyze end-to-end latency across services

E.Trace supports auto-scaling based on latency

AnswersA, D

Trace data can be used with Cloud Monitoring alerts.

Why this answer

Option A is correct because Cloud Trace can be integrated with Cloud Monitoring to create alerting policies based on trace data, such as latency thresholds or error rates. This integration allows you to set up notifications when specific trace conditions are met, enabling proactive performance monitoring.

Exam trap

Cisco often tests the misconception that Cloud Trace captures all requests by default, but the key trap is that it uses sampling to manage cost and performance, so you must explicitly configure higher sampling for full visibility.

Full explanation →

28

Multi-Selecteasy

A developer wants to view real-time logs from a running application on Compute Engine. Which two methods can they use to stream logs? (Choose two.)

Select 2 answers

A.Using the Logs Explorer's 'Stream logs' feature

B.Using gcloud compute ssh and running journalctl -f

C.Using gcloud logging tail

D.Using Cloud Monitoring's metrics explorer

E.Using gcloud app logs tail

AnswersA, C

Correct: the Logs Explorer provides a streaming view.

Why this answer

Option A is correct because the Logs Explorer in the Google Cloud Console provides a 'Stream logs' feature that allows you to view real-time log entries as they are ingested by Cloud Logging. This is ideal for monitoring a running Compute Engine instance without needing to SSH into it. Option C is correct because the `gcloud logging tail` command streams log entries from Cloud Logging in real time, using the Logging API's tail method, and can filter by resource type (e.g., `gce_instance`) or log name.

Exam trap

Cisco often tests the distinction between streaming logs from the centralized Cloud Logging service versus streaming logs directly from the VM's local journal, and candidates mistakenly choose `journalctl -f` (Option B) because they think it provides the same real-time view, but it does not integrate with Cloud Logging's centralized filtering and retention.

Full explanation →

29

MCQhard

A company deploys a stateful application using StatefulSets on GKE. They need to store persistent data on regional persistent disks for high availability. However, during zonal failures, pods are not rescheduled quickly. What is the best approach to improve recovery time?

A.Increase the number of replicas in the StatefulSet.

B.Configure podDisruptionBudget and use persistent disk with regional replication.

C.Use a headless service with external persistent storage like Filestore.

D.Use a Deployment instead of StatefulSet.

AnswerB

Regional PDs replicate across zones and PDB ensures minimum available pods during disruptions.

Why this answer

Option B is correct because configuring a podDisruptionBudget ensures that a minimum number of Pods remain available during voluntary disruptions, while using regional persistent disks (which replicate data across zones) allows the StatefulSet controller to quickly reschedule Pods in another zone without waiting for the failed zone's disk to become available. This combination minimizes downtime during zonal failures by maintaining quorum and ensuring data is already accessible in the surviving zone.

Exam trap

Cisco often tests the misconception that increasing replicas alone improves availability during zonal failures, but the real bottleneck is the persistent volume's zonal binding, which requires regional replication to allow cross-zone attachment.

How to eliminate wrong answers

Option A is wrong because increasing the number of replicas does not address the root cause of slow rescheduling during zonal failures; it only spreads Pods across more nodes but still relies on the same regional disk, which may be stuck in the failed zone. Option C is wrong because using a headless service with external persistent storage like Filestore changes the storage architecture but does not inherently improve recovery time for StatefulSets; Filestore is a network file system that introduces latency and does not provide the same zonal failover guarantees as regional PDs. Option D is wrong because using a Deployment instead of StatefulSet would lose the ordered pod identity and stable storage mapping required for stateful applications, and Deployments do not guarantee that each Pod gets its own persistent volume, which can lead to data corruption or loss.

Full explanation →

30

Multi-Selectmedium

A DevOps team is migrating an on-premises monitoring solution to Google Cloud. They need to collect custom application metrics from a batch processing job running on Compute Engine. Which two services can ingest custom metrics into Cloud Monitoring? (Choose two.)

Select 2 answers

A.Cloud Profiler API

B.Stackdriver Monitoring agent with custom plugin

C.Cloud Logging with log-based metrics

D.Cloud Trace API

E.Cloud Monitoring API with custom metric descriptors

AnswersC, E

Correct: log-based metrics can extract numerical values from logs and create custom metrics.

Why this answer

Option C is correct because Cloud Logging can ingest any log entry, and log-based metrics allow you to extract numeric values from log content to create custom metrics that appear in Cloud Monitoring. Option E is correct because the Cloud Monitoring API lets you define custom metric descriptors and then write time-series data directly to those metrics, bypassing any agent or log pipeline.

Exam trap

Cisco often tests the misconception that the Stackdriver Monitoring agent (Ops Agent) can ingest arbitrary custom metrics via plugins, when in fact it only collects predefined metrics and custom metrics require either log-based metrics or direct API calls.

Full explanation →

31

MCQhard

You are designing a system that ingests high-velocity event streams from IoT devices using Pub/Sub. Each event must be processed exactly once to update a Firestore database. However, due to the distributed nature, at-least-once delivery is guaranteed by Pub/Sub. Which design pattern should you use to achieve exactly-once processing?

A.Use a Cloud Function with a retry policy to ensure delivery, and deduplicate using a Cloud Bigtable row key.

B.Use Cloud Dataflow with exactly-once processing mode and write to Firestore using a custom sink.

C.Use a Cloud Run service to pull messages and write to Firestore; rely on Firestore's built-in deduplication using document IDs.

D.Make the message processor idempotent by using a unique event ID as the Firestore document ID, and perform upsert operations.

AnswerD

Idempotent processing with upsert ensures exactly-once effect.

Why this answer

Option D is correct because making the processor idempotent using a unique event ID as the Firestore document ID ensures that duplicate writes are harmless. Option A is wrong because Firestore does not provide built-in deduplication. Option B is wrong because Dataflow's exactly-once mode still requires idempotent sinks.

Option C is wrong because Bigtable row keys can help but idempotency is the key.

Full explanation →

32

MCQhard

A company uses Cloud Build to build Docker images and push to Artifact Registry. They want to trigger builds automatically when code is pushed to a Cloud Source Repository branch. They also need to ensure that only builds from the repository's main branch are allowed to push to the production Artifact Registry repository. What is the best way to implement this?

A.Use Cloud Source Repository webhooks and a Cloud Function to call Cloud Build

B.Use Cloud Build triggers and configure separate service accounts for each branch

C.Use Cloud Build triggers with branch filter and use IAM conditions on the Artifact Registry to restrict push based on the Cloud Build service account

D.Use Cloud Build triggers with substitution variables and separate build configurations

AnswerC

This approach allows you to create a trigger for the main branch with a dedicated service account that has write access to the production repository.

Why this answer

Option A is correct because creating a Cloud Build trigger with a branch filter for the main branch and using a dedicated service account for that trigger with appropriate permissions on the production repository ensures only main branch builds can push to production. Option B does not restrict access per branch, option C is unnecessarily complex, and option D uses multiple service accounts but is less straightforward.

Full explanation →

33

MCQeasy

A company has a hybrid cloud setup with on-premises applications that need to send messages to a Pub/Sub topic. The on-premises network is connected via Cloud VPN. What is the recommended way to publish messages?

A.Use a Cloud NAT instance to route traffic

B.Expose Pub/Sub publicly and use authentication via OAuth2 tokens

C.Use VPC Service Controls and Private Google Access

D.Use Cloud Router to establish BGP sessions for direct connectivity

AnswerC

Private Google Access enables on-premises to reach Google APIs via VPN, and VPC Service Controls provides security.

Why this answer

Option A is correct because to allow on-premises access to Pub/Sub via VPN, you need to enable Private Google Access and use VPC Service Controls for security. Option B is for dynamic routing, not for accessing Pub/Sub. Option C is for dynamic routing, not for Pub/Sub access.

Option D is for outbound internet access, not for private access to Google APIs.

Full explanation →

34

MCQeasy

Your Cloud Run service is experiencing 5xx errors. You have enabled Cloud Logging and Cloud Error Reporting. How can you quickly identify the most common error type?

A.Use Cloud Trace to analyze the traces of failing requests.

B.Open Cloud Error Reporting to see grouped error counts.

C.View the logs in Cloud Logging and manually count error messages.

D.Create a Cloud Monitoring alert on 5xx response codes.

AnswerB

Error Reporting aggregates and surfaces top errors.

Why this answer

Cloud Error Reporting automatically groups similar errors (e.g., same stack trace or error message) and shows a count for each group, making it the fastest way to identify the most common 5xx error type without manual log inspection. It is purpose-built for this exact use case, aggregating errors from Cloud Logging and presenting them in a dashboard sorted by frequency.

Exam trap

Cisco often tests the distinction between monitoring (Cloud Monitoring alerts) and error analysis (Cloud Error Reporting), tempting candidates to choose a monitoring alert when the question explicitly asks for identifying the most common error type, not just detecting that errors exist.

How to eliminate wrong answers

Option A is wrong because Cloud Trace is designed for latency analysis and distributed tracing, not for counting or grouping error types; it would require manual correlation to find the most common error. Option C is wrong because manually counting error messages in Cloud Logging is inefficient and error-prone, defeating the purpose of 'quickly' identifying the most common error type. Option D is wrong because a Cloud Monitoring alert on 5xx response codes only notifies you that errors are occurring, but does not group or identify the most common error type; it lacks the error aggregation and classification that Error Reporting provides.

Full explanation →

35

Multi-Selecthard

Which THREE methods are valid ways to deploy a containerized application to Google Kubernetes Engine (GKE)?

Select 3 answers

A.Use Helm to install a chart.

B.Use gcloud container clusters create to deploy the application.

C.Upload the container image to Cloud Storage and use a trigger to deploy.

D.Use kubectl apply with a Deployment manifest.

E.Use Config Connector to create a KubernetesDeployment resource.

AnswersA, D, E

Helm is a package manager for Kubernetes.

Why this answer

Options A, B, and C are correct. A: kubectl apply is direct. B: Helm charts are commonly used.

C: Config Connector manages GKE resources via Kubernetes custom resources. Option D is wrong because gcloud container clusters create creates clusters, not deploys apps. Option E is wrong because Cloud Storage does not deploy directly to GKE.

Full explanation →

36

MCQmedium

A company uses Cloud SQL for PostgreSQL and wants to run periodic analytical queries on the data without impacting the transactional workload. The data is updated frequently. Which integration approach is most suitable?

A.Use Cloud Composer to schedule ETL jobs that copy data to BigQuery every minute.

B.Migrate the database to Cloud Spanner and use strong reads for analytics.

C.Export the Cloud SQL data to Cloud Storage and then load into BigQuery for analysis.

D.Create a read replica of the Cloud SQL instance and point analytical queries to the replica.

AnswerD

Read replicas handle read traffic without impacting the main database's write performance.

Why this answer

Option B is correct because Cloud SQL read replicas allow offloading read queries without affecting the primary instance's performance. Option A is wrong as export/import is batch and not real-time. Option C is wrong because Cloud Spanner is a different database.

Option D is wrong because Cloud Composer is an orchestration tool, not a direct solution.

Full explanation →

37

MCQhard

Refer to the exhibit. A developer creates this cloudbuild.yaml for a Cloud Build pipeline. When they run the build, they get an error that the image push failed. What is the most likely cause?

A.The project ID 'my-project' does not exist.

B.The Artifact Registry repository 'my-repo' has not been created.

C.The Dockerfile is missing in the repository.

D.Cloud Run service 'my-service' already exists and needs to be deleted.

E.The gcloud command requires the '--platform managed' flag.

AnswerB

The push step requires the repository to exist; otherwise, the push fails.

Why this answer

The error occurs because the cloudbuild.yaml references an Artifact Registry repository 'my-repo' that does not exist in the project. Cloud Build attempts to push the Docker image to the specified repository, and if the repository has not been created, the push fails with a permission or not-found error. The repository must be created before the build runs, as Cloud Build does not automatically create repositories.

Exam trap

Cisco often tests the distinction between build-time errors (e.g., missing Dockerfile) and push-time errors (e.g., missing repository), and candidates may confuse a missing repository with a missing project or a deployment flag issue.

How to eliminate wrong answers

Option A is wrong because if the project ID 'my-project' did not exist, the build would fail earlier with a project-level authentication or resource-not-found error, not specifically an image push failure. Option C is wrong because a missing Dockerfile would cause a build failure during the image build step, not during the push step. Option D is wrong because the Cloud Run service already existing is not an error; Cloud Run deployments can update existing services, and the error is about image push, not deployment.

Option E is wrong because the '--platform managed' flag is required for Cloud Run deployments, not for image pushes to Artifact Registry; the push failure is unrelated to this flag.

Full explanation →

38

MCQhard

A developer uses this Cloud Build configuration to deploy to Cloud Run. The build succeeds but the deployment fails with an error that the service account lacks permission. What is the most likely missing permission?

A.roles/iam.serviceAccountUser on the Compute Engine default service account.

B.roles/iam.serviceAccountUser on the Cloud Build service account.

C.roles/storage.objectViewer on the container registry.

D.roles/run.admin on the Cloud Run service.

AnswerB

The Cloud Build service account needs to impersonate the runtime service account (default Compute Engine service account) to deploy Cloud Run services.

Why this answer

Cloud Build uses its own service account (the Cloud Build service account) to execute deployments. To deploy to Cloud Run, that service account needs the 'roles/iam.serviceAccountUser' role on the runtime service account (default Compute Engine service account) to act as that service account.

Full explanation →

39

Multi-Selecteasy

A company wants to deploy a containerized application to Cloud Run. Which two approaches are supported? (Choose two.)

Select 2 answers

A.Use gcloud beta run deploy with --source flag to build and deploy from source

B.Use Cloud Functions to package the container as a function

C.Upload a Dockerfile to Cloud Run console and let it build

D.Use Kubernetes Engine to deploy the container and then migrate to Cloud Run

E.Build the container locally and push to Artifact Registry, then deploy with gcloud

AnswersA, E

This allows building and deploying directly from source code.

Why this answer

Cloud Run supports source-based deployment with the `--source` flag and building/pushing to Artifact Registry then deploying. Other options are not valid deployment methods.

Full explanation →

40

MCQhard

A company runs a Java microservice on Google Kubernetes Engine (GKE) using a standard cluster with 3 nodes. They use Cloud Build to build the Docker image and push it to Artifact Registry, then apply a Kubernetes Deployment manifest that references the new image tag. The Deployment has a rolling update strategy with maxSurge=1 and maxUnavailable=0. After a recent deployment, the new pods crash with 'CrashLoopBackOff'. The old pods are still running successfully. The application logs show a connection refused error when trying to connect to a Cloud SQL instance. The Cloud SQL instance is in the same project and region. The GKE cluster nodes have the appropriate scopes to access Cloud SQL. The application uses a Cloud SQL proxy sidecar container to establish the connection. The previous deployment worked fine. What is the most likely cause of the failure?

A.The GKE cluster nodes do not have the Cloud SQL Client role.

B.The Cloud SQL proxy sidecar container is not included in the new Deployment revision.

C.The Kubernetes Secret containing the service account key was not updated to include the new pod's service account.

D.The new image tag points to a broken build that has incorrect code for Cloud SQL connection.

AnswerB

Correct. Without the sidecar, the application cannot connect to Cloud SQL, resulting in connection refused.

Why this answer

The correct answer is B because the Cloud SQL proxy sidecar container is missing from the new Deployment revision. Since the application relies on the sidecar to establish a secure connection to Cloud SQL, its absence causes the connection refused error. The old pods continue to run because they still have the sidecar from the previous Deployment revision, while the new pods crash due to the missing proxy.

Exam trap

Cisco often tests the misconception that a connection refused error implies a code or permission issue, when in fact it is a missing sidecar container that causes the failure, especially in scenarios where the sidecar is defined in the Deployment manifest and accidentally removed during a revision update.

How to eliminate wrong answers

Option A is wrong because the GKE cluster nodes have the appropriate scopes to access Cloud SQL, and the Cloud SQL Client role is an IAM role assigned to the service account, not a scope on the nodes; the sidecar proxy handles authentication. Option C is wrong because the Kubernetes Secret containing the service account key is not relevant here—the Cloud SQL proxy sidecar typically uses Workload Identity or a service account key mounted as a volume, but the issue is the sidecar container itself being absent, not a missing or outdated secret. Option D is wrong because the new image tag points to a build that likely has correct code; the connection refused error is due to the missing sidecar proxy, not a code defect in the application.

Full explanation →

41

Multi-Selecthard

A company runs a microservices architecture on GKE with gRPC services. They want to implement traffic splitting for canary deployments. Which THREE components should they use?

Select 3 answers

A.ClusterIP service

B.Istio or Anthos Service Mesh

C.Ingress resource

D.Google Cloud Load Balancer

E.Headless service

AnswersB, C, D

Provides advanced traffic management, including weight-based canary deployments.

Why this answer

Option B is correct because Istio or Anthos Service Mesh provides fine-grained traffic splitting capabilities for canary deployments in a GKE environment. It uses Envoy sidecar proxies to route a percentage of traffic to different service versions based on HTTP headers or weight, enabling controlled rollouts without modifying application code.

Exam trap

The trap here is that candidates often confuse ClusterIP or Headless services with traffic splitting capabilities, but these are only for basic service discovery and do not provide the advanced routing needed for canary deployments.

Full explanation →

42

MCQhard

A financial services company runs a transaction processing microservice on Google Kubernetes Engine (GKE). The service uses Cloud Spanner as its database. After migrating from Cloud SQL to Spanner to improve scalability, the team notices that a small percentage of transactions fail with an 'ABORTED' error due to deadlock detection. The application currently performs no retries, and the failures cause customer-facing errors. The team also observes that under peak load, transaction latencies are around 500ms, which is acceptable but they want to ensure the system remains reliable. They need to implement a solution that minimizes failures while maintaining acceptable performance. Which course of action should they take?

A.Increase the number of Spanner nodes to reduce the probability of deadlocks.

B.Reduce the size of each transaction by splitting them into smaller ones.

C.Change the transaction isolation level to READ UNCOMMITTED to avoid deadlocks.

D.Implement retry logic with exponential backoff and random jitter for aborted transactions.

AnswerD

Retrying with backoff and jitter is the standard pattern for handling Spanner aborts, ensuring transient conflicts are resolved without significant latency impact.

Why this answer

In Cloud Spanner, 'ABORTED' errors due to deadlock detection are a normal part of the optimistic concurrency control mechanism. The correct solution is to implement retry logic with exponential backoff and random jitter, as recommended by Google's own documentation. This approach transparently handles transient deadlocks without requiring infrastructure changes or sacrificing consistency, and it maintains acceptable latency by spacing out retries.

Exam trap

Cisco often tests the misconception that scaling infrastructure (more nodes) or reducing transaction size alone can eliminate deadlocks, when in fact retry logic is the required pattern for handling transient aborts in distributed databases like Cloud Spanner.

How to eliminate wrong answers

Option A is wrong because increasing the number of Spanner nodes improves throughput and storage capacity but does not directly reduce the probability of deadlocks; deadlocks are a function of transaction contention, not node count. Option B is wrong because splitting transactions into smaller ones can reduce the chance of conflicts but does not eliminate the need for retry logic; it also may break application-level atomicity requirements. Option C is wrong because Cloud Spanner does not support READ UNCOMMITTED isolation; it provides serializable isolation (and optional stale reads), and lowering isolation is not possible and would violate consistency guarantees.

Full explanation →

43

MCQhard

You are a developer at a company that runs a critical pricing engine on Compute Engine instances in a managed instance group (MIG) behind an internal TCP load balancer. The pricing engine is a stateful application that stores state in memory and also writes to a Cloud Bigtable instance for persistence. The application uses a custom port 8080. You need to migrate this application to Cloud Run for better scalability and reduced operational overhead. The application must maintain session affinity so that requests from the same client are routed to the same instance (since the in-memory state is not yet fully externalized). The application currently uses a health check on /healthz that returns 200 OK. You have containerized the application. When you deploy to Cloud Run, you notice that traffic is not sticky; every request might go to a different revision. You also need to ensure that Bigtable writes are performed asynchronously to avoid slowing down the pricing calculations. What should you do?

A.Implement a custom health check on TCP port 8080 in Cloud Run to ensure only healthy instances receive traffic.

B.Increase the container concurrency setting to 1 to force each container to handle one request at a time.

C.Use Cloud Run with an HTTP(S) External Load Balancer and enable session affinity on the backend service.

D.Deploy the application on Cloud Run and configure an Internal TCP/UDP Load Balancer in front of it with session affinity.

AnswerC

External load balancer provides session affinity; Cloud Run itself does not.

Why this answer

Option B is correct because Cloud Run does not support session affinity natively; to achieve stickiness, you need to set the session affinity feature on the external HTTP(S) load balancer and place Cloud Run behind it. Option A is wrong because increasing concurrency does not affect stickiness. Option C is wrong because even with an internal load balancer, Cloud Run does not support session affinity directly; you need an external load balancer with session affinity.

Option D is wrong because Cloud Run does not support custom health checks with TCP; it only supports HTTP health checks.

Full explanation →

44

MCQmedium

Refer to the exhibit. A company configured an HPA for their deployment. They notice that the HPA is not scaling based on the 'packets-per-second' metric. What is the most likely reason?

A.The metric is not available in the cluster.

B.The metric name 'packets-per-second' is incorrect.

C.The target type should be 'Value' instead of 'AverageValue'.

D.The HPA is using the wrong scaleTargetRef.

AnswerA

Custom metrics must be exposed via an adapter; if not, HPA cannot access it.

Why this answer

The 'packets-per-second' metric is a custom metric. If it is not registered in the cluster's metrics server (e.g., via custom metrics adapter), the HPA will not be able to collect it. The metric name and target type are correct.

The scaleTargetRef matches the deployment. Therefore, the metric being unavailable is the most likely issue.

Full explanation →

45

MCQhard

A developer created a Cloud Function that makes an HTTP request to an external API. The above error occurs intermittently. The external API is working correctly. What is the most likely cause?

A.The request to the external API has incorrect headers or payload

B.The function is not handling network retries properly

C.The Cloud Function is not deployed in the same region as the API

D.The function is timing out due to long response time

AnswerA

An invalid argument error strongly suggests the request parameters are incorrect.

Why this answer

The 'INVALID_ARGUMENT' error indicates the request payload or headers are malformed. Intermittent occurrence suggests a data-dependent issue rather than a permanent config problem.

Full explanation →

46

MCQhard

A company deploys a Java application on Compute Engine with a preemptible VM instance group managed by an instance template. The application writes critical state to local SSD. After a preemption event, the new instance starts fresh and loses state. What is the best practice to ensure state persistence?

A.Modify the startup script to recover state from a snapshot

B.Refactor the application to write state to a persistent service like Cloud Storage

C.Configure the managed instance group as stateful to preserve local SSD data

D.Use a regular (non-preemptible) VM instead of preemptible

AnswerB

This decouples state from the instance, ensuring durability across preemptions.

Why this answer

Option B is correct because local SSD data is ephemeral and lost on VM preemption or termination. Refactoring the application to write critical state to a persistent service like Cloud Storage ensures data durability independent of the VM lifecycle. This aligns with the best practice of designing preemptible workloads to be stateless, where state is stored externally.

Exam trap

Cisco often tests the misconception that local SSD can be made persistent through MIG stateful configuration, but stateful MIGs do not protect against preemption—they only preserve instance name and metadata, not local SSD data on termination.

How to eliminate wrong answers

Option A is wrong because snapshots capture disk state at a point in time, but they are not designed for real-time state recovery; the startup script would need to restore from a snapshot, which adds latency and complexity, and the snapshot itself may be stale if not taken frequently. Option C is wrong because managed instance groups (MIGs) with stateful configuration preserve local SSD data only for specific instances, not for preemptible VMs which are terminated and recreated; stateful MIGs are intended for regular VMs where instance identity is preserved. Option D is wrong because using a non-preemptible VM avoids preemption but increases cost and defeats the purpose of using preemptible VMs for cost savings; the question asks for best practice to ensure state persistence, not to avoid preemption.

Full explanation →

47

MCQmedium

A team is migrating a monolithic application to a microservices architecture on Google Kubernetes Engine (GKE). They want to ensure that failures in one microservice do not cascade to others. Which design pattern should they implement?

A.Implement retry logic with exponential backoff for all inter-service calls.

B.Implement a circuit breaker pattern that opens when failure thresholds are exceeded.

C.Use synchronous HTTP calls with timeouts to detect failures quickly.

D.Use bulkheads to separate thread pools for each service.

AnswerB

Circuit breaker fails fast and prevents unnecessary load on failing services.

Why this answer

The circuit breaker pattern is the correct choice because it prevents cascading failures by monitoring inter-service calls and opening the circuit when failures exceed a threshold, allowing the system to fail fast and recover gracefully. In a GKE-based microservices architecture, this pattern is typically implemented using libraries like Resilience4j or Istio's circuit breaker, which can be configured to trip after a certain number of consecutive failures, thus protecting downstream services from being overwhelmed.

Exam trap

Cisco often tests the distinction between patterns that isolate failures within a component (bulkheads) versus patterns that prevent failures from propagating across components (circuit breaker), leading candidates to confuse the scope of each pattern.

How to eliminate wrong answers

Option A is wrong because retry logic with exponential backoff alone does not prevent cascading failures; it can actually exacerbate them by continuing to send requests to an already failing service, potentially causing resource exhaustion. Option C is wrong because synchronous HTTP calls with timeouts, while useful for detecting failures, do not provide a mechanism to stop repeated calls to a failing service, leading to thread pool starvation and cascading failures. Option D is wrong because bulkheads separate thread pools to isolate failures within a single service instance, but they do not prevent failures from propagating across different microservices in a distributed system.

Full explanation →

48

Multi-Selectmedium

Which THREE are valid uses of Cloud Trace? (Choose three.)

Select 3 answers

A.Identifying latency bottlenecks in a distributed application

B.Monitoring CPU usage of a Compute Engine instance

C.Viewing the flow of requests through microservices

D.Analyzing the performance of external API calls

E.Exporting traces to Prometheus for long-term storage

AnswersA, C, D

Trace shows where time is spent across services.

Why this answer

Cloud Trace is a distributed tracing system that captures latency data from applications, allowing you to identify performance bottlenecks across services. Option A is correct because Cloud Trace provides detailed traces that show the time spent in each component of a distributed application, enabling you to pinpoint where delays occur.

Exam trap

Cisco often tests the distinction between tracing (Cloud Trace) and monitoring (Cloud Monitoring), so candidates mistakenly choose CPU usage monitoring as a valid use of Cloud Trace.

Full explanation →

49

MCQhard

A team uses Cloud Build to deploy applications that need to access a Cloud SQL database in a VPC. They want to avoid exposing the database to the public internet. Which configuration is required?

A.Configure Cloud Build to use a private pool in the same VPC as the database

B.Enable VPC Network Peering between Cloud Build and the database VPC

C.Use Cloud SQL Proxy in a Cloud Build step

D.Use a public IP on Cloud SQL and restrict by IP whitelist

AnswerA

Private pools run inside a VPC, enabling internal access to Cloud SQL.

Why this answer

Cloud Build private pools run in a customer-managed VPC, allowing workers to directly access resources like Cloud SQL instances via private IP without traversing the public internet. This configuration ensures the database is never exposed to the public internet, meeting the security requirement.

Exam trap

Cisco often tests the misconception that VPC peering or Cloud SQL Proxy can replace the need for placing Cloud Build workers inside the same VPC, but private pools are the only native way to run Cloud Build in your own VPC without public internet exposure.

How to eliminate wrong answers

Option B is wrong because VPC Network Peering is used to connect two VPC networks, but Cloud Build does not have its own VPC to peer; private pools are the correct mechanism to place Cloud Build workers inside the customer's VPC. Option C is wrong because Cloud SQL Proxy still requires a public IP or a private IP connection; while it can connect via private IP, it does not eliminate the need for the database to be accessible from the Cloud Build environment, and using a proxy in a Cloud Build step does not inherently avoid public exposure if the database has a public IP. Option D is wrong because using a public IP on Cloud SQL and restricting by IP whitelist still exposes the database to the public internet, albeit with access controls, which violates the requirement to avoid public exposure entirely.

Full explanation →

50

MCQeasy

A web application uses Cloud SQL for MySQL. The team expects a sudden spike in read-only traffic from a reporting tool. What should they use to offload read queries?

A.Automatic storage increase

B.Cross-region replication

C.Read replicas

D.Failover replica

AnswerC

Read replicas allow you to offload read queries from the primary instance, improving performance.

Why this answer

Read replicas in Cloud SQL for MySQL allow you to offload read traffic from the primary instance by creating one or more read-only copies. This is the correct approach for handling a sudden spike in read-only queries from a reporting tool, as it distributes the load without affecting write performance or requiring application changes beyond updating the connection string.

Exam trap

Cisco often tests the distinction between read replicas (for scaling reads) and failover replicas (for high availability), tempting candidates to choose failover replica because it sounds like it can handle traffic, but it cannot serve reads independently in Cloud SQL for MySQL.

How to eliminate wrong answers

Option A is wrong because automatic storage increase only adds disk space when the instance runs low, which does nothing to offload read queries or reduce CPU/memory load from read traffic. Option B is wrong because cross-region replication is designed for disaster recovery and geographic redundancy, not for scaling read capacity; it introduces latency and does not provide a local read endpoint for the reporting tool. Option D is wrong because a failover replica (also called a standby or HA replica) is a synchronous copy used for high availability and automatic failover, not for offloading read queries; it cannot serve read traffic independently in Cloud SQL for MySQL.

Full explanation →

51

MCQhard

A company is migrating a monolithic application to microservices on Google Cloud. They have strict requirements for service-to-service communication: requests must be authenticated, authorized, and encrypted in transit. They also need to enforce fine-grained access control based on the requesting service identity. Which Google Cloud service should they use to achieve these goals?

A.Cloud Run with ingress control

B.Cloud Armor with IAM

C.Cloud Service Mesh (Anthos Service Mesh)

D.Cloud Endpoints with API keys

AnswerC

Cloud Service Mesh provides mutual TLS and policy-based access control for microservices.

Why this answer

Option C is correct because Cloud Service Mesh (Anthos Service Mesh) provides mutual TLS, fine-grained authorization policies, and service identity. Option A is for API management, option B is for DDoS protection, and option D is for serverless ingress control.

Full explanation →

52

MCQeasy

Refer to the exhibit. You run the above command to build and push a Docker image to Container Registry. The build fails with an error: 'denied: Unauthenticated access'. What should you do to resolve this?

A.Grant the Cloud Build service account the Storage Object Admin role on the project

B.Grant the Cloud Build service account the Project Editor role

C.Grant the Compute Engine default service account the Storage Object Creator role

D.Run gcloud auth login as the project owner before submitting the build

AnswerA

This allows push to Container Registry, which is backed by Cloud Storage.

Why this answer

The error 'denied: Unauthenticated access' indicates that the Cloud Build service account does not have permission to push images to Container Registry. By default, Cloud Build uses the Cloud Build service account (service-[PROJECT_NUMBER]@cloudbuild.gserviceaccount.com) to execute builds. Granting the Storage Object Admin role (roles/storage.admin) to this service account provides the necessary permissions to write objects (Docker image layers) to the Container Registry bucket in Cloud Storage, resolving the authentication failure.

Exam trap

Cisco often tests the distinction between the Cloud Build service account and the Compute Engine default service account, leading candidates to incorrectly choose Option C because they confuse the service account used by Cloud Build with the one used by Compute Engine instances.

How to eliminate wrong answers

Option B is wrong because granting the Project Editor role (roles/editor) is overly permissive and violates the principle of least privilege; it includes many unnecessary permissions beyond what is required for pushing images. Option C is wrong because the Compute Engine default service account is not used by Cloud Build; Cloud Build uses its own dedicated service account, and granting roles to the Compute Engine default service account would not resolve the build's authentication error. Option D is wrong because 'gcloud auth login' authenticates the user running the command, not the Cloud Build service account; the build runs in a non-interactive environment and relies on the service account's credentials, not the user's OAuth tokens.

Full explanation →

53

MCQmedium

An application writes structured logs to Cloud Logging. The team wants to create a metric based on the value of a JSON field 'order_total' to alert when totals exceed $1000. What type of metric should they use?

A.Uptime check metric.

B.Log-based metric.

C.Error Reporting metric.

D.Custom metric from Cloud Monitoring agent.

AnswerB

Extracts 'order_total' from logs and creates a metric.

Why this answer

A log-based metric extracts a numeric value from a log entry's JSON payload using a regular expression or a label extractor. By defining a log-based metric on the 'order_total' field and setting an alert threshold of $1000, the team can monitor and alert on high-value orders directly from Cloud Logging without additional instrumentation.

Exam trap

Cisco often tests the distinction between log-based metrics and custom metrics from agents, where candidates mistakenly think a custom metric agent is required to extract values from logs, but Cloud Logging's built-in log-based metrics handle this directly without any agent.

How to eliminate wrong answers

Option A is wrong because uptime check metrics monitor the availability and response time of a URL or service, not the value of a field in structured logs. Option C is wrong because Error Reporting metrics are designed to count and group application errors (e.g., exceptions, stack traces), not to extract arbitrary numeric fields like 'order_total'. Option D is wrong because custom metrics from the Cloud Monitoring agent require installing and configuring the agent on a VM to collect system-level metrics (e.g., CPU, memory), not to parse log entries.

Full explanation →

54

MCQmedium

A company is deploying a containerized application on Google Kubernetes Engine (GKE). The development team has built a Docker image and pushed it to Artifact Registry. They want to automate the deployment process so that whenever a new image is pushed to the registry, the application is automatically updated in the GKE cluster. Which combination of services should they use to achieve this?

A.Use Cloud Deploy to create a delivery pipeline that watches the Artifact Registry and promotes the image to GKE.

B.Set up a Cloud Build trigger that monitors the Artifact Registry and runs a build step to update the GKE deployment using kubectl.

C.Schedule a Cloud Scheduler job that periodically checks for new images in Artifact Registry and updates the GKE deployment.

D.Configure a Cloud Run service that is automatically deployed when a new image is pushed to Artifact Registry.

AnswerB

Correct: Cloud Build can be triggered by an Artifact Registry push and execute kubectl commands to update the deployment.

Why this answer

Option B is correct because Cloud Build can be configured with a trigger that monitors Artifact Registry for new image pushes. When a new image is pushed, the trigger executes a build step that uses kubectl to update the GKE deployment, enabling continuous deployment without manual intervention.

Exam trap

The trap here is confusing Cloud Deploy's pipeline capabilities with event-driven triggers, leading candidates to choose Option A, but Cloud Deploy requires an explicit trigger (like a Cloud Build invocation) and does not directly watch Artifact Registry for image pushes.

How to eliminate wrong answers

Option A is wrong because Cloud Deploy does not natively watch Artifact Registry for image pushes; it is designed for managing delivery pipelines with Skaffold and requires explicit triggers or integration with Cloud Build. Option C is wrong because Cloud Scheduler is a cron-based job scheduler that does not react to events in real time; it would introduce latency and inefficiency by polling, and it lacks native integration to detect new images. Option D is wrong because Cloud Run is a serverless compute platform for stateless containers, not a deployment automation service for GKE; it cannot update a GKE cluster's deployment.

Full explanation →

55

MCQmedium

A company runs a batch job daily that processes large files from Cloud Storage and stores results in BigQuery. The job requires significant compute for about 10 minutes and is fault-tolerant. Which compute option is most cost-effective?

A.Cloud Run Jobs

B.Always-on Compute Engine VM

C.GKE cluster with a single node

D.Preemptible VM

E.Cloud Functions (9-minute timeout)

AnswerD

Low cost, suitable for fault-tolerant and short-lived workloads.

Why this answer

Option D is correct because Preemptible VMs offer the same compute capacity as regular VMs at a significantly lower cost (up to 80% discount), and since the batch job runs for only 10 minutes daily and is fault-tolerant, it can handle the occasional preemption without data loss. The job's short duration and fault tolerance make preemptible instances ideal, as they can be restarted if terminated.

Exam trap

Cisco often tests the misconception that Cloud Functions or Cloud Run Jobs are always the cheapest serverless options, but the trap here is that the 9-minute timeout of Cloud Functions disqualifies it, and candidates overlook the cost savings of preemptible VMs for fault-tolerant, short-duration batch jobs.

How to eliminate wrong answers

Option A is wrong because Cloud Run Jobs have a maximum timeout of 60 minutes, which is sufficient, but they are designed for stateless containers and may incur higher costs per vCPU-hour compared to preemptible VMs for sustained batch processing. Option B is wrong because an always-on Compute Engine VM incurs costs 24/7, even when the job is not running, making it far more expensive than a preemptible VM that only runs for 10 minutes daily. Option C is wrong because a GKE cluster with a single node introduces unnecessary orchestration overhead and cost (including cluster management fees) for a simple batch job that does not require container orchestration.

Option E is wrong because Cloud Functions has a 9-minute timeout, which is insufficient for a job requiring 10 minutes of compute, and it is not designed for long-running batch processing.

Full explanation →

56

MCQmedium

A team is migrating a monolithic application to microservices on Google Kubernetes Engine (GKE). They want to ensure that if one microservice fails, it does not cascade to other services. Which design pattern should they implement?

A.Circuit Breaker pattern

B.Event-driven architecture

C.Retry with exponential backoff

D.Bulkhead pattern

AnswerA

Circuit Breaker pattern prevents cascading failures by opening the circuit when failures exceed a threshold.

Why this answer

The Circuit Breaker pattern is correct because it prevents cascading failures by monitoring for failures in a downstream microservice and, once a threshold is exceeded, immediately failing requests to that service without attempting the call. In GKE, this can be implemented using tools like Istio or Envoy sidecar proxies, which can be configured with circuit breaker settings to stop traffic to unhealthy pods, allowing the system to recover gracefully.

Exam trap

Cisco often tests the distinction between patterns that prevent cascading failures (Circuit Breaker) versus patterns that handle transient failures (Retry) or isolate resources (Bulkhead), so candidates mistakenly choose Retry or Bulkhead because they sound like they prevent failure spread, but they do not provide the fail-fast mechanism that stops the cascade.

How to eliminate wrong answers

Option B (Event-driven architecture) is wrong because it describes a communication style where services produce and consume events asynchronously, but it does not inherently provide failure isolation or prevent cascading failures; it can actually increase complexity in failure handling. Option C (Retry with exponential backoff) is wrong because it is a technique for handling transient failures by retrying with increasing delays, but it does not stop cascading failures; in fact, retrying a failing service can exacerbate the problem by adding load. Option D (Bulkhead pattern) is wrong because it isolates resources (e.g., thread pools or connections) per service or component to prevent a failure in one from exhausting shared resources, but it does not directly stop a failing service from being called; it limits blast radius but does not provide the fail-fast behavior of a circuit breaker.

Full explanation →

57

MCQeasy

A team is using Cloud Monitoring to set up an alerting policy for a Compute Engine instance that runs a web server. The team wants to be notified if the instance's CPU utilization exceeds 80% for 5 minutes. Which threshold type should they use?

A.Ratio threshold

B.Metric threshold

C.MQL (Monitoring Query Language)

D.Forecast threshold

AnswerB

Correct: a metric threshold directly checks if a metric exceeds a set value over a duration.

Why this answer

Option B is correct because a metric threshold alerting policy directly monitors a numeric metric (e.g., CPU utilization) and triggers when the value exceeds a defined threshold (80%) for a specified duration (5 minutes). This is the standard approach for simple threshold-based alerts on a single metric in Cloud Monitoring.

Exam trap

Cisco often tests the distinction between simple metric thresholds and more advanced options like MQL or forecast thresholds, tempting candidates to overcomplicate the solution when a basic metric threshold is sufficient.

How to eliminate wrong answers

Option A is wrong because a ratio threshold is used for comparing two metrics (e.g., errors per request), not for a single metric like CPU utilization. Option C is wrong because MQL is a powerful query language for complex, multi-metric or time-shifted analysis, but it is overkill and unnecessary for a simple static threshold on one metric. Option D is wrong because a forecast threshold predicts future metric values based on historical trends, not for detecting current or recent breaches of a fixed threshold.

Full explanation →

58

MCQhard

A developer is writing unit tests for a Python Cloud Run service that uses Cloud Firestore. They want to avoid hitting the real Firestore during tests. What should they use?

A.Use a real Firestore database but with a test project.

B.Mock the Firestore client using a library like unittest.mock.

C.Disable network access during tests.

D.Use the Firestore emulator for unit tests.

AnswerB

Mocking isolates the unit of code from external services.

Why this answer

Option B is correct because unit tests should isolate the code under test from external dependencies. Using `unittest.mock` to mock the Firestore client allows the developer to simulate Firestore calls and return controlled responses without any network I/O, ensuring tests are fast, deterministic, and independent of the real Firestore service.

Exam trap

The trap here is that candidates often confuse the Firestore emulator (a local integration testing tool) with a proper unit testing mock, leading them to choose option D even though the emulator is not suitable for isolated unit tests.

How to eliminate wrong answers

Option A is wrong because using a real Firestore database, even in a test project, still incurs network latency, potential costs, and dependency on the Firestore service being available, which violates the principle of unit test isolation. Option C is wrong because disabling network access during tests does not automatically prevent the Firestore client from attempting to connect; it would likely cause connection errors rather than gracefully simulating Firestore behavior. Option D is wrong because the Firestore emulator is intended for integration tests or end-to-end testing, not for pure unit tests; it still requires running a local emulator process and introduces external state management that unit tests should avoid.

Full explanation →

59

MCQeasy

An application deployed on Google Kubernetes Engine (GKE) is experiencing intermittent high latency. The operations team wants to quickly identify which specific code path is causing the delay. What should they use?

A.Enable Cloud Trace and analyze trace spans.

B.Use Cloud Profiler to identify memory leaks.

C.Set up a Cloud Monitoring uptime check.

D.Review Cloud Logging logs to find error messages.

AnswerA

Cloud Trace captures request spans and shows time spent in each component.

Why this answer

Cloud Trace is designed specifically for latency analysis in distributed systems like GKE. It captures end-to-end request latency and breaks it down into individual spans, each representing a specific code path or service call. By analyzing these spans, the operations team can pinpoint which exact code path (e.g., a database query, external API call, or internal function) is causing the intermittent high latency.

Exam trap

Cisco often tests the distinction between tools that measure latency (Cloud Trace) versus tools that measure resource utilization (Cloud Profiler) or availability (Cloud Monitoring uptime checks), leading candidates to confuse profiling with tracing.

How to eliminate wrong answers

Option B is wrong because Cloud Profiler identifies performance bottlenecks related to CPU and memory usage (e.g., memory leaks, hot functions), not intermittent latency caused by specific code paths. Option C is wrong because a Cloud Monitoring uptime check only verifies that the application is reachable and responding within a configured timeout; it does not provide granular latency breakdowns per code path. Option D is wrong because reviewing Cloud Logging logs for error messages would only surface failures or exceptions, not the normal but slow execution paths that cause intermittent high latency.

Full explanation →

60

MCQhard

During a Cloud Build run, a developer sees the error: "Step #0: error: failed to fetch metadata: connection refused". The build is trying to access a private Docker registry in a different project. What is the most likely cause?

A.The registry does not exist

B.The build environment cannot reach the registry due to network restrictions

C.The build service account lacks IAM permissions to the registry

D.The build is using a public pool with no access to internal networks

AnswerB

Connection refused typically means the target is actively refusing the connection, often due to firewalls or VPC Service Controls preventing access.

Why this answer

The error message indicates a network connectivity issue, not authentication. The most common cause is that the build environment cannot reach the registry due to VPC Service Controls, firewall rules, or the registry being in a different network. Authentication errors typically show "denied" or "unauthorized".

Full explanation →

61

MCQhard

An organization runs a critical application on Compute Engine with a regional managed instance group. They want to achieve 99.99% availability. Which architecture should they use?

A.Regional MIG with instances in two zones

B.Single zone MIG with multiple instances

C.Regional MIG with instances in three zones

D.Multi-region deployment with global load balancer

AnswerC

Three zones provide higher availability within a region.

Why this answer

To achieve 99.99% availability, the architecture must tolerate both a zonal failure and a single instance failure. A regional managed instance group (MIG) with instances in three zones ensures that even if one zone becomes unavailable, the remaining two zones can still serve traffic, meeting the 99.99% uptime target. Three zones provide the necessary redundancy because a two-zone regional MIG can only survive a single zone failure but not a simultaneous instance failure in the remaining zone, whereas three zones allow for a rolling update or failure of one zone while still maintaining quorum.

Exam trap

Cisco often tests the misconception that two zones are sufficient for 99.99% availability, but the trap here is that two zones only provide 99.9% availability because they cannot tolerate a simultaneous instance failure in the remaining zone during a zonal outage or maintenance event.

How to eliminate wrong answers

Option A is wrong because a regional MIG with instances in only two zones can survive a single zone failure, but if an instance in the remaining zone fails or a rolling update is performed, the application may drop below the required capacity, failing to achieve 99.99% availability. Option B is wrong because a single zone MIG with multiple instances cannot survive a zonal outage; if the entire zone fails, all instances are lost, making 99.99% availability impossible. Option D is wrong because while a multi-region deployment with a global load balancer can provide even higher availability, the question specifically asks for an architecture using Compute Engine with a regional managed instance group, and a multi-region deployment is not a regional MIG architecture; it introduces cross-region latency and complexity not required for the stated 99.99% target.

Full explanation →

62

Multi-Selectmedium

A team is designing a cloud-native application that must be highly available and resilient to zone failures. Which three practices should they follow? (Choose three.)

Select 3 answers

A.Use a single Load Balancer with multiple backends.

B.Deploy resources across multiple zones.

C.Use zonal managed instance groups with 100% target utilization.

D.Store data in regional persistent disks.

E.Implement health checks and autohealing.

AnswersB, D, E

Distributing instances across zones protects against zone-level failures.

Why this answer

Option B is correct because deploying resources across multiple zones ensures that the application remains available even if an entire zone fails. In Google Cloud, zones are independent failure domains, and distributing workloads across them is a fundamental pattern for achieving high availability and resilience to zone-level outages.

Exam trap

The trap here is that candidates may think a single load balancer is sufficient for high availability, but in cloud-native design, the load balancer itself is a managed service that is inherently resilient, while the real risk is having backends in only one zone or no spare capacity to absorb failures.

Full explanation →

63

Multi-Selectmedium

A team is deploying a new version of an application on GKE using a rolling update. They want to ensure that the update proceeds only if the new pods are healthy. Which two steps should they include? (Choose two.)

Select 2 answers

A.Set the minReadySeconds field in the deployment.

B.Define a readiness probe for the container.

C.Define a liveness probe for the container.

D.Set the revisionHistoryLimit to 10.

E.Use a postStart lifecycle hook to test health.

AnswersA, B

minReadySeconds ensures the pod is ready for that duration before being considered available.

Why this answer

Option A is correct because setting `minReadySeconds` in a Deployment ensures that a newly created Pod is considered ready only after it has been stable for that duration, preventing the rolling update from proceeding if the Pod fails shortly after startup. Option B is correct because a readiness probe determines whether a Pod is ready to serve traffic; during a rolling update, the Deployment controller waits for the new Pod's readiness probe to succeed before scaling down old Pods, ensuring the update only continues when new Pods are healthy.

Exam trap

Cisco often tests the distinction between readiness and liveness probes, and the trap here is that candidates confuse liveness probes (which restart containers) with readiness probes (which control traffic and rolling update progression), leading them to incorrectly select a liveness probe as a health gate for the update.

Full explanation →

64

Multi-Selecthard

A company wants to automate the deployment of a microservice application to Cloud Run using Cloud Build. They want to ensure zero-downtime deployments and traffic migration. Which three features should they utilize? (Choose three.)

Select 3 answers

A.Cloud Build triggers to build and deploy on code changes.

B.Cloud Run min and max instance settings.

C.Cloud Run managed continuous deployment from a repository.

D.Cloud Run gradual rollout with --no-traffic flag.

E.Cloud Run revision traffic splitting.

AnswersA, D, E

Triggers automate the build and deploy pipeline on code changes.

Why this answer

Cloud Build triggers automate building and deploying, Cloud Run traffic splitting enables gradual rollout, and deploying with --no-traffic allows creating a new revision without serving traffic, then shifting traffic gradually.

Full explanation →

65

Multi-Selectmedium

A company is designing a highly available application on Google Cloud using multiple regions. Which TWO strategies should they implement to achieve this?

Select 2 answers

A.Use zonal persistent disks for stateful data.

B.Use a global load balancer to distribute traffic across regions.

C.Deploy a single instance group in one region for simplicity.

D.Configure managed instance groups in multiple regions.

E.Store all data in a single Cloud Storage bucket.

AnswersB, D

Global load balancers route traffic to the closest healthy backend, enabling multi-region high availability.

Why this answer

Option B is correct because a global load balancer (e.g., Google Cloud External HTTPS Load Balancer) can distribute traffic across multiple regions, providing cross-region failover and low-latency routing. This is a fundamental pattern for multi-region high availability, as it allows traffic to be directed to healthy backends in any region, even if an entire region fails.

Exam trap

The trap here is that candidates often confuse zonal resources (like persistent disks) with regional or multi-regional resources, or they assume that a single-region deployment with a load balancer is sufficient for high availability, ignoring the need for geographic redundancy.

Full explanation →

66

MCQmedium

Refer to the exhibit. A developer runs the command and sees that the Cloud Run service is publicly accessible. The security team requires that only authenticated requests from a specific service account in the same project are allowed. What should the developer do to modify the IAM policy?

A.Add a new binding with the service account as the only member of roles/run.invoker

B.Update the IAM policy to remove the allUsers member from the roles/run.invoker binding

C.Change the service's ingress settings to "Internal and Cloud Load Balancing"

D.Remove the roles/run.viewer binding and add the service account to roles/run.invoker

AnswerB

Removing allUsers revokes public access. Then ensure the service account has invoker role.

Why this answer

Option B is correct because removing the allUsers member from the invoker role revokes public access. The service account already has the viewer role, but needs invoker to actually invoke the service. Option A changes ingress, which is not necessary.

Option C removes the viewer role, which is not needed. Option D adds a binding but does not remove allUsers.

Full explanation →

67

MCQmedium

Your company is deploying a web application on Cloud Run using a continuous deployment pipeline from Cloud Build. The application is built as a Docker container and pushed to Container Registry. The Cloud Run service is configured with the '--no-allow-unauthenticated' flag. You have set up Cloud Build triggers to build and deploy on commits to the main branch. The deployment works correctly for the first few commits, but after adding a new environment variable in the Cloud Build configuration file (cloudbuild.yaml), the deployment fails with an error that the Cloud Run service cannot be updated because the new revision fails health checks. The application code has not changed. What is the most likely cause?

A.The Cloud Build service account does not have permission to update the Cloud Run service.

B.The new environment variable exceeds the maximum size limit for environment variables in Cloud Run.

C.The health check configuration in the Cloud Run service was overwritten by the new deployment.

D.The new environment variable causes the application to fail its startup or health check.

AnswerD

A misconfigured environment variable can cause the app to crash.

Why this answer

Option D is correct because the application code has not changed, yet the deployment fails health checks immediately after adding a new environment variable. This indicates that the application is likely reading that variable at startup and crashing or failing its readiness probe due to an invalid value, missing dependency, or misconfiguration. Cloud Run requires the new revision to pass health checks (e.g., HTTP GET on the configured port) before it can serve traffic; if the variable causes the app to exit or hang, the revision is considered unhealthy and the update is rejected.

Exam trap

Cisco often tests the misconception that environment variables are harmless metadata and cannot cause deployment failures, when in fact they can break application startup logic or health check responses.

How to eliminate wrong answers

Option A is wrong because the Cloud Build service account already successfully deployed the first few revisions, so permissions are not the issue. Option B is wrong because Cloud Run environment variables have a total size limit of 64 KB for all variables combined, and a single new variable is extremely unlikely to exceed that. Option C is wrong because Cloud Run health check configuration (startup, liveness, readiness probes) is defined in the service YAML or via gcloud flags and is not overwritten by adding an environment variable in cloudbuild.yaml; the health check settings remain unchanged.

Full explanation →

68

MCQmedium

Refer to the exhibit. You have the above cloudbuild.yaml file. The build succeeds but the call to the function fails with a permission error. What is the most likely cause?

A.The function is using the wrong trigger type

B.The runtime 'nodejs16' is not supported

C.The '--allow-unauthenticated' flag is not allowed in Cloud Build

D.The function call is occurring before the deployment is fully complete, and the function is not yet ready to serve requests

AnswerD

The function may still be provisioning; add a sleep or check status.

Why this answer

The most likely cause is that the Cloud Build step deploys the function, but the subsequent test call occurs before the function's HTTP endpoint is fully provisioned and serving requests. Cloud Functions deployment is asynchronous; after the `gcloud functions deploy` command returns, the function may still be in a 'DEPLOYING' or 'ACTIVE' state but not yet ready to handle traffic. A permission error in this context typically arises because the function's IAM policy (e.g., `--allow-unauthenticated`) is applied only after the deployment completes, and the function's runtime endpoint may return a 403 until fully ready.

Exam trap

Cisco often tests the misconception that a successful `gcloud functions deploy` output means the function is immediately ready to serve requests, when in reality the deployment is asynchronous and the function may not be fully operational for several seconds.

How to eliminate wrong answers

Option A is wrong because the trigger type (HTTP trigger via `--trigger-http`) is correctly specified for a function that is called via HTTP; a permission error is unrelated to trigger type. Option B is wrong because `nodejs16` is a supported runtime in Cloud Functions (deprecated but still functional during the transition period), and a runtime error would manifest as a build failure, not a permission error. Option C is wrong because `--allow-unauthenticated` is a valid flag in `gcloud functions deploy` and is allowed in Cloud Build; it grants allUsers the `roles/cloudfunctions.invoker` role, and its absence would cause a permission error, but the flag itself is not disallowed.

Full explanation →

69

MCQeasy

A team uses Cloud Build to deploy a containerized application to Cloud Run. The build step fails intermittently with the error 'Failed to trigger build: Build timed out'. What is the most likely cause?

A.The build exceeds the default Cloud Build timeout.

B.The build machine has insufficient memory.

C.The Dockerfile contains invalid syntax.

D.The Cloud Build service account lacks permissions to deploy to Cloud Run.

AnswerA

Default timeout is 10 minutes; exceeding it causes build timeout.

Why this answer

The error 'Failed to trigger build: Build timed out' indicates that the Cloud Build execution exceeded the maximum allowed duration. By default, Cloud Build has a timeout of 10 minutes for build steps. If the build process (e.g., pulling dependencies, building the container image) takes longer than this default timeout, the build is automatically terminated, resulting in this intermittent failure.

Increasing the timeout in the build configuration or using a larger machine type can resolve this.

Exam trap

Cisco often tests the distinction between timeout errors and resource or permission errors, so candidates mistakenly attribute a timeout to insufficient memory or permissions when the error message explicitly points to duration limits.

How to eliminate wrong answers

Option B is wrong because insufficient memory on the build machine would typically cause an out-of-memory (OOM) error or a build failure with a different message, not a timeout error. Option C is wrong because invalid Dockerfile syntax would cause a build failure during the Docker build step with a syntax error message, not a timeout. Option D is wrong because a lack of permissions for the Cloud Build service account to deploy to Cloud Run would result in a permission denied or authorization error, not a build timeout.

Full explanation →

70

MCQeasy

A developer needs to store configuration parameters for a Cloud Run service, such as database connection strings and API keys. The values must be encrypted at rest and in transit. Which service should be used?

A.Cloud SQL

B.Cloud Storage

C.Firestore

D.Secret Manager

AnswerD

Secret Manager is a secure and convenient storage system for secrets.

Why this answer

Option B is correct because Secret Manager is designed for storing secrets with encryption at rest and in transit. Option A is for object storage, option C is a database, and option D is a relational database.

Full explanation →

71

MCQeasy

An application running on Cloud Run experiences cold starts causing latency spikes. What is the most cost-effective solution to reduce cold starts?

A.Set a minimum number of instances

B.Increase the container's CPU allocation

C.Enable HTTP keep-alive connections

D.Use a larger container memory size

AnswerA

Minimum instances keep the specified number of instances always warm, eliminating cold starts for those instances.

Why this answer

Setting a minimum number of instances ensures that Cloud Run always keeps at least one instance warm (idle) to serve incoming requests instantly, eliminating cold start latency. This is the most cost-effective solution because you only pay for the minimum instances when they are idle (at a reduced rate), whereas other options increase per-request cost or do not address the root cause of cold starts.

Exam trap

Cisco often tests the misconception that scaling resources (CPU or memory) or optimizing network connections can eliminate cold starts, but the only way to prevent cold starts is to keep instances warm, which is achieved by setting a minimum number of instances.

How to eliminate wrong answers

Option B is wrong because increasing CPU allocation does not prevent cold starts; it only speeds up request processing after the instance is already running, and it increases cost per instance without keeping instances warm. Option C is wrong because HTTP keep-alive connections reduce latency for subsequent requests over the same connection but do not eliminate the initial cold start when a new instance is created. Option D is wrong because larger memory size does not prevent cold starts; it may even increase cold start time due to longer container initialization, and it raises the cost per instance without guaranteeing a warm instance.

Full explanation →

72

MCQmedium

A team is deploying a microservices application on Google Kubernetes Engine (GKE). They want to ensure that if a pod fails, Kubernetes automatically replaces it and maintains the desired number of replicas. Which Kubernetes resource should they use?

A.StatefulSet

B.Deployment

C.Job

D.DaemonSet

AnswerB

A Deployment provides declarative updates for pods and ReplicaSets. It ensures that the desired number of pods are running and replaces failed pods automatically.

Why this answer

A Deployment is the correct Kubernetes resource for managing stateless microservices that require automatic pod replacement to maintain a desired replica count. It uses a ReplicaSet to ensure the specified number of pod replicas are running, and if a pod fails, the ReplicaSet controller immediately creates a new pod to restore the desired state.

Exam trap

Cisco often tests the distinction between stateless and stateful workloads, where candidates mistakenly choose StatefulSet for any application that needs high availability, overlooking that Deployments are the standard for stateless microservices with automatic replacement.

How to eliminate wrong answers

Option A is wrong because StatefulSet is designed for stateful applications that require stable network identities and persistent storage; it does not automatically replace pods in the same way as a Deployment for stateless workloads, and its pod replacement behavior is ordered and graceful, not immediate. Option C is wrong because a Job is used for batch or one-time tasks that run to completion, not for maintaining a desired number of continuously running replicas. Option D is wrong because a DaemonSet ensures that a copy of a pod runs on every node (or a subset of nodes) in the cluster, which is used for node-level services like logging or monitoring, not for maintaining a specific replica count across the cluster.

Full explanation →

73

MCQhard

A company uses Cloud Run for a serverless application that processes user uploads. Users report that sometimes the first request after a period of inactivity takes very long (cold start). The application is stateless. They want to minimize cold start latency while keeping costs low. The application is deployed with default settings: min instances = 0, max instances = 100, CPU always off, and a container image of 1GB. What should they do to reduce cold start latency?

A.Set min instances to 1 to keep a warm instance.

B.Increase container memory from the default to reduce startup time.

C.Use a larger container image to include more dependencies.

D.Enable CPU always on allocation.

AnswerA

Keeping a minimum number of instances eliminates cold starts.

Why this answer

Setting min instances to 1 ensures that at least one instance is always warm and ready to serve requests, eliminating the cold start for the first request after a period of inactivity. Since the application is stateless and the default min instances is 0, Cloud Run scales down to zero, causing a cold start on the next request. By keeping one instance warm, you minimize latency without significantly increasing cost, as you only pay for the single idle instance.

Exam trap

Cisco often tests the misconception that increasing resources (memory or CPU) or enabling CPU always on reduces cold start latency, when in fact the root cause is the instance being scaled to zero and the solution is to keep at least one instance warm via min instances.

How to eliminate wrong answers

Option B is wrong because increasing container memory does not reduce startup time; it only affects the CPU and memory resources available during execution, not the time to initialize the container. Option C is wrong because using a larger container image increases the download and extraction time during cold start, worsening the latency problem. Option D is wrong because enabling CPU always on allocation keeps the CPU active even when the instance is idle, which increases cost without addressing the cold start issue—the instance still scales to zero if min instances is 0.

Full explanation →

74

MCQeasy

A web application hosted on Compute Engine is experiencing slow response times during peak hours. Which Cloud Monitoring metric should be examined first to identify the bottleneck?

A.CPU utilization of backend instances

B.Number of incoming requests per second

C.Memory usage of backend instances

D.95th percentile request latency measured by Cloud Load Balancing

AnswerD

This metric directly measures user-facing response time, and a high latency indicates a performance issue that needs investigation.

Why this answer

The 95th percentile request latency measured by Cloud Load Balancing is the most direct indicator of user-perceived performance degradation. High latency at the load balancer level captures the end-to-end response time, including network, backend processing, and queuing delays, making it the first metric to examine when diagnosing slow response times during peak hours.

Exam trap

Cisco often tests the distinction between resource utilization metrics (CPU, memory) and performance metrics (latency), trapping candidates who assume high CPU or memory is always the root cause of slow response times, when in fact latency metrics provide the direct measure of user experience.

How to eliminate wrong answers

Option A is wrong because CPU utilization alone does not capture network latency, queuing delays, or application-level bottlenecks; a backend can have low CPU but still be slow due to I/O waits or database contention. Option B is wrong because the number of incoming requests per second measures throughput, not latency; high request volume can cause slowdowns, but latency is the direct symptom of the bottleneck. Option C is wrong because memory usage is a resource metric that may indicate swapping or OOM risks, but it is not the primary indicator of response time issues; a system can have ample memory yet still experience high latency due to other factors.

Full explanation →

75

MCQmedium

A company is migrating a monolithic application to a microservices architecture on Google Cloud. They want to decouple services and ensure that a failure in one service does not impact others. Which pattern should they implement?

A.Implement caching with Memorystore

B.Increase the number of instances of each service

C.Use synchronous HTTP calls with retries

D.Implement circuit breaker pattern using a service mesh like Istio

AnswerD

Circuit breaker trips on failures, isolating the fault.

Why this answer

The circuit breaker pattern, implemented via a service mesh like Istio, is the correct approach because it prevents cascading failures by monitoring service health and stopping requests to a failing service until it recovers. Istio's Envoy sidecar proxies enforce circuit breaking at the network layer, allowing the system to degrade gracefully without impacting other services.

Exam trap

Cisco often tests the misconception that scaling instances (Option B) or adding caching (Option A) is sufficient for fault isolation, but these patterns address performance and availability, not decoupling or failure containment.

How to eliminate wrong answers

Option A is wrong because caching with Memorystore improves read performance and reduces latency but does not decouple services or prevent failure propagation; a failing service still receives requests. Option B is wrong because increasing instance count improves scalability and fault tolerance through redundancy but does not isolate failures—a failing service can still overwhelm downstream services or cause cascading issues. Option C is wrong because synchronous HTTP calls with retries increase coupling and can exacerbate failures by causing retry storms, overwhelming already failing services and violating the goal of decoupling.

Full explanation →

Page 1 of 7

All pages

Practice PCD by domain

Target a specific domain to shore up weak areas.

Designing highly scalable, available, and reliable cloud-native applications Building and testing applications Deploying applications Integrating Google Cloud services Managing application performance monitoring

See all domains with question counts →