Knowledge + Practice

Google Professional Cloud DevOps Engineer (PCDOE) — Questions 826–900

987 questions total · 14pages · All types, answers revealed

Take a mock exam Exam hub

Page 12 of 14

826

MCQhard

A company has purchased Compute Engine committed use discounts (CUD) for 1 year for vCPU and memory. After 3 months, they need to upgrade some VMs to a larger machine type. What happens to the CUD coverage?

A.The CUD is automatically adjusted to cover the new machine type

B.The CUD continues to apply to the original resource, and any additional usage is charged at on-demand rates

C.The CUD applies only to the specific machine types in the commitment, so upgrades are not covered

D.The CUD is voided and a refund is issued

AnswerB

CUD covers the committed vCPU and memory; if you upgrade, the CUD still applies to the original amount, and extra usage is on-demand.

Why this answer

Option C is correct because CUDs apply to resource usage (vCPU, memory) not specific machine types. The original commitment continues to cover usage up to the committed amount; any additional usage is on-demand. Option A is incorrect because CUDs are not voided.

Option B is incorrect as CUDs are not auto-adjusted. Option D is partially true but A is more accurate. However, the key point is that CUDs cover the resources, so upgrades are covered as long as the resource types match.

Full explanation →

827

MCQhard

Based on the log entry, what is the most likely cause of the 404 error?

A.The user does not have permission to invoke the service.

B.The revision is not configured with the correct container port.

C.The Cloud Run service is not autoscaling properly, causing requests to be dropped.

D.The service has run out of memory.

AnswerB

A 404 often means the container is listening on a different port than what Cloud Run expects.

Why this answer

A 404 error on Cloud Run typically indicates that the request reached the service but no container is listening on the configured port. If the revision's container port does not match the port the application is actually serving on (e.g., the app listens on 8080 but the revision is configured for 3000), Cloud Run's HTTP ingress will fail to route traffic, resulting in a 404. This is the most likely cause because the error is not a permission or resource issue, but a routing mismatch at the container level.

Exam trap

Google Cloud often tests the distinction between HTTP status codes (404 vs 403 vs 503 vs 500) and their root causes in serverless environments, trapping candidates who confuse permission errors with routing misconfigurations.

How to eliminate wrong answers

Option A is wrong because a 403 Forbidden error, not a 404, would occur if the user lacks permission to invoke the service (IAM permissions control invocation, not routing). Option C is wrong because autoscaling issues typically cause 503 Service Unavailable or 429 Too Many Requests errors, not 404s; a 404 indicates the service exists but the endpoint is not reachable. Option D is wrong because running out of memory would cause the container to crash or return a 500 Internal Server Error, not a 404; memory limits affect container health, not HTTP routing.

Full explanation →

828

Multi-Selecthard

A company is migrating on-premises PostgreSQL databases to Cloud SQL. They need to minimize downtime and ensure data consistency. Which THREE steps should they follow? (Choose 3)

Select 3 answers

A.Use Database Migration Service to set up continuous replication from the on-premises database.

B.Export the on-premises database to a dump file and import it into Cloud SQL.

C.Manually sync data using a custom script with pg_dump and pg_restore.

D.After replication is caught up, promote the Cloud SQL instance to make it the primary.

E.Create a Cloud SQL for PostgreSQL instance with the same version as the source.

AnswersA, D, E

Correct. DMS supports homogeneous PostgreSQL migrations with minimal downtime.

Why this answer

A typical migration uses Database Migration Service (DMS) with continuous replication for minimal downtime. Setting up replication and then promoting the Cloud SQL instance is standard.

Full explanation →

829

MCQeasy

Refer to the exhibit. What does the alert condition indicate?

A.It alerts when the request count drops below 1000 for 1 minute.

B.It alerts for any Cloud Run revision that has more than 1000 requests in a 1-minute window.

C.It alerts when the average request count across all revisions exceeds 1000 over 1 minute.

D.It alerts when the total request count across all revisions exceeds 1000 per minute.

AnswerB

For each revision, if its request count exceeds 1000 for at least 1 minute, alert fires.

Why this answer

The alert condition in the exhibit uses a per-revision metric (e.g., `run.googleapis.com/request_count`) with a threshold of 1000 and a 1-minute window. This means the alert fires for any individual Cloud Run revision that exceeds 1000 requests within that window, not for the aggregate across all revisions. Option B correctly identifies this per-revision behavior.

Exam trap

Google Cloud often tests the distinction between per-resource and aggregate metrics, so the trap here is assuming that a threshold on a metric like 'request_count' automatically implies a sum across all revisions, when in fact it applies to each individual revision's time series.

How to eliminate wrong answers

Option A is wrong because the alert condition is set to fire when the request count exceeds 1000, not when it drops below 1000; a 'less than' threshold would require a different condition. Option C is wrong because the alert evaluates each revision independently, not the average across all revisions; averaging would require a different aggregation function like `mean` or `avg`. Option D is wrong because the alert does not sum request counts across all revisions; it triggers per revision, so a single revision exceeding 1000 requests in a minute fires the alert regardless of other revisions' counts.

Full explanation →

830

MCQeasy

You are monitoring Cloud Bigtable replication lag. Which metric should you use to determine if replicas are up to date, and what consistency level is typical for Bigtable replication?

A.Use 'replication_lag' metric; consistency is eventually consistent.

B.Use 'cluster_lag' metric; consistency is read-your-writes consistent.

C.Use 'replica_lag' metric; consistency is strongly consistent.

D.Use 'replication_lag' metric; consistency is strongly consistent.

AnswerA

Bigtable replication is asynchronous and eventually consistent; the 'replication_lag' metric measures the delay.

Full explanation →

831

MCQhard

You are planning a Cloud Bigtable cluster for a workload requiring 100,000 reads per second and 50,000 writes per second. The data will be stored on HDD. How many nodes are needed for the projected throughput? (Assume each node provides 10,000 QPS for reads or writes.)

A.20 nodes

B.5 nodes

C.15 nodes

D.10 nodes

AnswerD

10 nodes provide 100,000 reads/s and 100,000 writes/s, covering both.

Why this answer

Each Bigtable node can handle 10,000 QPS for reads or writes. For 100,000 reads/s, need 10 nodes. For 50,000 writes/s, need 5 nodes.

The node count must satisfy both: max(10,5)=10 nodes. Also storage capacity may be a factor but the question focuses on throughput.

Full explanation →

832

MCQmedium

A company needs to store petabytes of time-series IoT sensor data and query it with single-digit millisecond latency at millions of reads per second. The data has a simple key-value structure with timestamps. Which Google Cloud database is MOST appropriate?

A.Firestore

B.Cloud Bigtable

C.Cloud Spanner

D.BigQuery

AnswerB

Bigtable is the correct choice: wide-column NoSQL, designed for time-series and IoT workloads, single-digit ms latency, and scales to millions of QPS with additional nodes.

Why this answer

Cloud Bigtable is designed for exactly this use case — petabyte-scale, low-latency (single-digit ms), high-throughput NoSQL storage for time-series, IoT, and financial data. It scales horizontally by adding nodes. BigQuery is optimised for analytics (seconds-to-minutes latency), Cloud SQL is for OLTP (limited to tens of thousands of QPS), and Firestore is for document data with hierarchical structure.

Full explanation →

833

MCQmedium

A DevOps engineer needs to set up a centralized logging solution for multiple projects. They want to store logs in a BigQuery dataset for analysis. What is the best approach?

A.Use Cloud Logging's export feature to Pub/Sub and then to BigQuery.

B.Use the BigQuery Data Transfer Service for logs.

C.Create a sink in each project to export logs to the BigQuery dataset.

D.Create an aggregated sink at the organization or folder level to export logs to BigQuery.

AnswerD

Centralized and efficient.

Why this answer

Option D is correct because an aggregated sink at the organization or folder level allows you to collect logs from all projects within that hierarchy into a single BigQuery dataset in a centralized project. This approach eliminates the need to configure individual sinks per project, reduces administrative overhead, and ensures consistent log routing across the entire organization.

Exam trap

The trap here is that candidates often choose Option C (per-project sinks) because they think each project must independently export its logs, failing to recognize that aggregated sinks at the organization or folder level provide a centralized, scalable solution that reduces management overhead.

How to eliminate wrong answers

Option A is wrong because Cloud Logging's export to Pub/Sub then to BigQuery introduces unnecessary complexity and latency; Pub/Sub is typically used for real-time streaming or fan-out to multiple subscribers, not as a direct path to BigQuery when a sink can write directly. Option B is wrong because the BigQuery Data Transfer Service is designed for scheduled data imports from external sources (e.g., Google Ads, Amazon S3), not for ingesting Cloud Logging logs. Option C is wrong because creating a sink in each project is inefficient and error-prone for a multi-project setup; it requires manual configuration per project and does not scale, whereas an aggregated sink centralizes management.

Full explanation →

834

Multi-Selectmedium

A team is setting up CI/CD for a microservices architecture. They want to ensure each service is independently buildable and deployable. What practices should they adopt? (Select THREE)

Select 3 answers

A.Use Artifact Registry with separate repositories per service

B.Use Cloud Deploy's multi-target pipeline

C.Use a single repository with separate Cloud Build triggers per service

D.Use separate repositories per service

E.Use Cloud Build's build config with substitutions to build multiple services

AnswersA, C, D

Separate repositories provide isolation and access control per service.

Why this answer

Options A, B, and E are correct. Separate repositories (A) or separate triggers with includeFiles (B) ensure independent builds. Separate Artifact Registry repositories (E) ensure artifact isolation.

Option C builds multiple services in one config, reducing independence. Option D is about deployment targets, not builds.

Full explanation →

835

Multi-Selecteasy

A company is bootstrapping a Google Cloud organization for DevOps. Which TWO practices should be implemented to ensure secure and efficient management of infrastructure as code (IaC) pipelines?

Select 2 answers

A.Store infrastructure secrets (e.g., API keys) directly in Terraform configuration files for simplicity.

B.Use a dedicated project for CI/CD pipelines that houses Cloud Build triggers and Cloud Source Repositories.

C.Use a single project to host all development, staging, and production environments to reduce complexity.

D.Implement separation of duties by using least-privilege service accounts for Terraform and restricting direct human access to production projects.

E.Require manual approval from a security team for every infrastructure change.

AnswersB, D

A separate project isolates CI/CD resources and simplifies IAM management for pipeline service accounts.

Why this answer

Option B is correct because using a dedicated project for CI/CD pipelines isolates Cloud Build triggers and Cloud Source Repositories from other workloads, preventing accidental interference and simplifying access control. This aligns with Google Cloud's recommended landing zone pattern where pipeline infrastructure is managed separately from application environments.

Exam trap

The trap here is that candidates often confuse 'simplicity' with 'security' and choose a single project for all environments (Option C) or manual approval for every change (Option E), failing to recognize that Google Cloud's recommended architecture emphasizes isolation and automated guardrails over manual processes.

Full explanation →

836

MCQhard

A financial services company uses Spanner for their core database. They notice that some transactions are taking longer than expected, especially during cross-region writes. They have set up Spanner with regional configuration. What is the most likely cause?

A.The transaction is experiencing contention due to a hot spot

B.The transaction is using stale reads

C.The transaction is not using a read-write transaction

D.The transaction is too large

AnswerA

Contention on popular keys causes retries and delays.

Why this answer

A is correct because cross-region writes in a regional Spanner configuration can lead to increased latency due to hot spotting. A hot spot occurs when many writes are concentrated on a single split (e.g., a monotonically increasing key), causing contention and serialization delays. This is especially pronounced in cross-region scenarios because Spanner's TrueTime and Paxos-based replication require consensus across zones, amplifying the impact of contention.

Exam trap

Cisco often tests the misconception that cross-region latency is solely due to network distance, but the real trap is that regional configuration still suffers from hot spots because all writes must go to the leader replica, and contention on a split can cause disproportionate delays.

How to eliminate wrong answers

Option B is wrong because stale reads are used to reduce latency, not increase it; they read from a replica without waiting for the latest timestamp, which would not cause longer transaction times. Option C is wrong because not using a read-write transaction would mean using a read-only transaction, which does not involve writes and thus cannot explain slow cross-region writes. Option D is wrong because while large transactions can be slow, the question specifically highlights cross-region writes, and Spanner's transaction size limit is 20,000 mutations or 100 MB; the issue is more likely contention from a hot spot than sheer size.

Full explanation →

837

MCQeasy

A web application serves static assets (images, CSS, JavaScript) from Compute Engine instances. Users in different geographic regions report slow page loads. Which Google Cloud service can be used to improve performance for these users?

A.VPC Network Peering

B.Cloud Load Balancing

C.Cloud CDN

D.Cloud NAT

AnswerC

Cloud CDN uses Google's global edge network to cache static content closer to users.

Why this answer

Cloud CDN (Content Delivery Network) caches static assets at Google's globally distributed edge points of presence (PoPs). When users request images, CSS, or JavaScript, the content is served from the nearest edge cache rather than the origin Compute Engine instances, reducing latency and improving page load times for geographically distributed users.

Exam trap

The trap here is that candidates confuse Cloud Load Balancing (which only distributes traffic) with Cloud CDN (which caches at edge locations), assuming load balancing alone solves geographic latency issues.

How to eliminate wrong answers

Option A is wrong because VPC Network Peering connects two VPC networks for private IP communication; it does not cache content or accelerate delivery to end users. Option B is wrong because Cloud Load Balancing distributes traffic across backend instances but does not cache responses at edge locations; it can be used with Cloud CDN but alone does not reduce latency for static assets. Option D is wrong because Cloud NAT provides outbound internet connectivity for instances without external IPs; it does not cache or accelerate content delivery.

Full explanation →

838

MCQeasy

Which tool is recommended for managing the initial setup of a Google Cloud organization, including creating folders, projects, and IAM policies in an automated and repeatable manner?

A.Terraform

B.Deployment Manager

C.Cloud Console

D.gcloud command line

AnswerA

Terraform is widely adopted and Google recommends it for infrastructure automation.

Why this answer

Terraform is the recommended tool for bootstrapping a Google Cloud organization because it is declarative, idempotent, and supports infrastructure-as-code (IaC) for creating folders, projects, and IAM policies in an automated and repeatable manner. Unlike Google Cloud's Deployment Manager, Terraform is cloud-agnostic and has a mature provider (hashicorp/google) that directly manages organization-level resources such as google_folder, google_project, and google_organization_iam_member. This aligns with DevOps best practices for version-controlled, reproducible infrastructure provisioning.

Exam trap

Google Cloud often tests the misconception that Deployment Manager is the best choice because it is Google-native, but the question specifically asks for a tool that is 'recommended' for automated and repeatable bootstrapping, which Terraform achieves through its declarative, stateful, and multi-cloud design.

How to eliminate wrong answers

Option B is wrong because Deployment Manager is a Google Cloud-native IaC tool that uses YAML or Python templates, but it is less portable and lacks the broad community support and multi-cloud capabilities of Terraform; it also does not natively support the same level of modularity and state management for bootstrapping an organization. Option C is wrong because Cloud Console is a manual, click-based web interface that cannot be automated or repeated programmatically, making it unsuitable for initial setup in a DevOps pipeline. Option D is wrong because the gcloud command line is imperative and requires sequential commands, which is error-prone and not designed for idempotent, stateful infrastructure management across multiple environments.

Full explanation →

839

MCQmedium

A team uses a monorepo with multiple microservices in separate directories. They want to build only the changed service(s) when a push occurs to the repo. How can they achieve this efficiently?

A.Use a single Cloud Build trigger with a Dockerfile build step that builds all services.

B.Use Cloud Functions to invoke Cloud Build per changed directory.

C.Create multiple Cloud Build triggers, each with a different includeFiles filter matching the service directory.

D.Use a Cloud Build trigger with a build config that dynamically detects changes using git diff.

AnswerC

includeFiles and excludeFiles allow triggering only when files in specific paths change.

Why this answer

Option B is correct because Cloud Build triggers can use includeFiles filters to only trigger when files in a specific directory change. Option A builds all services, which is inefficient. Option C is possible but not native.

Option D adds complexity.

Full explanation →

840

Multi-Selecteasy

A company is bootstrapping a Google Cloud organization with multiple projects. They want to enable consistent security and compliance across all projects. Which two organization policies should they consider? (Choose TWO.)

Select 2 answers

A.Require all service accounts to have a unique naming convention.

B.Restrict domain of users to the company domain.

C.Enforce that all projects have a Cloud Storage bucket.

D.Allow all projects to use any external IPs.

E.Prevent users from disabling audit logging.

AnswersB, E

Use constraints/resourcemanager.allowedPolicyMemberDomains.

Why this answer

Option B is correct because the 'Restrict domain of users to the company domain' organization policy (constraints/iam.allowedPolicyMemberDomains) ensures that only identities from the specified Google Workspace or Cloud Identity domain can be added as members in IAM policies across all projects. This prevents external users from gaining access, enforcing a consistent security boundary from the outset of bootstrapping.

Exam trap

The trap here is that candidates often confuse organization policies with project-level configurations or best practices, mistakenly thinking that naming conventions or resource creation requirements can be enforced as organization policies, when in reality only specific predefined constraints are available.

Full explanation →

841

MCQeasy

An organization needs to store backup copies of a Cloud Spanner database for a minimum of 365 days to comply with regulatory requirements. Which backup option should they use?

A.Create a Spanner backup and set the expiration to 365 days

B.Enable point-in-time recovery (PITR) with a 365-day retention period

C.Export the database to Cloud Storage using the gcloud command and set a lifecycle policy

D.Use Cloud SQL for PostgreSQL with automated backups set to 365 days

AnswerA

Spanner backups support expiration up to 365 days, meeting the regulatory requirement.

Why this answer

Cloud Spanner backups can be configured with expiration up to 365 days (max). Import/export to Cloud Storage is not a managed backup solution and does not support PITR. Bigtable backups are for Bigtable, not Spanner.

Cloud SQL backups are for Cloud SQL.

Full explanation →

842

MCQmedium

You are designing a Cloud SQL for PostgreSQL database. The application has a table with 1 million rows that is frequently queried using equality on the 'email' column and range queries on the 'created_at' column. Which index strategy minimizes query latency?

A.Create a full-text index on email.

B.Create a composite B-tree index on (email, created_at).

C.Create a B-tree index on email only.

D.Create separate B-tree indexes on email and created_at.

AnswerB

This index supports the exact query pattern.

Why this answer

Option B is correct because a composite B-tree index on (email, created_at) allows the database to satisfy both the equality condition on 'email' and the range condition on 'created_at' in a single index scan. PostgreSQL can use the leftmost column for equality filtering and then traverse the index tree to retrieve the range portion efficiently, minimizing random I/O and query latency.

Exam trap

Cisco often tests the misconception that separate single-column indexes are equivalent to a composite index, but in PostgreSQL, separate indexes require bitmap scans or residual filtering, which are slower than a single composite index that matches the query's equality and range predicates.

How to eliminate wrong answers

Option A is wrong because a full-text index is designed for text search (e.g., tsvector/tsquery) and does not support equality or range comparisons on a plain 'email' column; it would be ignored by the query planner for these operations. Option C is wrong because a B-tree index on email only can filter by email efficiently, but then PostgreSQL must perform a separate filter on created_at for each matching row, which can be expensive for large result sets. Option D is wrong because separate B-tree indexes on email and created_at would force the planner to choose one index (likely email) and then apply a residual filter on created_at, or attempt a bitmap scan combining both indexes, which is less efficient than a single composite index that directly supports the query pattern.

Full explanation →

843

MCQhard

A company runs a microservices architecture on GKE with Istio service mesh. They observe that service-to-service latency has increased after enabling mTLS. What is the most likely cause?

A.mTLS encryption overhead

B.Incorrect load balancer configuration

C.Network policy restriction

D.Sidecar proxy resource limits

AnswerA

Encrypting and decrypting each request adds CPU overhead and latency.

Why this answer

Enabling mTLS in Istio encrypts all service-to-service traffic using mutual TLS, which adds CPU overhead for encryption and decryption of each request. This encryption overhead directly increases latency, especially for high-throughput or small-payload services, as the sidecar proxies must perform TLS handshakes and cryptographic operations on every packet.

Exam trap

Google Cloud often tests the misconception that mTLS only adds security without performance impact, but candidates must recognize that encryption/decryption at the sidecar proxy level introduces measurable CPU-bound latency.

How to eliminate wrong answers

Option B is wrong because incorrect load balancer configuration would cause traffic routing issues or dropped connections, not a general increase in latency after enabling mTLS. Option C is wrong because network policy restrictions would block or drop traffic, not simply increase latency across all service-to-service calls. Option D is wrong because sidecar proxy resource limits would cause throttling, timeouts, or OOM kills, but the question states latency increased after enabling mTLS, not after changing resource limits.

Full explanation →

844

Multi-Selectmedium

A service experiences increased latency and HTTP 503 errors. The engineer finds that the backend managed instance group (MIG) is at max instances and CPU utilization is 90%. Which TWO actions should the engineer take to restore the service quickly?

Select 2 answers

A.Enable autoscaling based on HTTP load balancing utilization

B.Increase the autoscaling target CPU utilization to 95%

C.Increase the maximum number of instances in the MIG

D.Reduce the autoscaling target CPU utilization to 50%

E.Reduce the number of instances to avoid resource contention

AnswersA, C

Scales based on request rate, which is more responsive than CPU.

Why this answer

Option A is correct because enabling autoscaling based on HTTP load balancing utilization allows the MIG to scale out based on the actual request load, which directly addresses the 503 errors caused by the backend being at max capacity. This metric is more responsive to traffic spikes than CPU utilization alone, as it reflects the frontend load balancer's view of backend capacity.

Exam trap

Google Cloud often tests the misconception that adjusting CPU utilization thresholds (either up or down) is a quick fix for capacity issues, when in fact the immediate solution is to increase the maximum instance count or enable a more responsive scaling metric.

Full explanation →

845

MCQeasy

A company uses Cloud Storage to store archival data. They want to minimize storage costs while maintaining availability. Which storage class should they use?

A.Nearline storage class.

B.Standard storage class.

C.Archive storage class.

D.Coldline storage class.

AnswerC

Archive is the lowest-cost storage class for long-term archival data.

Why this answer

The Archive storage class is the correct choice because it offers the lowest storage cost for archival data that is accessed less than once per year, with a 365-day minimum storage duration and retrieval costs that are higher than other classes. This aligns with the requirement to minimize storage costs while maintaining availability, as Archive data is still available for retrieval (though with a longer latency) and is replicated for durability.

Exam trap

The trap here is that candidates often confuse 'Coldline' with the cheapest option because of its name, but Archive is actually the lowest-cost class for truly archival data, and Cisco tests whether you know the specific access frequency and minimum storage duration differences between Coldline and Archive.

How to eliminate wrong answers

Option A is wrong because Nearline is designed for data accessed less than once per 30 days, not for archival data, and has higher storage costs than Archive. Option B is wrong because Standard is for frequently accessed data (e.g., multiple times per month) and has the highest storage cost, making it unsuitable for minimizing costs on archival data. Option D is wrong because Coldline is for data accessed less than once per 90 days, with storage costs higher than Archive and a 90-day minimum storage duration, which does not provide the lowest cost for long-term archival.

Full explanation →

846

MCQmedium

A team is migrating a large on-premise Oracle database to Cloud SQL for PostgreSQL. They need to minimize downtime and ensure data consistency. Which migration approach is recommended?

A.Use pg_dump and pg_restore

B.Use Database Migration Service with continuous replication

C.Export data as CSV, import to Cloud SQL

D.Create a Cloud SQL read replica from on-premise

AnswerB

DMS supports homogeneous migration with minimal downtime via change data capture.

Why this answer

Using Database Migration Service (DMS) with continuous replication provides near-zero downtime and maintains consistency.

Full explanation →

847

Multi-Selectmedium

A company is migrating an Oracle database to Cloud SQL for PostgreSQL. They need to convert data types. Which TWO data type mappings are correct?

Select 2 answers

A.NUMBER(10,2) → NUMERIC(10,2)

B.CLOB → VARCHAR

C.BLOB → BYTEA

D.DATE → DATE

E.NUMBER(10) → INTEGER

AnswersA, E

Correct mapping for fixed-point numeric.

Why this answer

Oracle NUMBER(10) maps to INTEGER, and CLOB maps to TEXT. BLOB maps to BYTEA, not VARCHAR. DATE maps to TIMESTAMP, not DATE because Oracle DATE includes time.

Full explanation →

848

MCQeasy

A team deploys a Cloud Function that processes user requests. They notice cold starts cause high latency for the first request after a period of inactivity. What is the most effective way to reduce cold starts?

A.Use a larger function timeout

B.Set the minimum instances to 1

C.Increase the memory allocation

D.Deploy the function in multiple regions

AnswerB

Keeping at least one warm instance eliminates cold start latency.

Why this answer

Setting minimum instances to 1 pre-warms a function instance, keeping it idle and ready to serve requests immediately. This eliminates the cold start latency for the first request after inactivity because the runtime environment is already initialized and loaded into memory.

Exam trap

Google Cloud often tests the misconception that increasing resources (memory or timeout) or spreading across regions solves cold starts, when the actual solution is keeping an instance alive via minimum instances or similar warm-start mechanisms.

How to eliminate wrong answers

Option A is wrong because increasing the function timeout does not prevent cold starts; it only allows the function to run longer before being terminated, which does not address the initialization delay. Option C is wrong because increasing memory allocation can improve performance during execution but does not keep an instance alive or reduce the cold start penalty; cold starts still occur after idle periods. Option D is wrong because deploying in multiple regions improves geographic latency and availability but does not reduce cold starts; each regional deployment still experiences cold starts independently after inactivity.

Full explanation →

849

Multi-Selecthard

A company runs a global application on Cloud Spanner with a multi-region configuration. They need to test their DR procedures without impacting production. Which THREE actions should they perform? (Choose 3)

Select 3 answers

A.Delete the production instance and recreate it from a backup.

B.Simulate a zone failure by modifying IAM permissions to restrict access to replicas in a zone.

C.Create a backup of the production database and restore it to a separate instance for validation.

D.Promote a read replica to a writable instance in a different region.

E.Perform a failover test by initiating a planned regional outage using Spanner's API.

AnswersB, C, E

This allows testing failover behavior without actual outage.

Why this answer

Non-destructive tests include creating a backup and restoring it to a separate instance for validation, simulating a zone failure by restricting replica access, and performing regular failover drills via backup/restore to test RTO.

Full explanation →

850

Multi-Selecthard

A company runs a stateful workload on Compute Engine with local SSDs. They need to improve disk I/O performance without changing the instance type. Which THREE actions should they take?

Select 3 answers

A.Migrate to persistent SSD for better durability.

B.Stripe data across multiple local SSD volumes using RAID 0.

C.Use a filesystem optimized for SSDs, such as ext4 with noatime and nodiratime options.

D.Ensure the instance is in the same zone as the application that accesses the disks.

E.Enable encryption for the local SSDs to reduce I/O overhead.

AnswersB, C, D

Increases throughput and IOPS.

Why this answer

Option B is correct because striping data across multiple local SSD volumes using RAID 0 increases the aggregate I/O throughput and IOPS by distributing read and write operations across all disks in parallel. This directly improves disk I/O performance without changing the instance type, as local SSDs are physically attached to the host and offer the highest performance when combined.

Exam trap

Google Cloud often tests the misconception that persistent SSDs are always better for performance, but local SSDs provide lower latency and higher IOPS for stateful workloads, and striping them with RAID 0 is the key to maximizing I/O without changing the instance type.

Full explanation →

851

MCQhard

A company is transferring large datasets from on-premises to Google Cloud using a VPN. They notice high latency due to packet loss. What is the most effective way to improve throughput?

A.Set up Dedicated Interconnect for a more reliable connection.

B.Enable compression on the VPN tunnel.

C.Increase the number of VPN tunnels and use BGP multipath.

D.Use a multi-region GCP endpoint and distribute traffic.

AnswerA

Dedicated Interconnect provides a direct physical connection, reducing packet loss and latency.

Why this answer

Dedicated Interconnect provides a direct, private physical connection between on-premises and Google Cloud, bypassing the public internet entirely. This eliminates the packet loss and high latency inherent in VPN tunnels over the internet, offering consistent throughput and lower latency for large dataset transfers.

Exam trap

Google Cloud often tests the misconception that adding more VPN tunnels or enabling compression can overcome internet-based packet loss, but the correct solution is to eliminate the unreliable public internet path entirely with a dedicated connection like Interconnect.

How to eliminate wrong answers

Option B is wrong because enabling compression on the VPN tunnel can reduce the amount of data transmitted but does not address the underlying packet loss causing high latency; in fact, compression can increase CPU overhead and may worsen performance if packet loss is present. Option C is wrong because increasing the number of VPN tunnels with BGP multipath can improve bandwidth utilization but still relies on the public internet, so packet loss and latency issues remain; it does not provide a reliable, low-latency path. Option D is wrong because using a multi-region GCP endpoint and distributing traffic does not solve the fundamental problem of packet loss on the VPN connection; it only spreads traffic across regions, which may introduce additional latency and complexity without addressing the unreliable internet path.

Full explanation →

852

MCQhard

You need to size a Bigtable cluster for a workload that requires 50,000 reads per second (QPS) and 20,000 writes per second. Each read is about 1 KB, each write is about 1 KB. The data volume is 5 TB and growing. You choose SSD storage. What is the minimum number of nodes?

A.72 nodes

B.50 nodes

C.7 nodes

D.5 nodes

AnswerA

Storage requirement (5000 GB / 70 GB per node = 71.4, rounded up to 72) dictates the node count.

Why this answer

The correct answer is A (72 nodes) because Bigtable's SSD nodes provide approximately 10,000 read QPS per node (for 1 KB reads) and 10,000 write QPS per node (for 1 KB writes). With 50,000 reads and 20,000 writes, the read requirement dominates, needing 5 nodes for reads, but writes require 2 nodes. However, Bigtable's architecture requires a minimum of 3 nodes for replication and availability, and the total throughput must be scaled to account for node overhead and growth.

The calculation: (50,000 reads / 10,000) = 5 nodes for reads, (20,000 writes / 10,000) = 2 nodes for writes, but the combined load and Bigtable's recommendation for SSD nodes (each handling ~1,000 QPS per core, with 30 cores per node) yields 72 nodes when factoring in the 5 TB data volume (each SSD node stores ~70 GB usable) and growth.

Exam trap

Cisco often tests the misconception that you can size Bigtable nodes based solely on QPS without considering data volume and replication overhead, leading candidates to pick a lower node count like 5 or 7.

How to eliminate wrong answers

Option B (50 nodes) is wrong because it underestimates the throughput requirements; 50 nodes would only provide 500,000 read QPS and 500,000 write QPS, which is excessive, but the question asks for the minimum number of nodes, and 50 nodes is not the minimum given the 5 TB data volume and growth. Option C (7 nodes) is wrong because it incorrectly assumes that the combined QPS (70,000) can be divided by 10,000 per node, yielding 7 nodes, but this ignores Bigtable's node sizing for data volume (5 TB requires at least 72 nodes with SSD, as each node stores ~70 GB usable) and the need for replication. Option D (5 nodes) is wrong because it only considers the read QPS (50,000 / 10,000 = 5 nodes) and ignores the write QPS (20,000) and the 5 TB data volume, which would require far more nodes for storage.

Full explanation →

853

MCQmedium

A team is migrating a 2 TB MySQL database from on-premises to Cloud SQL. They want to minimize downtime. The source is MySQL 8.0 with InnoDB tables, and the application can be read-only during cutover. Which approach provides the lowest downtime while ensuring data consistency?

A.Use mysqldump with --single-transaction to export, then import to Cloud SQL using mysql client.

B.Use Cloud Dataflow to stream data from MySQL to Cloud SQL.

C.Use gcloud sql import command with a compressed dump file from Cloud Storage.

D.Use Database Migration Service with continuous CDC and promote when lag is zero.

AnswerD

DMS CDC minimizes downtime by replicating changes in real time.

Why this answer

DMS with continuous CDC provides near-zero downtime by replicating ongoing changes. After the initial dump, the source continues to replicate changes, allowing a quick cutover with minimal downtime.

Full explanation →

854

Multi-Selectmedium

A company uses Cloud Bigtable with replication across two regions. They want to implement a DR plan that minimizes RPO and RTO. Which TWO steps should they take? (Choose 2)

Select 2 answers

A.Configure multi-cluster routing with read-failover.

B.Regularly perform failover drills using Cloud DNS health checks to update routing.

C.Enable automatic failover for writes in the Bigtable cluster configuration.

D.Set the replication routing policy to any-replica.

E.Set the replication routing policy to single-cluster with failover priority.

AnswersA, B

This ensures reads fail over automatically while writes remain in primary, minimizing RPO.

Why this answer

Using multi-cluster routing with read-failover ensures reads automatically switch to the secondary cluster when primary is unhealthy, minimizing RPO by directing writes to primary. Regularly testing failover validates the process and updates runbooks.

Full explanation →

855

MCQhard

Refer to the exhibit. A DevOps engineer is debugging a Cloud Build pipeline that fails after the second step. The error indicates that the docker push fails with a permission denied error. The service account used by Cloud Build has the roles/storage.objectAdmin role on the project. What is the most likely cause of the failure?

A.The docker push command uses an incorrect repository path.

B.The service account does not have permission to push to Artifact Registry.

C.The Cloud Build service account needs the roles/artifactregistry.writer role.

D.The gcloud auth configure-docker step must be run for Artifact Registry.

AnswerC

Artifact Registry requires specific roles; storage.objectAdmin is insufficient for pushing images.

Why this answer

The service account has storage.objectAdmin which grants access to Cloud Storage, not Artifact Registry. Pushing to Artifact Registry requires the roles/artifactregistry.writer (or admin) role. Option A is too vague.

Option D is already performed in the first step. Option B is less likely as the path appears correct. Option C correctly identifies the missing role.

Full explanation →

856

MCQhard

A company uses BigQuery for analytics. They notice high costs due to queries scanning large amounts of data. They want to reduce costs without sacrificing performance for urgent queries. Which approach is most cost-effective?

A.Use flat-rate pricing with reservations

B.Partition and cluster tables, and use BI Engine for acceleration

C.Use on-demand pricing with query caching

D.Use materialized views and limit query jobs to interactive priority

AnswerB

Partitioning and clustering limit the data scanned, and BI Engine provides fast in-memory analysis for critical queries without scanning.

Why this answer

Option C is correct because partitioning and clustering reduce data scanned, and BI Engine accelerates queries. Option A (flat-rate) is costly if not fully utilized. Option B (caching) helps but does not reduce scanned data as much.

Option D (materialized views) incurs storage costs and may not be suitable for all queries.

Full explanation →

857

Multi-Selecteasy

A retail company is designing a new inventory management system on Cloud Spanner. They need to ensure high write throughput for order processing. Which two schema design practices help avoid write hotspots? (Choose TWO.)

Select 2 answers

A.Create secondary indexes on frequently queried columns.

B.Avoid using a monotonically increasing primary key.

C.Store all data in a single table with no interleaving.

D.Use foreign keys to enforce referential integrity.

E.Add a hash prefix to the primary key to distribute writes.

AnswersB, E

Monotonically increasing keys cause hotspotting.

Why this answer

Option B is correct because monotonically increasing primary keys (e.g., auto-increment integers or timestamps) cause all new writes to be directed to the same tablet server, creating a hot spot. Cloud Spanner splits data by key range, so sequential keys concentrate load on a single split. Option E is correct because adding a hash prefix to the primary key distributes writes uniformly across splits, preventing any single node from becoming a bottleneck.

Exam trap

Cisco often tests the misconception that secondary indexes or foreign keys can improve write performance, when in fact they only help reads or data integrity, not write distribution.

Full explanation →

858

MCQeasy

A Cloud SQL for MySQL instance is running low on disk space. You have enabled automatic storage increase, but you also want to be proactively alerted when disk usage exceeds 80%. Which steps should you take?

A.Set the storage size to a fixed large value to avoid alerts.

B.Create a Cloud Logging sink for disk usage logs and set up a Pub/Sub notification.

C.Create a Cloud Monitoring alert on the metric 'disk/utilization' with a threshold of 80%.

D.Use Cloud SQL's built-in email notifications for disk usage.

AnswerC

This is the correct way to get an alert when disk usage exceeds 80%.

Why this answer

Cloud Monitoring allows you to create alerting policies based on metrics. The disk usage metric for Cloud SQL is available, and you can set a threshold of 80%. Automatic storage increase is a separate setting that automatically increases storage when needed.

Full explanation →

859

Multi-Selecthard

A company is migrating a large relational database to Bigtable. The database has a table with columns: user_id (string), event_type (string), timestamp (timestamp), and details (JSON). The access patterns include retrieving all events for a user in a time range, and filtering by event_type. Which THREE row key design strategies should they apply? (Choose 3)

Select 3 answers

A.Include event_type as a column qualifier instead of row key

B.Store all events for a user in a single row

C.Use a hash prefix of user_id to distribute writes

D.Use a monotonically increasing timestamp as the row key

E.Use reverse timestamp to enable recent data first scans

AnswersA, C, E

Column qualifiers can be used to filter, and including event_type in row key may cause wide rows.

Why this answer

A row key like hash(user_id) + user_id + reverse_timestamp + event_type distributes writes (hash), allows user-level scans (user_id), orders by time (reverse_timestamp), and enables filtering on event_type by using it as a column qualifier or part of key.

Full explanation →

860

MCQhard

A company uses Spinnaker for continuous delivery across multiple GKE clusters. After a recent infrastructure change, the 'Canary' deployment strategy fails during the 'disable' phase of the old version. The error log shows: 'Unable to disable server group: Not authorized to perform compute.instanceGroups.update.' What is the most likely root cause?

A.The GKE cluster has reached its maximum node quota.

B.The Cloud Deploy pipeline is missing the required IAM role for the Spinnaker service account.

C.The Spinnaker service account lacks the compute.instanceGroups.update permission on the project.

D.The Kayenta canary analysis service is not configured correctly.

AnswerC

Correct: Spinnaker uses this permission to disable old server groups.

Why this answer

The error 'Unable to disable server group: Not authorized to perform compute.instanceGroups.update' directly indicates an IAM permissions issue. In Spinnaker, the service account used to interact with GCP must have the compute.instanceGroups.update permission to manage instance groups during the disable phase of a canary deployment. Option C correctly identifies that the Spinnaker service account lacks this specific permission on the project.

Exam trap

Google Cloud often tests the distinction between permissions errors and resource quota errors, leading candidates to incorrectly select quota-related options when the error message explicitly states 'Not authorized'.

How to eliminate wrong answers

Option A is wrong because reaching the maximum node quota would cause a failure to provision new nodes, not a permissions error during the disable phase. Option B is wrong because Cloud Deploy is a separate Google Cloud service; the error is from Spinnaker's own service account, not from a Cloud Deploy pipeline. Option D is wrong because Kayenta handles canary analysis and metric evaluation, not the disabling of server groups; the error is an IAM authorization failure, not a configuration issue with Kayenta.

Full explanation →

861

MCQeasy

A company is using Cloud Firestore to store user profiles. They need to query users by both their 'age' and 'city' fields. What must they configure to perform this query efficiently?

A.Use Firestore in Datastore mode for automatic composite indexes.

B.Configure index exemptions for the fields.

C.No configuration needed; single-field indexes are sufficient.

D.Create a composite index on age and city.

AnswerD

Composite index is required for queries on multiple fields.

Why this answer

Firestore automatically creates single-field indexes for simple queries. However, for queries filtering on multiple fields like 'age' and 'city', a composite index must be created manually. Option B is correct.

Option A (single-field index) is insufficient. Option C (index exemptions) is for arrays and maps, not for multi-field queries. Option D (Datastore mode) does not change this requirement.

Full explanation →

862

MCQmedium

A company uses Cloud Build to deploy a microservices application to Google Kubernetes Engine (GKE). They want to integrate Container Analysis to scan images for vulnerabilities before deployment. What is the minimal set of changes needed to achieve this?

A.Enable the Container Analysis API; no changes to the build configuration are needed.

B.Migrate images from Container Registry to Artifact Registry and enable vulnerability scanning there.

C.Add a build step to run a vulnerability scanner CLI tool before pushing the image.

D.Enable Binary Authorization to block deployment of vulnerable images.

AnswerA

Cloud Build automatically pushes images to defined registry, and Container Analysis scans them when API is enabled.

Why this answer

Option D is correct because Cloud Build natively integrates with Container Analysis; enabling the API and building the image triggers scanning automatically. Option A is incorrect - no need for a separate scan step. Option B is incorrect - Binary Authorization is for policy enforcement, not scanning.

Option C is incorrect - Artifact Registry does not replace scanning.

Full explanation →

863

MCQeasy

A startup is building a mobile application and needs a real-time database that automatically scales to handle sudden spikes in user traffic. They want to minimise operational overhead and only pay for the resources they use. Which Google Cloud database should they choose?

A.Firestore

B.Cloud Spanner

C.Cloud SQL

D.Cloud Bigtable

AnswerA

Firestore is serverless, auto-scaling, and pay-per-use, ideal for mobile apps.

Why this answer

Firestore is a fully managed, serverless NoSQL document database that automatically scales horizontally to handle sudden traffic spikes without manual intervention. It offers real-time data synchronization via listeners, and its pay-per-use billing model aligns with the startup's requirement to minimize operational overhead and only pay for consumed resources.

Exam trap

Cisco often tests the misconception that Cloud Spanner is the best choice for any scalable database need, but candidates overlook that Spanner requires provisioned capacity and is not serverless, while Firestore's auto-scaling and pay-per-use model directly address the startup's requirements for minimizing operational overhead and handling sudden spikes.

How to eliminate wrong answers

Option B is wrong because Cloud Spanner is a globally distributed, strongly consistent relational database designed for high-throughput OLTP workloads, but it requires upfront capacity planning and incurs costs for provisioned nodes, not a pay-per-use model, making it unsuitable for a startup wanting to minimize overhead and pay only for usage. Option C is wrong because Cloud SQL is a managed relational database (MySQL, PostgreSQL, SQL Server) that does not auto-scale for sudden spikes; it requires manual or scheduled scaling of read replicas and has a fixed instance size billing model, not a consumption-based model. Option D is wrong because Cloud Bigtable is a wide-column NoSQL database optimized for large-scale analytical and operational workloads (e.g., time-series, IoT) with high throughput, but it requires manual cluster sizing and pays for provisioned nodes per hour, not a serverless pay-per-use model, and lacks real-time data synchronization features.

Full explanation →

864

MCQmedium

A company uses Cloud Spanner with a multi-region configuration (nam6) for a global user database. They experience a regional outage affecting us-central1, where the leader region is located. What happens to write availability?

A.Writes continue with increased latency because us-central1 is still available for reads

B.The Spanner instance becomes read-only until an operator manually fails over

C.Writes are blocked until the us-central1 region recovers

D.Writes automatically fail over to us-east1 within seconds

AnswerD

Spanner multi-region configurations provide automatic failover to the secondary region with RTO <5 seconds.

Why this answer

In a multi-region Spanner configuration like nam6 (us-central1 and us-east1), if the leader region (us-central1) goes down, Spanner automatically fails over to the other region (us-east1) as the new leader. This failover is automatic and typically completes within 5 seconds (RTO <5 seconds). Write availability is restored once the new leader region is elected.

Full explanation →

865

Multi-Selecteasy

A DevOps team wants to monitor the performance of a Cloud SQL database. Which two metrics should they track? (Select TWO.)

Select 2 answers

A.Auto-increment counter

B.Query error rate

C.CPU utilization

D.Number of active connections

E.Disk read/write latency

AnswersC, E

High CPU may indicate inefficient queries or need for scaling.

Why this answer

CPU utilization (C) is a critical metric for Cloud SQL because high CPU usage indicates that the database instance is struggling to process queries, often due to inefficient queries or insufficient compute capacity. Monitoring CPU utilization helps teams decide when to scale up or optimize query performance.

Exam trap

Google Cloud often tests the distinction between metrics that measure performance (e.g., CPU, latency) versus metrics that measure capacity or configuration (e.g., active connections, auto-increment), leading candidates to select D because they conflate 'active connections' with performance impact.

Full explanation →

866

MCQmedium

An organization wants to enforce that all Compute Engine VMs use only specific machine families (e.g., N2, C2). Which mechanism should they use?

A.IAM deny policies

B.Quota management

C.Folders with different owners

D.Organization policy with compute.restrictComputeEngineMachineTypes

AnswerD

Org policies can restrict machine types.

Why this answer

Organization policies in Google Cloud allow administrators to enforce constraints on resources across the entire hierarchy. The `compute.restrictComputeEngineMachineTypes` constraint specifically limits which machine families (e.g., N2, C2) can be used when creating Compute Engine VMs, making it the correct mechanism for this requirement.

Exam trap

The trap here is that candidates often confuse IAM deny policies with organization policy constraints, thinking that deny policies can restrict resource configurations, when in fact they only control identity-based access, not resource properties.

How to eliminate wrong answers

Option A is wrong because IAM deny policies control who can perform actions (e.g., deny a user from creating VMs), not which machine types are allowed; they cannot restrict specific machine families. Option B is wrong because quota management limits the quantity of resources (e.g., number of vCPUs or GPUs) but does not restrict the selection of machine families like N2 or C2. Option C is wrong because folders with different owners are an organizational structure for delegating administration and access control, not a mechanism to enforce technical constraints on machine families.

Full explanation →

867

MCQhard

Your organization runs a critical e-commerce platform on Google Kubernetes Engine (GKE). The platform uses Cloud Service Mesh (Anthos Service Mesh) for traffic management and Cloud Monitoring for observability. Recently, after a new release, you observe that the p99 latency of the checkout service has increased from 200ms to 2s. The service's CPU and memory metrics appear normal, and there are no error logs. The release included a change to the Istio VirtualService configuration that added a retry policy: 3 retries with a 500ms timeout per retry. You suspect that the retries are contributing to the latency increase. You want to use Cloud Monitoring to confirm this hypothesis. Which approach should you take?

A.Use Cloud Trace to analyze distributed traces for the checkout service and look for retry spans

B.Check the 'Services' dashboard in Cloud Monitoring, which shows a pre-built latency chart for all services

C.Use Metrics Explorer to query the istio.io/service/server/request_count metric, filtered by response_code_class and destination_service, and include the istio.io/service/server/request_retries metric to see retry counts alongside latency

D.Use Logs Explorer to search for logs containing 'retry' in the checkout service namespace

AnswerC

This directly shows the correlation between retries and latency.

Why this answer

Option C is correct because it directly correlates retry attempts with latency by querying the `istio.io/service/server/request_retries` metric alongside the `istio.io/service/server/request_count` metric in Metrics Explorer. This allows you to visualize the retry count per destination service (checkout) and compare it with the p99 latency increase, confirming whether the retry policy is causing the observed latency spike. The retry policy (3 retries with 500ms timeout) can add up to 1.5s of additional latency per request, which aligns with the increase from 200ms to 2s.

Exam trap

Google Cloud often tests the distinction between metrics (which aggregate over time) and traces (which show individual request paths), leading candidates to choose Cloud Trace (Option A) when they should use Metrics Explorer with retry-specific metrics to confirm a latency hypothesis.

How to eliminate wrong answers

Option A is wrong because Cloud Trace shows distributed traces and retry spans, but it does not provide aggregated metrics like p99 latency or retry counts over time; it is more suitable for debugging individual requests rather than confirming a hypothesis about overall latency trends. Option B is wrong because the pre-built 'Services' dashboard in Cloud Monitoring shows latency charts but does not include retry metrics, so you cannot directly correlate retries with latency increases. Option D is wrong because Logs Explorer searching for 'retry' logs is inefficient and unreliable; Istio retries are not always logged by default, and even if they are, logs do not provide the aggregated time-series data needed to confirm a latency hypothesis.

Full explanation →

868

MCQhard

An organization is implementing SLO-based alerting for a critical service. They want to alert when the service has consumed 50% of its error budget over a 30-day window. Considering best practices for alert sensitivity and noise reduction, which alerting approach should they use?

A.Alert on the burn rate over a 1-hour window with a threshold of 10.

B.Alert on the burn rate over a 5-minute window with a threshold of 0.5.

C.Alert on the error budget remaining with a threshold of 50%.

D.Alert on the SLI value directly with a threshold of 99.9%.

AnswerA

A burn rate of 10 over 1 hour means the error budget would be exhausted in 3 hours (30 days / 10 = 3 hours), triggering an alert when 50% is consumed in about 1.5 hours, which is timely.

Why this answer

Option A is correct because alerting on a burn rate of 10 over a 1-hour window directly indicates that the service is consuming error budget at a rate that would exhaust the entire 30-day budget in 3 days (since 30 days / 10 = 3 days). This approach balances sensitivity and noise reduction by using a sufficiently long window (1 hour) to smooth out transient spikes, while the high threshold ensures only significant sustained degradation triggers an alert, aligning with SRE best practices for multi-window, multi-burn-rate alerting.

Exam trap

Google Cloud often tests the misconception that shorter windows (like 5 minutes) are better for fast detection, but the trap here is that overly short windows increase noise and false positives, whereas a 1-hour window with a high burn rate threshold provides the right balance for a 30-day SLO.

How to eliminate wrong answers

Option B is wrong because a 5-minute window with a burn rate threshold of 0.5 is far too sensitive and noisy; it would trigger alerts on minor, transient blips that do not meaningfully consume error budget over the 30-day window, leading to alert fatigue. Option C is wrong because alerting on error budget remaining at 50% is a reactive, threshold-based approach that provides no lead time; by the time 50% is consumed, the service may already be in a critical state, and it does not account for the rate of consumption. Option D is wrong because alerting on the SLI value directly (e.g., 99.9%) is a static threshold that ignores the error budget entirely; it can trigger false positives during normal fluctuations and fails to measure the actual impact on the SLO over the compliance period.

Full explanation →

869

MCQmedium

Refer to the exhibit. You see this log entry from a Cloud Run service. The stack trace shows the error occurs in handler.js at line 50. You want to see the state of variables at that point in the production environment without adding logging or redeploying. What should you do?

A.Use Error Reporting to view similar errors.

B.Use Cloud Profiler to capture a heap snapshot.

C.Use Cloud Debugger to set a snapshot location at line 50 in handler.js.

D.Use Cloud Trace to trace the request.

AnswerC

Cloud Debugger can capture local variables at specific lines in live applications.

Why this answer

Option C is correct because Cloud Debugger allows you to inspect the state of an application, including local variables and call stack, at a specific line of code in a production environment without modifying or redeploying the application. By setting a snapshot at line 50 in handler.js, you can capture the variable values at the exact point where the error occurs, which directly addresses the need to debug without adding logging or redeploying.

Exam trap

Google Cloud often tests the distinction between debugging tools (Cloud Debugger) and monitoring/observability tools (Error Reporting, Cloud Profiler, Cloud Trace), so the trap here is that candidates may confuse Cloud Debugger with Error Reporting or Cloud Trace, thinking any tool that shows errors or traces can also reveal variable state.

How to eliminate wrong answers

Option A is wrong because Error Reporting aggregates and analyzes errors but does not provide the ability to inspect variable state at a specific line of code; it only shows error logs and stack traces. Option B is wrong because Cloud Profiler is used for continuous profiling of CPU and memory usage to identify performance bottlenecks, not for capturing variable state at a specific code location. Option D is wrong because Cloud Trace is a distributed tracing system that tracks request latency and path through services, but it does not capture local variable values or allow inspection of application state at a specific line of code.

Full explanation →

870

MCQeasy

A developer is setting up a Memorystore for Redis instance and needs to restrict access to only a specific Compute Engine VM in the same VPC network. Which configuration should they use?

A.Use Cloud Armor to whitelist the VM's external IP

B.Use the AUTH password and share it only with the VM

C.Configure the Memorystore instance with a firewall rule that allows only the VM's internal IP

D.Place the Redis instance in a separate VPC and peer only with the VM's VPC

AnswerC

Correct. Firewall rules can be applied to the VPC to restrict inbound traffic to the Memorystore instance's IP range from only the VM's IP.

Why this answer

Memorystore for Redis uses Private Service Access with VPC peering. Access control is via authorized networks (VPC networks) and optionally by using Cloud Armor or firewall rules at the VM level. The simplest is to authorize the VM's VPC network and then use a firewall rule to allow traffic only from the specific VM's IP.

Full explanation →

871

MCQeasy

What is the primary benefit of using preemptible VMs?

A.Higher reliability.

B.Faster performance.

C.Better security.

D.Lower cost.

AnswerD

Preemptible VMs are significantly cheaper than regular VMs.

Why this answer

Preemptible VMs are Compute Engine instances that last a maximum of 24 hours and can be terminated at any time by Google. They offer a significantly lower price—up to 60-91% discount compared to standard VMs—making them ideal for batch jobs and fault-tolerant workloads where cost savings are the primary benefit.

Exam trap

Google Cloud often tests the misconception that preemptible VMs offer higher performance or reliability, but the exam trap is that candidates confuse the cost-saving benefit with other attributes like speed or availability, which are not improved.

How to eliminate wrong answers

Option A is wrong because preemptible VMs have no reliability guarantees; they can be terminated at any time, so they are less reliable than standard VMs. Option B is wrong because preemptible VMs use the same machine types and performance as standard VMs; there is no performance boost. Option C is wrong because preemptible VMs do not provide better security; they share the same security model as standard VMs and are not designed for security enhancements.

Full explanation →

872

Multi-Selectmedium

You are designing a disaster recovery plan for a Cloud Bigtable instance. The instance has a single cluster in us-east1. You need to ensure that if the cluster becomes unavailable, the database can still serve read and write requests with minimal downtime. Which THREE steps should you take? (Choose three.)

Select 3 answers

A.Ensure the application can handle eventual consistency between clusters.

B.Enable synchronous replication between clusters.

C.Add a second cluster in a different zone (e.g., us-east1-b) and enable replication.

D.Configure the application to route traffic to the secondary cluster in case of primary failure.

E.Take a full backup of the table to Cloud Storage daily.

AnswersA, C, D

Bigtable replication is asynchronous, so occasional stale reads are possible.

Why this answer

Option A is correct because Cloud Bigtable uses eventual consistency for replication across clusters. When you add a second cluster and enable replication, data is replicated asynchronously, meaning writes to one cluster are not immediately visible in the other. The application must be designed to handle eventual consistency to avoid reading stale data after a failover.

Exam trap

Cisco often tests the misconception that synchronous replication is available in Cloud Bigtable, but the service only supports asynchronous replication, and candidates may also incorrectly think that daily backups are sufficient for high availability instead of using multi-cluster replication.

Full explanation →

873

Matchingmedium

Match each cost optimization practice to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Discount for 1- or 3-year resource commitment

Automatic discounts for running instances most of month

Short-lived, low-cost instances for batch jobs

Adjusting machine type to match workload needs

Notifications when spending exceeds thresholds

Why these pairings

Common cost management techniques on Google Cloud.

Full explanation →

874

MCQmedium

An engineer is designing a Bigtable row key for global user events. They want to avoid hotspots and enable efficient queries by user_id and time range. Which row key design is best?

A.hash(user_id) + timestamp

B.reverse(timestamp) + user_id

C.user_id + timestamp

D.timestamp + user_id

AnswerA

Hash prefix distributes writes, timestamp enables time-based queries.

Why this answer

Using a hash of user_id ensures distribution, and appending timestamp enables range scans by time within a user's events.

Full explanation →

875

Multi-Selectmedium

A company is designing a disaster recovery strategy for their Cloud Spanner database. They need an RPO of less than 30 seconds and an RTO of less than 1 minute. Which two configurations would meet these requirements? (Choose TWO)

Select 2 answers

A.Cross-region backup and restore

B.Multi-region configuration with read-write replicas (e.g., nam3)

C.Point-in-time recovery (PITR)

D.Regional instance

E.Multi-region configuration with read-only replicas

AnswersB, D

A multi-region configuration with read-write replicas provides automatic failover with RPO ~15s and RTO < 1min.

Why this answer

Spanner regional instance with HA provides RPO=0 and RTO seconds due to synchronous replication within the region, but it does not protect against region failure. A multi-region configuration (e.g., nam3) provides automatic failover with RPO ~15s and RTO < 1min. So options A and D meet the stated requirements.

Full explanation →

876

MCQeasy

Your Cloud Bigtable instance is experiencing high latency for certain row key ranges. You suspect a hotspot is developing. Which Google Cloud tool should you use to visualize and diagnose the hotspot?

A.Key Visualiser

B.Cloud Monitoring dashboard to check CPU utilization per node

C.Bigtable's built-in 'hotspot detection' alert in Cloud Monitoring

D.Cloud Logging to review request logs and identify slow queries

AnswerA

Key Visualiser is the correct tool for detecting hotspots in Bigtable.

Why this answer

Key Visualiser is a tool specifically designed for Bigtable to identify hotspots and access patterns by visualizing row key distribution and traffic. It helps pinpoint uneven load distribution.

Full explanation →

877

MCQhard

A company runs a critical application on Google Kubernetes Engine (GKE) with 3 nodes (e2-standard-4). To reduce costs, the team is considering right-sizing the nodes. The application is latency-sensitive and experiences periodic traffic spikes. What is the most cost-effective approach that maintains performance during spikes?

A.Use a single larger node (n2-standard-8) to reduce node count and network overhead.

B.Switch to 3 n2-standard-2 nodes to reduce vCPU and memory, and rely on horizontal pod autoscaling.

C.Create a node pool with smaller nodes (e2-standard-2) using preemptible VMs and enable cluster autoscaler.

D.Keep current node size but use committed use discounts for 1 year to reduce per-hour cost.

AnswerC

Preemptible VMs are cheaper, and autoscaler adds nodes during spikes, maintaining performance.

Why this answer

Option C is the most cost-effective approach because it uses smaller e2-standard-2 nodes with preemptible VMs, which are significantly cheaper than regular VMs, and combines this with the cluster autoscaler to automatically add nodes during traffic spikes. This ensures that the latency-sensitive application maintains performance by scaling out horizontally when needed, while minimizing baseline costs with smaller, cheaper nodes.

Exam trap

The trap here is that candidates often assume preemptible VMs are unsuitable for production or latency-sensitive workloads, but the cluster autoscaler mitigates the risk of VM termination by quickly replacing nodes, making this a valid cost-saving strategy for spike-tolerant applications.

How to eliminate wrong answers

Option A is wrong because using a single larger node (n2-standard-8) creates a single point of failure and does not leverage horizontal scaling; during spikes, the application may still suffer from resource contention on a single node, and network overhead is negligible in GKE. Option B is wrong because switching to 3 n2-standard-2 nodes reduces total vCPU and memory by 50%, and relying solely on horizontal pod autoscaling without cluster autoscaler means the node pool cannot grow during spikes, leading to pod scheduling failures and performance degradation. Option D is wrong because keeping the current node size and using committed use discounts only reduces per-hour cost by about 20-30% but does not address the goal of right-sizing to reduce costs; it maintains the same over-provisioned resources, which is less cost-effective than using smaller nodes with autoscaling.

Full explanation →

878

MCQhard

A company has on-premises servers running Linux and GKE clusters. They want to monitor all infrastructure using Cloud Monitoring. Which solution is most scalable and aligned with Google's best practices?

A.Use collectd on on-prem servers to send to Cloud Monitoring via the Stackdriver agent configuration.

B.Deploy Prometheus on both environments and use the PromQL adapter for Cloud Monitoring.

C.Use Google's managed service for Prometheus on GKE and a Prometheus federation for on-prem.

D.Install the Ops Agent on all on-prem servers and use Google's default GKE monitoring.

AnswerC

The managed service for Prometheus on GKE is fully integrated, and federation from on-prem Prometheus scales well.

Why this answer

Option C is correct because it leverages Google's managed service for Prometheus on GKE, which is fully integrated with Cloud Monitoring and eliminates the operational overhead of self-managing Prometheus. For on-premises servers, Prometheus federation allows scraping metrics from the on-prem Prometheus instance and forwarding them to the managed service, providing a unified, scalable monitoring solution that aligns with Google's best practices for hybrid environments.

Exam trap

Google Cloud often tests the misconception that self-managed Prometheus with a custom adapter is the most scalable solution, when in fact Google's managed service eliminates operational overhead and provides native integration with Cloud Monitoring, making it the best practice for hybrid environments.

How to eliminate wrong answers

Option A is wrong because collectd is a legacy agent that requires manual configuration and does not natively integrate with Cloud Monitoring's modern metric pipeline; the Stackdriver agent is deprecated in favor of the Ops Agent. Option B is wrong because deploying self-managed Prometheus on both environments and using the PromQL adapter adds unnecessary complexity and does not leverage Google's managed service, which provides automatic scaling, high availability, and native integration with Cloud Monitoring. Option D is wrong because the Ops Agent is designed for on-premises VMs but does not provide the same level of integration for GKE clusters as the managed Prometheus service; using Google's default GKE monitoring lacks the flexibility and advanced querying capabilities of Prometheus for custom metrics.

Full explanation →

879

MCQhard

A DevOps team is implementing chaos engineering for their Cloud SQL for PostgreSQL database. They want to simulate a zone failure that triggers automatic failover of their HA instance without causing data loss. Which approach should they use?

A.Stop the Cloud SQL instance using gcloud sql instances patch --activation-policy NEVER

B.Use the gcloud sql instances failover command

C.Restrict network access to the primary instance

D.Delete the primary database

AnswerB

The 'gcloud sql instances failover' command triggers a graceful failover to the standby zone, simulating a zone failure without data loss.

Why this answer

Cloud SQL HA instances have automatic zone failover. To simulate a zone failure, you can use the gcloud command to trigger failover manually. This performs a planned failover that switches to the standby zone without data loss, mimicking a zone outage.

Full explanation →

880

MCQhard

A large enterprise is designing a centralized DevOps platform across multiple business units. They want to use a shared CI/CD pipeline that deploys to projects in different folders. Which approach ensures secure, auditable deployments while minimizing IAM administration?

A.Use a cross-project service account in the CI/CD project with required roles (e.g., Cloud Run Admin, Compute Admin) on target projects via IAM.

B.Use Cloud Build triggers directly in each target project with separate code repositories.

C.Grant the Cloud Build Editor role to all developers across projects to allow them to create pipelines.

D.Create a separate service account in each target project with the Cloud Build service agent role, and use impersonation from the CI/CD project.

AnswerA

Centralized service account with cross-project IAM is best practice; it simplifies management and audit.

Why this answer

Option A is correct because a cross-project service account in the CI/CD project, granted the necessary roles (e.g., Cloud Run Admin, Compute Admin) on target projects via IAM, allows the shared pipeline to deploy resources across folders without duplicating service accounts. This centralizes IAM administration, ensures auditability through a single identity, and follows the principle of least privilege by granting only required roles on target projects.

Exam trap

The trap here is that candidates often confuse the Cloud Build service agent role (used for internal Cloud Build operations) with the cross-project service account pattern, leading them to choose Option D, which adds unnecessary administrative overhead instead of leveraging IAM's native cross-project delegation.

How to eliminate wrong answers

Option B is wrong because using Cloud Build triggers directly in each target project with separate code repositories defeats the purpose of a centralized DevOps platform, increasing IAM administration overhead and fragmenting audit trails across multiple projects. Option C is wrong because granting the Cloud Build Editor role to all developers across projects violates the principle of least privilege, introduces excessive permissions, and undermines auditable deployments by allowing developers to create arbitrary pipelines. Option D is wrong because creating a separate service account in each target project with the Cloud Build service agent role and using impersonation from the CI/CD project adds unnecessary IAM complexity and administrative burden, as the cross-project service account approach in Option A achieves the same goal more efficiently.

Full explanation →

881

Multi-Selecteasy

Which TWO statements about bootstrapping a Google Cloud organization for DevOps are correct?

Select 2 answers

A.After enabling the cloudresourcemanager.googleapis.com API, organization policies are automatically applied.

B.Cloud Asset Inventory can be used to discover all resources in the organization.

C.All projects in an organization automatically share a default VPC network.

D.Cloud Audit Logs are disabled by default and must be enabled for each service.

E.Organization policies can be applied at the organization, folder, or project level.

AnswersB, E

Correct: Cloud Asset Inventory provides a historical view of all resources.

Why this answer

Cloud Asset Inventory provides a complete view of all resources (e.g., Compute Engine instances, Cloud Storage buckets, IAM policies) across the entire organization, including all folders and projects. This is essential for DevOps bootstrapping to audit, monitor, and manage resources at scale. It uses the Cloud Asset API to export asset metadata and supports real-time feeds for change detection.

Exam trap

Google Cloud often tests the misconception that organization policies are automatically applied after enabling an API, or that Cloud Audit Logs are disabled by default, when in fact Admin Activity logs are always enabled and Data Access logs require explicit activation.

Full explanation →

882

MCQhard

A financial services firm runs a multi-region Spanner instance with the nam-eur-asia1 configuration. They need to ensure that if the leader region (us-central1) becomes unavailable, failover to another region occurs automatically within 1 minute. What is the expected RPO and RTO for this scenario?

A.RPO ~ 15 seconds, RTO < 1 minute

B.RPO ~ 0, RTO ~ 15 seconds

C.RPO ~ 1 hour, RTO ~ 5 minutes

D.RPO = 0, RTO < 1 minute

AnswerA

This matches Spanner's documented RPO and RTO for multi-region failover.

Why this answer

Spanner multi-region configurations provide automatic failover with RTO under 1 minute. For the nam-eur-asia1 configuration, it is a multi-region with multiple read-write replicas. The RPO is approximately 15 seconds because Spanner uses quorum-based replication; in the event of a region failure, a small number of recent transactions may be uncommitted.

Full explanation →

883

MCQeasy

A Cloud Run service is experiencing increased cold start latency. The service is written in Python and uses several large dependencies. Which action would most effectively reduce cold start latency?

A.Set concurrency to 1 to ensure each request gets a dedicated container.

B.Increase the CPU allocation to 4 vCPUs.

C.Set a minimum number of instances to keep containers warm.

D.Increase memory to 2 GiB.

AnswerC

Min instances eliminate cold start by keeping containers ready.

Why this answer

Option C is correct because setting a minimum number of instances ensures that the Cloud Run service always has a pool of warm containers ready to serve requests, eliminating the cold start penalty. Cold starts in Python are particularly severe due to the time required to import large dependencies (e.g., NumPy, TensorFlow) and initialize the runtime. By keeping containers alive, you bypass the entire initialization phase, directly addressing the root cause of increased latency.

Exam trap

Google Cloud often tests the misconception that increasing CPU or memory directly reduces cold start latency, when in fact cold starts are primarily caused by initialization overhead (dependency loading, runtime startup) that is not mitigated by resource scaling.

How to eliminate wrong answers

Option A is wrong because setting concurrency to 1 does not reduce cold start latency; it forces each request to have a dedicated container, which can actually increase the number of cold starts if the service scales up, and it wastes resources without addressing the initialization delay. Option B is wrong because increasing CPU allocation speeds up request processing after the container is warm, but it does not reduce the time taken to import large Python dependencies or start the application—cold start latency is dominated by I/O and import overhead, not CPU speed. Option D is wrong because increasing memory provides more headroom for the container but does not affect the initialization sequence; cold start latency is caused by loading dependencies and starting the runtime, not by memory pressure.

Full explanation →

884

MCQmedium

A company is setting up a new Google Cloud organization for DevOps. They want to enforce that all projects have a specific set of VPC Service Controls perimeters. Which approach should they use to ensure these perimeters are automatically applied to all new projects?

A.Configure Cloud Shell to run a script that creates a perimeter when a new project is created.

B.Define an organization policy with a constraint that requires all projects to be within a perimeter.

C.Use Deployment Manager to deploy a configuration that creates a perimeter for each new project.

D.Create a VPC Service Controls perimeter and add the organization node as a member.

AnswerB

Organization policies can enforce constraints like 'vpcServiceControls' across projects.

Why this answer

Option B is correct because Google Cloud Organization Policies allow you to define and enforce constraints at the organization, folder, or project level. The `constraints/compute.restrictVpcServiceControls` constraint can be set to require all new projects to be within a specific VPC Service Controls perimeter, ensuring automatic enforcement without manual intervention.

Exam trap

The trap here is that candidates often confuse VPC Service Controls perimeter membership (which is a resource-level attribute) with organization policy enforcement (which is a hierarchical governance mechanism), leading them to choose Option D or A instead of the correct policy-based approach.

How to eliminate wrong answers

Option A is wrong because Cloud Shell scripts are not a scalable or reliable mechanism for enforcing policies on all new projects; they require manual execution or a separate trigger and do not provide automatic, organization-wide enforcement. Option C is wrong because Deployment Manager is an infrastructure-as-code tool for deploying resources, but it does not automatically apply to every new project created outside of its deployment scope; it would require a separate deployment per project. Option D is wrong because adding the organization node as a member to a VPC Service Controls perimeter does not automatically enforce that all projects within the organization are inside the perimeter; it only allows the organization to be a member, but projects must still be explicitly added or constrained via policy.

Full explanation →

885

MCQhard

A large enterprise is migrating to Google Cloud and wants to bootstrap their organization for DevOps. They have multiple business units, each needing their own folder with projects. Security requires that all projects in the 'prod' folder must have a specific set of organization policies enforced, such as restricting service account key creation. They also want to allow individual teams to create project-level policies as long as they don't conflict with the organization policies. Which approach ensures this while minimizing administrative overhead?

A.Set the required organization policies on the 'prod' folder and allow teams to set additional policies at the project level as long as they don't conflict.

B.Set organization policies at the organization level and use IAM conditions to apply them only to the prod folder.

C.Create custom roles containing the required constraints and assign them to the team's IAM members.

D.Place all production workloads in a single project and use VPC Service Controls for security.

AnswerA

Folder-level policies are inherited; project policies can add restrictions but cannot relax them.

Why this answer

Option A is correct because Google Cloud Organization Policies can be set at the folder level, allowing the 'prod' folder to inherit constraints like `iam.disableServiceAccountKeyCreation` across all its projects. Teams can then add additional project-level policies that are more restrictive, as long as they do not conflict with the inherited folder-level policies, which is enforced by the policy hierarchy. This minimizes administrative overhead by centralizing mandatory controls at the folder level while delegating flexibility to teams.

Exam trap

The trap here is confusing IAM roles and conditions with organization policy constraints, leading candidates to incorrectly select Option B or C, when in fact organization policies are a separate, hierarchical mechanism that cannot be bypassed by IAM or custom roles.

How to eliminate wrong answers

Option B is wrong because organization policies cannot be applied selectively using IAM conditions; IAM conditions control access to resources, not the enforcement of organization policy constraints. Option C is wrong because custom roles define IAM permissions, not organization policy constraints; constraints like restricting service account key creation are enforced via organization policies, not IAM roles. Option D is wrong because placing all production workloads in a single project violates the requirement for multiple business units to have their own folders and projects, and VPC Service Controls address data exfiltration, not organization policy enforcement.

Full explanation →

886

MCQeasy

A company uses Error Budgets for their service. The SLO is 99.9% availability over a 30-day window. The service has been down for 30 minutes in the current window. What is the remaining error budget?

A.43.2 minutes

B.60 minutes

C.13.2 minutes

D.30 minutes

AnswerC

Calculation: 0.001 * 43200 minutes = 43.2 minutes budget, minus 30 = 13.2.

Why this answer

The SLO of 99.9% over a 30-day window allows a total error budget of 43.2 minutes (30 days × 24 hours × 60 minutes × 0.001). The service has already consumed 30 minutes of downtime, so the remaining error budget is 43.2 - 30 = 13.2 minutes. Option C is correct because it reflects this precise calculation.

Exam trap

Google Cloud often tests the distinction between total error budget and remaining error budget, trapping candidates who forget to subtract the already consumed downtime from the total allowable downtime.

How to eliminate wrong answers

Option A is wrong because 43.2 minutes is the total error budget for the 30-day window, not the remaining budget after 30 minutes of downtime. Option B is wrong because 60 minutes would correspond to an SLO of approximately 99.86% (43.2 minutes is the correct total for 99.9%), and it does not account for the 30 minutes already consumed. Option D is wrong because 30 minutes is simply the downtime already incurred, not the remaining error budget.

Full explanation →

887

MCQeasy

A company needs to choose a Google Cloud database for a globally distributed application that requires strong consistency across continents and an SLA of 99.999% for availability. Which database service meets these requirements?

A.Cloud Bigtable

B.Cloud SQL

C.Firestore

D.Cloud Spanner

AnswerD

Spanner provides global strong consistency and a 99.999% SLA for multi-region configurations.

Why this answer

Cloud Spanner is the only Google Cloud database that provides global strong consistency and a 99.999% availability SLA when using a multi-region configuration. Bigtable offers only eventual consistency across regions, Cloud SQL is regional, and Firestore provides strong consistency only within a single region.

Full explanation →

888

Multi-Selecteasy

A company is using Cloud Spanner and wants to back up a database to Cloud Storage for long-term retention. They also need to restore the database to a specific point in time within the last 7 days. Which two features should they use? (Choose TWO.)

Select 2 answers

A.Set up a cross-region replica for backup

B.Use Spanner's point-in-time recovery feature to restore to a specific timestamp

C.Enable version management on the Spanner instance

D.Export the database using gcloud spanner databases export

E.Create a backup of the Spanner database using gcloud spanner backups create

AnswersB, E

Spanner allows restoring to a specific time within the retention period.

Why this answer

Spanner supports creating backups and restoring from them. Backups can be stored in Cloud Storage (as part of the backup process) and can be used for point-in-time recovery within a configurable retention period (up to 7 days by default).

Full explanation →

889

MCQeasy

A team wants to monitor a Google Cloud Run service for application crashes. Which Google Cloud tool automatically captures and notifies on application errors?

A.Cloud Logging

B.Cloud Monitoring

C.Cloud Console

D.Error Reporting

AnswerD

Error Reporting automatically aggregates errors and can send notifications.

Why this answer

Error Reporting (D) is the correct answer because it is a Google Cloud service specifically designed to automatically capture, aggregate, and notify on application errors, including crashes in Cloud Run services. It ingests error events from Cloud Logging and provides real-time alerts and dashboards, making it the dedicated tool for this use case.

Exam trap

Google Cloud often tests the distinction between log storage (Cloud Logging) and error-specific analysis (Error Reporting), leading candidates to mistakenly choose Cloud Logging because they think 'logs contain errors, so that must be the tool.'

How to eliminate wrong answers

Option A is wrong because Cloud Logging is a centralized log storage and querying service; it does not automatically parse or notify on application errors without additional configuration (e.g., log-based metrics or sinks). Option B is wrong because Cloud Monitoring focuses on metrics, uptime checks, and alerting based on performance thresholds, not on automatically capturing and categorizing application crash errors. Option C is wrong because Cloud Console is a web-based UI for managing Google Cloud resources; it provides no automated error capture or notification capabilities.

Full explanation →

890

MCQhard

A company uses Cloud Monitoring to track latency for a multi-region web application. The SLO is 99.9% of requests under 500ms over a 30-day rolling window. The error budget has been rapidly depleting over the last week. The operations team wants to understand the impact of recent deployments. Which approach should they use to correlate deployment changes with latency spikes?

A.Use Cloud Logging to search for deployment logs and manually compare with latency metrics

B.Use Cloud Trace to analyze latency distributions for each deployment version

C.Create a custom dashboard in Cloud Monitoring that includes latency charts and use annotation markers to indicate deployment times

D.Configure Error Reporting to alert on latency threshold breaches

AnswerC

Annotation markers allow you to overlay deployment events on time-series charts, making it easy to correlate changes with latency spikes.

Why this answer

Option C is correct because Cloud Monitoring supports custom dashboards with annotation markers that can be programmatically or manually added to indicate deployment events. By overlaying these markers on latency charts, the operations team can visually correlate deployment times with latency spikes, enabling direct root-cause analysis without manual log searching or separate tools.

Exam trap

Google Cloud often tests the distinction between monitoring tools (Cloud Monitoring for dashboards and annotations) versus debugging tools (Cloud Trace for per-request analysis) or logging tools (Cloud Logging for raw logs), leading candidates to choose a tool that addresses part of the problem but not the correlation requirement.

How to eliminate wrong answers

Option A is wrong because manually searching Cloud Logging for deployment logs and comparing them with latency metrics is inefficient, error-prone, and does not provide a real-time or automated correlation; it relies on manual cross-referencing, which is not scalable for a multi-region application. Option B is wrong because Cloud Trace is designed for distributed tracing of individual requests and analyzing latency distributions per version, but it does not natively support overlaying deployment timelines or providing a high-level dashboard view for correlation with deployment events. Option D is wrong because Error Reporting is focused on aggregating and alerting on application errors (e.g., exceptions, crashes), not on latency threshold breaches; configuring it to alert on latency would misuse its purpose, and it lacks the ability to correlate alerts with deployment timelines.

Full explanation →

891

MCQeasy

A Cloud SQL for PostgreSQL instance is running low on disk space. The database size is 500 GB and the current storage is 600 GB. The engineer needs to increase storage to 800 GB without downtime. What should they do?

A.Shut down the instance, edit the configuration to increase storage, then restart.

B.Use gcloud sql instances patch to increase storage size to 800 GB.

C.Create a new instance with 800 GB storage and migrate data using pg_dump.

D.Enable auto-storage increase and wait for it to happen automatically.

AnswerB

The patch command can increase storage online without downtime.

Why this answer

Option B is correct because Cloud SQL for PostgreSQL supports online storage increases using the `gcloud sql instances patch` command, which resizes the disk without requiring an instance restart or downtime. The storage can be increased up to the maximum allowed for the instance tier, and the change takes effect immediately while the instance remains available.

Exam trap

Cisco often tests the misconception that storage changes require downtime or instance restart, leading candidates to choose the shutdown option (A) instead of the online patch command (B).

How to eliminate wrong answers

Option A is wrong because shutting down the instance causes downtime, which violates the requirement for zero downtime; Cloud SQL allows storage increases without stopping the instance. Option C is wrong because creating a new instance and migrating with pg_dump involves significant downtime during the dump and restore process, and is unnecessarily complex when a simple online resize is available. Option D is wrong because auto-storage increase only triggers when storage usage reaches a threshold (typically 90% or more), and it increases storage by a fixed increment (e.g., 10 GB or 15% of current size), not to a specific target like 800 GB; it also may not activate immediately and does not guarantee the exact desired size.

Full explanation →

892

MCQmedium

A Memorystore for Redis instance is running out of memory. The application uses Redis as a cache with key expiration. You want to prevent data loss for keys that have not reached their TTL. Which eviction policy should you configure?

A.noeviction

B.volatile-ttl

C.allkeys-lru

D.volatile-lru

AnswerD

volatile-lru evicts keys with an expire set (TTL) using LRU, preserving keys without TTL. This prevents data loss for keys that have no TTL.

Full explanation →

893

MCQmedium

A company uses BigQuery flat-rate pricing with 1000 slots. During peak hours, queries are queued, but during off-peak, many slots are idle. What is the most cost-effective way to handle the idle slots?

A.Purchase additional slots to reduce queuing

B.Do nothing; idle slots are already paid for

C.Use flex slots to add capacity only during peak

D.Change to on-demand pricing to pay only for data scanned

AnswerC

Flex slots are short-term commitments that provide extra capacity when needed.

Why this answer

Option B is correct because flex slots allow short-term capacity additions during peak, avoiding idle costs. Option A wastes money on idle slots. Option C increases cost.

Option D may not be cheaper overall.

Full explanation →

894

MCQeasy

A team is planning a one-time migration of a 500 GB MySQL database to Cloud SQL using Database Migration Service. They want to minimize the impact on the source database. Which mysqldump flags should they use when taking the initial snapshot?

A.--master-data=2 and --single-transaction

B.--lock-all-tables and --flush-logs

C.--single-transaction and --skip-lock-tables

D.--all-databases and --routines

AnswerC

These flags allow a consistent snapshot without locking tables.

Why this answer

For a consistent snapshot without locking InnoDB tables, use --single-transaction. --skip-lock-tables avoids table locks. --master-data is not needed for DMS one-time migration.

Full explanation →

895

MCQeasy

A startup needs a database for a global user base with low-latency reads and writes, strong consistency, and the ability to scale horizontally without downtime. They anticipate variable traffic. Which Google Cloud database service meets these requirements?

A.Cloud Bigtable

B.Cloud SQL

C.Firestore

D.Cloud Spanner

AnswerD

Spanner offers global, strongly consistent, scalable database.

Why this answer

Cloud Spanner provides global distribution, strong consistency, horizontal scaling, and no downtime for schema changes or scaling. Cloud SQL is not global, Bigtable does not have strong consistency, and Firestore is not global with strong consistency for multi-region.

Full explanation →

896

MCQhard

An engineer is designing a Cloud Spanner table for a global user activity tracking system with high write throughput. Which primary key design is BEST to avoid hotspots?

A.Monotonically increasing integer (INT64) with auto-increment

B.Composite key with user_id as first part

C.UUID string (generated by application)

D.Timestamp as primary key

AnswerC

UUIDs are randomly distributed, avoiding write hotspots.

Why this answer

Using a UUID or hash-prefixed key distributes writes evenly across nodes, preventing hotspots that occur with monotonically increasing keys.

Full explanation →

897

MCQhard

A large stateful service running on Compute Engine experiences variable performance due to CPU throttling from noisy neighbors. Which solution provides the most consistent performance?

A.Enable live migration for the VMs

B.Use sole-tenant nodes to isolate the VMs

C.Use preemptible VMs for stateful workloads

D.Purchase committed use discounts for lower cost

AnswerB

Sole-tenant nodes ensure your VMs are the only ones on the physical machine, eliminating neighbor noise.

Why this answer

Sole-tenant nodes ensure that your VMs are the only ones running on the underlying physical server, eliminating resource contention from other tenants (noisy neighbors). This provides consistent CPU performance because the vCPUs are not oversubscribed and the full physical core capacity is dedicated to your instances.

Exam trap

The trap here is that candidates confuse live migration (which maintains availability during host maintenance) with performance isolation, or assume that committing to a discount (CUD) implies dedicated resources.

How to eliminate wrong answers

Option A is wrong because live migration moves a running VM to another host without downtime but does not prevent noisy neighbor contention on either the source or destination host. Option C is wrong because preemptible VMs are designed for fault-tolerant, stateless batch workloads and can be terminated at any time, making them unsuitable for stateful services that require persistent data and consistent performance. Option D is wrong because committed use discounts reduce cost in exchange for a 1- or 3-year commitment but do not affect CPU throttling or noisy neighbor isolation.

Full explanation →

898

MCQhard

A company uses Cloud Storage with standard storage class for all data. They want to automatically move data that has been accessed more than 30 days ago to a lower-cost storage class, and after 90 days to archive. What should they configure?

A.Lifecycle management rules.

B.Bucket lock.

C.Retention policy.

D.Object versioning.

AnswerA

Lifecycle management enables automated transitions to reduce costs.

Why this answer

Lifecycle management rules can automatically transition objects to different storage classes based on age or last access time.

Full explanation →

899

MCQhard

Your organization is bootstrapping a new Google Cloud environment for a DevOps team. The team consists of 15 engineers who will be working on multiple microservices deployed across several projects. You have created a folder called 'devops' under the organization node. Within this folder, you plan to create three projects: 'devops-dev', 'devops-staging', and 'devops-prod'. You want to enforce that all resources in these projects are created in a specific region (us-central1) and that no external IP addresses can be assigned to Compute Engine instances. Additionally, you want to ensure that all service accounts used by the applications have minimal permissions. After setting up the organization policies, you notice that a developer was able to create a Compute Engine instance with an external IP in the 'devops-dev' project. You check the organization policy constraints and find that the constraint 'compute.vmExternalIpAccess' is set to 'Deny' at the organization level, but the developer bypassed it. What is the most likely reason?

A.The project 'devops-dev' has a policy that overrides the organization-level deny.

B.The organization policy has not propagated to all projects yet.

C.The developer used the wrong constraint name; the correct constraint is 'compute.restrictExternalIp'.

D.The developer tagged the instance with a tag that exempts it from the organization policy.

AnswerA

Project-level policies override organization-level policies if they are less restrictive.

Why this answer

Option A is correct because organization policies can be overridden at a lower level in the resource hierarchy. Even though the constraint 'compute.vmExternalIpAccess' is set to 'Deny' at the organization level, a policy at the project level (or folder level) with a higher priority or a different binding can allow external IPs. In Google Cloud, organization policies are inherited by default, but a child policy can override the parent if it is explicitly set to 'Allow' or if the deny list is not enforced.

The developer likely had a project-level policy that allowed external IPs, bypassing the organization-level deny.

Exam trap

Google Cloud often tests the misconception that organization policies are absolute and cannot be overridden, but in reality, policies can be overridden at lower hierarchy levels unless explicitly configured to be enforced with a 'denyAll' or by using a boolean constraint that cannot be overridden.

How to eliminate wrong answers

Option B is wrong because organization policies propagate almost immediately to all projects under the hierarchy; there is no significant propagation delay that would allow a bypass. Option C is wrong because the correct constraint name for controlling external IPs on Compute Engine instances is 'compute.vmExternalIpAccess', not 'compute.restrictExternalIp'; the latter is not a valid Google Cloud constraint. Option D is wrong because tags do not exempt resources from organization policies; tags are used for metadata and access control, not for policy exemptions.

Full explanation →

900

MCQeasy

A team uses Cloud Build to deploy a Cloud Run service. The build fails with: 'ERROR: (gcloud.run.services.update) PERMISSION_DENIED: Permission 'run.services.update' denied on resource.' The Cloud Build service account has the Cloud Run Admin role. What is missing?

A.The build config must use the Cloud Run deployer step instead of the gcloud command.

B.The Cloud Build service account should have the Owner role on the project.

C.The Cloud Run service must be deployed in the same region as the build.

D.The Cloud Build service account needs the 'run.services.update' permission or the Cloud Run Admin role.

AnswerD

The error indicates missing permissions; Cloud Run Admin includes it.

Why this answer

Option D is correct because the error message explicitly states that the 'run.services.update' permission is denied, which means the Cloud Build service account lacks this specific permission. Although the Cloud Run Admin role includes 'run.services.update', the error indicates the role is not properly assigned or the service account is not using it. Reassigning the Cloud Run Admin role or directly granting the 'run.services.update' permission resolves the issue.

Exam trap

Google Cloud often tests the misconception that using a specific step type (like Cloud Run deployer) bypasses IAM requirements, when in fact all deployment methods require the same underlying permissions.

How to eliminate wrong answers

Option A is wrong because the Cloud Run deployer step is a convenience wrapper that still requires the same underlying IAM permissions; using it instead of the gcloud command does not bypass permission checks. Option B is wrong because the Owner role is overly permissive and unnecessary; the Cloud Run Admin role (roles/run.admin) already includes all required Cloud Run permissions, including 'run.services.update'. Option C is wrong because Cloud Run deployments are not region-restricted by the build's region; the service can be deployed to any region regardless of where Cloud Build runs.

Full explanation →

Page 12 of 14

All pages

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Practice PCDOE by domain

Target a specific domain to shore up weak areas.

Design and Plan Database Solutions Manage Database Solutions Migrate Database Solutions Design for Reliability, Scalability, and Disaster Recovery Bootstrapping a Google Cloud organization for DevOps Managing service incidents Managing Google Cloud costs Building and implementing CI/CD pipelines Implementing service monitoring strategies Optimizing service performance

See all domains with question counts →

Google Professional Cloud DevOps Engineer PCDOE Questions 826–900 | Page 12/14 | Courseiva