Knowledge + Practice

CCNA Service Performance Questions

75 of 113 questions · Page 1/2 · Service Performance topic · Answers revealed

Practice these questions Exam hub All questions

1

Multi-Selecteasy

A company serves static content using a global HTTP(S) load balancer with Cloud CDN. They want to maximize the cache hit ratio. Which two actions should they take?

Select 2 answers

A.Use signed URLs for all requests.

B.Set Cache-Control: public, max-age=31536000.

C.Enable Cloud CDN with cache key based on URL and host.

D.Set Cache-Control: private.

E.Enable Identity-Aware Proxy (IAP) for the backend.

AnswersB, C

A long max-age allows content to be cached for a year, maximizing cache hits.

Why this answer

Setting Cache-Control: public, max-age=31536000 instructs browsers and intermediate caches to store the response for one year, maximizing the likelihood that subsequent requests are served from cache. This long max-age reduces the need for revalidation, directly improving cache hit ratio.

Exam trap

Google Cloud often tests the misconception that signed URLs or IAP improve caching, when in fact they introduce per-request variability that reduces cache hit ratio, and that Cache-Control: private is appropriate for static content when it actually prevents caching entirely.

Practice this question →

2

MCQmedium

A DevOps team wants to autoscale a GKE Deployment based on a custom metric exposed by the application. The metric is available via an HTTP endpoint. Which approach should they use to integrate this metric with the Horizontal Pod Autoscaler (HPA)?

A.Deploy a Prometheus Operator with the kube-state-metrics adapter and configure the HPA to use the custom metric.

B.Expose the metric via an Ingress and configure HPA to read from the Ingress metrics.

C.Use the standard CPU-based HPA and map the custom metric to CPU usage via a script.

D.Configure the Stackdriver Metrics Adapter to collect the metric from the endpoint.

AnswerA

Prometheus adapter can scrape custom endpoints and expose metrics to the custom.metrics.k8s.io API used by HPA.

Why this answer

Option A is correct because the Prometheus Operator, combined with the kube-state-metrics adapter (or the prometheus-adapter), allows HPA to consume custom metrics from a Prometheus server that scrapes the application's HTTP endpoint. The adapter exposes these metrics via the custom.metrics.k8s.io API, which HPA natively queries. This is the standard approach for integrating application-specific HTTP metrics into Kubernetes autoscaling.

Exam trap

Google Cloud often tests the misconception that any HTTP endpoint can be directly plugged into HPA, but the trap here is that HPA requires a metrics API adapter (like prometheus-adapter or Stackdriver adapter) to bridge the gap between the raw metric source and the Kubernetes custom metrics API.

How to eliminate wrong answers

Option B is wrong because Ingress does not expose application-level custom metrics to the HPA; Ingress metrics are typically for traffic routing and not designed to be consumed by the custom metrics API. Option C is wrong because HPA's CPU-based autoscaling cannot be mapped to custom metrics via a script; HPA requires a dedicated metrics API (custom.metrics.k8s.io) and does not support arbitrary mapping of custom metrics to CPU usage. Option D is wrong because the Stackdriver Metrics Adapter collects metrics from Google Cloud Monitoring, not directly from an HTTP endpoint; it would require the application to push metrics to Stackdriver, not expose them via an HTTP endpoint.

Practice this question →

3

MCQeasy

A latency-sensitive web application uses Cloud CDN. What configuration change would most directly reduce cache miss rates?

A.Enable Cloud Armor to filter malicious traffic

B.Use signed URLs to restrict access

C.Set TTL to 0 to ensure content is always fresh

D.Enable cache static content and set appropriate cache modes

AnswerD

Caching static files reduces origin requests and cache misses, improving latency.

Why this answer

Enabling cache static content and setting appropriate cache modes directly reduces cache miss rates by ensuring that more requests are served from the Cloud CDN cache rather than forwarded to the origin. This configuration allows the CDN to store and serve static assets (e.g., images, CSS, JavaScript) based on cache keys and TTLs, minimizing the number of cache misses for eligible content.

Exam trap

Google Cloud often tests the misconception that reducing TTL or using security features like signed URLs improves cache performance, when in fact these actions either increase cache misses or have no effect on caching behavior.

How to eliminate wrong answers

Option A is wrong because Cloud Armor filters malicious traffic at the edge but does not affect cache hit ratios; it protects against DDoS and web attacks but does not change caching behavior. Option B is wrong because signed URLs restrict access to content for security purposes but do not influence cache miss rates; they may even increase misses if unique URLs bypass cache. Option C is wrong because setting TTL to 0 forces the CDN to treat every request as a cache miss, requiring revalidation with the origin for each request, which drastically increases cache miss rates and defeats the purpose of caching.

Practice this question →

4

MCQhard

Your company runs a multi-region e-commerce platform on Google Kubernetes Engine (GKE) with services in us-central1 and europe-west1. The application uses a global external HTTP(S) load balancer with Cloud CDN for static assets. Recently, users in Asia report that product images take 5-10 seconds to load, while users in the US and Europe experience sub-second load times. You check the Cloud CDN cache hit ratio and see it is 95% globally. You also notice that the images are served from a backend bucket in us-central1. The load balancer uses the default routing configuration. Your team has implemented client-side caching with Cache-Control headers set to public, max-age=3600. What is the most likely cause of the high latency for Asian users?

A.The load balancer is not using premium tier networking, so traffic from Asia takes a longer path.

B.Client-side caching with max-age=3600 is too short, causing frequent revalidation.

C.Cloud CDN does not have edge caches in Asia, so requests are served from the nearest available edge location, which may be far from users.

D.The cache hit ratio is too low for Asian users due to different traffic patterns.

AnswerC

Cloud CDN edge locations in Asia may be limited; first request latency is high due to cache miss.

Why this answer

Option C is correct because Cloud CDN does not have edge caches in Asia; the nearest edge locations for Asian users are in the western United States (e.g., Los Angeles) or possibly Australia, resulting in higher latency. Even with a 95% global cache hit ratio, the physical distance from Asia to the serving edge increases round-trip time (RTT) significantly, causing 5-10 second load times. The default routing configuration of the global external HTTP(S) load balancer does not automatically optimize for regional cache presence.

Exam trap

Google Cloud often tests the misconception that a high global cache hit ratio guarantees low latency for all users, ignoring the impact of geographic distance between edge caches and end users.

How to eliminate wrong answers

Option A is wrong because the global external HTTP(S) load balancer uses Premium Tier networking by default, which provides a single anycast IP and routes traffic over Google's global network, not the public internet, so the path is already optimized. Option B is wrong because client-side caching with max-age=3600 (1 hour) is reasonable for static assets, and revalidation would only add a conditional request (304 Not Modified) which is fast; the issue is not revalidation frequency but the distance to the serving edge. Option D is wrong because the cache hit ratio is 95% globally, indicating that the majority of requests are served from cache; even if Asian users had a slightly different pattern, the high global ratio suggests cache misses are not the primary cause of latency.

Practice this question →

5

MCQeasy

A DevOps team wants to serve static content from a Cloud Storage bucket with low latency globally. They also need TLS termination. Which load balancer type should they use?

A.External Network Load Balancer

B.SSL Proxy Load Balancer

C.Internal HTTP(S) Load Balancer

D.External HTTP(S) Load Balancer

AnswerD

External HTTP(S) LB supports backend buckets, global anycast IP, and TLS termination.

Why this answer

External HTTP(S) Load Balancer is the correct choice because it provides global anycast IP addresses, TLS termination at the Google Front End (GFE), and integrates directly with Cloud Storage buckets as a backend. This enables low-latency content delivery worldwide while offloading SSL decryption to the load balancer.

Exam trap

Google Cloud often tests the misconception that any load balancer with 'SSL' or 'Proxy' in its name can serve web content to Cloud Storage, but only the External HTTP(S) Load Balancer provides the necessary HTTP protocol support and global anycast routing for static content delivery.

How to eliminate wrong answers

Option A is wrong because External Network Load Balancer operates at Layer 4 (TCP/UDP) and does not support TLS termination or HTTP-based routing to Cloud Storage backends. Option B is wrong because SSL Proxy Load Balancer terminates TLS but is designed for non-HTTP traffic and cannot route directly to Cloud Storage buckets as a backend service. Option C is wrong because Internal HTTP(S) Load Balancer is regional and cannot serve content globally with low latency; it also does not support external clients.

Practice this question →

6

MCQhard

Which tool can be used to capture and analyze latency spikes in a distributed application?

A.Cloud Logging

B.Cloud Debugger

C.Cloud Monitoring

D.Cloud Trace

AnswerD

Correct. Trace captures end-to-end latency and identifies spikes.

Why this answer

Cloud Trace is the correct tool for capturing and analyzing latency spikes in a distributed application because it provides end-to-end latency tracking by instrumenting requests as they traverse microservices. It collects trace spans with timing data, allowing you to identify bottlenecks and pinpoint the exact service or operation causing the spike.

Exam trap

Google Cloud often tests the distinction between Cloud Monitoring (metrics) and Cloud Trace (distributed tracing), so the trap here is assuming that latency spikes are a metric problem solvable by Cloud Monitoring, when in fact they require trace-level analysis to identify the root cause across service boundaries.

How to eliminate wrong answers

Option A is wrong because Cloud Logging is designed for centralized log storage and querying, not for capturing distributed latency data or tracing request paths across services. Option B is wrong because Cloud Debugger is used for inspecting application state and code execution in production without stopping the app, but it does not capture latency metrics or trace request flows. Option C is wrong because Cloud Monitoring focuses on collecting and alerting on metrics (e.g., CPU, memory) and uptime checks, but it lacks the distributed tracing capability needed to analyze per-request latency spikes across services.

Practice this question →

7

MCQmedium

A team notices that Cloud SQL read replicas are not handling read traffic efficiently, causing high latency for read-heavy queries. What is the best approach to improve read performance?

A.Use a connection pooling proxy like ProxySQL

B.Use Cloud Memorystore to cache frequent query results

C.Enable MySQL query cache

D.Increase the number of read replicas

AnswerB

Caching reduces database load and latency for read-heavy, repetitive queries.

Why this answer

Cloud Memorystore (Redis) caches the results of frequent read queries, reducing the load on Cloud SQL read replicas and lowering latency for repeated queries. This directly addresses the root cause of inefficient read traffic by serving cached data from in-memory storage, which is orders of magnitude faster than querying a replica. It is the best approach because it offloads read-heavy workloads without requiring additional replicas or relying on deprecated features like MySQL query cache.

Exam trap

Google Cloud often tests the misconception that adding more read replicas is always the best solution for read performance, but the trap here is that replicas still execute queries against disk and do not eliminate redundant work, whereas caching directly reduces query execution frequency and latency.

How to eliminate wrong answers

Option A is wrong because ProxySQL is a connection pooling and query routing proxy that manages connections and can distribute reads to replicas, but it does not cache query results; it only improves connection efficiency, not the latency of repeated read-heavy queries. Option C is wrong because MySQL query cache is deprecated and removed in MySQL 8.0, and even when available, it is inefficient for high-concurrency workloads due to cache invalidation overhead and mutex contention. Option D is wrong because simply increasing the number of read replicas adds more nodes to handle read traffic, but it does not reduce latency for repeated queries; replicas still execute the same queries against disk, and the underlying inefficiency of redundant reads remains.

Practice this question →

8

MCQmedium

Which Cloud Run setting controls the maximum number of requests a container can handle concurrently?

A.concurrency

B.timeout

C.max-instances

D.min-instances

AnswerA

Correct. Concurrency sets the maximum concurrent requests per container.

Why this answer

Option A is correct because the `concurrency` setting in Cloud Run defines the maximum number of simultaneous requests that a single container instance can process at any given time. This directly controls how many requests are routed to a single container before Cloud Run spins up additional instances, optimizing resource utilization and preventing overload.

Exam trap

Google Cloud often tests the distinction between `concurrency` (requests per instance) and `max-instances` (total instances), leading candidates to confuse capacity scaling limits with per-instance request handling.

How to eliminate wrong answers

Option B is wrong because `timeout` controls the maximum duration a request can run before being terminated (default 300 seconds), not how many requests can be handled concurrently. Option C is wrong because `max-instances` limits the total number of container instances that can be created to handle traffic, not the concurrency per instance. Option D is wrong because `min-instances` specifies the minimum number of container instances that must remain warm and ready to serve traffic, which affects cold starts but does not control concurrent request handling.

Practice this question →

9

MCQhard

A team uses Cloud Spanner for a global application. Query performance degrades as data grows. They notice that most queries filter on a column 'customer_id' but the primary key is a UUID. What is the best approach to optimize performance?

A.Enable query optimization hints

B.Reorder the primary key to start with customer_id

C.Use interleaved tables

D.Create a secondary index on customer_id

AnswerB

Putting customer_id first in the primary key distributes writes and optimizes queries on that column.

Why this answer

In Cloud Spanner, the primary key determines the physical ordering of rows on storage tablets. When queries filter on `customer_id` but the primary key starts with a UUID, Spanner must perform a full table scan because the filter cannot leverage the key order. Reordering the primary key to start with `customer_id` allows Spanner to use efficient key-range scans, dramatically reducing the number of rows read per query.

Exam trap

Google Cloud often tests the misconception that secondary indexes are always the best solution for query performance, when in fact reordering the primary key to match the most common query filter pattern is more efficient because it avoids the extra index lookup and write overhead.

How to eliminate wrong answers

Option A is wrong because query optimization hints (e.g., `@{FORCE_INDEX}` or `@{JOIN_METHOD}`) can influence execution plans but do not fix the fundamental physical design issue of an inefficient primary key ordering; they are a band-aid, not a structural solution. Option C is wrong because interleaved tables are used to store parent-child relationships physically co-located for join performance, not to optimize single-table queries filtering on a non-key column. Option D is wrong because while a secondary index on `customer_id` would help, it introduces additional write amplification and storage overhead, and the question asks for the 'best approach' — reordering the primary key is more efficient as it avoids index lookups entirely and leverages the primary storage order.

Practice this question →

10

MCQmedium

Your team deploys a microservice on Google Kubernetes Engine (GKE) that serves an API with low latency requirements. Users report that the API occasionally times out during peak hours. You check the GKE metrics and see that CPU utilization is below 50% but memory is near 100% on the nodes. What is the most likely cause and what should you do?

A.The nodes are under-provisioned; add more nodes to the cluster.

B.The application is memory-constrained; increase memory resource limits for the pod.

C.The application is CPU-bound; increase CPU resource limits for the pod.

D.The network bandwidth is insufficient; increase the machine type for nodes.

AnswerB

Memory is near 100% on nodes, causing requests to queue and time out. Increasing memory limits allows more concurrent requests.

Why this answer

Option B is correct because the symptoms—low CPU utilization but near 100% memory usage on nodes, with API timeouts during peak hours—indicate that the application is hitting memory limits. When a pod exceeds its memory request, the kernel can OOM-kill it, causing request failures and timeouts. Increasing the memory resource limits for the pod allows it to allocate more heap or cache, preventing OOM kills and stabilizing latency.

Exam trap

Google Cloud often tests the misconception that high memory usage on nodes always means you need more nodes, but the correct action is to first check pod-level resource limits and adjust them, as adding nodes only masks the real issue of memory-constrained pods.

How to eliminate wrong answers

Option A is wrong because adding more nodes would distribute the memory load but does not address the root cause: the application itself needs more memory per pod; under-provisioned nodes would show high CPU or memory across nodes, but here memory is near 100% while CPU is low, indicating a memory bottleneck at the pod level. Option C is wrong because CPU utilization is below 50%, so the application is not CPU-bound; increasing CPU limits would not resolve memory exhaustion and could waste resources. Option D is wrong because network bandwidth issues would manifest as packet loss or high latency, not as near-100% node memory; increasing machine type might add memory but is an inefficient fix compared to adjusting pod resource limits.

Practice this question →

11

MCQhard

Refer to the exhibit. The team observes that some requests are fast while others are slow. Both requests have identical payload and response. What is the most likely cause of the latency difference?

A.The fast request hit a cached response

B.The slow request had a larger response size

C.The fast request used a different load balancer

D.The slow request used a different HTTP method

AnswerA

The cacheHit field shows true for the fast request, indicating a cache hit reduced latency.

Why this answer

The fast request hit a cached response, meaning the reverse proxy or CDN served the response from its cache without forwarding the request to the origin server. This eliminates the round-trip time to the backend and the processing time on the origin, resulting in significantly lower latency. Since both requests have identical payloads and responses, caching is the most plausible explanation for the observed difference.

Exam trap

Google Cloud often tests the misconception that latency differences must be caused by network or server-side factors, when in fact caching is the most common and simplest explanation for identical requests with different response times.

How to eliminate wrong answers

Option B is wrong because the question explicitly states both requests have identical payload and response, so the response size cannot differ. Option C is wrong because using a different load balancer would not inherently cause a latency difference for identical requests; load balancers typically add minimal, consistent overhead. Option D is wrong because the HTTP method (e.g., GET vs POST) does not affect latency for identical payloads and responses; the method only changes semantics, not the network or processing time for the same data.

Practice this question →

12

MCQhard

A company's Cloud SQL for PostgreSQL instance is experiencing performance degradation. They observe a high number of idle connections and slow transaction commit times. Which combination of actions will most effectively address this issue?

A.Add a read replica and route read-only queries to it.

B.Configure statement timeout and use PgBouncer for connection pooling.

C.Increase the storage size and enable automatic backup.

D.Drop unused indexes and run VACUUM.

AnswerB

Statement timeout kills long-running queries; connection pooling reduces idle connections.

Why this answer

The high number of idle connections and slow transaction commit times indicate connection management and resource contention issues. PgBouncer reduces overhead by pooling and reusing database connections, while statement timeout prevents long-running queries from holding locks and consuming resources, directly addressing both symptoms.

Exam trap

Google Cloud often tests the misconception that performance degradation is always solved by scaling storage or adding replicas, when the real issue is connection management and query timeout configuration.

How to eliminate wrong answers

Option A is wrong because adding a read replica only offloads read traffic, but does not reduce idle connections or improve commit latency for write transactions. Option C is wrong because increasing storage size addresses disk space or I/O throughput, not connection overhead or transaction commit delays. Option D is wrong because dropping unused indexes and running VACUUM reclaims storage and improves query planning, but does not manage idle connections or reduce commit wait times caused by connection churn.

Practice this question →

13

MCQmedium

A web application frequently reads the same set of reference data from Cloud SQL. This causes high database load and slow responses. Which design change would most improve performance?

A.Implement caching with Memorystore for Redis

B.Increase Cloud SQL machine size

C.Add a read replica

D.Use Cloud Spanner for higher throughput

AnswerA

Caching frequently read data in memory drastically reduces database load and latency.

Why this answer

Implementing caching with Memorystore reduces database load by serving repeated reads from memory, which is faster than SQL queries. Increasing machine size or adding read replicas helps but still involves database I/O; Cloud Spanner is overkill for reference data.

Practice this question →

14

MCQeasy

Which service is commonly used for time-series data and real-time analytics?

A.Bigtable

B.Cloud SQL

C.Firestore

D.Cloud Spanner

AnswerA

Correct. Bigtable handles time-series and real-time analytics at scale.

Why this answer

Bigtable is a fully managed, scalable NoSQL database optimized for large analytical and operational workloads, including time-series data and real-time analytics. It provides sub-10ms latency for high-throughput reads and writes, making it ideal for IoT, monitoring, and financial data streams. Its column-oriented storage and automatic sharding support efficient time-based queries and aggregation.

Exam trap

Google Cloud often tests the misconception that any NoSQL database (like Firestore) is suitable for time-series analytics, but Bigtable's specific architecture for high-throughput, low-latency time-ordered data is the key differentiator.

How to eliminate wrong answers

Option B (Cloud SQL) is wrong because it is a relational database for OLTP workloads, not designed for the high write throughput or time-series-specific optimizations needed for real-time analytics. Option C (Firestore) is wrong because it is a document-oriented NoSQL database for mobile and web apps, lacking native time-series features and optimized for real-time sync rather than analytical queries. Option D (Cloud Spanner) is wrong because it is a globally distributed relational database with strong consistency, but its overhead and cost make it unsuitable for the high-frequency, append-heavy patterns of time-series data.

Practice this question →

15

MCQhard

Refer to the exhibit. After applying the shown firewall rule, users report increased latency to a web application. What is the most likely cause?

A.The rule priority is set to 1000, which is too low.

B.The rule contains both allow and deny for the same traffic, creating a conflict.

C.The source range covers all IPs, causing excessive traffic.

D.The firewall rule has logging enabled, which adds overhead.

AnswerB

A rule cannot have both allow and deny; this misconfiguration likely causes packets to be dropped or processed incorrectly.

Why this answer

Correct: The rule has both allow and deny with same ports, and the rule is contradictory; the deny overrides because deny rules are evaluated after allow? Actually in VPC firewall rules, allow and deny cannot both be specified in the same rule. This is an invalid combination. The rule may cause unexpected behavior.

Option A is wrong because logging alone does not cause latency. Option B is wrong because source range is all. Option D is wrong because priority is not high.

Practice this question →

16

MCQeasy

A batch data processing job on Cloud Dataflow is running slower than expected. Which action will most directly increase throughput?

A.Enable Streaming Engine for the pipeline

B.Enable autoscaling

C.Increase the number of workers in the pipeline

D.Use FlexRS pricing model

AnswerC

More workers enable greater parallelism, increasing the processing rate.

Why this answer

Increasing the number of workers directly increases the parallelism of the pipeline, allowing more data to be processed concurrently. In Cloud Dataflow, throughput is limited by the number of available worker slots; adding workers raises the total processing capacity. This is the most direct action to increase throughput when the pipeline is CPU-bound or I/O-bound and underutilizing existing resources.

Exam trap

Google Cloud often tests the misconception that autoscaling (Option B) is a direct performance lever, when in fact it is a reactive mechanism that adjusts resources based on current utilization, not a proactive action to immediately boost throughput.

How to eliminate wrong answers

Option A is wrong because Streaming Engine is designed to improve streaming pipeline performance by offloading state storage to backend services, but it does not directly increase throughput for batch jobs; it may even add latency for batch pipelines. Option B is wrong because enabling autoscaling adjusts the number of workers dynamically based on utilization, but it does not guarantee an immediate increase in throughput—it only reacts to current load and may take time to scale up. Option D is wrong because FlexRS pricing model is a cost-saving option that provides discounts for flexible resource scheduling, but it does not affect pipeline throughput or performance.

Practice this question →

17

MCQhard

Refer to the exhibit. A team is troubleshooting a pod crash loop. Based on the exhibit, which infrastructure change should be prioritized to resolve the issue and optimize service performance?

A.Increase the pod's CPU request

B.Increase the maximum number of pods per node

C.Mount a ConfigMap or volume containing the missing file

D.Enable pod anti-affinity

AnswerC

Correct. Providing the missing file resolves the error.

Why this answer

The exhibit indicates a pod crash loop caused by a missing file, which is a configuration issue rather than a resource or scheduling problem. Mounting a ConfigMap or volume that provides the missing file directly resolves the crash by ensuring the pod has the required configuration at startup. This fix also optimizes service performance by eliminating unnecessary restarts and allowing the pod to serve traffic consistently.

Exam trap

The trap here is that candidates often assume a crash loop is always due to resource constraints or scheduling issues, but Cisco tests the ability to identify configuration-related failures by reading pod logs or events that explicitly mention a missing file.

How to eliminate wrong answers

Option A is wrong because increasing the CPU request does not address a missing file; it only guarantees CPU resources, which is irrelevant to a configuration-related crash loop. Option B is wrong because increasing the maximum number of pods per node does not fix the missing file; it only allows more pods on a node, which could worsen resource contention without resolving the root cause. Option D is wrong because enabling pod anti-affinity affects pod placement and distribution across nodes, but it does not provide the missing configuration file needed to prevent the crash loop.

Practice this question →

18

MCQmedium

A Cloud Spanner database is experiencing slow query performance. Which approach should be taken to optimize read performance without compromising consistency?

A.Increase the number of Spanner nodes to boost throughput

B.Use interleaved tables to store related rows together

C.Add secondary indexes and use read-only transactions for read queries

D.Migrate the data to Cloud Bigtable for better read performance

AnswerC

Secondary indexes avoid full table scans, and read-only transactions bypass lock overhead, improving read performance.

Why this answer

Option C is correct because secondary indexes in Cloud Spanner allow efficient lookup of rows by non-key columns, avoiding full table scans. Read-only transactions provide a consistent snapshot of data without locking, which optimizes read performance while maintaining strong consistency.

Exam trap

Google Cloud often tests the misconception that adding nodes always improves read performance, when in fact read optimization in Spanner relies on proper indexing and transaction type selection, not just scaling compute resources.

How to eliminate wrong answers

Option A is wrong because increasing Spanner nodes primarily improves write throughput and storage capacity, not read performance for individual queries; read performance is more dependent on index usage and query design. Option B is wrong because interleaved tables optimize for parent-child row locality and reduce join costs, but they do not directly improve read performance for arbitrary queries on non-key columns. Option D is wrong because migrating to Cloud Bigtable would sacrifice Spanner's strong consistency and transactional capabilities, which contradicts the requirement to not compromise consistency.

Practice this question →

19

MCQhard

An organization uses Cloud CDN with an HTTP(S) Load Balancer to serve static content. They observe that cache hit ratio is lower than expected. The content is immutable and has long Cache-Control headers. What is the most likely cause?

A.The requests include unique query parameters like session IDs.

B.The Cache-Control max-age is set too short.

C.The load balancer is configured with SSL termination.

D.The content is served using signed URLs with expiration.

AnswerA

Query parameters create different cache keys, reducing hits.

Why this answer

When Cloud CDN serves content with long Cache-Control headers but unique query parameters (like session IDs) are appended to each request, the cache treats each URL as a distinct object. This causes cache misses because the load balancer forwards requests with different query strings to the origin, preventing the CDN from serving cached responses. The correct answer is A because this behavior directly undermines cache efficiency despite proper cache headers.

Exam trap

Google Cloud often tests the misconception that cache hit ratio is solely determined by Cache-Control headers, ignoring that URL uniqueness (especially query parameters) overrides caching behavior.

How to eliminate wrong answers

Option B is wrong because the question states the content has long Cache-Control headers, so a short max-age is not the issue. Option C is wrong because SSL termination at the load balancer does not affect cache hit ratio; it only handles encryption/decryption. Option D is wrong because signed URLs with expiration are used for access control, not caching; they do not inherently reduce cache hits unless the URL changes per request, which is not implied.

Practice this question →

20

Multi-Selectmedium

A company uses Cloud SQL for their transactional database. They are experiencing slow read performance. Which THREE actions can improve read throughput? (Choose three.)

Select 3 answers

A.Enable query caching

B.Use Cloud SQL Proxy for connections

C.Add a read replica

D.Use connection pooling

E.Increase the tier size of the primary instance

AnswersC, D, E

Read replicas serve read queries, reducing load on the primary.

Why this answer

Adding a read replica offloads read queries from the primary Cloud SQL instance, distributing the read workload and improving read throughput. This is a common pattern for scaling read-heavy workloads in Cloud SQL, as replicas serve read traffic asynchronously without impacting the primary instance's write performance.

Exam trap

Google Cloud often tests the misconception that connection pooling or caching alone can solve read throughput issues, but the key is that read replicas directly scale read capacity, while connection pooling only reduces connection overhead and caching is often deprecated or ineffective in transactional databases.

Practice this question →

21

Multi-Selectmedium

A team is running a stateful application on Compute Engine with high disk I/O. They want to optimize disk performance. Which TWO actions should they take? (Choose two.)

Select 2 answers

A.Use standard persistent disk for cost savings

B.Enable disk encryption

C.Use SSD persistent disk for data

D.Use local SSD for temporary data

E.Increase disk size

AnswersC, D

SSD persistent disk offers significantly higher IOPS compared to standard disk.

Why this answer

Option C is correct because SSD persistent disks provide higher IOPS and lower latency than standard persistent disks, making them suitable for high disk I/O workloads. Option D is correct because local SSDs offer even higher performance by attaching directly to the host VM, but their data is ephemeral, making them ideal for temporary data like caches or scratch space.

Exam trap

Google Cloud often tests the misconception that increasing disk size is the primary way to improve disk performance, but the correct approach for high I/O is to choose the right disk type (SSD) and use local SSDs for ephemeral data, not just resize the disk.

Practice this question →

22

MCQeasy

A backend service receives bursts of requests that cause timeouts. The team wants to smooth out the load while ensuring all requests are processed eventually. Which strategy should they use?

A.Use Cloud Tasks to queue incoming requests and process at a controlled rate

B.Implement client-side rate limiting

C.Use Cloud Load Balancing with connection draining

D.Increase the number of backend instances

AnswerA

Cloud Tasks decouples request submission from processing, allowing smooth rate-controlled execution.

Why this answer

Cloud Tasks is designed to decouple request processing by queuing incoming requests and then dispatching them to a target handler at a controlled rate. This allows the backend to process requests smoothly, preventing timeouts during bursts, while ensuring every request is eventually processed through retry and dead-letter mechanisms.

Exam trap

Google Cloud often tests the distinction between load balancing (which distributes traffic) and queuing (which buffers traffic); candidates mistakenly choose connection draining or scaling because they think smoothing load is about distributing or adding capacity, not about buffering and rate-limiting.

How to eliminate wrong answers

Option B is wrong because client-side rate limiting only controls the rate at which individual clients send requests, but it does not smooth out load from multiple clients or guarantee that all requests are eventually processed; it simply drops or delays requests at the source. Option C is wrong because Cloud Load Balancing with connection draining is used to gracefully terminate existing connections during instance shutdown, not to queue or rate-limit incoming requests; it does not provide a buffer for burst traffic. Option D is wrong because increasing the number of backend instances improves capacity but does not smooth out load; bursts can still overwhelm the system if the rate of incoming requests exceeds the aggregate processing capacity, and it does not guarantee eventual processing of all requests without a queue.

Practice this question →

23

MCQmedium

A team deploys a microservice on Google Kubernetes Engine (GKE) that processes user uploads. The service latency has increased over time. Monitoring shows that CPU utilization is low, but memory usage is high and garbage collection (GC) pauses are frequent. Which action is most likely to reduce latency?

A.Scale out the deployment by increasing the number of replicas.

B.Reduce the number of replicas to concentrate load.

C.Increase the CPU limit to allow faster processing.

D.Increase the memory limit and requests for the container.

AnswerD

More memory reduces GC frequency and pauses.

Why this answer

Frequent GC pauses and high memory usage with low CPU indicate the JVM heap is too small, causing the garbage collector to run more often. Increasing the memory limit and requests gives the JVM more headroom to reduce GC frequency, directly lowering latency. This is a classic JVM tuning scenario in containerized environments like GKE.

Exam trap

Google Cloud often tests the misconception that scaling out or adding CPU fixes all performance issues, but here the symptom of low CPU and high memory with GC pauses specifically points to a memory constraint, not a throughput or compute bottleneck.

How to eliminate wrong answers

Option A is wrong because scaling out replicas does not address the root cause of GC thrashing; it distributes the same memory-constrained workload across more pods, each still suffering from frequent GC pauses. Option B is wrong because reducing replicas concentrates the load on fewer pods, worsening memory pressure and GC frequency. Option C is wrong because increasing CPU limits does not help when CPU utilization is already low; the bottleneck is memory, not compute.

Practice this question →

24

Multi-Selecthard

Which THREE approaches can help reduce egress costs while improving performance for a multi-region application using Cloud Load Balancing? (Choose 3)

Select 3 answers

A.Optimize data transfer by compressing responses.

B.Use internal load balancers for traffic between regions.

C.Increase the number of instances in each region.

D.Use Cloud CDN to cache content at edge locations.

E.Use premium tier networking for lower latency.

AnswersA, B, D

Compression reduces data transferred and egress cost.

Why this answer

Option A is correct because compressing responses reduces the amount of data transferred over the network, which directly lowers egress costs charged by cloud providers. Smaller payloads also reduce latency and improve perceived performance for end users, as less data needs to travel across regions.

Exam trap

Google Cloud often tests the misconception that adding more instances or using premium networking always improves performance and reduces costs, but in reality, these actions increase costs without addressing egress charges, while compression, CDN, and internal load balancers directly target data transfer volume and routing.

Practice this question →

25

MCQmedium

Refer to the exhibit. A DevOps engineer notices that instance-1 runs on older CPU platform. The application is sensitive to CPU features that are only available on Skylake or newer. Which action should be taken to optimize performance?

A.Live migrate instance-1 to a different host.

B.Use Terraform to add a lifecycle rule to ignore changes.

C.Terminate instance-1 and recreate it with a newer machine type.

D.Stop instance-1 and update the minimum CPU platform to Skylake.

AnswerD

Setting min-cpu-platform ensures the instance runs on at least Skylake.

Why this answer

Option D is correct because stopping the instance and updating the minimum CPU platform to Skylake ensures that the instance is rescheduled onto a host that meets the required CPU feature set. This action directly addresses the application's sensitivity to CPU features available only on Skylake or newer, without requiring a full recreation or risking compatibility issues during live migration.

Exam trap

Google Cloud often tests the distinction between live migration (which preserves the current CPU platform) and stop/start actions (which can change the host and CPU platform), leading candidates to incorrectly choose live migration as a quick fix.

How to eliminate wrong answers

Option A is wrong because live migration does not change the underlying CPU platform; it moves the instance to another host of the same or similar CPU generation, so the older CPU platform issue persists. Option B is wrong because adding a lifecycle rule in Terraform to ignore changes only prevents infrastructure-as-code drift and does not affect the actual CPU platform of the running instance. Option C is wrong because terminating and recreating the instance with a newer machine type is unnecessarily disruptive; stopping and updating the minimum CPU platform achieves the same result without losing the instance's metadata, attached disks, or network configuration.

Practice this question →

26

MCQmedium

A gaming company runs a real-time multiplayer server on GKE. They want to minimize latency between players worldwide. Which approach should they use?

A.Increase the number of nodes in the cluster

B.Use a single regional cluster

C.Use Cloud Functions

D.Use a multi-cluster setup with clusters in multiple regions and use a global load balancer

AnswerD

Deploys servers near players, reducing round-trip time.

Why this answer

A multi-cluster setup with clusters in multiple regions, fronted by a global load balancer, minimizes latency by placing game servers physically closer to players worldwide. The global load balancer uses Anycast IP and Google Front Ends (GFEs) to route traffic to the nearest healthy backend cluster, reducing round-trip time (RTT). This approach is specifically designed for real-time multiplayer workloads that require low latency across geographic regions.

Exam trap

Google Cloud often tests the misconception that scaling nodes (Option A) or using a single regional cluster (Option B) can solve global latency issues, when in fact geographic distribution is required for low-latency worldwide coverage.

How to eliminate wrong answers

Option A is wrong because increasing the number of nodes in a single cluster does not reduce geographic latency; it only increases compute capacity within the same region, leaving distant players with high latency. Option B is wrong because a single regional cluster serves only one geographic area, forcing players far from that region to experience high latency, which defeats the goal of minimizing worldwide latency. Option C is wrong because Cloud Functions are stateless, short-lived, and not designed for persistent, real-time multiplayer game sessions; they lack the low-latency, stateful, and long-running connection requirements of a gaming server.

Practice this question →

27

Multi-Selecteasy

A DevOps team wants to optimize the performance of a Cloud Run service that experiences sporadic traffic. Which TWO strategies should they implement?

Select 2 answers

A.Set min-instances to 5 to avoid cold starts

B.Use Cloud Run for Anthos on GKE for better performance

C.Use Cloud Scheduler to trigger the service periodically

D.Reduce container image size and use startup probes

E.Enable CPU boost during cold starts

AnswersD, E

Smaller image loads faster, and startup probes delay traffic until the instance is ready, preventing errors.

Why this answer

Option D is correct because reducing the container image size decreases the time required to pull and start the container, directly mitigating cold start latency. Startup probes allow Cloud Run to defer sending traffic until the container is ready, preventing premature request failures and improving perceived performance. Together, these optimizations address the root causes of cold starts in a serverless environment.

Exam trap

Google Cloud often tests the misconception that setting min-instances or using periodic triggers (like Cloud Scheduler) are effective performance optimizations for sporadic traffic, when in fact they are cost-increasing workarounds that do not address the fundamental cold start problem.

Practice this question →

28

MCQmedium

A team wants to simulate real-world user traffic to identify performance bottlenecks before a launch. Which tool should they use to generate load from multiple regions?

A.Cloud Monitoring

B.gcloud beta load test

C.Cloud Load Testing (Distributed Load Testing on GCP)

D.ab (Apache Benchmark)

AnswerC

This solution generates distributed load from multiple regions using Compute Engine instances.

Why this answer

Cloud Load Testing (Distributed Load Testing on GCP) is the correct choice because it is a managed service that can generate synthetic traffic from multiple geographic regions simultaneously, simulating real-world user distribution. This allows the team to identify performance bottlenecks across different network paths and regional endpoints before launch.

Exam trap

Google Cloud often tests the distinction between monitoring tools (which observe) and load generation tools (which create traffic), leading candidates to mistakenly choose Cloud Monitoring because it sounds related to performance analysis.

How to eliminate wrong answers

Option A is wrong because Cloud Monitoring is a observability and alerting service, not a load generation tool; it can monitor performance but cannot generate traffic. Option B is wrong because 'gcloud beta load test' is not a valid gcloud command; the correct command for load testing is 'gcloud alpha loadtest' or using the Cloud Load Testing API, but even that does not natively support multi-region traffic generation without additional configuration. Option D is wrong because ab (Apache Benchmark) is a single-threaded, single-origin HTTP benchmarking tool that cannot generate load from multiple regions or simulate distributed user traffic.

Practice this question →

29

Multi-Selectmedium

A team is troubleshooting a slow response time on an App Engine standard environment application. The application uses Cloud SQL as its database. Which TWO actions should the team take to identify the bottleneck?

Select 2 answers

A.Examine App Engine request logs for latency patterns.

B.Increase the number of App Engine instances.

C.Enable Cloud SQL slow query logging and analyze long-running queries.

D.Enable Cloud CDN to cache responses.

E.Disable caching to ensure fresh data.

AnswersA, C

Correlates with slow queries.

Why this answer

Option A is correct because examining App Engine request logs reveals latency patterns, such as which endpoints or operations are slow, helping to pinpoint whether the bottleneck is in the application code, network, or database. Option C is correct because enabling Cloud SQL slow query logging identifies long-running SQL queries that could be causing database contention or inefficient data retrieval, directly addressing a common performance bottleneck in App Engine applications using Cloud SQL.

Exam trap

Google Cloud often tests the misconception that scaling the application tier (more instances) always improves performance, but the trap here is that the bottleneck may be at the database layer, where scaling the app tier without addressing database issues can actually degrade performance due to increased connection contention.

Practice this question →

30

MCQeasy

A company is using Cloud CDN to deliver static content globally. Some users in Asia report slow load times. Which configuration change would most likely improve performance for these users?

A.Add an additional CDN origin

B.Enable Cloud Armor

C.Use a global external HTTP(S) load balancer with Cloud CDN

D.Increase the cache TTL

AnswerC

Global load balancer with anycast IP and edge caching reduces latency significantly.

Why this answer

Cloud CDN relies on a global external HTTP(S) load balancer to route user requests to the nearest cache node. Without this load balancer, Cloud CDN cannot leverage Google's global network and edge caches, so users in Asia would not be served from a nearby point of presence. Option C correctly identifies that using a global external HTTP(S) load balancer with Cloud CDN enables geographic load balancing and edge caching, directly improving latency for distant users.

Exam trap

Google Cloud often tests the misconception that Cloud CDN works independently of the load balancer type, but Cloud CDN requires a global external HTTP(S) load balancer to route traffic to edge caches; using a regional load balancer or no load balancer at all will not provide global performance improvements.

How to eliminate wrong answers

Option A is wrong because adding an additional CDN origin does not affect the distribution of cache nodes; Cloud CDN uses the same global edge network regardless of origin count, and multiple origins are for redundancy or content separation, not latency reduction. Option B is wrong because Cloud Armor is a web application firewall and DDoS protection service; it does not improve content delivery speed or cache hit ratio, and may add slight processing overhead. Option D is wrong because increasing the cache TTL only extends how long content is stored in cache; it does not bring content closer to users in Asia or change the routing path, so it cannot fix slow load times caused by geographic distance.

Practice this question →

31

MCQeasy

An e-commerce platform uses Cloud SQL for its database. The team notices that read queries are slow. They want to improve read performance without significant cost increase. Which action should they take?

A.Add a read replica

B.Increase the number of vCPUs on the instance

C.Increase the storage size

D.Enable binary logging

AnswerA

A read replica distributes read queries, reducing load on the primary.

Why this answer

Adding a read replica offloads read traffic from the primary instance, improving read performance with minimal cost. Increasing vCPUs or storage incurs higher cost. Binary logging is for replication, not read performance.

Practice this question →

32

MCQhard

A data engineering team runs frequent aggregation queries on a large BigQuery table. Query performance is slow and costs are high. Which optimization technique would best improve performance and reduce cost?

A.Use materialized views for pre-aggregated results

B.Convert to non-partitioned table

C.Use clustering on the partition key

D.Increase the number of slots in the reservation

AnswerA

Materialized views pre-compute and store aggregation results, drastically reducing query time and cost.

Why this answer

Materialized views pre-compute and store the results of frequent aggregation queries, allowing BigQuery to serve subsequent queries directly from the cached results rather than scanning the entire base table. This drastically reduces the amount of data processed, lowering both query latency and cost (since BigQuery charges by bytes processed). For repeated aggregation patterns, this is the most effective optimization.

Exam trap

Google Cloud often tests the misconception that clustering on the partition key is redundant and that increasing compute resources (slots) is the primary fix for cost and performance, when in reality reducing data scanned through pre-aggregation is the most impactful lever.

How to eliminate wrong answers

Option B is wrong because converting to a non-partitioned table would increase the amount of data scanned per query, worsening performance and cost. Option C is wrong because clustering on the partition key provides no additional benefit beyond partitioning itself; clustering is most effective on non-partitioned columns or columns with high cardinality to improve filter pruning. Option D is wrong because increasing slots only improves concurrency and execution speed for queries that are already optimized; it does not reduce the bytes processed, so cost would remain high or increase.

Practice this question →

33

MCQmedium

You are using Memorystore for Redis as a cache for a high-traffic web application. You observe that cache hit ratio is low, causing high database load. What is the most effective way to improve cache hit ratio?

A.Use the allkeys-lru eviction policy to keep frequently accessed keys.

B.Increase the instance size to store more data.

C.Migrate to Memorystore for Memcached for lower latency.

D.Set a longer TTL for all cache entries.

AnswerA

LRU policy keeps popular items, improving hit ratio.

Why this answer

The allkeys-lru eviction policy is the most effective way to improve cache hit ratio because it automatically evicts the least recently used keys across the entire keyspace when memory is full, retaining the most frequently accessed data. This ensures that the cache always contains the hottest data, directly increasing the likelihood of cache hits without requiring manual intervention or additional resources.

Exam trap

Google Cloud often tests the misconception that increasing memory or TTL alone solves cache performance issues, but the trap here is that without an appropriate eviction policy like allkeys-lru, the cache will still evict hot keys in favor of cold ones, leaving the hit ratio low regardless of size or TTL settings.

How to eliminate wrong answers

Option B is wrong because simply increasing the instance size stores more data but does not address the underlying issue of which data is retained; without an appropriate eviction policy, the cache may still be filled with stale or rarely accessed keys, leaving the hit ratio low. Option C is wrong because migrating to Memorystore for Memcached does not inherently improve cache hit ratio; Memcached lacks built-in LRU eviction across all keys (it uses a slab-based allocator with per-slab LRU) and does not support data persistence or advanced eviction policies like allkeys-lru, making it less suitable for optimizing hit ratios in this scenario. Option D is wrong because setting a longer TTL for all cache entries can cause the cache to fill with stale data that is no longer accessed, reducing the effective cache capacity for hot keys and potentially worsening the hit ratio; TTL should be tuned per key based on access patterns, not uniformly extended.

Practice this question →

34

MCQeasy

Your application uses Cloud SQL for MySQL and you notice that read replica lag is increasing. Which action would most likely reduce replica lag?

A.Configure automatic failover to the replica.

B.Decrease the memory of the primary instance.

C.Increase the machine type of the replica.

D.Promote the replica to a standalone instance.

AnswerC

More powerful replica can keep up with replication.

Why this answer

Increasing the machine type of the replica (Option C) directly addresses read replica lag by providing more CPU and memory resources to the replica instance. This allows the replica to apply binary log (binlog) events from the primary faster, reducing the replication lag. Cloud SQL for MySQL uses asynchronous replication, so a replica with insufficient resources cannot keep up with the write throughput of the primary.

Exam trap

Google Cloud often tests the misconception that promoting a replica or failing over will resolve lag, but the correct approach is to scale the replica's resources to match the primary's write rate.

How to eliminate wrong answers

Option A is wrong because automatic failover does not reduce replica lag; it only switches traffic to the replica after a primary failure, and the replica must already be caught up for a successful failover. Option B is wrong because decreasing the memory of the primary instance would likely increase write latency and could worsen replication lag by slowing down the primary's ability to commit transactions and generate binlog events. Option D is wrong because promoting the replica to a standalone instance breaks replication entirely, eliminating the replica lag but also removing the read replica's purpose; it does not reduce lag—it stops replication.

Practice this question →

35

MCQmedium

Your team has deployed a microservices application on Google Kubernetes Engine (GKE). You notice that one service has high latency during peak hours. The service is CPU-bound and uses a HorizontalPodAutoscaler (HPA) based on CPU utilization. What is the most likely cause of the latency?

A.The GKE cluster uses preemptible nodes that are frequently reclaimed.

B.The HPA's target CPU utilization is set too high, causing the autoscaler to react slowly.

C.The service uses a global external HTTP(S) load balancer with session affinity.

D.The application does not implement request autoscaling at the application layer.

AnswerB

A high target CPU threshold delays scaling, leading to latency.

Why this answer

Option B is correct because when the HPA's target CPU utilization is set too high, the autoscaler waits until the average CPU utilization exceeds that threshold before scaling up. During peak hours, the service becomes CPU-bound and latency increases as pods are overwhelmed, but the HPA reacts slowly because it only triggers when the high threshold is breached, causing a delay in adding new pods to handle the load.

Exam trap

Google Cloud often tests the misconception that HPA scaling is instantaneous or that CPU-bound latency is caused by external factors like load balancers or node preemption, when the real issue is the HPA threshold configuration and its delayed reaction to sustained high utilization.

How to eliminate wrong answers

Option A is wrong because preemptible nodes being reclaimed would cause pod evictions and potential downtime, not a gradual latency increase during peak hours; the scenario describes high latency specifically during peak hours, not intermittent failures. Option C is wrong because a global external HTTP(S) load balancer with session affinity does not inherently cause high latency for a CPU-bound service; session affinity can cause uneven load distribution but the primary issue here is CPU saturation and slow HPA reaction. Option D is wrong because request autoscaling at the application layer (e.g., concurrency limits or queue-based scaling) is not a standard Kubernetes mechanism and does not address the root cause of the HPA's slow reaction to CPU utilization; the HPA is already configured for CPU, so the issue is the threshold setting.

Practice this question →

36

Multi-Selecthard

A company runs a high-traffic web application on GKE. Which three practices can help optimize performance under load? (Select THREE.)

Select 3 answers

A.Use a global load balancer

B.Use a multi-cluster ingress

C.Store session state in Cloud Memorystore

D.Enable HorizontalPodAutoscaler with custom metrics

E.Increase the number of nodes to maximum

AnswersA, C, D

Routes traffic to nearest backend, reducing latency.

Why this answer

A global load balancer (GLB) distributes incoming traffic across multiple GKE clusters and regions using Google's global network infrastructure. It terminates traffic at the edge, reducing latency by routing users to the closest healthy backend, and provides DDoS protection and SSL offloading. This is essential for high-traffic web applications to handle load spikes and improve response times.

Exam trap

Google Cloud often tests the misconception that simply adding more nodes (Option E) is a valid performance optimization, when in fact it is a costly and inefficient approach that ignores the need for dynamic scaling and resource efficiency.

Practice this question →

37

Multi-Selecthard

Which TWO actions should a DevOps engineer take to reduce latency for a global user base accessing a web application hosted on Compute Engine?

Select 2 answers

A.Configure instance groups in multiple regions with a global load balancer.

B.Enable HTTP/2 on the load balancer.

C.Enable Cloud CDN with cache static content.

D.Increase the machine type of the instances.

E.Use Cloud Load Balancing with global anycast IP.

AnswersA, C

Multi-region deployment allows serving users from the closest region, reducing latency.

Why this answer

Option A is correct because deploying instance groups in multiple regions and using a global load balancer (e.g., Google Cloud External HTTP(S) Load Balancer) allows user requests to be routed to the closest healthy backend, reducing network round-trip time. This geo-distribution minimizes latency by serving content from the nearest regional endpoint rather than a single centralized location.

Exam trap

Google Cloud often tests the misconception that a global anycast IP alone (Option E) is sufficient to reduce latency, but the trap is that anycast only optimizes the frontend routing to the edge; without multi-region backends, the request must still travel to the single backend region, negating the latency benefit.

Practice this question →

38

MCQmedium

Refer to the exhibit. An App Engine application returns 504 errors. The application calls an external API and processes the result. Which change is most likely to resolve the errors?

A.Reduce the idle timeout in the scaling settings.

B.Increase the App Engine request timeout in app.yaml to 120 seconds.

C.Change the scaling type from automatic to manual.

D.Increase the number of instances to handle the load.

AnswerB

The default timeout is 60 seconds; increasing it allows more time.

Why this answer

A 504 error from App Engine indicates the request exceeded the timeout limit before the application could respond. The default App Engine request timeout is 60 seconds, and since the application calls an external API and processes the result, the total time may exceed this limit. Increasing the request timeout in app.yaml to 120 seconds allows the application more time to complete the external API call and processing, resolving the 504 error.

Exam trap

Google Cloud often tests the distinction between request timeout (which causes 504 errors) and scaling or load-related issues (which cause 503 errors or latency), so candidates mistakenly choose instance count or scaling type changes when the real problem is a timeout threshold.

How to eliminate wrong answers

Option A is wrong because reducing the idle timeout in scaling settings would cause instances to be shut down sooner when idle, potentially increasing cold starts and latency, but it does not address the request timeout that causes 504 errors. Option C is wrong because changing from automatic to manual scaling does not change the per-request timeout limit; manual scaling controls instance count and startup behavior, not the maximum time a single request can take. Option D is wrong because increasing the number of instances helps handle higher concurrency and load, but if a single request already exceeds the timeout, more instances will not prevent that request from timing out.

Practice this question →

39

Matchingmedium

Match each monitoring concept to its purpose.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Verify external accessibility of a service

Time taken to respond to a request

Percentage of failed requests

Number of requests processed per second

Degree to which a resource is fully utilized

Why these pairings

Four golden signals of monitoring.

Practice this question →

40

Multi-Selectmedium

Your team is running a high-traffic web application on Google Kubernetes Engine (GKE) and has configured Horizontal Pod Autoscaling (HPA) based on CPU utilization. Recently, the application experienced intermittent latency spikes during traffic bursts. You suspect that the HPA is not scaling quickly enough. Which TWO actions would most effectively improve the autoscaling responsiveness?

Select 2 answers

A.Increase the target CPU utilization percentage to 90% so that pods are less likely to be added.

B.Configure a custom metric in HPA based on the application's request latency (e.g., p99 latency).

C.Set the maxReplicas of the HPA to a value lower than the expected peak traffic to force faster scaling.

D.Reduce the --horizontal-pod-autoscaler-upscale-stabilization window in the HPA configuration to a lower value (e.g., 30 seconds).

E.Increase the --horizontal-pod-autoscaler-sync-period flag to 30 seconds and increase the --horizontal-pod-autoscaler-upscale-delay flag to 5 minutes.

AnswersB, D

Custom metrics like request latency provide a more direct and responsive signal to scale based on performance.

Why this answer

Option B is correct because using a custom metric based on request latency (e.g., p99 latency) allows HPA to react to application performance degradation directly, rather than relying solely on CPU utilization which may lag behind traffic bursts. This provides a more immediate signal for scaling, as latency spikes often precede CPU saturation in web applications. Option D is correct because reducing the --horizontal-pod-autoscaler-upscale-stabilization window (default 5 minutes) to a lower value like 30 seconds decreases the time HPA waits before acting on scale-up recommendations, enabling faster response to sudden load increases.

Exam trap

Google Cloud often tests the misconception that increasing target CPU utilization or sync periods improves scaling speed, when in fact these changes reduce responsiveness; the trap is confusing 'stabilization' with 'delay' and assuming higher thresholds or longer intervals help with bursts.

Practice this question →

41

MCQmedium

A DevOps team wants to optimize resource utilization for their GKE deployment. Which built-in Kubernetes resource can automatically adjust CPU and memory requests based on historical usage?

A.HorizontalPodAutoscaler

B.ResourceQuota

C.PodDisruptionBudget

D.VerticalPodAutoscaler

AnswerD

Correct. VPA adjusts resource requests based on usage.

Why this answer

The VerticalPodAutoscaler (VPA) is the correct choice because it automatically adjusts CPU and memory resource requests (and limits) for pods based on historical usage data, optimizing resource utilization without manual intervention. Unlike the HorizontalPodAutoscaler, which scales the number of pods, the VPA modifies the resource specifications of existing pods to match observed demand.

Exam trap

Google Cloud often tests the distinction between scaling the number of replicas (HPA) versus scaling the resources per pod (VPA), and candidates mistakenly choose HPA because they associate 'automatic adjustment' with scaling out, not adjusting requests.

How to eliminate wrong answers

Option A is wrong because HorizontalPodAutoscaler (HPA) adjusts the number of pod replicas based on CPU/memory metrics, not the per-pod resource requests. Option B is wrong because ResourceQuota sets hard limits on total resource consumption within a namespace, preventing overcommitment, but does not dynamically adjust requests based on usage. Option C is wrong because PodDisruptionBudget (PDB) controls the number of pods that can be voluntarily disrupted during maintenance, not resource request adjustments.

Practice this question →

42

MCQhard

A Cloud Run service experiences high cold start latency. The team has already set min-instances to 1. Which additional optimization can further reduce cold start impact?

A.Use HTTP/2 to speed up request handling

B.Reduce container image size and enable CPU boost during startup

C.Place the service on Cloud Run for Anthos on GKE

D.Increase the container memory limit

AnswerB

Smaller images load faster, and CPU boost provides extra CPU during startup, reducing latency.

Why this answer

Option B is correct because reducing the container image size decreases the time required to pull and unpack the container on a cold start, and enabling CPU boost during startup temporarily allocates additional CPU resources to accelerate the initialization process. Together, these directly address the root causes of cold start latency by minimizing both the image loading time and the application initialization time, even when min-instances is already set to 1.

Exam trap

Google Cloud often tests the misconception that increasing resources (memory or CPU) always improves performance, but in the context of cold starts, the bottleneck is typically image size and initialization speed, not steady-state resource limits.

How to eliminate wrong answers

Option A is wrong because HTTP/2 improves multiplexing and reduces latency for multiple concurrent requests, but it does not affect the cold start latency of a single instance that has not yet been initialized. Option C is wrong because Cloud Run for Anthos on GKE runs on a Kubernetes cluster, which introduces additional orchestration overhead and does not inherently reduce cold start latency compared to the fully managed Cloud Run service. Option D is wrong because increasing the container memory limit may allow the container to use more memory, but it does not speed up the startup process; in fact, larger memory allocations can sometimes increase cold start time due to resource provisioning delays.

Practice this question →

43

MCQhard

You are troubleshooting a performance issue with a Compute Engine instance that is part of a managed instance group serving a web application. Users report intermittent high latency. You run the command shown in the exhibit. Based on the output, what is the most likely cause of the performance issue?

A.The instance is under-provisioned for CPU.

B.The instance is hitting the network egress bandwidth limit.

C.The service account lacks the necessary scopes for Cloud Monitoring and Cloud Trace.

D.The boot disk is too small, causing I/O contention.

AnswerA

The output does not show the machine type, but the disk size and service account suggest a small instance, likely with 1 vCPU. Insufficient CPU causes high latency under load.

Why this answer

The output shows high CPU utilization (e.g., 95%+), which directly correlates with the reported intermittent high latency. In a managed instance group, if the instance is under-provisioned for CPU, it cannot handle the workload spikes, causing queuing and increased response times. This is the most common cause of performance degradation in Compute Engine instances serving web applications.

Exam trap

Google Cloud often tests the misconception that network bandwidth or disk I/O is the primary bottleneck in web application latency, but the exhibit's focus on CPU utilization is the key clue that the issue is compute-bound, not I/O-bound.

How to eliminate wrong answers

Option B is wrong because network egress bandwidth limits would manifest as packet drops, retransmissions, or a plateau in throughput, not sustained high CPU usage; the output does not show any network-related metrics like dropped packets or bandwidth saturation. Option C is wrong because missing scopes for Cloud Monitoring and Cloud Trace would prevent telemetry data from being sent, but the command shown (likely `top` or `htop`) still runs locally and would not cause the high latency itself; the latency issue is a symptom of resource contention, not a permissions problem. Option D is wrong because a boot disk that is too small causing I/O contention would show high disk I/O wait times or disk queue depth in the output, not high CPU utilization; the exhibit focuses on CPU, not disk metrics.

Practice this question →

44

MCQmedium

A company deploys a microservices application on Google Kubernetes Engine (GKE). They notice increased latency during peak hours. The application uses a Cloud SQL database for state. The team wants to optimize service performance. What should they do first?

A.Implement a caching layer with Memorystore.

B.Enable Cloud SQL connection pooling.

C.Move the database to Cloud Spanner.

D.Increase the number of replicas in the GKE deployment.

AnswerB

Connection pooling minimizes the cost of repeatedly opening and closing database connections, directly addressing latency caused by connection setup.

Why this answer

Option B is correct because the primary bottleneck during peak hours for a microservices application using Cloud SQL is often the number of database connections. Connection pooling reuses a fixed set of connections, reducing the overhead of establishing new connections and preventing connection exhaustion, which directly addresses increased latency without requiring architectural changes.

Exam trap

Google Cloud often tests the misconception that scaling out (increasing replicas) always improves performance, but in stateful applications with a centralized database, it can backfire by increasing connection pressure; the trap here is that candidates overlook connection management as the first optimization step and jump to caching or database migration.

How to eliminate wrong answers

Option A is wrong because implementing a caching layer with Memorystore reduces read latency for frequently accessed data but does not address the underlying connection management issue with Cloud SQL; it adds complexity and is not the first optimization step. Option C is wrong because moving the database to Cloud Spanner is a significant architectural change that introduces higher cost and complexity, and it is overkill for optimizing latency caused by connection management; Spanner is designed for global scale and strong consistency, not for fixing connection pooling issues. Option D is wrong because increasing the number of replicas in the GKE deployment can actually worsen latency by creating more pods that each open new connections to Cloud SQL, exacerbating connection exhaustion and database load without addressing the root cause.

Practice this question →

45

MCQmedium

An e-commerce platform uses Cloud Load Balancing with backend services running on Compute Engine managed instance groups. During Black Friday sales, the application experiences high latency and some 503 errors. The team uses autoscaling based on average CPU utilization, but scaling is too slow—Cloud Monitoring shows CPU rises to 90% before new instances are added. The team needs to reduce latency and eliminate 503 errors. What should they do?

A.Use HTTP load balancing with a larger backend timeout

B.Change the autoscaling metric to 'requests per second' and set a lower target value

C.Enable Cloud CDN for all dynamic content

D.Increase the cooldown period for the autoscaling policy

AnswerB

Requests per second scales proactively based on traffic, reacting faster than CPU-based scaling.

Why this answer

Option B is correct because switching the autoscaling metric from average CPU utilization to 'requests per second' (RPS) with a lower target value allows the autoscaler to react more quickly to traffic spikes. CPU utilization is a lagging indicator that rises only after requests have already been queued and processed, whereas RPS directly reflects incoming load. By setting a lower target RPS, the autoscaler can add instances before the backend becomes saturated, reducing latency and eliminating 503 errors.

Exam trap

Google Cloud often tests the misconception that CPU utilization is the best metric for scaling web applications, but the trap here is that CPU is a lagging indicator, and candidates may overlook that request-based metrics provide faster, more direct feedback for traffic-driven workloads.

How to eliminate wrong answers

Option A is wrong because increasing the backend timeout does not address the root cause of slow scaling; it only allows connections to wait longer before timing out, which can mask the problem but does not prevent 503 errors or reduce latency. Option C is wrong because Cloud CDN caches static content at edge locations, but the question describes dynamic content that cannot be cached; enabling CDN for dynamic content would not reduce latency or eliminate 503 errors caused by backend overload. Option D is wrong because increasing the cooldown period would make autoscaling even slower, as it delays the addition of new instances after a scale-up decision, worsening the latency and 503 errors.

Practice this question →

46

MCQhard

A company runs a batch processing pipeline on Dataflow that reads from Pub/Sub and writes to BigQuery. The pipeline is falling behind due to high volume, and messages are backing up in Pub/Sub. Autoscaling is enabled and workers are running but utilization is only 30%. The streaming engine is off. What should the engineer do to increase throughput?

A.Use a larger machine type for workers.

B.Enable Streaming Engine for Dataflow.

C.Increase the number of worker machines.

D.Increase the worker disk size.

AnswerB

Streaming Engine reduces shuffle overhead, improving throughput for streaming pipelines.

Why this answer

B is correct because enabling Streaming Engine offloads the heavy lifting of shuffle and state management from worker VMs to the Dataflow service backend, reducing the per-worker overhead. With only 30% utilization, the bottleneck is not compute capacity but the per-worker throughput limit caused by the streaming pipeline's shuffle and state operations. Streaming Engine allows the existing workers to process more data per second by removing these bottlenecks, directly addressing the Pub/Sub backlog without adding more resources.

Exam trap

Google Cloud often tests the misconception that low utilization means you need more or larger workers, when in fact the bottleneck is the streaming shuffle overhead that Streaming Engine resolves, not compute capacity.

How to eliminate wrong answers

Option A is wrong because using a larger machine type increases CPU and memory per worker, but the pipeline is only at 30% utilization, indicating that compute resources are not the bottleneck; the issue is the streaming shuffle and state overhead that a larger machine type does not alleviate. Option C is wrong because increasing the number of workers would add more machines that would also be underutilized due to the same streaming overhead, failing to address the root cause of low throughput per worker. Option D is wrong because increasing worker disk size helps with buffering and spill-to-disk scenarios, but the pipeline is not disk-bound; the bottleneck is the streaming engine's shuffle and state management, which is not resolved by more disk space.

Practice this question →

47

MCQeasy

A DevOps team is troubleshooting a web application that shows high latency during peak hours. The application runs on Google Kubernetes Engine (GKE). They want to identify which specific API calls are causing the delay. Which Google Cloud tool should they use?

A.Cloud Monitoring

B.Cloud Profiler

C.Cloud Logging

D.Cloud Trace

AnswerD

Cloud Trace offers distributed tracing, enabling identification of slow API calls.

Why this answer

Cloud Trace is the correct tool because it is a distributed tracing system that captures latency data from applications, allowing you to trace individual API calls and identify which specific endpoints are causing delays. Unlike Cloud Monitoring, which provides aggregate metrics, Cloud Trace provides per-request latency breakdowns, making it ideal for pinpointing slow API calls in a GKE environment.

Exam trap

The trap here is that candidates often confuse Cloud Monitoring (which shows overall latency metrics) with Cloud Trace (which provides per-request tracing), leading them to choose Cloud Monitoring when they need to drill down into specific API calls.

How to eliminate wrong answers

Option A is wrong because Cloud Monitoring provides aggregate metrics (e.g., CPU, memory, request count) but does not trace individual API calls or provide per-request latency breakdowns. Option B is wrong because Cloud Profiler is designed for continuous profiling of CPU and memory usage to identify performance bottlenecks in code, not for tracing specific API call latencies. Option C is wrong because Cloud Logging captures log entries and events but does not provide the distributed tracing capability needed to correlate latency across API calls.

Practice this question →

48

MCQhard

A Cloud Run service experiences high latency during cold starts. The service is memory-intensive. Which configuration change will most effectively reduce cold start latency?

A.Decrease the container concurrency setting.

B.Set a minimum number of instances.

C.Increase the max instances limit.

D.Enable CPU boost for the service.

AnswerB

Min instances keep instances always running, eliminating cold starts.

Why this answer

Setting a minimum number of instances ensures that a baseline of pre-warmed instances is always running, eliminating cold starts for the initial requests. For a memory-intensive service, this avoids the latency penalty of loading large libraries or datasets into memory on first invocation. This is the most direct and effective configuration change to reduce cold start latency.

Exam trap

Google Cloud often tests the misconception that increasing max instances or enabling CPU boost can solve cold start latency, when in fact only setting a minimum number of instances directly prevents the cold start from occurring.

How to eliminate wrong answers

Option A is wrong because decreasing container concurrency reduces the number of simultaneous requests each instance can handle, which may increase the number of instances needed but does not address the root cause of cold start latency. Option C is wrong because increasing the max instances limit only raises the ceiling for scaling out, which does nothing to prevent cold starts when no instances are running. Option D is wrong because CPU boost temporarily increases CPU during cold starts but only reduces latency by a small margin; it does not eliminate the cold start itself, and for memory-intensive services, the bottleneck is often memory allocation, not CPU.

Practice this question →

49

MCQeasy

A company runs a web application on Compute Engine behind a regional HTTP Load Balancer. Users report slow page load times during peak hours. CPU utilization on instances is under 60%, but network egress is near the instance's bandwidth limit. Which action should the engineer take?

A.Increase the number of instances in the instance group.

B.Switch to a global external HTTP Load Balancer.

C.Enable Cloud CDN.

D.Use a larger machine type with higher network throughput.

AnswerD

Larger machine types have higher network egress limits, addressing the bottleneck.

Why this answer

The bottleneck is network egress bandwidth, not CPU. Increasing the instance size to a machine type with higher network throughput (e.g., n2-highmem-4 vs. n2-standard-2) directly raises the per-instance egress cap, alleviating the bandwidth limit. Option D is correct because it addresses the root cause—insufficient network I/O capacity per instance.

Exam trap

Google Cloud often tests the misconception that horizontal scaling (adding instances) always solves performance issues, but here the bottleneck is per-instance network throughput, not request handling capacity.

How to eliminate wrong answers

Option A is wrong because adding more instances distributes load but does not increase the per-instance egress bandwidth; if each instance is already at its egress limit, more instances will also hit the same limit. Option B is wrong because switching to a global external HTTP Load Balancer improves latency through anycast IP and cross-regional routing, but does not increase the egress bandwidth of individual Compute Engine instances. Option C is wrong because Cloud CDN caches static content at edge locations, reducing origin load, but the reported bottleneck is network egress on the instances, which CDN does not address for dynamic or uncacheable traffic.

Practice this question →

50

Multi-Selecthard

Which THREE factors should you consider when designing a Cloud Run service for optimal performance under unpredictable traffic patterns? (Choose 3)

Select 3 answers

A.Use HTTP/2 for faster connection reuse.

B.Set a minimum number of instances to reduce cold starts.

C.Configure VPC egress through Cloud NAT for lower latency.

D.Allocate sufficient CPU per instance to handle peak load.

E.Set a maximum number of instances to control concurrency and cost.

AnswersB, D, E

Min instances keep containers warm.

Why this answer

Option B is correct because setting a minimum number of instances ensures that Cloud Run always keeps a baseline of warm instances ready to serve requests. This eliminates cold starts for the first requests during traffic spikes, which is critical for unpredictable traffic patterns where latency spikes from cold starts would degrade user experience.

Exam trap

The trap here is that candidates confuse network optimization features (HTTP/2, Cloud NAT) with instance lifecycle management, mistakenly believing they address cold starts or scaling latency when they only affect connection efficiency or outbound connectivity.

Practice this question →

51

MCQmedium

A team notices that a Cloud Run service occasionally has high latency. They suspect a memory leak or excessive CPU usage. Which tool should they use to identify the bottleneck during those periods?

A.Cloud Trace

B.Cloud Monitoring

C.Cloud Logging

D.Cloud Profiler

AnswerD

Cloud Profiler continuously profiles CPU and memory usage to pinpoint bottlenecks.

Why this answer

Cloud Profiler is the correct tool because it continuously gathers CPU and memory usage data from your Cloud Run service, allowing you to identify which functions or code paths are consuming excessive resources during high-latency periods. Unlike monitoring or logging, Profiler provides a flame graph that pinpoints the exact bottleneck (e.g., a memory leak or CPU spike) at the function level, which is essential for diagnosing performance issues in a serverless environment.

Exam trap

The trap here is that candidates often confuse Cloud Trace (distributed tracing) with Cloud Profiler (code-level profiling), assuming that latency tracing alone can identify resource bottlenecks, when in fact only Profiler can pinpoint the specific function causing excessive CPU or memory usage.

How to eliminate wrong answers

Option A is wrong because Cloud Trace is designed for distributed tracing of request latency across services, not for profiling CPU or memory usage within a single service. Option B is wrong because Cloud Monitoring provides metrics and alerts (e.g., request latency, CPU utilization) but does not offer the granular, code-level insight needed to identify a memory leak or excessive CPU usage. Option C is wrong because Cloud Logging captures log entries and error messages but cannot profile resource consumption or show which specific functions are causing the bottleneck.

Practice this question →

52

MCQeasy

Your company runs a web application on Compute Engine behind a global HTTP(S) Load Balancer. You want to improve performance for users in Europe. You have already enabled Cloud CDN. What is the next best action to reduce latency?

A.Switch to a regional load balancer to route traffic more efficiently.

B.Configure a multi-regional Cloud CDN and enable cache warming for popular content.

C.Add Compute Engine instances in a European region (e.g., europe-west1) and add them to the load balancer's backend.

D.Increase the machine type of existing instances to improve processing speed.

AnswerC

Adding instances in Europe reduces the distance between users and servers, lowering latency.

Why this answer

Adding Compute Engine instances in a European region (e.g., europe-west1) and adding them to the load balancer's backend reduces latency by bringing the origin server physically closer to European users. Even with Cloud CDN enabled, cache misses still require a round trip to the backend; having a backend in Europe minimizes that distance. The global HTTP(S) Load Balancer automatically routes requests to the closest healthy backend, so this directly improves performance for users in Europe.

Exam trap

The trap here is that candidates assume Cloud CDN alone solves all latency issues, forgetting that cache misses still require backend proximity, and that the global load balancer already supports multi-region backends without needing to switch to a regional load balancer.

How to eliminate wrong answers

Option A is wrong because switching to a regional load balancer would actually increase latency for users outside that single region, as it cannot route traffic globally; the global load balancer is already the correct choice for multi-region distribution. Option B is wrong because multi-regional Cloud CDN is not a configurable setting—Cloud CDN is already global by default, and cache warming does not reduce latency for cache misses or dynamic content that bypasses the cache. Option D is wrong because increasing machine type improves processing speed but does not reduce network latency; the bottleneck for European users is geographic distance, not compute capacity.

Practice this question →

53

Multi-Selecthard

Which THREE strategies can reduce API latency in Apigee?

Select 3 answers

A.Disable TLS termination to reduce overhead.

B.Enable response compression in the API proxy.

C.Use connection pooling for backend services.

D.Implement caching for frequently accessed responses.

E.Increase the analytics data collection frequency.

AnswersB, C, D

Compresses responses to reduce payload size and transfer time.

Why this answer

Option B is correct because enabling response compression (e.g., gzip) reduces the size of payloads transferred over the network, directly lowering latency by decreasing transmission time. Apigee supports automatic compression via the AssignMessage policy or by setting the Accept-Encoding header, which is especially effective for text-based responses like JSON or XML.

Exam trap

Google Cloud often tests the misconception that disabling security features like TLS reduces latency, but the exam expects you to recognize that security is non-negotiable and that the real performance gains come from compression, connection reuse, and caching.

Practice this question →

54

MCQeasy

Which service should be used to monitor the health of HTTP endpoints from multiple locations?

A.Cloud Trace

B.Cloud Logging

C.Cloud Monitoring

D.Cloud Debugger

AnswerC

Correct. Uptime checks in Cloud Monitoring monitor endpoint health.

Why this answer

Cloud Monitoring (formerly Stackdriver Monitoring) includes uptime checks that can be configured to probe HTTP endpoints from multiple global locations, measuring latency, availability, and response content. This makes it the correct service for monitoring the health of HTTP endpoints from diverse geographic regions.

Exam trap

Google Cloud often tests the distinction between passive monitoring (Cloud Logging) and active synthetic monitoring (Cloud Monitoring uptime checks), leading candidates to mistakenly choose Cloud Logging because they think 'health' implies analyzing logs rather than proactively probing endpoints.

How to eliminate wrong answers

Option A is wrong because Cloud Trace is a distributed tracing service that captures latency data from applications to diagnose performance bottlenecks, not a tool for monitoring HTTP endpoint health from multiple locations. Option B is wrong because Cloud Logging aggregates and stores log data from various sources, but it does not natively perform active health checks or synthetic monitoring of HTTP endpoints. Option D is wrong because Cloud Debugger allows you to inspect the state of a running application without stopping it, but it does not monitor endpoint health or availability from multiple locations.

Practice this question →

55

MCQeasy

An application running on GKE experiences high latency during traffic spikes. The team wants to scale pods based on request latency. Which metric should they use in the HorizontalPodAutoscaler?

A.Custom metric: request count

B.Custom metric: request latency percentile (e.g., p95)

C.Memory utilization

D.CPU utilization

AnswerB

A custom latency metric directly reflects the performance issue and can trigger scaling.

Why this answer

To scale based on request latency, the HorizontalPodAutoscaler (HPA) must use a custom metric that directly reflects the application's response time, such as the p95 latency percentile. CPU or memory utilization are system-level metrics that do not capture the user-perceived performance degradation caused by traffic spikes. Custom metrics like request count correlate with load but not directly with latency, making the p95 latency percentile the correct choice for latency-based autoscaling.

Exam trap

Google Cloud often tests the misconception that CPU or memory utilization are sufficient for scaling based on performance metrics, but the trap here is that resource metrics do not capture user-facing latency, which is the actual performance indicator the team wants to optimize.

How to eliminate wrong answers

Option A is wrong because request count is a measure of traffic volume, not latency; scaling on request count may trigger scaling before latency increases, but it does not directly address the goal of reducing high latency during spikes. Option C is wrong because memory utilization is a resource metric that does not reflect request latency; an application can have high latency without high memory usage, and scaling on memory would not solve latency issues. Option D is wrong because CPU utilization is also a resource metric that does not directly correlate with request latency; an application may experience high latency due to I/O waits or lock contention without high CPU usage, so scaling on CPU would be ineffective.

Practice this question →

56

MCQmedium

Refer to the exhibit. A DevOps engineer observes that a GKE cluster's node performance is degraded during high I/O workloads. Based on the exhibit, which change would most likely improve disk I/O performance?

A.Change machineType to n2-standard-4

B.Change imageType to UBUNTU

C.Change serviceAccount to a custom one

D.Change diskType to pd-ssd

AnswerD

Correct. pd-ssd offers significantly better I/O performance.

Why this answer

The exhibit shows a GKE node pool using the default pd-standard disk type, which is backed by HDDs and has lower IOPS and throughput compared to SSDs. Changing diskType to pd-ssd directly improves disk I/O performance by providing higher IOPS and lower latency, which is critical for high I/O workloads.

Exam trap

Google Cloud often tests the misconception that changing machine type or OS image can fix disk I/O issues, when the real bottleneck is the underlying disk type (pd-standard vs. pd-ssd).

How to eliminate wrong answers

Option A is wrong because changing machineType to n2-standard-4 increases CPU and memory resources but does not address the underlying disk I/O bottleneck caused by the disk type. Option B is wrong because changing imageType to UBUNTU only alters the OS image, not the disk performance characteristics; the disk type remains pd-standard. Option C is wrong because changing serviceAccount to a custom one affects authentication and authorization, not disk I/O performance.

Practice this question →

57

Multi-Selecthard

A company runs a microservices architecture on GKE and notices high network latency between services. Which THREE actions can improve inter-service communication performance?

Select 3 answers

A.Use Istio mTLS for all service-to-service communication

B.Increase the number of nodes in the cluster

C.Implement caching at the API gateway to reduce redundant requests

D.Enable HTTP/2 and gRPC for inter-service communication

E.Use headless services for direct pod-to-pod communication

AnswersC, D, E

Caching at the gateway avoids repeated processing, lowering latency for frequent requests.

Why this answer

Option C is correct because implementing caching at the API gateway reduces redundant requests, which directly decreases network round trips and lowers latency for repeated data retrievals. This is a common performance optimization in microservices architectures, as it offloads backend services and minimizes inter-service chatter.

Exam trap

Google Cloud often tests the misconception that security features like mTLS always improve performance, when in reality they add computational cost, and that scaling nodes always reduces latency, ignoring the potential for increased cross-node traffic.

Practice this question →

58

MCQeasy

A company runs a stateful workload on Compute Engine VMs with persistent disks. They observe that disk I/O latency spikes periodically. The workload is sensitive to latency. What should they do to improve performance?

A.Increase the size of the persistent disk.

B.Migrate to local SSDs for better performance.

C.Use SSD persistent disks instead of standard persistent disks.

D.Configure a snapshot schedule to offload I/O.

AnswerC

SSD offers lower latency and higher IOPS.

Why this answer

Option C is correct because SSD persistent disks provide consistent, low-latency I/O performance compared to standard persistent disks, which use spinning media and can exhibit periodic latency spikes under sustained load. For latency-sensitive stateful workloads, SSD persistent disks offer predictable IOPS and throughput, directly addressing the periodic spikes observed.

Exam trap

Google Cloud often tests the misconception that increasing disk size or using local SSDs is the universal fix for latency, but the trap here is failing to recognize that the workload is stateful and requires persistent storage, making local SSDs inappropriate despite their performance.

How to eliminate wrong answers

Option A is wrong because increasing the size of a persistent disk increases its baseline IOPS and throughput limits, but it does not eliminate the underlying latency variability of standard persistent disks; the periodic spikes are due to the disk type, not capacity. Option B is wrong because local SSDs provide very low latency but are ephemeral—data is lost if the VM stops or migrates—making them unsuitable for stateful workloads that require persistent data across VM lifecycles. Option D is wrong because configuring a snapshot schedule offloads I/O only during snapshot creation (via incremental snapshots), but it does not prevent periodic latency spikes during normal operation; snapshots are for backup, not performance improvement.

Practice this question →

59

MCQhard

A team wants to optimize a batch processing job that is CPU-bound. Which Compute Engine machine family should they use?

A.C2

B.E2

C.N2

D.M2

AnswerA

Correct. C2 is Compute-optimized for CPU-bound workloads.

Why this answer

C2 is the correct machine family because it is specifically designed for compute-intensive, CPU-bound workloads. It offers the highest clock speed and per-core performance among Compute Engine machine families, making it ideal for batch processing jobs that are limited by CPU throughput rather than memory or I/O.

Exam trap

Google Cloud often tests the distinction between general-purpose (N2, E2) and specialized families (C2, M2), and the trap here is that candidates pick N2 thinking it is 'balanced' for all workloads, failing to recognize that CPU-bound jobs require the dedicated high-frequency compute of the C2 family.

How to eliminate wrong answers

Option B (E2) is wrong because it is a general-purpose, cost-optimized machine family that uses shared-core or smaller dedicated cores, which cannot sustain the high CPU utilization required for CPU-bound batch processing. Option C (N2) is wrong because it is a general-purpose machine family balanced for memory and CPU, but it does not provide the high-frequency CPUs or optimized compute performance of the C2 family. Option D (M2) is wrong because it is a memory-optimized machine family designed for large in-memory databases and memory-intensive workloads, not for CPU-bound tasks.

Practice this question →

60

MCQmedium

A team is using Cloud Run for a containerized application. They notice that requests have high latency due to cold starts. Which configuration change would most effectively reduce cold start latency?

A.Set the min-instances parameter to a value > 0

B.Enable concurrent requests

C.Increase the function timeout

D.Reduce the container image size

AnswerA

Correct. Min-instances keeps instances warm.

Why this answer

Setting the `min-instances` parameter to a value greater than 0 keeps a baseline number of container instances always warm and ready to serve requests. This eliminates the cold start penalty because the runtime environment, including the container and its dependencies, is already initialized and listening for traffic. Cloud Run will scale down to this minimum count, ensuring that new requests are routed to pre-warmed instances rather than triggering a new container startup.

Exam trap

The trap here is that candidates often think reducing container image size (Option D) is the most effective solution, but while it reduces startup time, it does not prevent cold starts entirely, whereas min-instances eliminates them for the baseline count.

How to eliminate wrong answers

Option B is wrong because enabling concurrent requests allows a single container instance to handle multiple requests simultaneously, which improves throughput and resource utilization but does not address the latency caused by starting a new container from scratch. Option C is wrong because increasing the function timeout only extends the maximum duration a request can run; it does not reduce the time it takes for the first request to be processed after a period of inactivity. Option D is wrong because reducing the container image size can speed up the image pull and startup process, but it does not eliminate the cold start entirely; the instance still needs to be provisioned and the runtime initialized, whereas min-instances keeps instances pre-provisioned.

Practice this question →

61

Multi-Selecthard

An application running on GKE experiences high tail latency. The team is optimizing performance. Which THREE techniques should they consider? (Choose three.)

Select 3 answers

A.Use pod anti-affinity to spread pods across nodes

B.Set proper resource requests and limits to avoid resource contention

C.Enable Istio sidecar injection for all pods

D.Use pod affinity to pack pods on same node

E.Increase the number of replicas for stateless services

AnswersA, B, E

Spread reduces resource competition and improves performance.

Why this answer

Pod anti-affinity spreads pods across different nodes, reducing the risk of a single node becoming a hotspot and causing contention for resources like CPU, memory, or network bandwidth. By distributing pods, you minimize queuing delays and improve tail latency, as no single node is overloaded with too many pods competing for the same resources.

Exam trap

Google Cloud often tests the misconception that packing pods together (affinity) improves performance by reducing network hops, but in practice, it increases contention and tail latency, while anti-affinity spreads load and improves predictability.

Practice this question →

62

MCQeasy

An application running on App Engine standard environment has high instance startup latency, leading to slow first requests. What is the most effective configuration change to reduce cold starts?

A.Use manual scaling instead of automatic scaling

B.Use Cloud Endpoints for request authentication

C.Set min_idle_instances to a value greater than 0

D.Increase the memory limit per instance

AnswerC

This ensures that a pool of warm instances is always available, reducing cold start latency.

Why this answer

Setting a minimum number of idle instances ensures that instances are always ready to serve traffic, eliminating startup delays. Manual scaling (A) requires manual capacity planning; increasing memory (C) may not reduce startup time; Cloud Endpoints (D) is for API management.

Practice this question →

63

MCQhard

You are using Cloud CDN with an external HTTPS load balancer. Users in Asia report slow load times for static assets. The origin is in us-central1. What should you do to improve performance?

A.Switch the load balancer to an internal HTTPS load balancer with gRPC.

B.Use premium tier networking for the load balancer.

C.Enable Cloud CDN and configure cache modes for static content.

D.Configure a serverless NEG to route traffic to Cloud Functions.

AnswerC

CDN caches content at edge locations, reducing latency.

Why this answer

Cloud CDN caches static assets at Google's global edge locations, reducing latency for users in Asia by serving content from a nearby point of presence instead of the us-central1 origin. Enabling Cloud CDN and configuring cache modes for static content (e.g., setting Cache-Control headers or using origin cache policies) ensures that frequently requested assets are served from cache, dramatically improving load times for geographically distant users.

Exam trap

Google Cloud often tests the distinction between network-level optimizations (like premium tier) and application-level caching (like CDN), and the trap here is assuming that faster routing alone can solve latency for repeated static asset requests, when in fact caching eliminates the need for those requests to reach the origin at all.

How to eliminate wrong answers

Option A is wrong because switching to an internal HTTPS load balancer with gRPC would restrict traffic to within a VPC network, making it inaccessible to external users in Asia, and gRPC does not inherently improve latency for static assets. Option B is wrong because premium tier networking optimizes routing between Google Cloud regions and the internet but does not cache content; it reduces network hop latency but cannot eliminate the round-trip time to the origin for every request. Option D is wrong because configuring a serverless NEG to route traffic to Cloud Functions would introduce additional compute overhead and is designed for dynamic content or API backends, not for caching or accelerating static asset delivery.

Practice this question →

64

MCQhard

Refer to the exhibit. A team uses these Compute Engine instances to run a batch processing job. The job frequently gets killed on instance-3. What is the most likely cause?

A.Instance-3 has insufficient CPU.

B.Instance-3 has a corrupted disk.

C.Instance-3 is preemptible and gets terminated by Google Cloud.

D.Instance-3 has insufficient memory.

AnswerC

Preemptible instances can be shut down within 24 hours, causing job interruptions.

Why this answer

Instance-3 is a preemptible VM, which means Google Cloud can terminate it at any time if its resources are needed elsewhere. Preemptible instances have a maximum runtime of 24 hours and are subject to termination with a 30-second warning. The batch processing job being frequently killed on instance-3 is a classic symptom of preemptible VM termination, not a resource exhaustion or disk issue.

Exam trap

Google Cloud often tests the distinction between preemptible VM termination and resource exhaustion (CPU/memory/disk) by presenting a scenario where a job is 'killed' — candidates may incorrectly attribute this to insufficient resources rather than recognizing the preemptible VM behavior.

How to eliminate wrong answers

Option A is wrong because insufficient CPU would cause the job to run slowly or fail with a resource-exhausted error, not be killed abruptly by the system. Option B is wrong because a corrupted disk would manifest as I/O errors, data corruption, or boot failures, not as the job being killed without disk-related error messages. Option D is wrong because insufficient memory would cause out-of-memory (OOM) kills logged in the system, which would show as the process being terminated by the kernel, not as the instance being stopped by Google Cloud.

Practice this question →

65

MCQhard

A financial services company runs a real-time trading application on GKE with 10 microservices. The application uses Cloud Spanner as the database. Recently, the team noticed increased latency during peak trading hours. Cloud Monitoring shows high CPU utilization on the Spanner nodes (averaging 80%) and increased locking contention. The team has already added secondary indexes and tuned queries. The application's latency budget is 50ms for writes and 20ms for reads. The team must reduce latency while maintaining strong consistency and meeting the budget. What should they do?

A.Increase the number of Spanner nodes to reduce contention and CPU load

B.Change the application to use eventual consistency for read operations

C.Migrate the database to Cloud Bigtable for higher throughput

D.Implement a write buffer using Cloud Pub/Sub and batch writes to Spanner

AnswerA

More nodes improve throughput and reduce locking contention, meeting latency budgets without sacrificing consistency.

Why this answer

Increasing the number of Spanner nodes directly addresses the root cause: high CPU utilization (80%) and locking contention. More nodes distribute the read/write load, reducing per-node CPU and contention, which lowers latency. This maintains strong consistency and meets the 50ms write / 20ms read budget without architectural changes.

Exam trap

Google Cloud often tests the misconception that adding nodes only helps with storage or throughput, not latency; in Spanner, more nodes reduce CPU contention and lock waits, directly improving latency under high load.

How to eliminate wrong answers

Option B is wrong because changing to eventual consistency violates the requirement for strong consistency, which is non-negotiable for a real-time trading application. Option C is wrong because Cloud Bigtable does not support strong consistency or SQL queries, and it is optimized for analytical workloads, not transactional trading with strict latency budgets. Option D is wrong because a write buffer with Pub/Sub and batch writes would increase write latency beyond the 50ms budget and could introduce data staleness, violating strong consistency.

Practice this question →

66

Multi-Selectmedium

A DevOps team is investigating performance issues in their GKE cluster. They want to use Cloud Profiler to identify the bottleneck. Which three steps are required to start profiling? (Select THREE)

Select 3 answers

A.Configure IAM permissions

B.Deploy the profiler agent to the application container

C.Enable Cloud Profiler API

D.Install a sidecar proxy

E.Modify the application code to include profiling endpoints

AnswersA, B, C

The agent needs roles/profiler.agent.

Why this answer

A is correct because Cloud Profiler requires the `cloudprofiler.agent` IAM role (or equivalent permissions) on the service account used by the GKE node or application to allow the agent to write profiling data to the Cloud Profiler API. Without this permission, the agent cannot upload profiles, and no data will appear in the console.

Exam trap

Google Cloud often tests the misconception that Cloud Profiler requires code modifications or sidecar proxies, when in fact it uses a lightweight agent that requires only API enablement, IAM permissions, and agent deployment.

Practice this question →

67

MCQmedium

A DevOps team uses Cloud Run for a containerized application that processes real-time financial data. The service has a concurrency setting of 80, and instances are scaled based on CPU usage. During market volatility, the service experiences high latency and some requests timeout. Cloud Monitoring shows that the average CPU utilization is 40%, but the instance count spikes to the maximum allowed. What is the most likely cause?

A.The concurrency setting is too low, causing many instances to be created.

B.The max instances limit is set too low, causing requests to queue.

C.The service uses too much memory, causing cold starts.

D.The CPU utilization target for autoscaling is set too high, causing slow scaling.

AnswerA

Low concurrency increases instance count, each handling few requests, causing underutilization.

Why this answer

With a concurrency setting of 80, each instance can handle up to 80 simultaneous requests. However, if the actual request rate exceeds 80 per instance, Cloud Run will spin up new instances. During market volatility, the request volume spikes, causing the instance count to hit the maximum even though average CPU utilization is only 40%.

This indicates that the concurrency limit is too low for the burst traffic, forcing excessive instance creation and leading to high latency and timeouts due to instance startup overhead.

Exam trap

Google Cloud often tests the misconception that CPU utilization is the primary driver of Cloud Run scaling, when in fact concurrency settings and request queuing are the dominant factors in burst scenarios.

How to eliminate wrong answers

Option B is wrong because if the max instances limit were set too low, requests would be queued or rejected, but the symptom here is that instance count spikes to the maximum allowed, not that it is capped prematurely. Option C is wrong because memory issues or cold starts would manifest as increased startup latency or out-of-memory errors, not as high instance count with low average CPU utilization. Option D is wrong because a CPU utilization target set too high would cause the autoscaler to be slow to add instances, leading to sustained high CPU and potential queuing, whereas here instances are being added aggressively despite low average CPU.

Practice this question →

68

MCQeasy

Refer to the exhibit. A team runs a batch processing job on these instances. The job is CPU-bound and can tolerate interruptions. Which instance is the most cost-effective for this workload?

A.instance-3

B.None, they should use a different machine type

C.instance-1

D.instance-2

AnswerC

Correct. Preemptible instance with sufficient CPU at low cost.

Why this answer

Instance-1 is the most cost-effective because it is a preemptible (or spot) VM, which is significantly cheaper than standard on-demand instances. Since the batch processing job is CPU-bound and can tolerate interruptions, preemptible instances are ideal for this workload, offering up to 60-91% cost savings while still providing the necessary compute capacity.

Exam trap

Google Cloud often tests the misconception that any preemptible instance is automatically the best choice, but the trap here is that candidates might overlook whether the workload can actually tolerate interruptions or whether the specific instance type (e.g., with GPUs or high memory) is over-provisioned for a CPU-bound job.

How to eliminate wrong answers

Option A is wrong because instance-3 is likely a standard on-demand or reserved instance, which costs more than preemptible options and is not the most cost-effective for an interruption-tolerant, CPU-bound batch job. Option B is wrong because preemptible instances (like instance-1) are specifically designed for fault-tolerant, batch workloads, so a different machine type is unnecessary. Option D is wrong because instance-2 might be a preemptible instance with a higher machine type or additional resources (e.g., GPUs or more vCPUs) that are not needed for a CPU-bound job, leading to unnecessary cost.

Practice this question →

69

MCQmedium

A team uses Cloud Load Balancing with backend NEGs. Users report intermittent high latency. How should they diagnose the root cause effectively?

A.Increase the number of backend instances immediately

B.Enable Cloud Monitoring latency histogram for the load balancer

C.Check Cloud CDN cache hit ratio

D.Use Cloud Trace to analyze per-request latency spans

AnswerD

Cloud Trace captures latency for each request across distributed services, enabling identification of slow components.

Why this answer

Cloud Trace provides end-to-end latency analysis by capturing per-request spans as they traverse the load balancer, backend NEGs, and other services. This allows you to pinpoint exactly which hop (e.g., load balancer processing, backend queuing, or application code) is causing the intermittent high latency, rather than relying on aggregate metrics or caching assumptions.

Exam trap

Google Cloud often tests the distinction between aggregate monitoring (like histograms or cache ratios) and distributed tracing for diagnosing intermittent, per-request performance issues, leading candidates to choose a simpler metric-based option instead of the more precise tracing tool.

How to eliminate wrong answers

Option A is wrong because blindly increasing backend instances treats a symptom (high latency) without diagnosing its cause; it may waste resources if the latency is due to network congestion, misconfigured timeouts, or a specific backend bottleneck. Option B is wrong because Cloud Monitoring latency histograms show aggregate latency distributions but cannot isolate which specific request or component is responsible for intermittent spikes; they lack per-request span-level granularity. Option C is wrong because Cloud CDN cache hit ratio only affects cacheable content; intermittent high latency for dynamic or uncacheable requests would not be explained by cache misses, and CDN metrics do not reveal backend processing delays.

Practice this question →

70

MCQhard

Refer to the exhibit. The team wants to reduce the service's p50 latency from 2 seconds to under 500ms. Which optimization would have the most impact?

A.Increase the number of service instances

B.Optimize processOrder() by reducing logging

C.Optimize getCustomerData() by caching customer data

D.Optimize saveToDatabase() by using batch writes

AnswerC

Caching eliminates the 1200ms function call, potentially reducing total time by over 50%.

Why this answer

The exhibit shows that getCustomerData() is the most time-consuming operation, taking 1.2 seconds out of the total 2-second p50 latency. Caching customer data eliminates repeated expensive lookups (e.g., database queries or external API calls), directly reducing the critical path latency. This optimization targets the largest bottleneck, making it the most impactful for achieving sub-500ms p50.

Exam trap

Google Cloud often tests the misconception that horizontal scaling or optimizing non-critical paths (like logging) can significantly reduce p50 latency, when in fact the largest single bottleneck must be addressed first.

How to eliminate wrong answers

Option A is wrong because increasing service instances (horizontal scaling) reduces throughput bottlenecks but does not reduce per-request latency; it may even add network overhead. Option B is wrong because optimizing processOrder() by reducing logging saves only a few milliseconds, not the ~1.2 seconds needed to meet the target. Option D is wrong because saveToDatabase() using batch writes improves throughput for bulk operations but does not reduce the latency of a single request's synchronous write path.

Practice this question →

71

MCQeasy

A company notices increased latency for their web application running on Compute Engine. They suspect a database bottleneck. Which Google Cloud service should they use to identify slow queries?

A.Cloud Logging

B.Cloud Debugger

C.Cloud Trace

D.Cloud SQL Query Insights

AnswerD

Cloud SQL Query Insights provides self-service, intelligent query diagnostics.

Why this answer

Cloud SQL Query Insights is the correct choice because it is a Google Cloud managed service specifically designed to identify and analyze database performance issues, including slow queries, in Cloud SQL instances. It provides detailed query performance metrics, execution plans, and recommendations to optimize database queries, directly addressing the bottleneck in a Compute Engine web application.

Exam trap

The trap here is that candidates often confuse Cloud Trace (which traces request-level latency across services) with database-specific query analysis, but Cloud Trace does not provide the granular SQL-level insights needed to identify slow queries in a database.

How to eliminate wrong answers

Option A is wrong because Cloud Logging aggregates and stores log data from various sources, but it does not provide built-in query analysis or performance insights for databases; it would require manual log parsing to identify slow queries. Option B is wrong because Cloud Debugger is used to inspect application code state at runtime for debugging purposes, not for analyzing database query performance or identifying slow queries. Option C is wrong because Cloud Trace is a distributed tracing service that captures latency data across microservices and HTTP requests, but it does not offer database-specific query analysis or insights into slow SQL queries.

Practice this question →

72

MCQmedium

A company has a stateful application deployed on a GKE cluster with stateful sets using persistent volumes. The application is experiencing higher than expected latency for write operations. The team uses SSDs for persistent disks. Cloud Monitoring shows high disk queue depth on the nodes where the stateful pods are scheduled. Which of the following is the most effective optimization?

A.Configure a separate node pool with local SSDs for the stateful workloads.

B.Increase the number of replicas of the stateful set.

C.Enable disk caching on the persistent disks.

D.Use regional persistent disks for higher throughput.

AnswerC

Disk caching can significantly reduce I/O latency if supported by the workload.

Why this answer

Option C is correct because enabling read/write caching on persistent disks reduces write latency by buffering writes to the local instance's SSD before acknowledging them to the application. This directly addresses the high disk queue depth observed in Cloud Monitoring, as caching absorbs bursty write I/O and lowers queue depth. For stateful workloads on GKE with SSDs, disk caching is a standard optimization to improve write performance without changing the underlying disk type.

Exam trap

Google Cloud often tests the misconception that local SSDs are always better for performance, but the trap here is that local SSDs lack data persistence, making them inappropriate for stateful sets that require durable storage across pod lifecycle events.

How to eliminate wrong answers

Option A is wrong because local SSDs are ephemeral and do not persist data across pod rescheduling or node failures, making them unsuitable for stateful sets that require durable persistent volumes; they also do not support the same caching mechanisms as persistent disks. Option B is wrong because increasing the number of replicas does not reduce write latency for a single stateful pod; it only distributes read traffic and may increase contention on shared backend storage. Option D is wrong because regional persistent disks provide higher availability through synchronous replication across zones, but they do not inherently improve throughput or reduce write latency compared to zonal persistent disks; in fact, replication adds write latency overhead.

Practice this question →

73

Multi-Selectmedium

A team is optimizing the performance of their application running on Cloud Run. They want to reduce cold starts. Which two actions would help? (Select TWO)

Select 2 answers

A.Enable min instances

B.Increase the maximum number of container instances

C.Increase the CPU limit

D.Use a custom container base image with reduced size

E.Enable HTTP/2

AnswersA, D

Keeps a baseline of warm instances, avoiding cold starts.

Why this answer

Enabling min instances (option A) keeps a baseline number of container instances always warm and ready to serve requests, eliminating the cold start latency for those instances. This directly reduces the time required to spin up a new container when traffic spikes, as the pre-warmed instances can handle requests immediately.

Exam trap

Google Cloud often tests the misconception that increasing resource limits (like CPU or memory) or scaling parameters (like max instances) can reduce cold starts, when in fact only pre-warming instances (min instances) and reducing container image size (option D) directly address the startup latency.

Practice this question →

74

MCQeasy

You are a DevOps engineer at a media streaming company. Your application runs on Google Kubernetes Engine (GKE) and serves video content to users worldwide. The application uses a microservices architecture with a frontend service that handles user requests and a backend transcoding service that converts video files. Recently, you noticed that the transcoding service is causing performance bottlenecks during peak hours, leading to increased latency for users. You have enabled Cloud Monitoring and Cloud Trace and observed that the transcoding service's CPU utilization is consistently above 90% during peak times, and the queue of video transcoding tasks is growing. The current deployment has 5 replicas of the transcoding service with no autoscaling. You need to optimize the performance of the transcoding service to reduce latency. Your company has a limited budget and wants to minimize costs. What should you do?

A.Enable Horizontal Pod Autoscaling (HPA) on the transcoding service based on CPU utilization, targeting 70% utilization.

B.Upgrade the transcoding service to a larger machine type with more CPU and memory.

C.Increase the number of replicas of the transcoding service to 10 and keep it static.

D.Refactor the frontend to push transcoding tasks to a Cloud Pub/Sub topic, and create a separate deployment of workers that subscribe to the topic and perform transcoding. Configure HPA on the worker deployment based on the Pub/Sub subscription backlog.

AnswerD

This decouples the frontend from the transcoding, preventing blocking. Workers can scale based on queue depth, optimizing cost and performance.

Why this answer

Option D is correct because it decouples the transcoding workload from user-facing requests using Cloud Pub/Sub, allowing the worker deployment to scale independently based on the backlog of tasks. This pattern reduces latency by preventing the frontend from being blocked by transcoding, and HPA on Pub/Sub backlog ensures cost-efficient scaling only when demand increases, aligning with the limited budget.

Exam trap

Google Cloud often tests the misconception that CPU-based HPA is sufficient for all performance bottlenecks, but the trap here is that CPU-bound services with growing queues require decoupling and backlog-based scaling, not just more replicas or larger machines.

How to eliminate wrong answers

Option A is wrong because HPA based on CPU utilization alone does not address the root cause of the bottleneck—the transcoding service is already CPU-bound at 90%, and scaling based on CPU will only add more replicas that still compete for the same resources, potentially increasing cost without resolving the queue growth. Option B is wrong because upgrading to a larger machine type increases cost significantly without improving scalability or handling the queue backlog, and it does not address the architectural coupling between frontend and transcoding. Option C is wrong because increasing replicas to 10 statically raises costs and does not adapt to variable demand, leading to over-provisioning during off-peak hours and still failing to handle peak loads efficiently.

Practice this question →

75

MCQmedium

An organization uses Cloud Armor to protect their web application. After enabling the service, they notice increased latency on some requests. Which Cloud Armor feature is most likely causing this?

A.Rate limiting

B.IP blacklist/whitelist

C.Pre-configured WAF rules

D.Geo-based access control

AnswerD

Checking geographic location involves IP database lookup, which can increase latency.

Why this answer

Geo-based access control (D) is the most likely cause of increased latency because it requires Cloud Armor to perform a GeoIP lookup on every request to determine the geographic origin. This lookup adds processing overhead, especially if the organization has a large or complex set of geo-based rules, which can introduce measurable delay.

Exam trap

The trap here is that candidates often assume all security features add latency equally, but Cisco specifically tests that GeoIP lookups are the most computationally expensive compared to simple IP or rate-limit checks.

How to eliminate wrong answers

Option A is wrong because rate limiting typically reduces latency by dropping or throttling excess requests, not increasing it. Option B is wrong because IP blacklist/whitelist checks are simple, fast lookups in a small list that add negligible latency. Option C is wrong because pre-configured WAF rules (e.g., OWASP Top 10) are evaluated efficiently by Cloud Armor's edge infrastructure and are not a primary source of added latency.

Practice this question →

Page 1 of 2 · 113 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Service Performance questions.

Start 20-question session