Knowledge + Practice

CCNA Optimizing service performance Questions

38 of 113 questions · Page 2/2 · Optimizing service performance · Answers revealed

Practice these questions Domain overview All questions

76

MCQmedium

Which storage class provides the lowest cost for data accessed less than once a year?

A.Nearline

B.Archive

C.Standard

D.Coldline

AnswerB

Correct. Archive is for data accessed less than once a year, at lowest cost.

Why this answer

Archive storage class is the correct answer because it is specifically designed for long-term data retention where access is extremely infrequent, such as less than once a year. It offers the lowest storage cost among Google Cloud Storage classes, but with higher retrieval costs and a minimum storage duration of 365 days, making it ideal for data that is rarely accessed.

Exam trap

Google Cloud often tests the distinction between Coldline and Archive by making candidates confuse the minimum storage duration (90 days for Coldline vs. 365 days for Archive) and the access frequency thresholds (once a quarter vs. once a year), leading them to pick Coldline when Archive is the correct lowest-cost option for data accessed less than once a year.

How to eliminate wrong answers

Option A (Nearline) is wrong because it is optimized for data accessed less than once a month, not less than once a year, and has a higher storage cost than Archive. Option C (Standard) is wrong because it is designed for frequently accessed data (hot data) and has the highest storage cost among the options. Option D (Coldline) is wrong because it is intended for data accessed less than once a quarter (90 days), still more frequent than once a year, and its storage cost is higher than Archive.

Practice this question →

77

MCQmedium

A team is running a stateful application on Compute Engine VMs. They notice that the application performance degrades over time as the disk fills up. They want to proactively alert before performance degrades. Which metric should they monitor?

A.Disk usage percentage

B.Disk read/write latency

C.Network sent bytes

D.CPU utilization

AnswerA

Monitors disk capacity, enabling early alerts.

Why this answer

Disk usage percentage is the correct metric because the application performance degrades as the disk fills up, which is a capacity issue. Monitoring disk usage percentage allows the team to set an alert threshold (e.g., 80% or 90%) to proactively take action (e.g., clean up logs or resize disks) before the disk becomes full and causes performance degradation. This directly addresses the root cause described in the scenario.

Exam trap

The trap here is that candidates may confuse a capacity metric (disk usage percentage) with a performance metric (disk latency), assuming latency is the best indicator of degradation, but the question explicitly ties the degradation to the disk filling up, making capacity the direct cause to monitor.

How to eliminate wrong answers

Option B is wrong because disk read/write latency measures the time taken for I/O operations, which can indicate performance issues but does not directly reflect the disk filling up; latency may increase due to other factors like contention or hardware failure, not solely capacity. Option C is wrong because network sent bytes tracks outbound traffic, which is unrelated to disk space consumption and performance degradation from a full disk. Option D is wrong because CPU utilization measures processor load, which can degrade application performance but is not the specific cause described—the problem is explicitly tied to the disk filling up, not CPU saturation.

Practice this question →

78

MCQeasy

A DevOps engineer is optimizing a Cloud Run service that experiences cold starts. The service is written in Python and uses several large libraries. Which change is most effective to reduce cold start latency?

A.Increase the maximum number of concurrent requests per container.

B.Set a minimum number of instances to keep containers warm.

C.Set a longer request timeout.

D.Increase the CPU allocation for the service.

AnswerB

Min instances avoid cold starts entirely.

Why this answer

Setting a minimum number of instances (option B) ensures that a baseline of container instances is always warm and ready to serve requests, eliminating cold starts for those instances. Cold starts occur when a new container must be initialized, including loading large Python libraries, which adds significant latency. By keeping a minimum number of instances running, the service avoids the initialization delay for the first request to each instance.

Exam trap

Google Cloud often tests the misconception that increasing CPU or concurrency directly reduces cold start latency, but the key insight is that cold starts are caused by the initialization of new containers, not by processing speed or request handling capacity.

How to eliminate wrong answers

Option A is wrong because increasing the maximum number of concurrent requests per container does not reduce cold start latency; it only allows each container to handle more requests simultaneously, which can improve throughput but does not prevent the initial startup delay. Option C is wrong because setting a longer request timeout does not address cold starts; it only gives the service more time to respond, which might mask latency but does not reduce the initialization time. Option D is wrong because increasing CPU allocation can speed up request processing but does not eliminate the need to load large Python libraries during a cold start; the startup time is dominated by library loading, which is I/O-bound and not significantly improved by more CPU.

Practice this question →

79

MCQhard

A microservices application on GKE with Istio service mesh experienced performance degradation after a recent update. Which optimization technique is most effective for improving inter-service communication performance?

A.Increase Istio sidecar resource limits

B.Implement request collapsing to merge identical requests

C.Use Istio traffic mirroring to offload requests

D.Enable gRPC for inter-service communication

AnswerD

gRPC leverages HTTP/2 and binary serialization, reducing overhead and latency compared to JSON-based REST.

Why this answer

Option D is correct because gRPC uses HTTP/2 as its transport protocol, which enables multiplexed streams over a single TCP connection, reducing latency and improving throughput for inter-service communication. In a GKE environment with Istio, gRPC also leverages Istio's native support for HTTP/2-based traffic, allowing efficient load balancing and connection reuse. This directly addresses performance degradation caused by chatty or high-frequency service calls.

Exam trap

Google Cloud often tests the misconception that increasing resources (Option A) or adding caching (Option B) is the primary fix for performance issues, when the real bottleneck is often the communication protocol itself, especially in service mesh environments where gRPC is the recommended approach.

How to eliminate wrong answers

Option A is wrong because increasing Istio sidecar resource limits only addresses resource contention, not the underlying inefficiency in inter-service communication protocols; it may mask symptoms without improving protocol-level performance. Option B is wrong because request collapsing merges identical requests at a proxy or cache layer, which is typically used for read-heavy workloads and does not optimize the communication protocol itself; it can introduce additional latency for dynamic service calls. Option C is wrong because traffic mirroring duplicates requests for testing or observability, not for offloading production traffic; it actually increases load on the system and degrades performance further.

Practice this question →

80

Multi-Selecthard

A company uses Cloud Monitoring to set up alerting for their production system. They want to reduce alert fatigue while ensuring critical issues are caught quickly. Which two strategies should they implement? (Select TWO)

Select 2 answers

A.Use notification channels with escalation policies

B.Use low threshold values to catch issues early

C.Implement alert aggregation and deduplication

D.Disable alerts during off-hours

E.Set up separate alerts for each microservice

AnswersA, C

Ensures the right people are notified and issues are escalated.

Why this answer

Option A is correct because notification channels with escalation policies ensure that alerts are routed to the appropriate responders based on severity and time thresholds, reducing noise by preventing low-severity issues from repeatedly notifying the same person. Escalation policies automatically escalate unacknowledged critical alerts to higher-level teams, ensuring critical issues are caught quickly without overwhelming on-call staff.

Exam trap

Google Cloud often tests the misconception that lowering thresholds or disabling alerts improves fatigue, when in fact these actions either increase noise or create dangerous gaps in coverage; the correct approach is to use escalation policies and aggregation to intelligently manage alert volume.

Practice this question →

81

MCQhard

A DevOps team is using Cloud Build to build and push container images. The build times have increased significantly. They suspect that the build cache is not being used effectively. Which build configuration change would likely improve cache usage?

A.Increase the machine type

B.Use a private pool

C.Use kaniko instead of Docker

D.Enable parallel builds

AnswerC

Kaniko leverages fine-grained layer caching, reducing rebuild time.

Why this answer

Kaniko is a cache-aware container image builder that can leverage a remote image registry as a cache layer, unlike the default Docker builder which relies on a local Docker daemon and its local layer cache. By using Kaniko with a configured cache repository, the team can reuse previously built layers across builds, even when builds run on different Cloud Build workers, significantly reducing build times.

Exam trap

The trap here is that candidates often assume 'more resources' (larger machine) or 'dedicated resources' (private pool) will fix caching issues, when in fact the problem is the ephemeral nature of the build environment and the need for a persistent, remote cache mechanism like Kaniko provides.

How to eliminate wrong answers

Option A is wrong because increasing the machine type (e.g., from e2-standard-2 to e2-highcpu-8) provides more CPU and memory, which can speed up the build execution but does not address the root cause of cache misses or ineffective cache usage. Option B is wrong because using a private pool provides dedicated compute resources and reduces contention, but it does not change the caching mechanism; the Docker builder still uses a local cache that is not persisted across builds. Option D is wrong because enabling parallel builds runs multiple build steps concurrently, which can reduce overall wall-clock time but does not improve cache hit rates; it may even cause cache conflicts if steps share dependencies.

Practice this question →

82

MCQmedium

A company is migrating a batch processing workload to Google Cloud. The workload is CPU-intensive and runs for a few hours each day. Which Compute Engine machine family should they choose to optimize performance and cost?

A.Compute-optimized (C2/C2D)

B.Burstable (T2D)

C.GPU-accelerated (A2)

D.Memory-optimized (M2)

AnswerA

C2 family offers high CPU performance per core, ideal for batch processing.

Why this answer

A is correct because the workload is CPU-intensive and runs for a few hours each day, which is a sustained, compute-bound task. Compute-optimized machines (C2/C2D) are designed specifically for high-performance computing (HPC) and batch processing, offering the highest ratio of vCPU to memory and the fastest single-threaded performance on Google Cloud. This minimizes runtime and cost per job compared to general-purpose or burstable families.

Exam trap

The trap here is that candidates often confuse 'cost optimization' with choosing the cheapest machine family (burstable), ignoring that sustained CPU-intensive workloads on burstable instances incur performance throttling and longer runtimes, ultimately increasing total cost of ownership (TCO).

How to eliminate wrong answers

Option B is wrong because Burstable (T2D) machines are designed for workloads with low average CPU utilization and occasional spikes, not sustained CPU-intensive batch processing; they throttle performance when credits are exhausted, leading to unpredictable job completion times. Option C is wrong because GPU-accelerated (A2) machines are optimized for parallel processing workloads like ML training and 3D rendering, not general CPU-intensive batch jobs, and would incur unnecessary cost for unused GPU resources. Option D is wrong because Memory-optimized (M2) machines are designed for memory-intensive workloads such as large in-memory databases and SAP HANA, not CPU-bound tasks, and their higher memory-to-vCPU ratio would waste resources and increase cost.

Practice this question →

83

MCQmedium

A company runs a production web application on Google Compute Engine behind an HTTP(S) load balancer. The application is deployed across multiple managed instance groups in three regions (us-east1, europe-west1, asia-east1). Recently, users report slow page load times. Monitoring shows that CPU utilization on instances is consistently low (around 30%) but memory usage is high (over 80%). The application uses a self-managed in-memory cache per instance to store session data and frequently accessed objects. The team is considering adding more instances to the instance groups to distribute the load. However, they notice that the load balancer's latency is spiking and the cache hit ratio is low. What is the most likely issue and what should the engineer do?

A.Add more instances to the instance groups to increase total memory capacity.

B.Migrate to a managed in-memory cache like Memorystore for Redis to serve as a centralized cache shared by all instances.

C.Increase the machine type of instances to have more memory per instance (e.g., n1-highmem-4).

D.Enable autoscaling based on memory utilization.

AnswerB

A centralized cache eliminates duplication, reduces per-instance memory pressure, and improves cache hit ratio, reducing latency.

Why this answer

The low cache hit ratio and high memory usage indicate that each instance's self-managed in-memory cache is fragmented and inefficient, as session data and frequently accessed objects are not shared across instances. This forces the load balancer to repeatedly fetch data from the backend, causing latency spikes. Migrating to a centralized managed cache like Memorystore for Redis eliminates per-instance cache duplication, improves cache hit ratio, and reduces load balancer latency by serving data from a single, consistent cache.

Exam trap

Google Cloud often tests the misconception that scaling horizontally (adding more instances) or vertically (increasing instance size) can solve performance issues caused by architectural inefficiencies like cache fragmentation, rather than addressing the root cause with a shared caching layer.

How to eliminate wrong answers

Option A is wrong because adding more instances only increases total memory capacity but does not solve the fundamental issue of cache fragmentation; each new instance would still maintain its own isolated cache, leading to continued low cache hit ratios and latency spikes. Option C is wrong because increasing the machine type per instance (e.g., n1-highmem-4) provides more memory per instance but does not address the lack of cache sharing; the cache hit ratio remains low as each instance still caches independently, and the load balancer latency persists. Option D is wrong because enabling autoscaling based on memory utilization would only add more instances when memory is high, but this does not fix the root cause of cache inefficiency; it may even worsen the problem by increasing the number of fragmented caches.

Practice this question →

84

Multi-Selectmedium

A web application experiences high latency during peak hours. Which TWO actions should the team take to optimize performance?

Select 2 answers

A.Implement autoscaling based on CPU utilization

B.Enable Cloud CDN with the origin as the backend bucket

C.Use Cloud Memorystore to cache frequently accessed data

D.Increase the size of the instances serving the application

E.Reduce the number of backend services to simplify routing

AnswersA, C

Autoscaling adjusts capacity to match demand, preventing overload during peak hours.

Why this answer

Option A is correct because implementing autoscaling based on CPU utilization allows the application to dynamically add compute instances during peak hours when CPU load increases, thereby distributing the request load and reducing latency. This aligns with Google Cloud's managed instance groups and autoscaling policies that scale out based on metrics like CPU utilization, ensuring sufficient resources are available to handle traffic spikes without manual intervention.

Exam trap

Google Cloud often tests the misconception that vertical scaling (increasing instance size) is equivalent to horizontal scaling (autoscaling) for handling peak loads, but the exam expects candidates to recognize that horizontal scaling is more cost-effective, resilient, and aligned with cloud-native best practices for optimizing performance under variable traffic.

Practice this question →

85

Drag & Dropmedium

Arrange the steps to implement a canary deployment for a Cloud Run service.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Deploy new revision, shift traffic, monitor, increase, remove old.

Practice this question →

86

MCQhard

You are designing a globally distributed application using Cloud Spanner. The application has a write-heavy workload. You notice that write latency increases as the number of nodes increases. What is the most likely cause?

A.The instance is using a multi-region configuration with too many read-only replicas.

B.The workload has many cross-node transactions due to split rows.

C.The application is using stale reads for write transactions.

D.The number of splits is too low, causing hotspots.

AnswerB

Cross-split transactions require coordination, increasing latency.

Why this answer

Option B is correct because in Cloud Spanner, write-heavy workloads with many cross-node transactions cause increased write latency as nodes are added. This occurs because Spanner splits rows across nodes, and transactions that span multiple splits require two-phase commit (2PC) coordination between nodes, which adds network overhead and latency. Adding more nodes increases the likelihood that a transaction touches multiple splits, exacerbating the coordination cost.

Exam trap

The trap here is that candidates often assume adding nodes always improves performance, but Cisco tests the counterintuitive behavior where cross-node coordination overhead in distributed databases like Spanner can degrade write latency with scale.

How to eliminate wrong answers

Option A is wrong because multi-region configurations with read-only replicas do not directly affect write latency; read-only replicas serve reads and do not participate in write quorums, so they do not cause write latency to increase with node count. Option C is wrong because stale reads are used for read-only transactions, not write transactions; write transactions always require strong reads to ensure consistency, so stale reads cannot be applied to writes. Option D is wrong because too few splits cause hotspots (uneven load on nodes), which would increase latency as load grows, but the question states latency increases as nodes increase, which is the opposite of hotspot behavior—hotspots are mitigated by adding nodes, not worsened.

Practice this question →

87

MCQmedium

Your GKE cluster runs a batch job that processes large files from Cloud Storage. The job uses CPUs inefficiently, with low utilization. You want to reduce cost while maintaining throughput. Which approach should you take?

A.Use Cloud Storage FUSE to stream files directly into containers, avoiding local storage.

B.Configure the node pool to use spot VMs.

C.Use local SSDs for faster file access.

D.Increase the CPU request for the job pods.

AnswerA

Streaming reduces latency and cost by eliminating disk.

Why this answer

Option A is correct because Cloud Storage FUSE allows containers to stream files directly from Cloud Storage without first downloading them to a local disk. This eliminates the I/O bottleneck of writing to local storage and reduces CPU overhead from disk operations, enabling the batch job to process files more efficiently and maintain throughput while using fewer CPU resources.

Exam trap

Google Cloud often tests the misconception that faster storage (local SSDs) or cheaper compute (spot VMs) always reduces cost, when the real issue is inefficient resource utilization that must be addressed at the application or data access layer.

How to eliminate wrong answers

Option B is wrong because spot VMs reduce cost but do not address the root cause of low CPU utilization; they may even increase cost if preemptions cause job restarts and wasted cycles. Option C is wrong because local SSDs improve disk I/O speed, but the problem is CPU inefficiency, not disk latency; faster storage does not fix underutilized CPUs. Option D is wrong because increasing CPU requests for pods will allocate more CPU resources but will not improve utilization if the job is not CPU-bound; it may actually increase cost without improving throughput.

Practice this question →

88

Multi-Selecteasy

Which TWO actions can reduce startup latency for a Cloud Run service?

Select 2 answers

A.Use a regional Cloud Run with separate service per region.

B.Increase the maximum instances limit.

C.Optimize the container image to reduce size.

D.Increase the container concurrency setting.

E.Set a minimum number of instances to keep warm.

AnswersC, E

Smaller images pull faster from Container Registry.

Why this answer

Option C is correct because a smaller container image reduces the time required to pull the image from the registry to the compute node during cold starts. Cloud Run's startup latency is dominated by image download and filesystem extraction; optimizing the image (e.g., using distroless base images, multi-stage builds, or removing unnecessary layers) directly shortens this critical path.

Exam trap

Google Cloud often tests the misconception that scaling limits or concurrency settings affect startup latency, when in reality only image optimization and pre-warming (minimum instances) directly reduce cold-start time.

Practice this question →

89

MCQhard

Refer to the exhibit. A payment microservice on GKE logs frequent 'connection closed' errors. The service connects to a backend database. Which approach is most effective to reduce these errors?

A.Implement retry logic with exponential backoff in the service code.

B.Increase the number of pod replicas to distribute load.

C.Adjust the readiness probe to be more aggressive.

D.Increase the CPU and memory limits for the container.

AnswerA

Retries handle transient connection closures.

Why this answer

The 'connection closed' errors indicate transient network failures or database server-side connection drops. Implementing retry logic with exponential backoff in the service code is the most effective approach because it allows the microservice to gracefully recover from intermittent failures without overwhelming the database with immediate retries. This pattern is a standard resilience technique for cloud-native applications on GKE, as it handles temporary issues like network blips or database connection pool exhaustion.

Exam trap

Google Cloud often tests the misconception that scaling resources (pods or limits) fixes all performance issues, but here the trap is that 'connection closed' errors are typically transient network or database-side issues, not resource bottlenecks, so retry logic is the correct resilience pattern.

How to eliminate wrong answers

Option B is wrong because increasing pod replicas distributes load but does not address the root cause of transient connection failures; it may even increase the number of concurrent connections, potentially worsening the problem. Option C is wrong because adjusting the readiness probe to be more aggressive (e.g., shorter interval or lower threshold) could cause pods to be prematurely removed from service during brief hiccups, leading to more instability and connection errors. Option D is wrong because increasing CPU and memory limits addresses resource starvation, not transient network or database connection drops; the errors are not caused by insufficient resources but by connection lifecycle issues.

Practice this question →

90

MCQeasy

A company serves static assets (images, CSS) to global users. Users in distant regions experience slow load times. Which service should they use to optimize delivery?

A.Cloud CDN

B.Cloud Load Balancing

C.Cloud NAT

D.Cloud Armor

AnswerA

Cloud CDN caches static content at global edge locations, reducing latency for distant users.

Why this answer

Cloud CDN (Content Delivery Network) caches static assets (images, CSS) at edge locations worldwide, reducing latency for distant users by serving content from a nearby point of presence (PoP). This directly addresses slow load times caused by geographic distance, as the origin server is no longer the sole source of delivery.

Exam trap

Google Cloud often tests the distinction between caching (CDN) and load balancing, where candidates mistakenly think distributing traffic globally (Cloud Load Balancing) will also cache content, but load balancing alone does not reduce latency for static assets without edge caching.

How to eliminate wrong answers

Option B (Cloud Load Balancing) is wrong because it distributes incoming traffic across multiple backend instances to improve availability and fault tolerance, but it does not cache content or reduce latency for static assets globally. Option C (Cloud NAT) is wrong because it provides outbound internet connectivity for private instances (e.g., VMs without public IPs) by translating private IPs to public IPs, and has no role in content delivery or caching. Option D (Cloud Armor) is wrong because it is a web application firewall (WAF) that protects against DDoS and application-layer attacks (e.g., SQL injection, XSS), not a caching or content delivery service.

Practice this question →

91

MCQhard

You created the above alert policy to detect high CPU utilization in your GKE cluster. However, you are receiving too many false positive alerts. What is the most likely reason?

A.The threshold value of 0.8 is too low; it should be 0.9 for production.

B.The crossSeriesReducer is set to REDUCE_SUM, which sums CPU across containers, so a namespace with many containers can trigger the alert even if each container uses less than 80%.

C.The duration of 300 seconds (5 minutes) is too short; it should be longer to avoid transient spikes.

D.The filter does not specify a specific namespace, causing alerts from all namespaces.

AnswerB

REDUCE_SUM adds up CPU usage of all containers in the namespace/container group. This can exceed 0.8 when many containers are active, even if each is below 80%. Using REDUCE_MAX per container would be more appropriate.

Why this answer

Option B is correct because the crossSeriesReducer set to REDUCE_SUM aggregates CPU utilization across all containers in a namespace. This means that even if each container uses only 20% CPU, a namespace with five containers would show a total of 100%, triggering the alert when the threshold is 0.8 (80%). This causes false positives because the alert fires on the sum, not on individual container utilization.

Exam trap

Google Cloud often tests the misconception that false positives are caused by thresholds being too low or durations too short, when the real issue is an incorrect aggregation reducer that sums metrics across multiple resources.

How to eliminate wrong answers

Option A is wrong because raising the threshold to 0.9 would not fix the root cause—the aggregation issue—and could still trigger false positives if the sum of many low-utilization containers exceeds 0.9. Option C is wrong because the duration of 300 seconds is already long enough to filter transient spikes; extending it further would delay legitimate alerts without addressing the aggregation problem. Option D is wrong because the filter not specifying a namespace is not the primary cause; the alert would still fire on aggregated CPU across all containers, and adding a namespace filter would not prevent false positives from summed utilization within that namespace.

Practice this question →

92

Multi-Selectmedium

Which TWO metrics from Cloud Monitoring would best indicate that a GKE workload is experiencing CPU throttling due to a resource quota? (Choose 2)

Select 2 answers

A.node/cpu/usage_time

B.container/cpu/throttled_time

C.container/memory/usage_bytes

D.container/cpu/usage_time

E.container/accelerator/duty_cycle

AnswersB, D

Directly shows time spent throttled.

Why this answer

Option B is correct because `container/cpu/throttled_time` directly measures the cumulative time a container's CPU usage was throttled due to exceeding its assigned CPU quota (CFS quota). Option D is correct because `container/cpu/usage_time` shows the actual CPU time used by the container; when compared against the quota limit, a high usage_time relative to the quota indicates that throttling is likely occurring. Together, these two metrics confirm both the occurrence and the cause of CPU throttling.

Exam trap

Google Cloud often tests the distinction between node-level and container-level metrics, and the trap here is that candidates may pick `node/cpu/usage_time` (Option A) thinking it reflects container throttling, when in fact it aggregates all pods on the node and cannot reveal per-container quota enforcement.

Practice this question →

93

MCQeasy

Refer to the exhibit. A GKE node shows MemoryPressure condition. What should the team do to improve performance of pods scheduled on this node?

A.Enable cluster autoscaler to scale up new nodes

B.Increase the node's memory by changing the machine type

C.Adjust pod resource requests to leave more allocatable memory

D.Evict pods and delete the node

AnswerA

Cluster autoscaler adds nodes when pod is unschedulable due to memory pressure, distributing load.

Why this answer

When a GKE node reports a MemoryPressure condition, it means the node's kubelet is actively evicting pods to free memory, which degrades performance. Enabling cluster autoscaler allows the cluster to automatically provision new nodes when existing nodes are under memory pressure, redistributing pods and alleviating the condition without manual intervention.

Exam trap

Google Cloud often tests the misconception that MemoryPressure can be resolved by modifying pod requests or node size, when the correct automated solution is cluster autoscaler to add capacity dynamically.

How to eliminate wrong answers

Option B is wrong because changing the machine type requires recreating the node, which is disruptive and not a dynamic solution; cluster autoscaler handles scaling without manual node replacement. Option C is wrong because adjusting pod resource requests only affects future scheduling, not the current memory pressure on the node, and does not free memory for existing pods. Option D is wrong because evicting pods and deleting the node is a manual, reactive action that causes downtime, whereas cluster autoscaler provides automated, proactive scaling.

Practice this question →

94

MCQmedium

A company runs a microservices application on GKE. The checkout service has high tail latency. Using Cloud Profiler, the team finds that most time is spent in database queries. Which action should they take to improve performance?

A.Migrate the database to Cloud Spanner.

B.Increase the number of replicas of the checkout service.

C.Add database connection pooling using a sidecar proxy.

D.Enable Cloud CDN for the checkout API.

AnswerC

Connection pooling reduces overhead of establishing connections, improving latency.

Why this answer

Option C is correct because database connection pooling reduces the overhead of establishing new connections for each request, which is a common cause of high tail latency in microservices. By using a sidecar proxy (e.g., Envoy or a dedicated connection pooler like PgBouncer), the checkout service can reuse existing database connections, minimizing latency spikes from connection setup and teardown. This directly addresses the root cause identified by Cloud Profiler—time spent in database queries—without requiring a database migration or scaling the service itself.

Exam trap

Google Cloud often tests the misconception that scaling out (increasing replicas) or migrating to a different database solves all performance issues, when the real problem is connection management overhead within the existing database layer.

How to eliminate wrong answers

Option A is wrong because migrating to Cloud Spanner does not inherently reduce per-query latency; it provides horizontal scalability and strong consistency, but the bottleneck is connection overhead, not database throughput or consistency. Option B is wrong because increasing replicas of the checkout service does not reduce the latency of individual database queries; it may even increase connection churn and exacerbate the problem. Option D is wrong because Cloud CDN caches static content at edge locations, but the checkout API involves dynamic, transactional database queries that cannot be cached, so CDN provides no benefit for this latency issue.

Practice this question →

95

MCQeasy

A company wants to reduce the response time of a globally distributed web application. Which Google Cloud service can cache static content at edge locations to improve performance?

A.Cloud DNS

B.Cloud NAT

C.Cloud Armor

D.Cloud CDN

AnswerD

Correct. Cloud CDN caches content at edge locations to reduce latency.

Why this answer

Cloud CDN (Content Delivery Network) uses Google's globally distributed edge caches to serve static content (e.g., images, CSS, JavaScript) from locations closer to users, reducing latency and offloading origin servers. It integrates with external HTTPS load balancers to automatically cache responses based on cache-control headers, directly addressing the goal of improving response time for a globally distributed web application.

Exam trap

The trap here is that candidates confuse Cloud Armor (a security service) with a content delivery service, or assume Cloud DNS can cache content because it involves 'edge' name servers, but DNS caching is for DNS records, not web content.

How to eliminate wrong answers

Option A is wrong because Cloud DNS is a domain name resolution service that translates domain names to IP addresses; it does not cache or serve static content at edge locations. Option B is wrong because Cloud NAT provides outbound internet connectivity for private instances via network address translation, with no caching or edge delivery capabilities. Option C is wrong because Cloud Armor is a web application firewall (WAF) and DDoS protection service that filters traffic based on security policies; it does not cache static content or accelerate content delivery.

Practice this question →

96

MCQmedium

An application on GKE frequently reads the same data from a Cloud Storage bucket. The data changes rarely. Which solution will best improve read performance and reduce costs?

A.Deploy a sidecar container that caches the data in an emptyDir volume.

B.Configure a Cloud SQL read replica for the data.

C.Increase the number of nodes in the cluster.

D.Use a StatefulSet with a persistent volume claim to store the data.

AnswerA

Sidecar with caching can serve data from local disk, reducing Cloud Storage reads.

Why this answer

Correct: Use a sidecar container with a shared emptyDir volume to cache data from Cloud Storage using a tool like gcsfuse with caching. Option A is wrong because persistent volumes are for stateful workloads. Option C is wrong because read replicas are for databases.

Option D is wrong because increasing node count does not improve per-pod read speed.

Practice this question →

97

MCQeasy

A startup runs a mobile app backend on App Engine standard environment. They recently added new features, and the app's response time increased significantly. The team suspects instance startup time is causing cold starts for new users. They have already reduced code size and enabled warmup requests. What is the best next step to improve performance?

A.Migrate to App Engine flexible environment

B.Increase the number of idle instances using automatic scaling settings

C.Implement a latency-based health check to redirect traffic

D.Use Cloud Endpoints to limit traffic and reduce load

AnswerB

Setting min_idle_instances to a higher value keeps instances warm, eliminating cold start delays.

Why this answer

Warmup requests reduce cold starts by initializing the app before live traffic arrives, but they don't eliminate startup latency for new instances. Increasing the number of idle instances via automatic scaling settings ensures that pre-warmed, ready-to-serve instances are always available, so new users never trigger a cold start. This directly addresses the root cause—instance startup time—without changing the environment or adding complexity.

Exam trap

Google Cloud often tests the misconception that warmup requests alone solve cold starts, when in fact they only reduce the impact—idle instances are required to eliminate the latency entirely.

How to eliminate wrong answers

Option A is wrong because migrating to App Engine flexible environment would increase cold start latency (VMs take longer to boot than containers) and adds operational overhead, making performance worse, not better. Option C is wrong because latency-based health checks redirect traffic away from unhealthy instances but do not reduce cold start latency; they only manage traffic routing after an instance is already slow. Option D is wrong because Cloud Endpoints manages API authentication and throttling, not instance startup performance; limiting traffic does not reduce the time it takes for a new instance to become ready.

Practice this question →

98

MCQeasy

Refer to the exhibit. What does the alert condition indicate?

A.It alerts when the request count drops below 1000 for 1 minute.

B.It alerts for any Cloud Run revision that has more than 1000 requests in a 1-minute window.

C.It alerts when the average request count across all revisions exceeds 1000 over 1 minute.

D.It alerts when the total request count across all revisions exceeds 1000 per minute.

AnswerB

For each revision, if its request count exceeds 1000 for at least 1 minute, alert fires.

Why this answer

The alert condition in the exhibit uses a per-revision metric (e.g., `run.googleapis.com/request_count`) with a threshold of 1000 and a 1-minute window. This means the alert fires for any individual Cloud Run revision that exceeds 1000 requests within that window, not for the aggregate across all revisions. Option B correctly identifies this per-revision behavior.

Exam trap

Google Cloud often tests the distinction between per-resource and aggregate metrics, so the trap here is assuming that a threshold on a metric like 'request_count' automatically implies a sum across all revisions, when in fact it applies to each individual revision's time series.

How to eliminate wrong answers

Option A is wrong because the alert condition is set to fire when the request count exceeds 1000, not when it drops below 1000; a 'less than' threshold would require a different condition. Option C is wrong because the alert evaluates each revision independently, not the average across all revisions; averaging would require a different aggregation function like `mean` or `avg`. Option D is wrong because the alert does not sum request counts across all revisions; it triggers per revision, so a single revision exceeding 1000 requests in a minute fires the alert regardless of other revisions' counts.

Practice this question →

99

MCQhard

A financial services company uses Spanner for their core database. They notice that some transactions are taking longer than expected, especially during cross-region writes. They have set up Spanner with regional configuration. What is the most likely cause?

A.The transaction is experiencing contention due to a hot spot

B.The transaction is using stale reads

C.The transaction is not using a read-write transaction

D.The transaction is too large

AnswerA

Contention on popular keys causes retries and delays.

Why this answer

In a regional Spanner configuration, cross-region writes are not possible; hot spots (contention) are a common cause of latency. Stale reads are fast, and transaction size alone rarely causes significant delays.

Practice this question →

100

MCQeasy

A web application serves static assets (images, CSS, JavaScript) from Compute Engine instances. Users in different geographic regions report slow page loads. Which Google Cloud service can be used to improve performance for these users?

A.VPC Network Peering

B.Cloud Load Balancing

C.Cloud CDN

D.Cloud NAT

AnswerC

Cloud CDN uses Google's global edge network to cache static content closer to users.

Why this answer

Cloud CDN (Content Delivery Network) caches static assets at Google's globally distributed edge points of presence (PoPs). When users request images, CSS, or JavaScript, the content is served from the nearest edge cache rather than the origin Compute Engine instances, reducing latency and improving page load times for geographically distributed users.

Exam trap

The trap here is that candidates confuse Cloud Load Balancing (which only distributes traffic) with Cloud CDN (which caches at edge locations), assuming load balancing alone solves geographic latency issues.

How to eliminate wrong answers

Option A is wrong because VPC Network Peering connects two VPC networks for private IP communication; it does not cache content or accelerate delivery to end users. Option B is wrong because Cloud Load Balancing distributes traffic across backend instances but does not cache responses at edge locations; it can be used with Cloud CDN but alone does not reduce latency for static assets. Option D is wrong because Cloud NAT provides outbound internet connectivity for instances without external IPs; it does not cache or accelerate content delivery.

Practice this question →

101

MCQhard

A company runs a microservices architecture on GKE with Istio service mesh. They observe that service-to-service latency has increased after enabling mTLS. What is the most likely cause?

A.mTLS encryption overhead

B.Incorrect load balancer configuration

C.Network policy restriction

D.Sidecar proxy resource limits

AnswerA

Encrypting and decrypting each request adds CPU overhead and latency.

Why this answer

Enabling mTLS in Istio encrypts all service-to-service traffic using mutual TLS, which adds CPU overhead for encryption and decryption of each request. This encryption overhead directly increases latency, especially for high-throughput or small-payload services, as the sidecar proxies must perform TLS handshakes and cryptographic operations on every packet.

Exam trap

Google Cloud often tests the misconception that mTLS only adds security without performance impact, but candidates must recognize that encryption/decryption at the sidecar proxy level introduces measurable CPU-bound latency.

How to eliminate wrong answers

Option B is wrong because incorrect load balancer configuration would cause traffic routing issues or dropped connections, not a general increase in latency after enabling mTLS. Option C is wrong because network policy restrictions would block or drop traffic, not simply increase latency across all service-to-service calls. Option D is wrong because sidecar proxy resource limits would cause throttling, timeouts, or OOM kills, but the question states latency increased after enabling mTLS, not after changing resource limits.

Practice this question →

102

MCQeasy

A team deploys a Cloud Function that processes user requests. They notice cold starts cause high latency for the first request after a period of inactivity. What is the most effective way to reduce cold starts?

A.Use a larger function timeout

B.Set the minimum instances to 1

C.Increase the memory allocation

D.Deploy the function in multiple regions

AnswerB

Keeping at least one warm instance eliminates cold start latency.

Why this answer

Setting minimum instances to 1 pre-warms a function instance, keeping it idle and ready to serve requests immediately. This eliminates the cold start latency for the first request after inactivity because the runtime environment is already initialized and loaded into memory.

Exam trap

Google Cloud often tests the misconception that increasing resources (memory or timeout) or spreading across regions solves cold starts, when the actual solution is keeping an instance alive via minimum instances or similar warm-start mechanisms.

How to eliminate wrong answers

Option A is wrong because increasing the function timeout does not prevent cold starts; it only allows the function to run longer before being terminated, which does not address the initialization delay. Option C is wrong because increasing memory allocation can improve performance during execution but does not keep an instance alive or reduce the cold start penalty; cold starts still occur after idle periods. Option D is wrong because deploying in multiple regions improves geographic latency and availability but does not reduce cold starts; each regional deployment still experiences cold starts independently after inactivity.

Practice this question →

103

Multi-Selecthard

A company runs a stateful workload on Compute Engine with local SSDs. They need to improve disk I/O performance without changing the instance type. Which THREE actions should they take?

Select 3 answers

A.Migrate to persistent SSD for better durability.

B.Stripe data across multiple local SSD volumes using RAID 0.

C.Use a filesystem optimized for SSDs, such as ext4 with noatime and nodiratime options.

D.Ensure the instance is in the same zone as the application that accesses the disks.

E.Enable encryption for the local SSDs to reduce I/O overhead.

AnswersB, C, D

Increases throughput and IOPS.

Why this answer

Option B is correct because striping data across multiple local SSD volumes using RAID 0 increases the aggregate I/O throughput and IOPS by distributing read and write operations across all disks in parallel. This directly improves disk I/O performance without changing the instance type, as local SSDs are physically attached to the host and offer the highest performance when combined.

Exam trap

Google Cloud often tests the misconception that persistent SSDs are always better for performance, but local SSDs provide lower latency and higher IOPS for stateful workloads, and striping them with RAID 0 is the key to maximizing I/O without changing the instance type.

Practice this question →

104

MCQhard

A company is transferring large datasets from on-premises to Google Cloud using a VPN. They notice high latency due to packet loss. What is the most effective way to improve throughput?

A.Set up Dedicated Interconnect for a more reliable connection.

B.Enable compression on the VPN tunnel.

C.Increase the number of VPN tunnels and use BGP multipath.

D.Use a multi-region GCP endpoint and distribute traffic.

AnswerA

Dedicated Interconnect provides a direct physical connection, reducing packet loss and latency.

Why this answer

Dedicated Interconnect provides a direct, private physical connection between on-premises and Google Cloud, bypassing the public internet entirely. This eliminates the packet loss and high latency inherent in VPN tunnels over the internet, offering consistent throughput and lower latency for large dataset transfers.

Exam trap

Google Cloud often tests the misconception that adding more VPN tunnels or enabling compression can overcome internet-based packet loss, but the correct solution is to eliminate the unreliable public internet path entirely with a dedicated connection like Interconnect.

How to eliminate wrong answers

Option B is wrong because enabling compression on the VPN tunnel can reduce the amount of data transmitted but does not address the underlying packet loss causing high latency; in fact, compression can increase CPU overhead and may worsen performance if packet loss is present. Option C is wrong because increasing the number of VPN tunnels with BGP multipath can improve bandwidth utilization but still relies on the public internet, so packet loss and latency issues remain; it does not provide a reliable, low-latency path. Option D is wrong because using a multi-region GCP endpoint and distributing traffic does not solve the fundamental problem of packet loss on the VPN connection; it only spreads traffic across regions, which may introduce additional latency and complexity without addressing the unreliable internet path.

Practice this question →

105

Multi-Selecteasy

A DevOps team wants to monitor the performance of a Cloud SQL database. Which two metrics should they track? (Select TWO.)

Select 2 answers

A.Auto-increment counter

B.Query error rate

C.CPU utilization

D.Number of active connections

E.Disk read/write latency

AnswersC, E

High CPU may indicate inefficient queries or need for scaling.

Why this answer

CPU utilization (C) is a critical metric for Cloud SQL because high CPU usage indicates that the database instance is struggling to process queries, often due to inefficient queries or insufficient compute capacity. Monitoring CPU utilization helps teams decide when to scale up or optimize query performance.

Exam trap

Google Cloud often tests the distinction between metrics that measure performance (e.g., CPU, latency) versus metrics that measure capacity or configuration (e.g., active connections, auto-increment), leading candidates to select D because they conflate 'active connections' with performance impact.

Practice this question →

106

MCQeasy

A Cloud Run service is experiencing increased cold start latency. The service is written in Python and uses several large dependencies. Which action would most effectively reduce cold start latency?

A.Set concurrency to 1 to ensure each request gets a dedicated container.

B.Increase the CPU allocation to 4 vCPUs.

C.Set a minimum number of instances to keep containers warm.

D.Increase memory to 2 GiB.

AnswerC

Min instances eliminate cold start by keeping containers ready.

Why this answer

Option C is correct because setting a minimum number of instances ensures that the Cloud Run service always has a pool of warm containers ready to serve requests, eliminating the cold start penalty. Cold starts in Python are particularly severe due to the time required to import large dependencies (e.g., NumPy, TensorFlow) and initialize the runtime. By keeping containers alive, you bypass the entire initialization phase, directly addressing the root cause of increased latency.

Exam trap

Google Cloud often tests the misconception that increasing CPU or memory directly reduces cold start latency, when in fact cold starts are primarily caused by initialization overhead (dependency loading, runtime startup) that is not mitigated by resource scaling.

How to eliminate wrong answers

Option A is wrong because setting concurrency to 1 does not reduce cold start latency; it forces each request to have a dedicated container, which can actually increase the number of cold starts if the service scales up, and it wastes resources without addressing the initialization delay. Option B is wrong because increasing CPU allocation speeds up request processing after the container is warm, but it does not reduce the time taken to import large Python dependencies or start the application—cold start latency is dominated by I/O and import overhead, not CPU speed. Option D is wrong because increasing memory provides more headroom for the container but does not affect the initialization sequence; cold start latency is caused by loading dependencies and starting the runtime, not by memory pressure.

Practice this question →

107

MCQhard

A large stateful service running on Compute Engine experiences variable performance due to CPU throttling from noisy neighbors. Which solution provides the most consistent performance?

A.Enable live migration for the VMs

B.Use sole-tenant nodes to isolate the VMs

C.Use preemptible VMs for stateful workloads

D.Purchase committed use discounts for lower cost

AnswerB

Sole-tenant nodes ensure your VMs are the only ones on the physical machine, eliminating neighbor noise.

Why this answer

Sole-tenant nodes ensure that your VMs are the only ones running on the underlying physical server, eliminating resource contention from other tenants (noisy neighbors). This provides consistent CPU performance because the vCPUs are not oversubscribed and the full physical core capacity is dedicated to your instances.

Exam trap

The trap here is that candidates confuse live migration (which maintains availability during host maintenance) with performance isolation, or assume that committing to a discount (CUD) implies dedicated resources.

How to eliminate wrong answers

Option A is wrong because live migration moves a running VM to another host without downtime but does not prevent noisy neighbor contention on either the source or destination host. Option C is wrong because preemptible VMs are designed for fault-tolerant, stateless batch workloads and can be terminated at any time, making them unsuitable for stateful services that require persistent data and consistent performance. Option D is wrong because committed use discounts reduce cost in exchange for a 1- or 3-year commitment but do not affect CPU throttling or noisy neighbor isolation.

Practice this question →

108

Multi-Selectmedium

Which TWO actions can reduce tail latency in a microservices architecture deployed on GKE? (Choose 2)

Select 2 answers

A.Run multiple replicas of each service and use a load balancer with least-request algorithm.

B.Use a single large machine type for all services.

C.Enable session affinity to keep users on the same pod.

D.Increase the batch size for processing requests.

E.Implement request hedging by sending duplicate requests to multiple replicas.

AnswersA, E

Distributes load and reduces queuing delay.

Why this answer

Option A is correct because running multiple replicas and using a load balancer with a least-request algorithm distributes incoming requests to the pod with the fewest active connections, reducing queuing delay and preventing any single replica from becoming a hotspot. This directly lowers tail latency by ensuring that slow or overloaded pods are not overwhelmed, and the load balancer's algorithm minimizes the variance in response times across replicas.

Exam trap

Google Cloud often tests the misconception that session affinity (sticky sessions) improves performance, but in reality it harms tail latency by preventing even load distribution and causing pod overload under variable traffic.

Practice this question →

109

Multi-Selectmedium

Which TWO practices should be implemented to optimize query performance in Cloud Spanner?

Select 2 answers

A.Split large tables into multiple smaller tables to distribute load.

B.Create as many indexes as possible on all columns.

C.Use interleaved tables to co-locate related rows.

D.Use globally distributed interleaving across regions.

E.Define secondary indexes on columns used in WHERE clauses.

AnswersC, E

Interleaving ensures parent and child rows are stored together, reducing cross-node reads.

Why this answer

Option C is correct because interleaved tables in Cloud Spanner physically co-locate parent and child rows on the same split, reducing cross-node round trips and improving join performance. This design leverages Spanner's hierarchical storage model to minimize latency for queries that access related data together.

Exam trap

Google Cloud often tests the misconception that more indexes always improve query performance, but in Spanner, excessive indexes degrade write performance and storage efficiency, while interleaving and selective secondary indexes are the correct optimization strategies.

Practice this question →

110

MCQhard

A company runs a critical application on Compute Engine instances behind a TCP/UDP Network Load Balancer. They notice intermittent high latency for a subset of users. The application logs show no errors, and instance CPU is below 50%. Which next step is most effective to diagnose the latency?

A.Increase the number of instances behind the load balancer.

B.Enable VPC Flow Logs and analyze for dropped packets.

C.Switch to an HTTP(S) Load Balancer for better visibility.

D.Analyze Cloud Monitoring metrics for the load balancer, including backend latency and request counts.

AnswerD

These metrics pinpoint where latency occurs.

Why this answer

Option D is correct because Cloud Monitoring provides detailed metrics for TCP/UDP Network Load Balancers, including backend latency and request counts, which directly help identify whether the latency originates from the load balancer itself or the backend instances. Since instance CPU is below 50% and application logs show no errors, the issue is likely at the network or load balancer level, and these metrics offer the most targeted diagnostic data without changing the architecture.

Exam trap

The trap here is that candidates assume VPC Flow Logs (Option B) are the go-to tool for diagnosing latency, but they only show flow-level metadata and not latency metrics, whereas Cloud Monitoring provides the specific performance data needed for this scenario.

How to eliminate wrong answers

Option A is wrong because increasing the number of instances does not diagnose the root cause of latency; it only masks the symptom and may not help if the issue is due to network congestion, load balancer configuration, or regional user distribution. Option B is wrong because VPC Flow Logs capture metadata about network flows (e.g., source/destination IP, ports, packet count) but do not provide latency or dropped packet analysis for a Network Load Balancer; they are more suited for security auditing or connection tracking, not performance diagnostics. Option C is wrong because switching to an HTTP(S) Load Balancer would change the architecture and introduce Layer 7 processing overhead, which is unnecessary for a TCP/UDP application and does not directly diagnose the existing latency issue; the current load balancer type is appropriate for the protocol, and the problem should be investigated with existing monitoring tools.

Practice this question →

111

MCQhard

A media streaming service uses Cloud Storage to store video files and serves them via Cloud CDN. Users in Asia report buffering issues. The team notices that the cache hit ratio is low in that region. The origin is a single Cloud Storage bucket in us-central1. Which set of actions would best improve performance for Asian users?

A.Use Cloud Load Balancing with Cloud Armor to protect the origin.

B.Enable HTTP/2 on Cloud CDN and increase the TTL for video content.

C.Configure a custom domain on Cloud CDN with SSL and enable request collapsing.

D.Create a new Cloud Storage bucket in an Asian region and use dual-region bucket with Cloud CDN.

AnswerD

A closer origin reduces latency for cache misses, improving performance.

Why this answer

Option D is correct because creating a new Cloud Storage bucket in an Asian region and using a dual-region bucket with Cloud CDN reduces latency by serving content from a geographically closer origin, improving cache hit ratios for Asian users. Cloud CDN caches content at edge locations, but if the origin is far away (us-central1), the first miss still incurs high latency. A dual-region bucket provides a local origin for cache misses, significantly reducing round-trip time.

Exam trap

Google Cloud often tests the misconception that Cloud CDN alone solves all latency issues, but the trap here is that cache hit ratio depends on both edge caching and origin proximity; without a local origin, cache misses still cause high latency for distant users.

How to eliminate wrong answers

Option A is wrong because Cloud Load Balancing with Cloud Armor provides DDoS protection and traffic distribution, but does not address the low cache hit ratio or reduce latency for Asian users; it does not change the origin location or caching behavior. Option B is wrong because enabling HTTP/2 and increasing TTL for video content can improve performance generally, but HTTP/2 does not fix the fundamental issue of a distant origin, and longer TTLs only help if content is already cached; they do not improve cache hit ratio for a region with a faraway origin. Option C is wrong because configuring a custom domain with SSL and enabling request collapsing improves security and reduces origin load by collapsing concurrent requests, but does not address the geographic distance between Asian users and the us-central1 origin, so cache misses still suffer high latency.

Practice this question →

112

MCQmedium

A team uses Spanner for a global database. They notice increased read latency and high CPU utilization on some nodes. The workload is read-heavy with occasional writes. Which action is most likely to improve performance?

A.Create read-only replicas in each region.

B.Split the most frequently read tables into smaller tables.

C.Increase the number of nodes in the instance.

D.Add more nodes to the instance and ensure read requests are distributed evenly.

AnswerD

More nodes spread read load and reduce CPU per node.

Why this answer

In a read-heavy Spanner workload with high CPU utilization on some nodes, adding more nodes and ensuring read requests are distributed evenly (Option D) directly addresses the bottleneck by increasing the instance's compute capacity and spreading the load across all nodes. Spanner's architecture uses a shared-nothing design where each node handles a portion of the data and traffic; uneven distribution can cause hot spots. Adding nodes scales out processing power, and ensuring even distribution (e.g., via proper key design or using Spanner's built-in load balancing) reduces latency and CPU spikes on individual nodes.

Exam trap

Google Cloud often tests the misconception that adding nodes alone (Option C) solves performance issues, but the trap is that without even distribution of read requests, hot spots persist, making Option D the only complete solution.

How to eliminate wrong answers

Option A is wrong because read-only replicas in Spanner are used for improving read latency for stale reads (non-strong reads) and do not reduce CPU utilization on the primary nodes; they also cannot serve strong reads, which are common in read-heavy workloads. Option B is wrong because splitting frequently read tables into smaller tables does not inherently reduce CPU utilization or read latency; it may increase complexity and cross-table joins, and Spanner already partitions data into splits automatically. Option C is wrong because simply increasing the number of nodes without ensuring even distribution of read requests can leave hot spots unresolved; the key issue is uneven load, not just insufficient capacity.

Practice this question →

113

Multi-Selectmedium

A team is optimizing a Cloud Run service. Which two actions can reduce request latency? (Select TWO.)

Select 2 answers

A.Increase max-instances

B.Enable HTTP/2

C.Reduce container image size

D.Use a regional endpoint

E.Enable min-instances

AnswersC, E

Reduces startup time, lowering latency for new instances.

Why this answer

Enabling min-instances reduces cold starts, and reducing container image size lowers startup time, both reducing latency.

Practice this question →

← PreviousPage 2 of 2 · 113 questions total

Ready to test yourself?

Try a timed practice session using only Optimizing service performance questions.

Start 20-question session