PCA · topic practice

Ensure solution and operations reliability practice questions

Practise Google Professional Cloud Architect Ensure solution and operations reliability practice questions — original exam-style scenarios with answer choices, explanations, and analysis of common mistakes.

Courseiva uses original exam-style practice questions designed for learning and revision. The goal is to understand the concepts, recognise exam patterns, and improve through explanations — not memorise copied exam dumps.

Reviewed byJohnson Ajibi· MSc IT Security
20 questionsDomain: Ensure solution and operations reliability

What the exam tests

What to know about Ensure solution and operations reliability

Ensure solution and operations reliability questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Watch out for

Common Ensure solution and operations reliability exam traps

  • Answering from memory before reading the full scenario.
  • Missing a constraint such as cost, availability, security, scope or command context.
  • Choosing a broad answer when the question asks for the most specific fix.
  • Ignoring why the wrong options are tempting.

Practice set

Ensure solution and operations reliability questions

20 questions · select your answer, then reveal the explanation

A company runs a critical application on Compute Engine instances in a managed instance group (MIG) with autoscaling. During a traffic spike, some instances become unhealthy but are not automatically replaced. What is the most likely cause?

A company is designing a disaster recovery plan for a Cloud SQL for PostgreSQL instance. They want to failover to a different region with minimal data loss and recovery time under 10 minutes. The database is 500 GB and experiences 2,000 write transactions per second. Which solution should they use?

A company uses Cloud Spanner for a global financial application. They experience increased latency and transaction aborts during peak hours. Which measure should they take first to improve reliability?

A company deploys a microservices application on Google Kubernetes Engine (GKE). Pods in one deployment are frequently OOMKilled. The team sets memory requests and limits, but pods still crash. What is the most likely remaining cause?

An organization uses Cloud Functions (2nd gen) for event-driven processing. They notice that some functions fail with 'memory limit exceeded' errors during peak load. The function processes messages from Pub/Sub and writes to Firestore. What should they do to improve reliability without sacrificing throughput?

A company deploys a stateful workload using StatefulSets on GKE. They want to ensure that if a pod is evicted, its persistent volume claim (PVC) is reattached to the replacement pod in the same zone. Which configuration achieves this?

A company monitors their application with Cloud Monitoring. They set up an alerting policy to notify the on-call team when the 99th percentile latency exceeds 500 ms for 5 minutes. However, they receive false positive alerts due to short bursts. How should they refine the policy?

A company runs a web application on Compute Engine behind an HTTP load balancer. They want to improve reliability by implementing failover across two regions. Which TWO actions should they take?

A company uses Cloud CDN to accelerate content delivery. They notice that some users receive stale content even after purging the cache. Which THREE factors could cause this?

A company deploys a critical application on Google Kubernetes Engine (GKE) and wants to ensure high availability during cluster upgrades. Which TWO practices should they follow?

A company runs a multi-tier application on Google Cloud: a frontend on App Engine Standard, a backend on Cloud Run, and a Cloud SQL database. The application experiences intermittent 500 errors when users submit forms. The errors correlate with high CPU usage on the Cloud SQL instance (db-n1-standard-2, 7.5 GB memory). The Cloud Run service has a concurrency setting of 80 and a maximum of 10 instances. The App Engine service uses automatic scaling. The team has verified that the application code is not the issue. They suspect the database is hitting connection limits. Current max_connections on Cloud SQL is 250. The Cloud Run service uses a connection pool of 10 connections per instance. The App Engine service uses a connection pool of 5 connections per instance. They also have a few batch jobs that run occasionally, using up to 10 connections. The team wants to resolve the errors with minimal cost and complexity. Which course of action should they take?

A company runs a web application on Google Kubernetes Engine (GKE) with Cluster Autoscaler enabled. During a traffic spike, the application becomes slow and some requests timeout. The cluster has sufficient CPU and memory headroom. What is the most likely cause and solution?

An organization is migrating a legacy monolithic application to Google Cloud. The application currently runs on a single server with an on-premises database. The application is stateful and requires low-latency access to the database. The migration must minimize downtime and ensure high availability. Which architecture should the company adopt?

A company uses Cloud SQL for MySQL to host its production database. The database experiences high read traffic. The team wants to improve read performance without modifying the application. What should they do?

A company is running a critical application on Compute Engine. The application writes logs to a local persistent disk. The operations team wants to ensure logs are not lost if the VM fails. What should they do?

Which TWO options are best practices for ensuring high availability of an application running on Google Kubernetes Engine (GKE)?

Which THREE options are valid strategies for disaster recovery (DR) in Google Cloud?

A company runs a batch processing workload on Compute Engine that processes financial transactions. The workload runs daily and must complete within a 4-hour window. The application reads input data from Cloud Storage, processes it, and writes output to another Cloud Storage bucket. The current implementation uses a single VM with a 500 GB persistent disk. Recently, the data volume has increased, and the job is now taking over 6 hours, exceeding the SLA. The team is tasked with redesigning the solution to be faster and more reliable. They want to minimize costs and operational overhead. The data is critical and must not be lost. Which approach should they take?

A company has deployed a critical application on Google Kubernetes Engine (GKE) with a Regional cluster (us-central1). The application uses a Cloud SQL for PostgreSQL database with a cross-region replica for disaster recovery. The SRE team needs to ensure that the application can survive a regional outage with minimal data loss. Which TWO actions should the team take to improve the reliability of the solution?

You are investigating a Vertex AI Workbench instance (instance-2) that is showing UNHEALTHY status. Based on the exhibit, what is the most likely cause of the issue?

Network Topology
project=my-projectlocation=us-central1Refer to the exhibit.NAME STATE HEALTH UPDATE_TIMEinstance-1 RUNNING HEALTHY 2025-02-15T10:30:00Zinstance-2 RUNNING UNHEALTHY 2025-02-15T10:35:00Z...health:state: UNHEALTHYreasons:- type: CONTAINER_CREATE_FAILED

Free account

Track your progress over time

Create a free account to save your results and see which topics improve across sessions.

Focused Ensure solution and operations reliability sessions

Start a Ensure solution and operations reliability only practice session

Every question in these sessions is drawn from the Ensure solution and operations reliability domain — nothing else.

Related practice questions

Related PCA topic practice pages

Move into related areas when this topic feels solid.

Frequently asked questions

What does the PCA exam test about Ensure solution and operations reliability?
Ensure solution and operations reliability questions test whether you can apply the concept in context, not just recognise a definition.
How should I use these practice questions?
Select your answer before revealing the explanation. Then read why each option is right or wrong — this active recall approach builds retention far faster than re-reading notes.
Can I practise just Ensure solution and operations reliability questions in a focused session?
Yes — the session launcher on this page draws every question from the Ensure solution and operations reliability domain. Use a 10-question session first to gauge your baseline, then move to 20 or 30 once the weak spots are clear.
Where can I practise other PCA topics?
Use the topic links above to move to related areas, or go back to the PCA question bank to see all topics.
Are these real exam questions or dumps?
These are original practice questions written to test the same concepts the PCA exam covers. They are not copied from any real exam or dump site.