PCA Exam Questions and Answers

A company is migrating on-premises workloads to Google Cloud. They have a critical application that requires consistent low-latency access to a database, with read replicas in multiple regions for disaster recovery. The application is expected to grow by 10x over the next year. Which database service and configuration should the architect choose to meet these requirements?

Use Cloud Bigtable with multi-region replication

Use Cloud SQL for PostgreSQL with cross-region read replicas

Use Cloud Spanner with multi-region configuration

Cloud Spanner offers global strong consistency, automatic replication, and horizontal scalability.

Use Firestore in native mode with multi-region location

Why: Cloud Spanner with a multi-region configuration is the correct choice because it provides strong global consistency, low-latency reads and writes across regions, and automatic horizontal scaling to handle a 10x growth in workload. Its multi-region replication ensures synchronous replication for disaster recovery while maintaining ACID transactions, which is critical for a database requiring consistent low-latency access.

A financial services company is designing a multi-tier application on Google Cloud. The application must meet PCI DSS compliance, with data encrypted at rest and in transit. They plan to use Cloud SQL for PostgreSQL for transactional data and Cloud Storage for archival data. Which TWO actions should the architect take to meet compliance requirements?

Configure client-side encryption in the application code

Rely on Google-managed default encryption for all data

Enable customer-managed encryption keys (CMEK) on Cloud SQL and Cloud Storage

CMEK provides control over key management required for PCI DSS.

Use VPC Service Controls to restrict data access

VPC Service Controls prevent data exfiltration and help meet compliance.

Use Cloud HSM with a key generated outside of Google Cloud

Why: Option C is correct because enabling CMEK on Cloud SQL and Cloud Storage allows the company to use their own encryption keys, which is often required by PCI DSS to demonstrate control over key management. CMEK ensures data at rest is encrypted with keys managed via Cloud KMS, providing auditability and separation of duties beyond Google-managed default encryption.

Refer to the exhibit. An architect created a VM instance using the above command. After the instance starts, the architect tries to access the nginx default page from the internet but gets a timeout. What is the most likely reason?

The VM is in a subnet without a Cloud NAT

The firewall rule allowing HTTP traffic is missing

The startup script failed to install nginx

The VM was created without an external IP address

The 'no-address' flag omits the external IP, making the VM unreachable from the internet.

Why: The most likely reason for the timeout is that the VM was created without an external (public) IP address. Without an external IP, the VM is not directly reachable from the internet, even if nginx is running and firewall rules allow HTTP traffic. The timeout occurs because the internet has no route to the VM's private IP address.

A media streaming company is deploying a new video transcoding pipeline on Google Cloud. The pipeline receives raw video files uploaded to Cloud Storage, triggers a Cloud Function that submits transcoding jobs to a Compute Engine worker pool, and stores the transcoded output in another Cloud Storage bucket. The workers are managed by a managed instance group (MIG) running a custom container image. Currently, when there is a spike in uploads, the MIG takes 5-7 minutes to scale up new workers, causing processing delays. The architect needs to reduce the time to add new workers to under 2 minutes. The workers are stateless and the container image is about 2 GB. What should the architect do?

Use Cloud Run instead of Compute Engine to run the transcoding workers

Increase the minimum number of instances in the MIG to 10

Replace the Compute Engine workers with Cloud Functions to handle the transcoding

Create a custom Compute Engine image that includes the container runtime and pre-pulled container

A custom image with the container already pulled reduces boot time as the image does not need to be downloaded.

Why: Option D is correct because creating a custom Compute Engine image that includes the container runtime and pre-pulls the 2 GB container image eliminates the need to download the image during scale-up. This reduces the instance startup time from 5-7 minutes to under 2 minutes, as the container is already cached locally on the image, bypassing the network pull delay.

A company is migrating a legacy monolithic application to Google Cloud. The application currently runs on a single on-premises server and uses a local MySQL database. The company wants to minimize changes to the application code while improving scalability and reliability. Which migration strategy should the architect recommend?

Refactor the application into microservices and deploy on Google Kubernetes Engine.

Rehost the application on Compute Engine and use Cloud SQL for MySQL as the database.

Rehosting on Compute Engine with Cloud SQL minimizes changes and improves scalability and reliability.

Containerize the application with Docker and run it on Cloud Run.

Migrate the database to Firestore and rewrite the application to use Firestore APIs.

Why: Option B is correct because rehosting (lift-and-shift) the monolithic application to Compute Engine with Cloud SQL for MySQL minimizes code changes while improving scalability and reliability. Cloud SQL provides managed MySQL with automated backups, replication, and failover, addressing the need for reliability without requiring application refactoring.

A global e-commerce platform is experiencing intermittent latency spikes during flash sales. The application is deployed on Google Kubernetes Engine (GKE) with a regional cluster. The architecture includes a frontend service, a product catalog service using Cloud Spanner, and an order processing service using Cloud Pub/Sub. During high load, the catalog service shows increased query latency, and some requests time out. What should the architect prioritize to address the issue?

Use Cloud CDN to cache product catalog responses.

Increase the number of nodes in the GKE node pool.

Enable Cloud Spanner interleaved tables and add secondary indexes for common query filters.

Secondary indexes and interleaved tables optimize query access patterns, reducing latency.

Migrate the catalog service from Cloud Spanner to Cloud Bigtable for better read performance.

Why: The correct answer is C because the issue is specifically with Cloud Spanner query latency under high load. Enabling interleaved tables and adding secondary indexes optimizes data locality and query performance, reducing the need for expensive cross-table joins and full table scans. This directly addresses the root cause of increased latency and timeouts in the catalog service.

Want more Design and plan a cloud solution architecture practice?

All Manage and provision cloud infrastructure questions

Domain 2: Manage and provision cloud infrastructure

A company is deploying a new application on Compute Engine. They need to ensure that the application can automatically recover from a zone failure. What is the best approach?

Create a managed instance group with instances in multiple zones.

MIG auto-heals and distributes across zones.

Use a global load balancer in front of a single instance.

Create a single VM in a single zone and rely on live migration.

Use Cloud Storage to store application state and restore from a snapshot.

Why: A managed instance group (MIG) with instances in multiple zones provides automatic recovery from a zone failure by distributing instances across zones and using auto-healing to recreate failed instances. If one zone becomes unavailable, the load balancer routes traffic to healthy instances in other zones, ensuring high availability without manual intervention.

An organization has multiple projects in Google Cloud and wants to centralize logging and monitoring for all projects. They need to aggregate logs from all projects into a single project for analysis. Which approach should they use?

Export logs from each project to a Cloud Storage bucket and then import them into BigQuery.

Enable Cloud Audit Logs for all projects and view them from the central project.

Install the Stackdriver agent on all VMs and point them to the central project.

Create a logs sink in each project that exports logs to a BigQuery dataset in the central project.

Logs sinks can route any log entries to BigQuery.

Why: Option D is correct because Google Cloud's logs sink feature allows you to route logs from multiple source projects to a centralized BigQuery dataset in a single destination project. This approach aggregates logs efficiently without requiring agents or manual import steps, and it supports real-time log export for analysis.

A developer needs to deploy a containerized application on Google Kubernetes Engine (GKE) with minimal operational overhead. They want to automatically scale the number of pods based on CPU utilization. Which GKE feature should they use?

Horizontal Pod Autoscaler.

HPA scales pods based on metrics like CPU.

Node auto-repair.

Vertical Pod Autoscaler.

Cluster Autoscaler.

Why: The Horizontal Pod Autoscaler (HPA) is the correct choice because it automatically scales the number of pod replicas in a GKE deployment based on observed CPU utilization (or other custom metrics). This directly meets the requirement of scaling pods with minimal operational overhead, as HPA is a native Kubernetes resource that requires no manual intervention once configured.

A company is deploying a web application on Compute Engine behind a global HTTP(S) load balancer. They want to restrict access to only traffic from specific IP ranges. Which load balancer feature should they use?

Cloud Armor security policies.

Cloud Armor can allow/deny traffic based on IP.

VPC firewall rules.

Identity-Aware Proxy (IAP).

Cloud CDN.

Why: Cloud Armor security policies are the correct choice because they allow you to define IP-based allow/deny rules at the edge of Google's network, directly integrated with the global HTTP(S) load balancer. This provides granular access control based on source IP ranges before traffic reaches your backend instances, which is exactly what the requirement specifies.

A company has a production database running on Cloud SQL. They need to ensure high availability with automatic failover in the event of a zone outage. What should they do?

Export the database to Cloud Storage and import in another region.

Enable Cloud SQL High Availability (HA) configuration.

HA provides automatic failover to standby in another zone.

Create a cross-region read replica.

Configure automated backups.

Why: Enabling Cloud SQL High Availability (HA) configuration provisions a standby instance in a different zone within the same region, using synchronous replication to ensure zero data loss. In the event of a zone outage, Cloud SQL automatically fails over to the standby instance, typically within 60 seconds, providing high availability without manual intervention.

A developer wants to store and retrieve non-relational data with flexible schema and automatic scaling. Which Google Cloud service should they use?

Cloud Bigtable.

Cloud SQL.

Firestore.

Firestore is NoSQL with flexible schema and auto-scaling.

Cloud Spanner.

Why: Firestore is a NoSQL document database that supports flexible schema and automatic scaling, making it ideal for non-relational data. It offers real-time synchronization, offline support, and serverless scaling, which aligns with the requirement for storing and retrieving data without manual sharding or capacity planning.

Want more Manage and provision cloud infrastructure practice?

All Design for security and compliance questions

Domain 3: Design for security and compliance

A company is migrating sensitive customer data to Google Cloud. They need to ensure data is encrypted at rest and in transit. Which Google Cloud service provides a centralized way to manage encryption keys used by Google Cloud services?

Cloud HSM

Cloud External Key Manager (Cloud EKM)

Cloud Key Management Service (Cloud KMS)

Cloud KMS provides centralized management of encryption keys used by Google Cloud services.

Secret Manager

Why: Cloud KMS is the correct choice because it provides a centralized, managed service for creating, rotating, and destroying encryption keys used by Google Cloud services. It integrates directly with services like Cloud Storage, BigQuery, and Compute Engine to enforce encryption at rest, and it supports customer-managed encryption keys (CMEK) for granular control. For data in transit, Cloud KMS can be used to manage keys for TLS or application-level encryption, though Google Cloud automatically encrypts all network traffic by default.

A financial services company runs a multi-tier application on Compute Engine. They need to restrict network access so that only the web tier can communicate with the application tier, and only the application tier can access the database tier. All VMs are in the same VPC network. What is the most secure way to implement this?

Use Identity-Aware Proxy (IAP) to manage network access between tiers.

Use VPC firewall rules with target tags to allow traffic between specific tiers.

VPC firewall rules with tags are the simplest and most secure way to enforce network segmentation within a VPC.

Create separate VPC networks for each tier and use VPC peering.

Assign a unique service account to each tier and use IAM conditions to restrict traffic.

Why: VPC firewall rules with target tags allow you to precisely control ingress and egress traffic between VM instances based on their assigned tags. By tagging web tier VMs with a tag like 'web-tier' and application tier VMs with 'app-tier', you can create a firewall rule that allows traffic from 'web-tier' to 'app-tier' on the required port (e.g., TCP 8080) and another rule allowing traffic from 'app-tier' to 'db-tier' on the database port (e.g., TCP 3306). This approach enforces the principle of least privilege within a single VPC network without introducing unnecessary complexity or breaking network isolation.

A healthcare organization uses Cloud Storage to store protected health information (PHI). They have a compliance requirement to ensure that all objects in the bucket are encrypted with a customer-managed key (CMK) that is rotated every 90 days. They also need to log all access to the bucket and detect anomalous access patterns. Which combination of Google Cloud services should they use?

Cloud Storage with default encryption, Cloud Audit Logs, and Security Command Center

Cloud Storage with CMEK via Cloud HSM, Cloud Audit Logs, and Cloud DLP

Cloud Storage with CSEK, Cloud Audit Logs, and Security Command Center

Cloud Storage with CMEK via Cloud KMS, Cloud Audit Logs, and Chronicle

CMEK uses Cloud KMS for key management, Cloud Audit Logs for logging, and Chronicle for anomaly detection.

Why: Option D is correct because Cloud Storage with CMEK via Cloud KMS allows the organization to use a customer-managed key that can be rotated every 90 days, meeting the compliance requirement. Cloud Audit Logs capture all access to the bucket, and Chronicle provides advanced security analytics to detect anomalous access patterns, fulfilling the logging and detection needs.

An e-commerce platform uses Cloud SQL for MySQL to store user profiles and order history. The security team wants to ensure that database administrators (DBAs) cannot view plaintext credit card numbers stored in the database. They also want to minimize application changes. What should they do?

Implement column-level encryption using Cloud KMS in the application layer.

Grant DBAs the Cloud SQL Viewer role to restrict access to data.

Use Cloud SQL Proxy to encrypt connections and limit DBA access.

Use Cloud DLP with de-identification and re-identification transforms on the Cloud SQL database.

Cloud DLP can automatically detect and tokenize sensitive data, with re-identification for authorized apps.

Why: Cloud DLP can be used to de-identify sensitive data like credit card numbers at rest in Cloud SQL, using deterministic or reversible transformations (e.g., format-preserving encryption or tokenization) that allow re-identification only by authorized applications. This approach minimizes application changes because DLP can scan and transform the data directly in the database, and the application can use re-identification transforms via the DLP API when needed, without modifying existing queries or schema.

A company wants to ensure that only Compute Engine instances with a specific service account can access a Cloud Storage bucket. Which IAM condition should they use?

Condition: 'request.auth == "serviceAccount:sa@project.iam.gserviceaccount.com"'

Condition: 'origin.serviceAccount == "sa@project.iam.gserviceaccount.com"'

Condition: 'resource.serviceAccount == "sa@project.iam.gserviceaccount.com"'

Condition: 'iam.serviceAccount == "sa@project.iam.gserviceaccount.com"'

The condition 'iam.serviceAccount' matches the service account used by the caller.

Why: Option D is correct because the `iam.serviceAccount` condition attribute in IAM conditions allows you to restrict access based on the service account identity of the caller. When a Compute Engine instance uses a service account, the condition `iam.serviceAccount == "sa@project.iam.gserviceaccount.com"` ensures that only requests authenticated with that specific service account are allowed to access the Cloud Storage bucket. This is the standard IAM condition attribute for matching the service account of the requesting principal.

A multinational corporation operates in multiple regions and must comply with GDPR. They use Cloud Load Balancing to distribute traffic across regional backends. Their security team wants to block traffic from specific countries (e.g., non-EU countries) at the edge. What should they use?

Configure Cloud CDN to serve content only to EU-based users.

Use Cloud Armor security policies with geographic-based denylist rules.

Cloud Armor can block traffic from specific countries at the Google Cloud edge.

Set VPC firewall rules to allow traffic only from EU IP ranges.

Configure Identity-Aware Proxy (IAP) to require user authentication from allowed countries.

Why: Cloud Armor security policies support geographic-based access control using denylist or allowlist rules that match client IP addresses against country-level geolocation data. This allows the security team to block traffic from specific countries at the edge, before it reaches the backend, which is the most efficient and compliant approach for GDPR enforcement.

Want more Design for security and compliance practice?

All Analyze and optimize technical and business processes questions

Domain 4: Analyze and optimize technical and business processes

A company is migrating its on-premises Oracle database to Cloud SQL for PostgreSQL. The database team wants to minimize downtime during migration. Which approach should they use?

Set up Oracle GoldenGate to replicate to Cloud SQL.

Use Database Migration Service for PostgreSQL with continuous migration from Oracle via Homogeneous Migration.

DMS supports minimal downtime via continuous replication.

Take a physical backup of Oracle and restore to Cloud SQL.

Export the database as a dump file, upload to Cloud Storage, and import into Cloud SQL.

Why: Database Migration Service (DMS) for PostgreSQL with continuous migration is the correct approach because it supports ongoing change data capture (CDC) from Oracle to Cloud SQL for PostgreSQL, enabling near-zero downtime. DMS handles schema conversion and data replication continuously, allowing the target to stay synchronized until a cutover, which minimizes downtime compared to batch methods.

An e-commerce platform uses Cloud Spanner for order processing. Recently, latency spikes have occurred during flash sales. The team suspects hot spots due to monotonically increasing order IDs. Which table design change would best solve this?

Remove the primary key and let Spanner auto-generate it.

Use interleaved tables to store orders under customers.

Add a random prefix to the order ID primary key.

Randomizing the first part of the key distributes writes across splits.

Create a secondary index on the timestamp column.

Why: Monotonically increasing primary keys (like sequential order IDs) cause hot spots in Cloud Spanner because all writes are directed to a single split (tablet), overwhelming that node. Adding a random prefix (e.g., a hash of the customer ID) distributes writes across multiple splits, eliminating the hot spot and reducing latency spikes during high-throughput flash sales.

A startup uses Cloud Functions with a Pub/Sub trigger to process incoming orders. They notice that the function sometimes fails to process messages, and those messages are lost. What is the most likely cause?

The subscription has an ackDeadlineSeconds of 600.

The Cloud Function has a timeout of 540 seconds.

The Pub/Sub topic has a retention duration of 10 minutes.

The Cloud Function is configured with retry on failure set to false.

If retry is disabled, failed messages are dropped.

Why: Option D is correct because when a Cloud Function fails to process a Pub/Sub message and retry on failure is set to false, the message is not redelivered. Pub/Sub relies on the subscriber (the Cloud Function) to acknowledge messages; without retries, a failed execution causes the message to be dropped after the ack deadline expires, leading to message loss.

A company uses BigQuery for analytics. They have a large partitioned table that is queried frequently. The query performance has degraded over time. Which optimization should they try first?

Create a materialized view for each frequent query.

Increase the number of slots for the project.

Apply clustering on frequently filtered columns.

Clustering sorts data, reducing scanned data for filters.

Denormalize the table to reduce joins.

Why: Clustering on frequently filtered columns reorganizes the data within partitions based on the values of those columns, which allows BigQuery to prune blocks more effectively during queries. This directly addresses the performance degradation by reducing the amount of data scanned, without requiring additional storage or compute resources.

An organization runs a Kubernetes cluster on GKE with cluster autoscaling enabled. They notice that pods are frequently in 'Pending' state due to insufficient CPU, but the cluster autoscaler does not add nodes quickly enough. What is the most likely cause?

The cluster autoscaler is using the 'least-waste' expander.

The horizontal pod autoscaler (HPA) is misconfigured.

The pod disruption budget (PDB) is too restrictive.

The node pool has reached the maximum node count limit.

Cluster autoscaler cannot exceed max node limit.

Why: Option D is correct because the cluster autoscaler cannot add new nodes if the node pool has already reached its maximum node count limit. This limit is configured at the node pool level in GKE, and once reached, the autoscaler will not scale up further, leaving pods in 'Pending' state due to insufficient CPU resources.

A company wants to reduce costs for a Cloud Storage bucket that stores infrequently accessed archival data. The data is accessed roughly once a quarter. Which storage class should they use?

Archive storage class.

Archive is for data accessed less than once a year.

Nearline storage class.

Standard storage class.

Coldline storage class.

Why: The Archive storage class is the correct choice because it is designed for data accessed less than once a year, making it ideal for quarterly-accessed archival data. It offers the lowest storage cost among Google Cloud Storage classes, though it incurs higher retrieval costs and a minimum 365-day storage duration, which aligns with infrequent access patterns.

Want more Analyze and optimize technical and business processes practice?

All Manage implementation of cloud architecture questions

Domain 5: Manage implementation of cloud architecture

Your team has deployed a microservices application on Google Kubernetes Engine (GKE) with multiple services communicating via internal ClusterIP services. You notice that some requests between services are failing intermittently with 'connection refused' errors. The services are defined with readiness probes. What is the most likely cause?

The readiness probes are not passing, causing the service endpoints to be removed.

Failing readiness probes cause the pod to be removed from service endpoints, leading to connection refused.

The services are not exposed via a VPC peering connection to the client's VPC.

The services are using NodePort instead of LoadBalancer type, causing port conflicts.

The services are not associated with an Ingress resource.

Why: The 'connection refused' error indicates that the client is attempting to connect to a port on which no process is listening. In GKE, when a readiness probe fails, Kubernetes removes the pod's IP from the corresponding ClusterIP service's endpoints. If all pods for a service fail their readiness probes, the service has no healthy endpoints, and any request to the ClusterIP will be refused because there is no backend to accept the connection. This matches the intermittent nature of the issue, as pods may temporarily fail the probe and then recover.

An organization is running a stateful workload on Compute Engine with a single persistent disk. They want to migrate to a regional persistent disk for higher availability. The disk is 500 GB and currently 80% full. They need zero downtime during the migration. What is the recommended approach?

Attach a new regional disk to the instance and use RAID 1 mirroring.

Create a snapshot of the disk, then create a new regional persistent disk from that snapshot, and attach it to the instance.

This is the recommended migration path; snapshot creation is the only downtime window.

Use rsync to copy data to a new regional disk while the instance is running.

Use gcloud compute disks resize to change the disk type to regional.

Why: Option B is correct because creating a snapshot of the existing persistent disk and then creating a new regional persistent disk from that snapshot allows you to attach the new disk to the instance with zero downtime. The snapshot captures the disk state at a point in time, and the regional disk is created asynchronously; once available, you can detach the original disk and attach the regional disk without stopping the instance, as Compute Engine supports live disk attachment/detachment.

A company is planning to deploy a global web application on Google Cloud. They expect low latency for users worldwide and need to serve static content (images, CSS) as well as dynamic API responses. Which architecture should they use?

Use Cloud CDN in front of an external HTTPS Load Balancer with backend services in multiple regions.

Cloud CDN caches static content at edge, and Load Balancer routes dynamic requests to nearest backend.

Use Cloud NAT to allow egress traffic from instances and distribute static content via a shared VPC.

Use Cloud DNS with geo-routing to direct users to the closest regional Cloud Run service.

Use VPC Network Peering to connect multiple regional VPCs and serve content from a central location.

Why: Cloud CDN in front of an external HTTPS Load Balancer with backend services in multiple regions is correct because it provides global anycast IP termination, low-latency content delivery via Google's edge cache for static content, and dynamic API requests are forwarded to the nearest healthy backend in the closest region. This architecture meets both the low-latency requirement for users worldwide and the need to serve both static and dynamic content efficiently.

You are designing a CI/CD pipeline for a containerized application on Google Cloud. The application is built with Cloud Build, stored in Container Registry, and deployed to GKE. The team wants to ensure that only images that pass vulnerability scanning are deployed. What should you do?

Add a step in Cloud Build that runs a vulnerability scanner on the image and fails the build if vulnerabilities exceed a threshold.

This integrates scanning into the pipeline, preventing vulnerable images from being pushed.

Configure Container Analysis to automatically scan images in Container Registry and block deployment via a webhook.

Enable Binary Authorization on the GKE cluster and configure a policy to require an attestation from a trusted authority.

Use Security Command Center to detect vulnerabilities and alert the team to manually block deployments.

Why: Option A is correct because Cloud Build can include a custom step that runs a vulnerability scanner (e.g., using the Google Cloud `gcloud container images list-tags` with the `--show-occurrences-from` flag or a third-party tool like Trivy) and then evaluates the results against a threshold. If the scan finds vulnerabilities exceeding the defined threshold, the build step exits with a non-zero status, causing the Cloud Build pipeline to fail and preventing the image from being pushed to Container Registry or deployed. This directly enforces the requirement that only images passing vulnerability scanning proceed in the CI/CD pipeline.

A company runs a data analytics platform on Google Cloud using BigQuery, Dataflow, and Cloud Storage. They notice that Dataflow jobs are failing with 'out of memory' errors for certain large pipelines. The pipelines process variable amounts of data, sometimes spiking 10x normal. Which strategy should they use to handle these spikes cost-effectively?

Manually monitor the job and increase the number of workers when a spike is detected.

Increase the machine type of the workers to a high-memory type and disable autoscaling.

Configure the Dataflow pipeline to use autoscaling with a higher maximum number of workers and use preemptible VMs for cost savings.

Autoscaling adjusts workers dynamically; preemptible VMs reduce cost for fault-tolerant work.

Use Dataflow Streaming Engine to offload state to persistent storage and reduce memory usage.

Why: Option C is correct because Dataflow's autoscaling can dynamically add workers to handle sudden data spikes, and using preemptible VMs significantly reduces cost for batch pipelines that can tolerate interruptions. This approach avoids manual intervention and over-provisioning, making it cost-effective for variable workloads.

A startup wants to deploy a web application on Google Cloud with a MySQL database. They anticipate low traffic initially but want the ability to scale seamlessly. They also want to minimize operational overhead. Which combination of services should they choose?

Compute Engine with a self-managed MySQL instance.

Cloud Run with Cloud Spanner.

App Engine Standard Environment with Cloud SQL.

App Engine Standard auto-scales and is serverless; Cloud SQL is managed.

Google Kubernetes Engine (GKE) with Cloud SQL.

Why: App Engine Standard Environment provides a fully managed, autoscaling platform for web applications, while Cloud SQL offers a managed MySQL database with automatic replication and backups. This combination minimizes operational overhead because Google handles infrastructure provisioning, patching, and scaling, and Cloud SQL integrates natively with App Engine via the Cloud SQL proxy or Unix socket, requiring no manual configuration for connectivity.

Want more Manage implementation of cloud architecture practice?

All Ensure solution and operations reliability questions

Domain 6: Ensure solution and operations reliability

A company runs a critical application on Compute Engine instances in a managed instance group (MIG) with autoscaling. During a traffic spike, some instances become unhealthy but are not automatically replaced. What is the most likely cause?

The MIG is regional and one zone failed.

The autohealing health check is misconfigured.

MIG autohealing relies on a health check to detect unhealthy instances and replace them; a misconfiguration prevents detection.

The instance template has a startup script error.

The HTTP load balancer's health check is failing.

Why: The most likely cause is that the autohealing health check is misconfigured. In a managed instance group, autohealing relies on a health check to detect unhealthy instances and trigger replacement. If the health check is misconfigured (e.g., wrong port, path, or protocol), the MIG will not recognize instances as unhealthy and will not automatically replace them, even during a traffic spike.

A company is designing a disaster recovery plan for a Cloud SQL for PostgreSQL instance. They want to failover to a different region with minimal data loss and recovery time under 10 minutes. The database is 500 GB and experiences 2,000 write transactions per second. Which solution should they use?

Export the database daily using gsutil and import in the other region using pg_restore.

Create a cross-region read replica and promote it to primary during failover.

Configure a cross-region replica instance using Cloud SQL's cross-region replication feature.

Cross-region replication provides a standby instance with synchronous replication, minimal data loss, and failover in minutes.

Automated backups with point-in-time recovery to a new instance in the other region.

Why: Cloud SQL for PostgreSQL offers a managed cross-region replication feature that creates a replica instance in a different region, using synchronous or asynchronous replication to keep data nearly in sync. This solution meets the RPO (minimal data loss) and RTO (under 10 minutes) requirements because the replica is continuously updated and can be promoted to primary in minutes, without needing to restore from a backup or export.

A company uses Cloud Spanner for a global financial application. They experience increased latency and transaction aborts during peak hours. Which measure should they take first to improve reliability?

Increase the number of nodes in the Spanner instance.

Reduce the number of indexes on frequently updated columns.

Optimize transactions to reduce lock contention.

Short, single-partition transactions reduce the chance of conflicts and aborts.

Use interleaved tables to co-locate related data.

Why: Option C is correct because transaction aborts and latency in Cloud Spanner are most commonly caused by lock contention during peak hours. By optimizing transactions—such as reducing their scope, using read-only transactions where possible, and avoiding hot-spot writes—you directly address the root cause of contention without incurring additional cost or schema changes. This aligns with Google's best practices for Spanner reliability.

A company deploys a microservices application on Google Kubernetes Engine (GKE). Pods in one deployment are frequently OOMKilled. The team sets memory requests and limits, but pods still crash. What is the most likely remaining cause?

CPU requests are too low, causing throttling and eventual crash.

The node pool is too small, causing memory pressure on the node.

Memory limits are set higher than the node's allocatable memory.

The application has a memory leak that eventually exceeds the limit.

A memory leak causes continuous memory growth until the limit is hit, resulting in OOMKill.

Why: Option D is correct because OOMKilled errors occur when a container exceeds its memory limit. Setting memory requests and limits prevents unbounded usage, but if the application has a memory leak, it will continue to consume memory until it hits the configured limit, causing the kernel's Out-Of-Memory (OOM) killer to terminate the pod. The fact that pods still crash after setting limits indicates the application itself is the root cause, not resource configuration.

An organization uses Cloud Functions (2nd gen) for event-driven processing. They notice that some functions fail with 'memory limit exceeded' errors during peak load. The function processes messages from Pub/Sub and writes to Firestore. What should they do to improve reliability without sacrificing throughput?

Increase the maximum number of concurrent function instances.

Increase the memory allocated to the Cloud Function.

More memory allows the function to handle larger data per invocation without hitting the limit.

Enable Pub/Sub batching to reduce the number of function invocations.

Split the function into multiple smaller functions, each handling a subset of the data.

Why: The 'memory limit exceeded' error indicates that the function's allocated memory is insufficient for the workload during peak load. Increasing the memory allocation (Option B) directly resolves this by providing more RAM for processing larger messages or concurrent operations, without altering the invocation pattern or throughput. Cloud Functions (2nd gen) allow memory to be set up to 32 GiB, and this change does not reduce the number of events processed per second.

A company deploys a stateful workload using StatefulSets on GKE. They want to ensure that if a pod is evicted, its persistent volume claim (PVC) is reattached to the replacement pod in the same zone. Which configuration achieves this?

Use a StatefulSet with a volumeClaimTemplate referencing a persistent disk in the same zone.

StatefulSet ensures stable pod identity and PVC reattachment; zone affinity ensures the disk is in the same zone.

Use a Deployment with a PVC that has allowedTopologies restricting to the desired zone.

Use a Deployment with a persistent volume that is manually attached after pod creation.

Use a StatefulSet with a persistent disk that has access mode ReadOnlyMany.

Why: StatefulSets are designed for stateful workloads and guarantee stable network identities and persistent storage. When a pod is evicted, the StatefulSet controller ensures the replacement pod uses the same PVC, which is bound to a GCE Persistent Disk in the same zone as the original pod, provided the volumeClaimTemplate specifies a disk in that zone. This maintains data locality and avoids cross-zone reattachment.

Want more Ensure solution and operations reliability practice?

Browse all PCA questions Take a timed practice test

Frequently asked questions

How many questions are on the PCA exam?

The PCA exam has 60 questions and must be completed in 120 minutes. The passing score is 720/1000.

What types of questions appear on the PCA exam?

Architecture design scenario questions covering GCP services, reliability, security, cost optimisation, and migration strategies.

How are PCA questions organised by domain?

The exam covers 6 domains: Design and plan a cloud solution architecture, Manage and provision cloud infrastructure, Design for security and compliance, Analyze and optimize technical and business processes, Manage implementation of cloud architecture, Ensure solution and operations reliability. Questions are weighted by domain — higher-weight domains appear more on your actual exam.

Are these the actual PCA exam questions?

No. These are original exam-style practice questions written against the official Google Cloud PCA exam objectives. They are not copied from the real exam. Courseiva focuses on genuine understanding, not memorisation of braindumps.

Ready to practice all 60 PCA questions?

Courseiva tracks your accuracy per domain and routes you toward weak areas automatically. Free, no account required.

Google Cloud · Free Practice Questions · Last reviewed May 2026

PCA Exam Questions and Answers

36real exam-style questions organised by domain, each with the correct answer highlighted and a plain-English explanation of why it's right — and why the others are wrong.

60 exam questions

120 min time limit

Pass: 720/1000 / 1000

6 exam domains

Overview Domain Blueprint Study Guide All QuestionsSample by Domain

Domain 1: Design and plan a cloud solution architecture

All Design and plan a cloud solution architecture questions

Use Cloud Bigtable with multi-region replication

Use Cloud SQL for PostgreSQL with cross-region read replicas

Use Cloud Spanner with multi-region configuration

Cloud Spanner offers global strong consistency, automatic replication, and horizontal scalability.

Use Firestore in native mode with multi-region location

Configure client-side encryption in the application code

Rely on Google-managed default encryption for all data

Enable customer-managed encryption keys (CMEK) on Cloud SQL and Cloud Storage

CMEK provides control over key management required for PCI DSS.

Use VPC Service Controls to restrict data access

VPC Service Controls prevent data exfiltration and help meet compliance.

Use Cloud HSM with a key generated outside of Google Cloud

The VM is in a subnet without a Cloud NAT

The firewall rule allowing HTTP traffic is missing

The startup script failed to install nginx

The VM was created without an external IP address

The 'no-address' flag omits the external IP, making the VM unreachable from the internet.

Use Cloud Run instead of Compute Engine to run the transcoding workers

Increase the minimum number of instances in the MIG to 10

Replace the Compute Engine workers with Cloud Functions to handle the transcoding

Create a custom Compute Engine image that includes the container runtime and pre-pulled container

A custom image with the container already pulled reduces boot time as the image does not need to be downloaded.

Refactor the application into microservices and deploy on Google Kubernetes Engine.

Rehost the application on Compute Engine and use Cloud SQL for MySQL as the database.

Rehosting on Compute Engine with Cloud SQL minimizes changes and improves scalability and reliability.

Containerize the application with Docker and run it on Cloud Run.

Migrate the database to Firestore and rewrite the application to use Firestore APIs.

Use Cloud CDN to cache product catalog responses.

Increase the number of nodes in the GKE node pool.

Enable Cloud Spanner interleaved tables and add secondary indexes for common query filters.

Secondary indexes and interleaved tables optimize query access patterns, reducing latency.

Migrate the catalog service from Cloud Spanner to Cloud Bigtable for better read performance.

Want more Design and plan a cloud solution architecture practice?

All Manage and provision cloud infrastructure questions

Domain 2: Manage and provision cloud infrastructure

A company is deploying a new application on Compute Engine. They need to ensure that the application can automatically recover from a zone failure. What is the best approach?

Create a managed instance group with instances in multiple zones.

MIG auto-heals and distributes across zones.

Use a global load balancer in front of a single instance.

Create a single VM in a single zone and rely on live migration.

Use Cloud Storage to store application state and restore from a snapshot.

Export logs from each project to a Cloud Storage bucket and then import them into BigQuery.

Enable Cloud Audit Logs for all projects and view them from the central project.

Install the Stackdriver agent on all VMs and point them to the central project.

Create a logs sink in each project that exports logs to a BigQuery dataset in the central project.

Logs sinks can route any log entries to BigQuery.

Horizontal Pod Autoscaler.

HPA scales pods based on metrics like CPU.

Node auto-repair.

Vertical Pod Autoscaler.

Cluster Autoscaler.

Cloud Armor security policies.

Cloud Armor can allow/deny traffic based on IP.

VPC firewall rules.

Identity-Aware Proxy (IAP).

Cloud CDN.

A company has a production database running on Cloud SQL. They need to ensure high availability with automatic failover in the event of a zone outage. What should they do?

Export the database to Cloud Storage and import in another region.

Enable Cloud SQL High Availability (HA) configuration.

HA provides automatic failover to standby in another zone.

Create a cross-region read replica.

Configure automated backups.

A developer wants to store and retrieve non-relational data with flexible schema and automatic scaling. Which Google Cloud service should they use?

Cloud Bigtable.

Cloud SQL.

Firestore.

Firestore is NoSQL with flexible schema and auto-scaling.

Cloud Spanner.

Want more Manage and provision cloud infrastructure practice?

All Design for security and compliance questions

Domain 3: Design for security and compliance

Cloud HSM

Cloud External Key Manager (Cloud EKM)

Cloud Key Management Service (Cloud KMS)

Cloud KMS provides centralized management of encryption keys used by Google Cloud services.

Secret Manager

Use Identity-Aware Proxy (IAP) to manage network access between tiers.

Use VPC firewall rules with target tags to allow traffic between specific tiers.

VPC firewall rules with tags are the simplest and most secure way to enforce network segmentation within a VPC.

Create separate VPC networks for each tier and use VPC peering.

Assign a unique service account to each tier and use IAM conditions to restrict traffic.

Cloud Storage with default encryption, Cloud Audit Logs, and Security Command Center

Cloud Storage with CMEK via Cloud HSM, Cloud Audit Logs, and Cloud DLP

Cloud Storage with CSEK, Cloud Audit Logs, and Security Command Center

Cloud Storage with CMEK via Cloud KMS, Cloud Audit Logs, and Chronicle

CMEK uses Cloud KMS for key management, Cloud Audit Logs for logging, and Chronicle for anomaly detection.

Implement column-level encryption using Cloud KMS in the application layer.

Grant DBAs the Cloud SQL Viewer role to restrict access to data.

Use Cloud SQL Proxy to encrypt connections and limit DBA access.

Use Cloud DLP with de-identification and re-identification transforms on the Cloud SQL database.

Cloud DLP can automatically detect and tokenize sensitive data, with re-identification for authorized apps.

A company wants to ensure that only Compute Engine instances with a specific service account can access a Cloud Storage bucket. Which IAM condition should they use?

Condition: 'request.auth == "serviceAccount:sa@project.iam.gserviceaccount.com"'

Condition: 'origin.serviceAccount == "sa@project.iam.gserviceaccount.com"'

Condition: 'resource.serviceAccount == "sa@project.iam.gserviceaccount.com"'

Condition: 'iam.serviceAccount == "sa@project.iam.gserviceaccount.com"'

The condition 'iam.serviceAccount' matches the service account used by the caller.

Configure Cloud CDN to serve content only to EU-based users.

Use Cloud Armor security policies with geographic-based denylist rules.

Cloud Armor can block traffic from specific countries at the Google Cloud edge.

Set VPC firewall rules to allow traffic only from EU IP ranges.

Configure Identity-Aware Proxy (IAP) to require user authentication from allowed countries.

Want more Design for security and compliance practice?

All Analyze and optimize technical and business processes questions

Domain 4: Analyze and optimize technical and business processes

A company is migrating its on-premises Oracle database to Cloud SQL for PostgreSQL. The database team wants to minimize downtime during migration. Which approach should they use?

Set up Oracle GoldenGate to replicate to Cloud SQL.

Use Database Migration Service for PostgreSQL with continuous migration from Oracle via Homogeneous Migration.

DMS supports minimal downtime via continuous replication.

Take a physical backup of Oracle and restore to Cloud SQL.

Export the database as a dump file, upload to Cloud Storage, and import into Cloud SQL.

Remove the primary key and let Spanner auto-generate it.

Use interleaved tables to store orders under customers.

Add a random prefix to the order ID primary key.

Randomizing the first part of the key distributes writes across splits.

Create a secondary index on the timestamp column.

The subscription has an ackDeadlineSeconds of 600.

The Cloud Function has a timeout of 540 seconds.

The Pub/Sub topic has a retention duration of 10 minutes.

The Cloud Function is configured with retry on failure set to false.

If retry is disabled, failed messages are dropped.

A company uses BigQuery for analytics. They have a large partitioned table that is queried frequently. The query performance has degraded over time. Which optimization should they try first?

Create a materialized view for each frequent query.

Increase the number of slots for the project.

Apply clustering on frequently filtered columns.

Clustering sorts data, reducing scanned data for filters.

Denormalize the table to reduce joins.

The cluster autoscaler is using the 'least-waste' expander.

The horizontal pod autoscaler (HPA) is misconfigured.

The pod disruption budget (PDB) is too restrictive.

The node pool has reached the maximum node count limit.

Cluster autoscaler cannot exceed max node limit.

A company wants to reduce costs for a Cloud Storage bucket that stores infrequently accessed archival data. The data is accessed roughly once a quarter. Which storage class should they use?

Archive storage class.

Archive is for data accessed less than once a year.

Nearline storage class.

Standard storage class.

Coldline storage class.

Want more Analyze and optimize technical and business processes practice?

All Manage implementation of cloud architecture questions

Domain 5: Manage implementation of cloud architecture

The readiness probes are not passing, causing the service endpoints to be removed.

Failing readiness probes cause the pod to be removed from service endpoints, leading to connection refused.

The services are not exposed via a VPC peering connection to the client's VPC.

The services are using NodePort instead of LoadBalancer type, causing port conflicts.

The services are not associated with an Ingress resource.

Attach a new regional disk to the instance and use RAID 1 mirroring.

Create a snapshot of the disk, then create a new regional persistent disk from that snapshot, and attach it to the instance.

This is the recommended migration path; snapshot creation is the only downtime window.

Use rsync to copy data to a new regional disk while the instance is running.

Use gcloud compute disks resize to change the disk type to regional.

Use Cloud CDN in front of an external HTTPS Load Balancer with backend services in multiple regions.

Cloud CDN caches static content at edge, and Load Balancer routes dynamic requests to nearest backend.

Use Cloud NAT to allow egress traffic from instances and distribute static content via a shared VPC.

Use Cloud DNS with geo-routing to direct users to the closest regional Cloud Run service.

Use VPC Network Peering to connect multiple regional VPCs and serve content from a central location.

Add a step in Cloud Build that runs a vulnerability scanner on the image and fails the build if vulnerabilities exceed a threshold.

This integrates scanning into the pipeline, preventing vulnerable images from being pushed.

Configure Container Analysis to automatically scan images in Container Registry and block deployment via a webhook.

Enable Binary Authorization on the GKE cluster and configure a policy to require an attestation from a trusted authority.

Use Security Command Center to detect vulnerabilities and alert the team to manually block deployments.

Manually monitor the job and increase the number of workers when a spike is detected.

Increase the machine type of the workers to a high-memory type and disable autoscaling.

Configure the Dataflow pipeline to use autoscaling with a higher maximum number of workers and use preemptible VMs for cost savings.

Autoscaling adjusts workers dynamically; preemptible VMs reduce cost for fault-tolerant work.

Use Dataflow Streaming Engine to offload state to persistent storage and reduce memory usage.

Compute Engine with a self-managed MySQL instance.

Cloud Run with Cloud Spanner.

App Engine Standard Environment with Cloud SQL.

App Engine Standard auto-scales and is serverless; Cloud SQL is managed.

Google Kubernetes Engine (GKE) with Cloud SQL.

Want more Manage implementation of cloud architecture practice?

All Ensure solution and operations reliability questions

Domain 6: Ensure solution and operations reliability

The MIG is regional and one zone failed.

The autohealing health check is misconfigured.

MIG autohealing relies on a health check to detect unhealthy instances and replace them; a misconfiguration prevents detection.

The instance template has a startup script error.

The HTTP load balancer's health check is failing.

Export the database daily using gsutil and import in the other region using pg_restore.

Create a cross-region read replica and promote it to primary during failover.

Configure a cross-region replica instance using Cloud SQL's cross-region replication feature.

Cross-region replication provides a standby instance with synchronous replication, minimal data loss, and failover in minutes.

Automated backups with point-in-time recovery to a new instance in the other region.

Increase the number of nodes in the Spanner instance.

Reduce the number of indexes on frequently updated columns.

Optimize transactions to reduce lock contention.

Short, single-partition transactions reduce the chance of conflicts and aborts.

Use interleaved tables to co-locate related data.

CPU requests are too low, causing throttling and eventual crash.

The node pool is too small, causing memory pressure on the node.

Memory limits are set higher than the node's allocatable memory.

The application has a memory leak that eventually exceeds the limit.

A memory leak causes continuous memory growth until the limit is hit, resulting in OOMKill.

Increase the maximum number of concurrent function instances.

Increase the memory allocated to the Cloud Function.

More memory allows the function to handle larger data per invocation without hitting the limit.

Enable Pub/Sub batching to reduce the number of function invocations.

Split the function into multiple smaller functions, each handling a subset of the data.

Use a StatefulSet with a volumeClaimTemplate referencing a persistent disk in the same zone.

StatefulSet ensures stable pod identity and PVC reattachment; zone affinity ensures the disk is in the same zone.

Use a Deployment with a PVC that has allowedTopologies restricting to the desired zone.

Use a Deployment with a persistent volume that is manually attached after pod creation.

Use a StatefulSet with a persistent disk that has access mode ReadOnlyMany.

Want more Ensure solution and operations reliability practice?