Cloud Digital Leader Scaling with Google Cloud operations — All Questions With Answers

Question 1easymultiple choice

Read the full Scaling with Cloud operations explanation →

A company's web service has a Service Level Objective (SLO) of 99.9% monthly availability. In a 30-day month, how many minutes of downtime are allowed before the SLO is violated?

Question 2mediummultiple choice

Read the full Scaling with Cloud operations explanation →

A SRE team wants to alert when their service is consuming error budget faster than expected, rather than alerting only when the SLO threshold is crossed. Which Cloud Monitoring alerting strategy supports this approach?

Question 3easymultiple choice

Read the full NAT/PAT explanation →

A company's on-premises IT team spends 70% of their time on routine maintenance tasks: patching servers, replacing failed hardware, and upgrading storage. After migrating to Google Cloud managed services, which operational outcome should they expect?

Question 4mediummultiple choice

Read the full Scaling with Cloud operations explanation →

A company has deployed a critical application on Google Cloud and wants to understand what happens to their workloads during a Google Cloud data center maintenance event (e.g., host system upgrades). What Google Compute Engine feature handles this automatically for most VMs?

Question 5mediummultiple choice

Read the full Scaling with Cloud operations explanation →

A company's application experiences traffic spikes every weekday morning when employees log in at 9 AM. The team wants their infrastructure to automatically handle these spikes without manual intervention and without over-provisioning resources all day. Which Google Cloud capability addresses this?

Question 6hardmultiple choice

Review the full routing breakdown →

A digital media company hosts video content globally. They want to reduce origin server load and deliver content faster to viewers worldwide. Their current architecture routes all viewer requests directly to the origin servers in `us-central1`, causing high latency for viewers in Asia and Europe. Which Google Cloud networking capability addresses this?

Question 7easymultiple choice

Read the full Scaling with Cloud operations explanation →

Which Google Cloud service provides a centralized view of an application's performance metrics, logs, and traces — enabling teams to monitor system health, set up alerts, and diagnose issues from a single platform?

Question 8mediummultiple choice

Read the full NAT/PAT explanation →

A company runs a mission-critical application that must be available 24/7. They want to ensure that if a Google Cloud region becomes unavailable (e.g., due to a natural disaster), the application automatically continues to serve users from another region. Which architecture pattern achieves this?

Question 9easymultiple choice

Read the full Scaling with Cloud operations explanation →

A company currently spends $200,000 annually on data center costs (hardware, power, cooling, staff). After migrating to Google Cloud, their cloud bill is $120,000 annually, but they also save $50,000 in data center costs they no longer pay. What is their net annual savings from the migration?

Question 10hardmultiple choice

Read the full Scaling with Cloud operations explanation →

A company's application is composed of 15 microservices. When a performance issue occurs, the team struggles to determine which service is causing latency since request traces span multiple services. Which Google Cloud service helps identify which specific service in a microservices chain is causing slowdowns?

Question 11easymultiple choice

Read the full Scaling with Cloud operations explanation →

What is the difference between a Service Level Indicator (SLI), a Service Level Objective (SLO), and a Service Level Agreement (SLA)?

Question 12mediummultiple choice

Read the full Scaling with Cloud operations explanation →

After a major production outage, the engineering team conducts a review of what happened, why it happened, and how to prevent it in the future. This document is shared with all engineering teams. What is this practice called, and why does Google's SRE culture emphasize it?

Question 13mediummultiple choice

Read the full Scaling with Cloud operations explanation →

A company running critical applications on Google Cloud wants access to technical support with a response time under 1 hour for critical issues and a dedicated Technical Account Manager (TAM). Which Google Cloud support tier should they purchase?

Question 14easymultiple choice

Read the full Scaling with Cloud operations explanation →

A company wants to optimize their Google Cloud spending. They have baseline compute workloads that run continuously 24/7 for at least one year. Which pricing option provides the greatest savings for these stable, long-running workloads?

Question 15mediummultiple choice

Read the full Scaling with Cloud operations explanation →

A company has multiple teams deploying to Google Cloud and wants to allocate cloud costs by team. Each team should see only their own costs and be accountable for their spending. Which Google Cloud feature enables this cost allocation and visibility?

Question 16hardmultiple choice

Read the full Scaling with Cloud operations explanation →

Google Cloud's infrastructure is designed to be highly available across multiple failure domains. What are 'availability zones' in Google Cloud, and how do they differ from 'regions'?

Question 17mediummultiple choice

Read the full Scaling with Cloud operations explanation →

A company uses Google Cloud and wants to understand their monthly cloud spend before the invoice arrives, track spending trends, and identify the top cost drivers across all services. Which built-in Google Cloud tool provides this visibility?

Question 18easymultiple choice

Review the full routing breakdown →

A company exports all their Google Cloud logs to Cloud Storage for long-term retention required by their compliance policy (7-year log retention). Which Cloud Logging feature enables routing logs to Cloud Storage?

Question 19hardmultiple choice

Read the full Scaling with Cloud operations explanation →

A company's application traffic is served by a Google Cloud global HTTP load balancer. They want to understand how request traffic distributes across backend instances in different regions. Which metric best represents this distribution?

Question 20mediummultiple choice

Read the full Scaling with Cloud operations explanation →

A company wants to proactively identify underutilized Compute Engine VMs (high provisioned capacity but low actual usage) to reduce costs. Which Google Cloud tool provides recommendations for right-sizing VMs?

Question 21mediummultiple choice

Read the full Scaling with Cloud operations explanation →

An SRE team has a monthly error budget of 43 minutes (99.9% SLO). In the first week of the month, a deployment causes a 50-minute outage. What should the SRE team do for the remainder of the month, and why?

Question 22hardmultiple choice

Read the full Scaling with Cloud operations explanation →

A reliability engineering team wants to proactively identify weaknesses in their distributed system by deliberately injecting failures — killing random instances, introducing network latency, and cutting off database connections — to observe how the system responds. What is this practice called?

Question 23easymultiple choice

Read the full Scaling with Cloud operations explanation →

Google Cloud's operations suite includes Cloud Monitoring for metrics. What is the difference between 'monitoring' and 'observability' in cloud operations?

Question 24mediummultiple choice

Read the full Scaling with Cloud operations explanation →

A company wants to set up automated checks that continuously verify their website's homepage, login page, and API endpoints are accessible from multiple global locations. If any endpoint becomes unreachable for more than 2 minutes, the on-call engineer should be alerted. Which Cloud Monitoring feature provides this?

Question 25mediummultiple choice

Read the full Scaling with Cloud operations explanation →

A company's cloud costs have increased by 40% over the past quarter. The operations team wants to identify and address the root causes. Which cost optimization strategies should they investigate first?

Question 26easymultiple choice

Read the full Scaling with Cloud operations explanation →

Google Cloud runs its own infrastructure operations using the Site Reliability Engineering (SRE) model, which Google invented. What is the core principle that distinguishes SRE from traditional IT operations?

Question 27hardmultiple choice

Read the full Scaling with Cloud operations explanation →

A company uses Google Cloud across 5 teams, 20 projects, and 3 regions. They want to enforce a standard that all resources include specific labels (e.g., `team`, `environment`, `cost-center`) for cost attribution and governance. What is the most scalable way to enforce this labeling standard?

Question 28mediummultiple choice

Read the full Scaling with Cloud operations explanation →

A company's application experiences a P1 (critical) production incident at 2 AM on a Sunday. The on-call engineer resolves the issue after 3 hours but isn't sure which team members to contact or what steps to follow during an incident. What operational practice and tooling would have helped manage this incident better?

Question 29easymultiple choice

Read the full Scaling with Cloud operations explanation →

A company wants to optimize Cloud Storage costs for a bucket containing 100 TB of access logs. The logs from the last 7 days are frequently analyzed; logs from 8–90 days are occasionally reviewed; logs older than 90 days are archived for compliance but rarely accessed. What is the most cost-effective storage class configuration?

Question 30mediummultiple choice

Read the full NAT/PAT explanation →

A company's engineering organization wants to share operational knowledge across teams using a 'golden path' — a recommended, pre-configured set of tools, services, and templates that makes the easy path also the correct path. Which Google Cloud concept supports this practice?

Question 31easymultiple choice

Read the full Scaling with Cloud operations explanation →

A company's cloud team is asked to demonstrate that their infrastructure changes are repeatable and auditable. They use Terraform configuration files committed to a Git repository to define all cloud resources. Which operational practice does this exemplify?

Question 32mediummultiple choice

Read the full NAT/PAT explanation →

A product team is discussing how to handle a planned 48-hour maintenance window for a critical customer-facing service. The SRE team argues the maintenance window is unnecessary with proper cloud architecture. Which cloud capability eliminates the need for planned downtime maintenance windows?

Question 33hardmultiple choice

Read the full Scaling with Cloud operations explanation →

An operations team tracks the following metrics for their customer portal: request latency p99, error rate, and requests per second. In Site Reliability Engineering terminology, what are these metrics called, and what do they collectively define?

Question 34mediummultiple choice

Read the full Scaling with Cloud operations explanation →

A company's cloud costs have grown faster than its business. The FinOps team is implementing cloud cost governance. Which practice most effectively ensures that individual teams are accountable for their cloud spending?

Question 35easymultiple choice

Read the full Scaling with Cloud operations explanation →

An operations team wants to receive an automated alert when their web application's HTTP error rate exceeds 5% for more than 5 minutes. Which Google Cloud product is used to configure this type of metric-based alert?

Question 36hardmultiple choice

Read the full Scaling with Cloud operations explanation →

A company is evaluating whether to adopt a multi-cloud strategy (using two or more cloud providers for different workloads). An engineer lists the following arguments: (1) resilience against a single cloud provider outage, (2) negotiating leverage on pricing, (3) using best-of-breed services from each provider. A cloud architect cautions that multi-cloud also introduces significant challenges. What is the most significant operational challenge of a multi-cloud approach?

Question 37mediummultiple choice

Read the full Scaling with Cloud operations explanation →

A cloud operations team wants to ensure that all cloud resources created in their Google Cloud organization comply with company naming standards and required cost allocation labels. Which Google Cloud capability can automatically enforce these standards on resource creation?

Question 38easymultiple choice

Read the full Scaling with Cloud operations explanation →

A company's cloud environment has grown rapidly and the team is struggling to understand what cloud resources exist across dozens of projects. Which Google Cloud product provides a unified inventory of all cloud assets across an organization's projects and folders?

Question 39mediummultiple choice

Read the full Scaling with Cloud operations explanation →

A DevOps team wants to implement a release process where a new application version is first deployed to 5% of production traffic, monitored for errors, then gradually increased to 100% if metrics remain healthy. Which deployment strategy does this describe?

Question 40hardmultiple choice

Read the full Scaling with Cloud operations explanation →

An SRE team analyzes that their service had 47 minutes of downtime in the past 30 days. Their SLO is 99.9% monthly availability. How should the team characterize their performance relative to the SLO?

Question 41easymultiple choice

Read the full Scaling with Cloud operations explanation →

A company's cloud team is asked to reduce the cost of a batch data processing workload that runs for 4–6 hours each night and can tolerate interruptions. The workload currently uses standard on-demand Compute Engine VMs. Which pricing option should the team evaluate first?

Question 42mediummultiple choice

Read the full Scaling with Cloud operations explanation →

A cloud team performs a quarterly review of its Compute Engine instances and discovers 15 VMs that have had zero CPU utilization for over 90 days. What is the recommended operational response to these idle resources?

Question 43hardmultiple choice

Read the full Scaling with Cloud operations explanation →

A company's SRE team is debating whether to automate a frequently performed manual operational task. The automation would take 4 weeks of engineering time to build. The manual task takes 30 minutes per occurrence and happens approximately 20 times per month. Using the SRE concept of 'toil,' how should the team approach this decision?

Question 44mediummultiple choice

Read the full Scaling with Cloud operations explanation →

A company runs a customer-facing web application with a published SLA of 99.95% monthly availability. In the past month, the application experienced two outages: a 12-minute outage and a 7-minute outage. Did the company meet its SLA?

Question 45easymultiple choice

Read the full Scaling with Cloud operations explanation →

A cloud architect is reviewing logs from a production incident. She wants to search all log entries across multiple Google Cloud projects for error messages containing a specific string. Which Google Cloud product enables centralized log searching and analysis across an entire organization?

Question 46mediummultiple choice

Read the full NAT/PAT explanation →

A DevOps team wants to adopt GitOps practices for managing their Google Cloud infrastructure. Which combination of tools and practices defines a GitOps approach to cloud infrastructure management?

Question 47hardmultiple choice

Read the full Scaling with Cloud operations explanation →

An operations team has been asked to estimate the annual cost impact of a proposed new cloud architecture. The architecture would replace 50 on-demand n2-standard-4 VMs (running 24/7) with an autoscaling group that averages 10 VMs under normal load but scales to 50 during peak hours (approximately 8 hours per day). Which analytical approach best estimates the cost impact?

Question 48mediummultiple choice

Read the full Scaling with Cloud operations explanation →

A cloud team receives an alert that a critical production service's error rate has spiked. Following incident response best practices, what is the correct first priority action?

Question 49easymultiple choice

Read the full Scaling with Cloud operations explanation →

A company wants to reduce its Google Cloud costs without reducing its workload capacity. The team identifies that several production VMs consistently use less than 30% of their allocated CPU and memory. What is the most straightforward cost optimization action?

Question 50hardmultiple choice

Read the full Scaling with Cloud operations explanation →

A platform engineering team is designing a self-service cloud environment for development teams. They want developers to be able to provision approved cloud resources quickly without waiting for central IT approval for every request, while still ensuring compliance with security and cost policies. Which architectural approach best balances developer agility with governance?

Question 51easymultiple choice

Read the full Scaling with Cloud operations explanation →

A cloud team wants to understand their current Google Cloud resource inventory — specifically, which VMs are running in each region, their machine types, and whether they have public IP addresses. Which approach most efficiently provides this across all projects?

Question 52mediummultiple choice

Read the full Scaling with Cloud operations explanation →

An operations team is performing a post-incident review after a production outage. The team lead insists that the review must follow a 'blameless postmortem' approach. What does this mean, and why is it important for organizational learning?

Question 53hardmultiple choice

Read the full Scaling with Cloud operations explanation →

A company's SRE team sets an SLO of 99.5% monthly availability for a non-critical internal tool. A business stakeholder argues the target should be 99.99%. The SRE team pushes back. Which SRE argument best supports keeping the 99.5% target?

Question 54mediummultiple choice

Read the full Scaling with Cloud operations explanation →

A cloud team wants to automatically enforce that all new Compute Engine VMs are created with a specific label (environment: production) and that no VMs are created with external IP addresses in the production project. Which Google Cloud capability enforces these organizational policies at resource creation time?

Question 55easymultiple choice

Read the full Scaling with Cloud operations explanation →

A company has a Google Cloud environment with 50 projects and 200 engineers. The security team wants to ensure that a new security policy — requiring all Cloud Storage buckets to have uniform bucket-level access enabled — applies to all existing and future buckets across all projects. Which approach scales to the entire organization?

Question 56mediummultiple choice

Read the full Scaling with Cloud operations explanation →

A company uses committed use discounts (CUDs) for its production workload baseline. An engineer proposes also using sustained use discounts (SUDs) for the same VMs. Why is this incorrect?

Question 57hardmultiple choice

Read the full Scaling with Cloud operations explanation →

An SRE team is practicing 'chaos engineering' by simulating a zone-level failure in their staging environment. They find that their application does not automatically recover — traffic is not redirected and the service remains down. What architectural component is most likely missing?

Question 58mediummultiple choice

Read the full NAT/PAT explanation →

A company's cloud operations team is implementing a tagging strategy for cost allocation. They want to ensure that the 'cost-center' label is present on every Compute Engine VM and Cloud Storage bucket created in their Google Cloud organization. Currently, some resources are created without this label. Which combination of controls best enforces and remediates this requirement?

Question 59easymultiple choice

Read the full Scaling with Cloud operations explanation →

A company's production database is running on a Compute Engine VM with a 500 GB Persistent Disk. The operations team wants to create a backup they can restore from in case of data corruption or accidental deletion. Which Google Cloud capability provides point-in-time backup for Persistent Disks?

Question 60hardmultiple choice

Read the full Scaling with Cloud operations explanation →

A company's cloud cost has grown significantly. A FinOps analysis reveals the largest waste category is idle Cloud SQL instances — 12 database instances that were provisioned for projects that have since ended, but were never deleted. What process failure most directly caused this waste?

Question 61mediumdrag order

Read the full Scaling with Cloud operations explanation →

Drag and drop the steps to set up a Cloud SQL for MySQL instance with a private IP address into the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

1Step 1

2Step 2

3Step 3

4Step 4

5Step 5

Question 62mediumdrag order

Read the full Scaling with Cloud operations explanation →

Drag and drop the steps to recover a Compute Engine VM from a snapshot in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

1Step 1

2Step 2

3Step 3

4Step 4

5Step 5

Question 63mediummatching

Read the full Scaling with Cloud operations explanation →

Match each Google Cloud security concept to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Identity and Access Management – fine-grained access control

Key Management Service for encryption keys

DDoS protection and web application firewall

Perimeter security to prevent data exfiltration

Centralized vulnerability and threat monitoring

Question 64mediummatching

Read the full Scaling with Cloud operations explanation →

Match each Google Cloud serverless compute option to its characteristic.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Event-driven, short-lived functions

Container-based, scales to zero

Platform as a Service (PaaS) with automatic scaling

Orchestration of services and APIs

Event routing and management service

Question 65mediummultiple choice

Read the full Scaling with Cloud operations explanation →

A company runs a web application on Compute Engine instances behind a managed instance group with autoscaling based on CPU utilization. After a marketing campaign, traffic spikes and the autoscaler adds instances quickly, but the application becomes slow. What is the most likely cause?

Question 66easymultiple choice

Read the full Scaling with Cloud operations explanation →

A developer needs to debug a production issue by analyzing logs from multiple microservices. Which Google Cloud service should they use to filter and search logs in real time?

Question 67hardmultiple choice

Read the full Scaling with Cloud operations explanation →

A company wants to implement SLOs for their API service. They need to measure the proportion of successful requests over a 30-day window. Which metric should they use?

Question 68mediummultiple choice

Read the full Scaling with Cloud operations explanation →

A company is migrating to Google Cloud and wants to reduce operational overhead for managing their infrastructure. Which Google Cloud service allows them to define infrastructure as code and automate provisioning?

Question 69hardmultiple choice

Read the full Scaling with Cloud operations explanation →

An organization has multiple projects and wants to aggregate logs from all projects into a single bucket for long-term retention and compliance. What should they do?

Question 70easymultiple choice

Read the full Scaling with Cloud operations explanation →

A developer is troubleshooting a slow response from a Cloud Run service. Which Google Cloud service can they use to trace requests across microservices?

Question 71easymultiple choice

Read the full Scaling with Cloud operations explanation →

A company uses Cloud Functions and notices that some functions are taking longer than expected. They want to identify which functions have the highest latency. What should they use?

Question 72hardmultiple choice

Read the full Scaling with Cloud operations explanation →

A team wants to set up alerts for when the error budget of their service is exhausted. The service has an SLO of 99.9% availability over a 30-day rolling window. Which condition should they use in Cloud Monitoring alerting?

Question 73easymultiple choice

Read the full Scaling with Cloud operations explanation →

A company wants to automatically scale their Compute Engine managed instance group based on the number of requests per second. Which metric should they use?

Question 74mediummulti select

Read the full Scaling with Cloud operations explanation →

A site reliability engineer is implementing SRE practices in Google Cloud. Which TWO of the following are key principles of SRE? (Choose TWO.)

Question 75hardmulti select

Read the full Scaling with Cloud operations explanation →

A company uses Cloud Monitoring to collect metrics from their applications running on Google Kubernetes Engine (GKE). They want to create custom dashboards and set up alerting policies. Which THREE capabilities are available in Cloud Monitoring? (Choose THREE.)

Question 76easymulti select

Read the full Scaling with Cloud operations explanation →

Which THREE of the following are best practices for managing operations in Google Cloud? (Choose THREE.)

Question 77mediummultiple choice

Read the full Scaling with Cloud operations explanation →

Refer to the exhibit. The autoscaler is configured to maintain a target CPU utilization of 0.6. Currently the group has 10 instances, but the autoscaler is not scaling up even though CPU utilization is above 0.8. What is the most likely reason?

Exhibit

gcloud compute instance-groups managed list --zone us-central1-a
NAME          LOCATION       SCOPE  BASE_INSTANCE_NAME  SIZE  TARGET_SIZE  INSTANCE_TEMPLATE  AUTOSCALED
my-mig        us-central1-a  zone   my-instance         10    20           my-template         yes

Question 78hardmultiple choice

Read the full Scaling with Cloud operations explanation →

Refer to the exhibit. A DevOps engineer notices that the alert fires even when there is only a single 5-second spike of errors that lasts for one minute. What is the most likely cause?

Exhibit

{
  "displayName": "High Error Rate",
  "condition": {
    "conditionThreshold": {
      "filter": "metric.type=\"logging.googleapis.com/user/myapp_errors\" AND resource.type=\"k8s_container\"",
      "aggregations": [
        {
          "alignmentPeriod": "60s",
          "perSeriesAligner": "ALIGN_RATE"
        }
      ],
      "comparison": "COMPARISON_GT",
      "thresholdValue": 5,
      "trigger": {
        "count": 1
      }
    }
  },
  "alertStrategy": {
    "autoClose": "1800s"
  }
}

Question 79mediummultiple choice

Read the full Scaling with Cloud operations explanation →

Refer to the exhibit. A DevOps engineer wants to create a chart showing the rate of items sold per second over time. What is a limitation of this metric for that purpose?

Exhibit

{
  "metric": {
    "type": "custom.googleapis.com/inventory/items_sold",
    "labels": {}
  },
  "resource": {
    "type": "global",
    "labels": {
      "project_id": "my-project"
    }
  },
  "points": [
    {
      "interval": {
        "endTime": "2023-01-01T12:00:00Z"
      },
      "value": {
        "int64Value": "100"
      }
    }
  ],
  "metricKind": "GAUGE",
  "valueType": "INT64"
}

Question 80easymultiple choice

Read the full Scaling with Cloud operations explanation →

A company runs a web application on Compute Engine. During seasonal sales, traffic spikes unpredictably. The operations team wants to ensure the application scales automatically without manual intervention while minimizing cost. Which solution should they implement?

Question 81mediummultiple choice

Read the full Scaling with Cloud operations explanation →

An application deployed on Cloud Run is experiencing increased latency. The team suspects it's not scaling quickly enough. They have set a maxScale of 10 and minScale of 0. What should they adjust to reduce cold start latency?

Question 82hardmultiple choice

Read the full Scaling with Cloud operations explanation →

A global gaming company uses Cloud Spanner for their leaderboard. They notice that write latency spikes during peak hours. The database is currently deployed in a single region. Which scaling strategy should they implement to reduce write latency globally?

Question 83easymultiple choice

Read the full Scaling with Cloud operations explanation →

A startup is building a read-heavy mobile backend. They want a database that can scale out reads without downtime. Which database service should they choose?

Question 84mediummultiple choice

Read the full Scaling with Cloud operations explanation →

You are monitoring Compute Engine instances with Cloud Monitoring. You notice that autoscaling is not triggering even though CPU utilization is above 80% for several minutes. The managed instance group has autoscaling based on CPU utilization with a target of 0.8. What is the most likely cause?

Question 85hardmultiple choice

Read the full Scaling with Cloud operations explanation →

A company runs batch processing jobs using preemptible VMs to reduce costs. They need to ensure these jobs can scale out significantly during peak hours. Which Compute Engine pricing model should they combine with autoscaling to optimize cost for these workloads?

Question 86easymultiple choice

Read the full Scaling with Cloud operations explanation →

A company has a stateful application running on Compute Engine. They want to scale horizontally while preserving state. Which configuration should they use?

Question 87mediummultiple choice

Read the full Scaling with Cloud operations explanation →

A team deploys microservices on GKE with Horizontal Pod Autoscaler (HPA). They want to scale based on custom metrics from third-party monitoring. What must they do first?

Question 88hardmultiple choice

Read the full Scaling with Cloud operations explanation →

A company uses Cloud Functions (2nd gen) to process events from Pub/Sub. During traffic spikes, function instances scale but latency increases. They want to maximize throughput per instance. What should they configure?

Question 89easymulti select

Read the full Scaling with Cloud operations explanation →

Which TWO are recommended practices when configuring autoscaling for Compute Engine managed instance groups?

Question 90mediummulti select

Read the full Scaling with Cloud operations explanation →

Which TWO statements correctly describe Cloud Run scaling behavior?

Question 91hardmulti select

Read the full Scaling with Cloud operations explanation →

Which THREE components should a company include in their architecture to design a global web application with low latency for users worldwide?

Question 92mediummultiple choice

Read the full Scaling with Cloud operations explanation →

Refer to the exhibit. An operations team configured this Cloud Monitoring alert. They notice that the alert fires, but the associated managed instance group autoscaler does not scale up. What is the most likely reason?

Exhibit

{
  "displayName": "High CPU Alert",
  "condition": {
    "conditionThreshold": {
      "aggregations": [
        {
          "alignmentPeriod": "60s",
          "perSeriesAligner": "ALIGN_RATE",
          "crossSeriesReducer": "REDUCE_NONE"
        }
      ],
      "comparison": "COMPARISON_GT",
      "duration": "300s",
      "filter": "metric.type=\"compute.googleapis.com/instance/cpu/utilization\" resource.type=\"gce_instance\"",
      "thresholdValue": 0.8,
      "trigger": {
        "count": 1
      }
    }
  },
  "enabled": true
}

Question 93hardmultiple choice

Read the full Scaling with Cloud operations explanation →

Refer to the exhibit. A team deployed this Cloud Run service. During a load test, the service receives high traffic, but the number of container instances never exceeds 10. What is the most likely cause?

Exhibit

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: hello
spec:
  template:
    metadata:
      autoscaling.knative.dev/maxScale: "10"
      autoscaling.knative.dev/minScale: "2"
    spec:
      containerConcurrency: 80
      containers:
      - image: us-docker.pkg.dev/cloudrun/container/hello
        resources:
          limits:
            cpu: "1"
            memory: "256Mi"

Question 94hardmultiple choice

Read the full Scaling with Cloud operations explanation →

A company runs an e-commerce platform on Google Kubernetes Engine (GKE) using autoscaling. They have a baseline workload and occasional traffic spikes during promotions. They configured a Horizontal Pod Autoscaler (HPA) for their web application pods and a Cluster Autoscaler for the node pool. The HPA targets 70% CPU utilization. During a recent sales event, traffic exceeded expectations. The operations team observed that the HPA increased the desired number of replicas to 50, but only 20 pods were running. The remaining 30 pods were in 'Pending' status. The Cluster Autoscaler logs show repeated messages: 'no capacity to scale up node pool'. The node pool is configured with a maximum of 10 nodes, each with 4 vCPUs, and currently 8 nodes are running. The team checked the node pool's current utilization and found that nodes are near capacity. What should the team do to ensure the application scales correctly during future events?

Question 95mediummultiple choice

Read the full Scaling with Cloud operations explanation →

A company runs a web application on Google Kubernetes Engine (GKE) that experiences sudden traffic spikes. The operations team notices that the application's response time increases significantly during these spikes despite having Horizontal Pod Autoscaler (HPA) configured. They want to ensure consistent performance. What should they do?

Question 96hardmultiple choice

Read the full NAT/PAT explanation →

A financial services company is migrating its on-premises monitoring system to Google Cloud. They need to collect metrics, logs, and traces from multiple projects and provide a unified view for their operations team. Security requires that logs containing sensitive data be stored with additional encryption and access controls. Which combination of services should they use?

Question 97easymulti select

Read the full Scaling with Cloud operations explanation →

An e-commerce platform uses Compute Engine instances in a managed instance group behind a Cloud Load Balancer. During a flash sale, the load balancer reports increased error rates. The operations team suspects the instances are overwhelmed. Which two steps should they take to troubleshoot the issue? (Choose TWO.)

Question 98mediummulti select

Read the full Scaling with Cloud operations explanation →

A gaming company runs a real-time multiplayer game server on Google Kubernetes Engine. They want to optimize costs while ensuring low latency for players across different regions. Which three strategies should they implement? (Choose THREE.)

Question 99hardmultiple choice

Read the full NAT/PAT explanation →

A large enterprise runs a critical application on Google Cloud consisting of Compute Engine instances behind a TCP load balancer. The application experiences intermittent slow response times that last for about 10 minutes before returning to normal. This pattern has been occurring every few days at random times. The operations team has configured Cloud Monitoring alerts for CPU and memory, but no alerts have fired. They have also reviewed the load balancer logs and see no errors, but the latency spikes. The application logs show no errors during these periods. The team suspects a resource bottleneck but cannot find it. Further investigation reveals that the application makes synchronous calls to an external authentication service for each request. What is the most likely cause and corrective action?

Question 100easymultiple choice

Read the full NAT/PAT explanation →

A startup has deployed a Node.js application on Cloud Run. They are seeing a higher-than-expected bill for Cloud Run usage. The application is accessed by users worldwide, and traffic patterns show occasional spikes. They want to reduce costs while maintaining performance. They currently have no concurrency management and use the default Cloud Run settings. What should they do first?

Question 101mediummulti select

Read the full Scaling with Cloud operations explanation →

Which TWO statements about resource monitoring and scaling on Google Cloud are correct?

Question 102hardmultiple choice

Read the full Scaling with Cloud operations explanation →

A cloud operations engineer notices that the managed instance group 'my-mig' has been scaling up frequently, but the application performance is still degraded. The CPU utilization metric shows high values. What is most likely the issue?

Exhibit

Refer to the exhibit.

```json
{
  "insertId": "1a2b3c4d5e",
  "jsonPayload": {
    "status": "SCALING_UP",
    "instanceGroup": "my-mig",
    "targetSize": 10,
    "currentSize": 5,
    "reason": "AutoScaler triggered by CPU utilization > 80% for 5 minutes"
  },
  "resource": {
    "type": "gce_instance_group",
    "labels": {
      "instance_group_name": "my-mig",
      "zone": "us-central1-a"
    }
  },
  "severity": "INFO",
  "timestamp": "2025-02-15T14:30:00Z"
}
```

Question 103easymultiple choice

Read the full Scaling with Cloud operations explanation →

A large online retailer operates a microservices-based e-commerce platform on Google Kubernetes Engine (GKE) across multiple zones. The application consists of several stateless services that handle customer traffic, inventory, and order processing. Recently, the company migrated its relational database to Cloud Spanner to achieve global scalability and strong consistency. After the migration, during peak shopping periods (e.g., Black Friday), the application experiences significant performance degradation. The operations team monitors CPU utilization of the pods and finds it consistently below 60% even under heavy load. However, Cloud Spanner metrics show high query latency and increased number of transactions waiting for lock conflicts. The team suspects that the bottleneck is now the database, not the compute. The application is designed to scale horizontally by adding more pod replicas. The team wants to ensure that scaling decisions are based on the actual performance bottleneck. What should they do?

gcloud compute instance-groups managed list --zone us-central1-a NAME LOCATION SCOPE BASE_INSTANCE_NAME SIZE TARGET_SIZE INSTANCE_TEMPLATE AUTOSCALED my-mig us-central1-a zone my-instance 10 20 my-template yes

{ "displayName": "High Error Rate", "condition": { "conditionThreshold": { "filter": "metric.type=\"logging.googleapis.com/user/myapp_errors\" AND resource.type=\"k8s_container\"", "aggregations": [ { "alignmentPeriod": "60s", "perSeriesAligner": "ALIGN_RATE" } ], "comparison": "COMPARISON_GT", "thresholdValue": 5, "trigger": { "count": 1 } } }, "alertStrategy": { "autoClose": "1800s" } }

{ "metric": { "type": "custom.googleapis.com/inventory/items_sold", "labels": {} }, "resource": { "type": "global", "labels": { "project_id": "my-project" } }, "points": [ { "interval": { "endTime": "2023-01-01T12:00:00Z" }, "value": { "int64Value": "100" } } ], "metricKind": "GAUGE", "valueType": "INT64" }

{ "displayName": "High CPU Alert", "condition": { "conditionThreshold": { "aggregations": [ { "alignmentPeriod": "60s", "perSeriesAligner": "ALIGN_RATE", "crossSeriesReducer": "REDUCE_NONE" } ], "comparison": "COMPARISON_GT", "duration": "300s", "filter": "metric.type=\"compute.googleapis.com/instance/cpu/utilization\" resource.type=\"gce_instance\"", "thresholdValue": 0.8, "trigger": { "count": 1 } } }, "enabled": true }

apiVersion: serving.knative.dev/v1 kind: Service metadata: name: hello spec: template: metadata: autoscaling.knative.dev/maxScale: "10" autoscaling.knative.dev/minScale: "2" spec: containerConcurrency: 80 containers: - image: us-docker.pkg.dev/cloudrun/container/hello resources: limits: cpu: "1" memory: "256Mi"

Refer to the exhibit. ```json { "insertId": "1a2b3c4d5e", "jsonPayload": { "status": "SCALING_UP", "instanceGroup": "my-mig", "targetSize": 10, "currentSize": 5, "reason": "AutoScaler triggered by CPU utilization > 80% for 5 minutes" }, "resource": { "type": "gce_instance_group", "labels": { "instance_group_name": "my-mig", "zone": "us-central1-a" } }, "severity": "INFO", "timestamp": "2025-02-15T14:30:00Z" } ```