GCDL domain

Scaling with Google Cloud operations

Use this page to practise GCDL Scaling with Google Cloud operations practice questions. The goal is not to memorise dumps, but to understand the concept, review the explanation and improve your exam readiness.

60 questions

Focused practice

Start a Scaling with Google Cloud operations session

All sessions draw only from this domain. Pick a length or try interactive practice with inline explanations.

Start 20-question practice session →

What the exam tests

What to know about Scaling with Google Cloud operations

Cloud concepts questions usually test the service model (IaaS/PaaS/SaaS) and deployment model (public/private/hybrid/community) appropriate for a given scenario.

IaaS, PaaS and SaaS responsibilities and examples.

Public, private, hybrid and community cloud deployment models.

On-premises vs cloud trade-offs: cost, control, scalability.

How cloud connectivity options (VPN, Direct Connect, ExpressRoute) work.

Question index

All Scaling with Google Cloud operations questions (60)

Click any question to see the full explanation, or start a practice session above.

1

A company's web service has a Service Level Objective (SLO) of 99.9% monthly availability. In a 30-day month, how many minutes of downtime are allowed before the SLO is violated?

2

A SRE team wants to alert when their service is consuming error budget faster than expected, rather than alerting only when the SLO threshold is crossed. Which Cloud Monitoring alerting strategy supports this approach?

3

A company's on-premises IT team spends 70% of their time on routine maintenance tasks: patching servers, replacing failed hardware, and upgrading storage. After migrating to Google Cloud managed services, which operational outcome should they expect?

4

A company has deployed a critical application on Google Cloud and wants to understand what happens to their workloads during a Google Cloud data center maintenance event (e.g., host system upgrades). What Google Compute Engine feature handles this automatically for most VMs?

5

A company's application experiences traffic spikes every weekday morning when employees log in at 9 AM. The team wants their infrastructure to automatically handle these spikes without manual intervention and without over-provisioning resources all day. Which Google Cloud capability addresses this?

6

A digital media company hosts video content globally. They want to reduce origin server load and deliver content faster to viewers worldwide. Their current architecture routes all viewer requests directly to the origin servers in `us-central1`, causing high latency for viewers in Asia and Europe. Which Google Cloud networking capability addresses this?

7

Which Google Cloud service provides a centralized view of an application's performance metrics, logs, and traces — enabling teams to monitor system health, set up alerts, and diagnose issues from a single platform?

8

A company runs a mission-critical application that must be available 24/7. They want to ensure that if a Google Cloud region becomes unavailable (e.g., due to a natural disaster), the application automatically continues to serve users from another region. Which architecture pattern achieves this?

9

A company currently spends $200,000 annually on data center costs (hardware, power, cooling, staff). After migrating to Google Cloud, their cloud bill is $120,000 annually, but they also save $50,000 in data center costs they no longer pay. What is their net annual savings from the migration?

10

A company's application is composed of 15 microservices. When a performance issue occurs, the team struggles to determine which service is causing latency since request traces span multiple services. Which Google Cloud service helps identify which specific service in a microservices chain is causing slowdowns?

11

What is the difference between a Service Level Indicator (SLI), a Service Level Objective (SLO), and a Service Level Agreement (SLA)?

12

After a major production outage, the engineering team conducts a review of what happened, why it happened, and how to prevent it in the future. This document is shared with all engineering teams. What is this practice called, and why does Google's SRE culture emphasize it?

13

A company running critical applications on Google Cloud wants access to technical support with a response time under 1 hour for critical issues and a dedicated Technical Account Manager (TAM). Which Google Cloud support tier should they purchase?

14

A company wants to optimize their Google Cloud spending. They have baseline compute workloads that run continuously 24/7 for at least one year. Which pricing option provides the greatest savings for these stable, long-running workloads?

15

A company has multiple teams deploying to Google Cloud and wants to allocate cloud costs by team. Each team should see only their own costs and be accountable for their spending. Which Google Cloud feature enables this cost allocation and visibility?

16

Google Cloud's infrastructure is designed to be highly available across multiple failure domains. What are 'availability zones' in Google Cloud, and how do they differ from 'regions'?

17

A company uses Google Cloud and wants to understand their monthly cloud spend before the invoice arrives, track spending trends, and identify the top cost drivers across all services. Which built-in Google Cloud tool provides this visibility?

18

A company exports all their Google Cloud logs to Cloud Storage for long-term retention required by their compliance policy (7-year log retention). Which Cloud Logging feature enables routing logs to Cloud Storage?

19

A company's application traffic is served by a Google Cloud global HTTP load balancer. They want to understand how request traffic distributes across backend instances in different regions. Which metric best represents this distribution?

20

A company wants to proactively identify underutilized Compute Engine VMs (high provisioned capacity but low actual usage) to reduce costs. Which Google Cloud tool provides recommendations for right-sizing VMs?

21

An SRE team has a monthly error budget of 43 minutes (99.9% SLO). In the first week of the month, a deployment causes a 50-minute outage. What should the SRE team do for the remainder of the month, and why?

22

A reliability engineering team wants to proactively identify weaknesses in their distributed system by deliberately injecting failures — killing random instances, introducing network latency, and cutting off database connections — to observe how the system responds. What is this practice called?

23

Google Cloud's operations suite includes Cloud Monitoring for metrics. What is the difference between 'monitoring' and 'observability' in cloud operations?

24

A company wants to set up automated checks that continuously verify their website's homepage, login page, and API endpoints are accessible from multiple global locations. If any endpoint becomes unreachable for more than 2 minutes, the on-call engineer should be alerted. Which Cloud Monitoring feature provides this?

25

A company's cloud costs have increased by 40% over the past quarter. The operations team wants to identify and address the root causes. Which cost optimization strategies should they investigate first?

26

Google Cloud runs its own infrastructure operations using the Site Reliability Engineering (SRE) model, which Google invented. What is the core principle that distinguishes SRE from traditional IT operations?

27

A company uses Google Cloud across 5 teams, 20 projects, and 3 regions. They want to enforce a standard that all resources include specific labels (e.g., `team`, `environment`, `cost-center`) for cost attribution and governance. What is the most scalable way to enforce this labeling standard?

28

A company's application experiences a P1 (critical) production incident at 2 AM on a Sunday. The on-call engineer resolves the issue after 3 hours but isn't sure which team members to contact or what steps to follow during an incident. What operational practice and tooling would have helped manage this incident better?

29

A company wants to optimize Cloud Storage costs for a bucket containing 100 TB of access logs. The logs from the last 7 days are frequently analyzed; logs from 8–90 days are occasionally reviewed; logs older than 90 days are archived for compliance but rarely accessed. What is the most cost-effective storage class configuration?

30

A company's engineering organization wants to share operational knowledge across teams using a 'golden path' — a recommended, pre-configured set of tools, services, and templates that makes the easy path also the correct path. Which Google Cloud concept supports this practice?

31

A company's cloud team is asked to demonstrate that their infrastructure changes are repeatable and auditable. They use Terraform configuration files committed to a Git repository to define all cloud resources. Which operational practice does this exemplify?

32

A product team is discussing how to handle a planned 48-hour maintenance window for a critical customer-facing service. The SRE team argues the maintenance window is unnecessary with proper cloud architecture. Which cloud capability eliminates the need for planned downtime maintenance windows?

33

An operations team tracks the following metrics for their customer portal: request latency p99, error rate, and requests per second. In Site Reliability Engineering terminology, what are these metrics called, and what do they collectively define?

34

A company's cloud costs have grown faster than its business. The FinOps team is implementing cloud cost governance. Which practice most effectively ensures that individual teams are accountable for their cloud spending?

35

An operations team wants to receive an automated alert when their web application's HTTP error rate exceeds 5% for more than 5 minutes. Which Google Cloud product is used to configure this type of metric-based alert?

36

A company is evaluating whether to adopt a multi-cloud strategy (using two or more cloud providers for different workloads). An engineer lists the following arguments: (1) resilience against a single cloud provider outage, (2) negotiating leverage on pricing, (3) using best-of-breed services from each provider. A cloud architect cautions that multi-cloud also introduces significant challenges. What is the most significant operational challenge of a multi-cloud approach?

37

A cloud operations team wants to ensure that all cloud resources created in their Google Cloud organization comply with company naming standards and required cost allocation labels. Which Google Cloud capability can automatically enforce these standards on resource creation?

38

A company's cloud environment has grown rapidly and the team is struggling to understand what cloud resources exist across dozens of projects. Which Google Cloud product provides a unified inventory of all cloud assets across an organization's projects and folders?

39

A DevOps team wants to implement a release process where a new application version is first deployed to 5% of production traffic, monitored for errors, then gradually increased to 100% if metrics remain healthy. Which deployment strategy does this describe?

40

An SRE team analyzes that their service had 47 minutes of downtime in the past 30 days. Their SLO is 99.9% monthly availability. How should the team characterize their performance relative to the SLO?

41

A company's cloud team is asked to reduce the cost of a batch data processing workload that runs for 4–6 hours each night and can tolerate interruptions. The workload currently uses standard on-demand Compute Engine VMs. Which pricing option should the team evaluate first?

42

A cloud team performs a quarterly review of its Compute Engine instances and discovers 15 VMs that have had zero CPU utilization for over 90 days. What is the recommended operational response to these idle resources?

43

A company's SRE team is debating whether to automate a frequently performed manual operational task. The automation would take 4 weeks of engineering time to build. The manual task takes 30 minutes per occurrence and happens approximately 20 times per month. Using the SRE concept of 'toil,' how should the team approach this decision?

44

A company runs a customer-facing web application with a published SLA of 99.95% monthly availability. In the past month, the application experienced two outages: a 12-minute outage and a 7-minute outage. Did the company meet its SLA?

45

A cloud architect is reviewing logs from a production incident. She wants to search all log entries across multiple Google Cloud projects for error messages containing a specific string. Which Google Cloud product enables centralized log searching and analysis across an entire organization?

46

A DevOps team wants to adopt GitOps practices for managing their Google Cloud infrastructure. Which combination of tools and practices defines a GitOps approach to cloud infrastructure management?

47

An operations team has been asked to estimate the annual cost impact of a proposed new cloud architecture. The architecture would replace 50 on-demand n2-standard-4 VMs (running 24/7) with an autoscaling group that averages 10 VMs under normal load but scales to 50 during peak hours (approximately 8 hours per day). Which analytical approach best estimates the cost impact?

48

A cloud team receives an alert that a critical production service's error rate has spiked. Following incident response best practices, what is the correct first priority action?

49

A company wants to reduce its Google Cloud costs without reducing its workload capacity. The team identifies that several production VMs consistently use less than 30% of their allocated CPU and memory. What is the most straightforward cost optimization action?

50

A platform engineering team is designing a self-service cloud environment for development teams. They want developers to be able to provision approved cloud resources quickly without waiting for central IT approval for every request, while still ensuring compliance with security and cost policies. Which architectural approach best balances developer agility with governance?

51

A cloud team wants to understand their current Google Cloud resource inventory — specifically, which VMs are running in each region, their machine types, and whether they have public IP addresses. Which approach most efficiently provides this across all projects?

52

An operations team is performing a post-incident review after a production outage. The team lead insists that the review must follow a 'blameless postmortem' approach. What does this mean, and why is it important for organizational learning?

53

A company's SRE team sets an SLO of 99.5% monthly availability for a non-critical internal tool. A business stakeholder argues the target should be 99.99%. The SRE team pushes back. Which SRE argument best supports keeping the 99.5% target?

54

A cloud team wants to automatically enforce that all new Compute Engine VMs are created with a specific label (environment: production) and that no VMs are created with external IP addresses in the production project. Which Google Cloud capability enforces these organizational policies at resource creation time?

55

A company has a Google Cloud environment with 50 projects and 200 engineers. The security team wants to ensure that a new security policy — requiring all Cloud Storage buckets to have uniform bucket-level access enabled — applies to all existing and future buckets across all projects. Which approach scales to the entire organization?

56

A company uses committed use discounts (CUDs) for its production workload baseline. An engineer proposes also using sustained use discounts (SUDs) for the same VMs. Why is this incorrect?

57

An SRE team is practicing 'chaos engineering' by simulating a zone-level failure in their staging environment. They find that their application does not automatically recover — traffic is not redirected and the service remains down. What architectural component is most likely missing?

58

A company's cloud operations team is implementing a tagging strategy for cost allocation. They want to ensure that the 'cost-center' label is present on every Compute Engine VM and Cloud Storage bucket created in their Google Cloud organization. Currently, some resources are created without this label. Which combination of controls best enforces and remediates this requirement?

59

A company's production database is running on a Compute Engine VM with a 500 GB Persistent Disk. The operations team wants to create a backup they can restore from in case of data corruption or accidental deletion. Which Google Cloud capability provides point-in-time backup for Persistent Disks?

60

A company's cloud cost has grown significantly. A FinOps analysis reveals the largest waste category is idle Cloud SQL instances — 12 database instances that were provisioned for projects that have since ended, but were never deleted. What process failure most directly caused this waste?

Watch out for

Common Scaling with Google Cloud operations exam traps

  • IaaS gives you infrastructure control; SaaS gives you only the application.
  • Hybrid cloud combines on-premises and public cloud — not two public clouds.
  • Cloud does not automatically mean cheaper or more secure.
  • Management responsibility shifts with each service model (IaaS → PaaS → SaaS).

Frequently asked questions

What does the Scaling with Google Cloud operations domain cover on the GCDL exam?
Cloud concepts questions usually test the service model (IaaS/PaaS/SaaS) and deployment model (public/private/hybrid/community) appropriate for a given scenario.
How many questions are in this domain?
This page lists all 60 Scaling with Google Cloud operations questions in the GCDL question bank. The actual exam draws from this domain proportionally to its weighting in the official exam blueprint.
What is the best way to practise this domain?
Start with a short focused session (10 questions) to identify gaps, then use the interactive practice page to work through explanations. Repeat with a longer session once the weak areas feel solid.
Can I practise only Scaling with Google Cloud operations questions?
Yes — the session launcher on this page filters questions to this domain only. Choose any session length or try the interactive practice page for inline explanations.