Practice KCNA Cloud Native Observability questions with full explanations on every answer.
Start practicing
Cloud Native Observability — choose a session length
Free · No account required
Click any question to see the full explanation and answer options, or start a focused practice session above.
A DevOps team notices that a microservice is returning 503 errors intermittently. The service runs in Kubernetes and uses a liveness probe. The team wants to understand the root cause without restarting the pod. Which observability approach should they use first?
2A platform team is designing a monitoring strategy for a multi-tenant Kubernetes cluster. Each tenant runs workloads in separate namespaces. The team needs to ensure tenant isolation while providing aggregated cluster-wide dashboards. Which approach best meets these requirements?
3A Kubernetes administrator is troubleshooting a pod that is stuck in CrashLoopBackOff. The pod's restart count is increasing. Which initial step should the administrator take to diagnose the issue?
4An organization uses Prometheus and Grafana for monitoring. They want to alert when the 99th percentile of request latency exceeds 500ms for more than 5 minutes. Which PromQL query should they use in the alert rule?
5Which TWO of the following are best practices for structuring log output in cloud-native applications to maximize observability?
6Which THREE of the following are valid use cases for distributed tracing in a microservices architecture?
7A company runs a Kubernetes cluster with 50 worker nodes, each hosting multiple microservices. They use Prometheus for metrics collection and Grafana for dashboards. Recently, the Prometheus server has been experiencing out-of-memory (OOM) kills during peak hours, causing gaps in metric collection. The cluster has a dedicated monitoring namespace. The team has already increased the Prometheus pod's memory limits to 8GB, but OOMs still occur. The metrics retention is set to 15 days. The cardinality of certain metrics (e.g., HTTP request labels with user IDs) is very high. The team needs to resolve the OOM issue without losing critical alerting capability for at least the last 7 days of data. Which action should they take first?
8A company deploys a microservice application on Kubernetes. They notice that one of the services is returning 5xx errors intermittently. Which observability tool should they use to correlate the errors with resource usage across all pods of that service?
9A platform team wants to implement observability for a Kubernetes cluster running 500+ microservices. They need to reduce the cost of storing logs while retaining the ability to search for specific error patterns. Which strategy best achieves this?
10A developer wants to monitor the health of a Kubernetes deployment by checking if the number of ready replicas matches the desired replicas. Which metric from kube-state-metrics should they query?
11Which TWO of the following are best practices for implementing observability in a cloud-native environment?
12You are an SRE managing a Kubernetes cluster with 200 nodes and 10,000 pods. The cluster runs a critical payment processing application. Users report that transactions are occasionally failing with a 'timeout' error. You have Prometheus and Grafana set up for monitoring, and you use Fluentd with Elasticsearch for logging. You notice that during peak hours, the CPU usage of the payment service pods spikes to 90%, but memory usage remains stable. The pod restart count is low. You also see that the response time of the payment service increases significantly during these spikes. You need to identify the root cause and propose a fix. Which course of action is most appropriate?
13A company is running a microservices application on a Kubernetes cluster. They have noticed that one of the services, 'payment-api', is experiencing intermittent high latency. The team wants to identify the root cause without modifying the application code. Which approach should they take?
14Which TWO of the following are recommended practices for achieving observability in a Kubernetes cluster?
15Drag and drop the steps to perform a backup of etcd in a Kubernetes cluster into the correct order.
16Match each Kubernetes security concept to its definition.
17Which of the following is a core component of the three pillars of observability?
18What is the primary purpose of Prometheus in cloud native observability?
19Which command retrieves logs from a specific container named 'sidecar' in a multi-container pod?
20In OpenTelemetry, what is the purpose of the Collector component?
21Which Prometheus metric type is best suited to count the number of HTTP requests received?
22What is the purpose of Alertmanager in Prometheus?
23Which tool is primarily used for distributed tracing in cloud native environments?
24What is context propagation in distributed tracing?
25Which component of the metrics-server provides resource metrics like CPU and memory usage?
26What does SLA stand for in the context of service reliability?
27Which log aggregation tool is designed specifically for Kubernetes and is often used as a lightweight alternative to Fluentd?
28A team wants to implement cost monitoring for their Kubernetes clusters. Which approach is most effective?
29Which TWO are pillars of observability? (Select two.)
30Which THREE are responsibilities of the OpenTelemetry project? (Select three.)
31Which TWO are components of a distributed trace? (Select two.)
32Which of the following is the correct definition of a Service Level Indicator (SLI)?
33What is the primary purpose of structured logging?
34Which Prometheus metric type is best suited for counting the total number of HTTP requests received by a service?
35A developer wants to view the logs of a specific container named 'sidecar' in a pod called 'web-app'. Which command should they use?
36Which component is responsible for aggregating metrics from Kubernetes nodes and exposing them to the metrics API?
37In the context of distributed tracing, what is a 'span'?
38Which tool is specifically designed for log aggregation and is built by Grafana Labs as a lightweight, cost-effective alternative to traditional log systems?
39A team wants to ensure that at least 99.9% of all requests to their application complete within 500ms over a 30-day window. How should this requirement be classified?
40Which open-source project provides a unified standard for collecting and exporting telemetry data (metrics, logs, and traces) from applications?
41In Prometheus, what is the purpose of the Alertmanager component?
42When using OpenTelemetry, what is the role of the 'Collector'?
43A company uses Prometheus for monitoring and wants to alert when the average CPU usage over 5 minutes exceeds 80%. Which PromQL query would correctly define this alert rule?
44Which TWO of the following are valid Prometheus metric types?
45Which THREE of the following are components of the OpenTelemetry project?
46Which TWO of the following are examples of context propagation mechanisms used in distributed tracing?
47Which of the following is considered one of the three pillars of observability?
48A DevOps team wants to collect logs from all Kubernetes nodes and forward them to a central log storage system. Which tool is specifically designed for lightweight log aggregation and forwarding on Kubernetes nodes?
49A company uses OpenTelemetry to instrument their microservices. They want to ensure that traces from one service can be correlated with those from another service across network calls. Which OpenTelemetry concept enables this correlation?
50Which Prometheus metric type is used to represent a value that can increase or decrease over time, such as memory usage?
51An application is instrumented with OpenTelemetry to export traces to Jaeger. The team notices that some traces are incomplete. What is the most likely cause?
52A team wants to set up alerts when a Kubernetes pod consumes more than 90% of its memory limit for over 5 minutes. They use Prometheus and Alertmanager. Which Prometheus query would trigger an alert for a specific pod named 'web-app' in the 'default' namespace?
53Which component of the OpenTelemetry architecture is responsible for receiving data from instrumented applications and processing it before export?
54What is the purpose of the metrics-server in Kubernetes?
55A team wants to visualize metrics from Prometheus in a dashboard. Which tool is commonly used for this purpose?
56A company defines an SLO that 99.9% of requests to a service should complete in under 200ms. Which metric type is used to measure this SLO?
57Which tool is specifically designed for distributed tracing and is a Cloud Native Computing Foundation (CNCF) graduated project?
58What does the 'kubectl logs' command retrieve?
59Which TWO of the following are valid Prometheus metric types? (Select two)
60Which THREE of the following are components of the OpenTelemetry project? (Select three)
61Which TWO of the following are common log aggregation tools used in Kubernetes environments? (Select two)
62Which of the following is NOT one of the three pillars of observability?
63What is the primary purpose of structured logging?
64Which tool is commonly used for log aggregation in Kubernetes and is designed to be lightweight?
65A developer wants to view the logs of a specific container named 'sidecar' inside a pod named 'app-pod'. Which command should they use?
66What type of Prometheus metric is best suited to count the total number of HTTP requests received by a service?
67Which of the following is true about Prometheus's pull-based model for collecting metrics?
68What is the primary role of the OpenTelemetry Collector?
69In distributed tracing, what is a 'span'?
70Which tool is specifically designed for distributed tracing and was originally developed by Uber?
71An SRE team defines an SLO that 99.9% of requests to a service should complete in under 500ms over a 30-day rolling window. If the service receives 10 million requests in a month, what is the maximum number of requests that can exceed the latency threshold while still meeting the SLO?
72In PromQL, which function would you use to calculate the per-second rate of increase of a counter over a specified time window?
73What is the main advantage of using OpenTelemetry over vendor-specific instrumentation libraries?
74Which TWO of the following are valid Prometheus metric types? (Select two.)
75Which THREE of the following are benefits of using a service mesh for observability? (Select three.)
76Which TWO of the following are valid components of the Alertmanager configuration? (Select two.)
77Which of the following is NOT one of the three pillars of observability in cloud-native environments?
78A DevOps team wants to collect and forward logs from all nodes in a Kubernetes cluster to a centralized logging backend. Which component is specifically designed for lightweight log collection and forwarding?
79A Prometheus alert rule fires when the error rate exceeds 5% for 5 minutes. The alert is sent to Alertmanager. What must be configured in Alertmanager to ensure the alert is deduplicated, grouped, and routed to the correct team?
80In OpenTelemetry, which component is responsible for receiving, processing, and exporting telemetry data from multiple sources?
81Which TWO of the following are Prometheus metric types? (Select two.)
82Which THREE of the following are core components of the OpenTelemetry specification? (Select three.)
83Which TWO of the following tools are commonly used for distributed tracing in cloud-native environments? (Select two.)
84Which TWO of the following are valid PromQL functions? (Select two.)
85Which THREE of the following are important considerations when defining SLOs (Service Level Objectives)? (Select three.)
86Which THREE of the following are benefits of structured logging? (Select three.)
The Cloud Native Observability domain covers the key concepts tested in this area of the KCNA exam blueprint published by CNCF. Courseiva provides free domain-focused practice, mock exams, missed-question review, and readiness tracking across all KCNA domains — no account required.
The Courseiva KCNA question bank contains 86 questions in the Cloud Native Observability domain. Click any question to see the full explanation and answer breakdown.
Start with a 10-question focused session to identify your baseline accuracy in this domain. Read every explanation — even for questions you answer correctly — to understand the reasoning. Once you score consistently above 80%, move to a 20–30 question session to confirm depth before moving to the next domain.
Yes — the session launcher on this page draws questions exclusively from the Cloud Native Observability domain. Choose 10, 20, 30, or 50 questions for a focused session, or click individual questions to review them one by one.
Save your results, see per-domain analytics, and get readiness scores — free, for every certification.
Sign Up FreeFree forever · Every certification included