How many Cloud Native Observability questions are on the KCNA exam?

The Cloud Native Observability domain is one of the weighted domains on the KCNA exam. The Courseiva question bank has 86 practice questions for this domain.

Free KCNA Cloud Native Observability Practice Questions (2026)

Q: How can I practice Cloud Native Observability questions for KCNA?

Click any of the 86 questions listed on this page to see the full question and explanation, or use the session launcher to start a focused practice session of 10, 20, 30 or 50 questions drawn only from the Cloud Native Observability domain.

Practice Cloud Native Observability questions

10Q 20Q 30Q 50Q

All KCNA Cloud Native Observability questions (86)

Start session

Click any question to see the full explanation and answer options, or start a focused practice session above.

A DevOps team notices that a microservice is returning 503 errors intermittently. The service runs in Kubernetes and uses a liveness probe. The team wants to understand the root cause without restarting the pod. Which observability approach should they use first?

A platform team is designing a monitoring strategy for a multi-tenant Kubernetes cluster. Each tenant runs workloads in separate namespaces. The team needs to ensure tenant isolation while providing aggregated cluster-wide dashboards. Which approach best meets these requirements?

A Kubernetes administrator is troubleshooting a pod that is stuck in CrashLoopBackOff. The pod's restart count is increasing. Which initial step should the administrator take to diagnose the issue?

An organization uses Prometheus and Grafana for monitoring. They want to alert when the 99th percentile of request latency exceeds 500ms for more than 5 minutes. Which PromQL query should they use in the alert rule?

Which TWO of the following are best practices for structuring log output in cloud-native applications to maximize observability?

Which THREE of the following are valid use cases for distributed tracing in a microservices architecture?

A company runs a Kubernetes cluster with 50 worker nodes, each hosting multiple microservices. They use Prometheus for metrics collection and Grafana for dashboards. Recently, the Prometheus server has been experiencing out-of-memory (OOM) kills during peak hours, causing gaps in metric collection. The cluster has a dedicated monitoring namespace. The team has already increased the Prometheus pod's memory limits to 8GB, but OOMs still occur. The metrics retention is set to 15 days. The cardinality of certain metrics (e.g., HTTP request labels with user IDs) is very high. The team needs to resolve the OOM issue without losing critical alerting capability for at least the last 7 days of data. Which action should they take first?

A company deploys a microservice application on Kubernetes. They notice that one of the services is returning 5xx errors intermittently. Which observability tool should they use to correlate the errors with resource usage across all pods of that service?

A platform team wants to implement observability for a Kubernetes cluster running 500+ microservices. They need to reduce the cost of storing logs while retaining the ability to search for specific error patterns. Which strategy best achieves this?

A developer wants to monitor the health of a Kubernetes deployment by checking if the number of ready replicas matches the desired replicas. Which metric from kube-state-metrics should they query?

Which TWO of the following are best practices for implementing observability in a cloud-native environment?

You are an SRE managing a Kubernetes cluster with 200 nodes and 10,000 pods. The cluster runs a critical payment processing application. Users report that transactions are occasionally failing with a 'timeout' error. You have Prometheus and Grafana set up for monitoring, and you use Fluentd with Elasticsearch for logging. You notice that during peak hours, the CPU usage of the payment service pods spikes to 90%, but memory usage remains stable. The pod restart count is low. You also see that the response time of the payment service increases significantly during these spikes. You need to identify the root cause and propose a fix. Which course of action is most appropriate?

A company is running a microservices application on a Kubernetes cluster. They have noticed that one of the services, 'payment-api', is experiencing intermittent high latency. The team wants to identify the root cause without modifying the application code. Which approach should they take?

Which TWO of the following are recommended practices for achieving observability in a Kubernetes cluster?

Drag and drop the steps to perform a backup of etcd in a Kubernetes cluster into the correct order.

Match each Kubernetes security concept to its definition.

Which of the following is a core component of the three pillars of observability?

What is the primary purpose of Prometheus in cloud native observability?

Which command retrieves logs from a specific container named 'sidecar' in a multi-container pod?

In OpenTelemetry, what is the purpose of the Collector component?

Which Prometheus metric type is best suited to count the number of HTTP requests received?

What is the purpose of Alertmanager in Prometheus?

Which tool is primarily used for distributed tracing in cloud native environments?

What is context propagation in distributed tracing?

Which component of the metrics-server provides resource metrics like CPU and memory usage?

What does SLA stand for in the context of service reliability?

Which log aggregation tool is designed specifically for Kubernetes and is often used as a lightweight alternative to Fluentd?

A team wants to implement cost monitoring for their Kubernetes clusters. Which approach is most effective?

Which TWO are pillars of observability? (Select two.)

Which THREE are responsibilities of the OpenTelemetry project? (Select three.)

Which TWO are components of a distributed trace? (Select two.)

Which of the following is the correct definition of a Service Level Indicator (SLI)?

What is the primary purpose of structured logging?

Which Prometheus metric type is best suited for counting the total number of HTTP requests received by a service?

A developer wants to view the logs of a specific container named 'sidecar' in a pod called 'web-app'. Which command should they use?

Which component is responsible for aggregating metrics from Kubernetes nodes and exposing them to the metrics API?

In the context of distributed tracing, what is a 'span'?

Which tool is specifically designed for log aggregation and is built by Grafana Labs as a lightweight, cost-effective alternative to traditional log systems?

A team wants to ensure that at least 99.9% of all requests to their application complete within 500ms over a 30-day window. How should this requirement be classified?

Which open-source project provides a unified standard for collecting and exporting telemetry data (metrics, logs, and traces) from applications?

In Prometheus, what is the purpose of the Alertmanager component?

When using OpenTelemetry, what is the role of the 'Collector'?

A company uses Prometheus for monitoring and wants to alert when the average CPU usage over 5 minutes exceeds 80%. Which PromQL query would correctly define this alert rule?

Which TWO of the following are valid Prometheus metric types?

Which THREE of the following are components of the OpenTelemetry project?

Which TWO of the following are examples of context propagation mechanisms used in distributed tracing?

Which of the following is considered one of the three pillars of observability?

A DevOps team wants to collect logs from all Kubernetes nodes and forward them to a central log storage system. Which tool is specifically designed for lightweight log aggregation and forwarding on Kubernetes nodes?

A company uses OpenTelemetry to instrument their microservices. They want to ensure that traces from one service can be correlated with those from another service across network calls. Which OpenTelemetry concept enables this correlation?

Which Prometheus metric type is used to represent a value that can increase or decrease over time, such as memory usage?

An application is instrumented with OpenTelemetry to export traces to Jaeger. The team notices that some traces are incomplete. What is the most likely cause?

A team wants to set up alerts when a Kubernetes pod consumes more than 90% of its memory limit for over 5 minutes. They use Prometheus and Alertmanager. Which Prometheus query would trigger an alert for a specific pod named 'web-app' in the 'default' namespace?

Which component of the OpenTelemetry architecture is responsible for receiving data from instrumented applications and processing it before export?

What is the purpose of the metrics-server in Kubernetes?

A team wants to visualize metrics from Prometheus in a dashboard. Which tool is commonly used for this purpose?

A company defines an SLO that 99.9% of requests to a service should complete in under 200ms. Which metric type is used to measure this SLO?

Which tool is specifically designed for distributed tracing and is a Cloud Native Computing Foundation (CNCF) graduated project?

What does the 'kubectl logs' command retrieve?

Which TWO of the following are valid Prometheus metric types? (Select two)

Which THREE of the following are components of the OpenTelemetry project? (Select three)

Which TWO of the following are common log aggregation tools used in Kubernetes environments? (Select two)

Which of the following is NOT one of the three pillars of observability?

What is the primary purpose of structured logging?

Which tool is commonly used for log aggregation in Kubernetes and is designed to be lightweight?

A developer wants to view the logs of a specific container named 'sidecar' inside a pod named 'app-pod'. Which command should they use?

What type of Prometheus metric is best suited to count the total number of HTTP requests received by a service?

Which of the following is true about Prometheus's pull-based model for collecting metrics?

What is the primary role of the OpenTelemetry Collector?

In distributed tracing, what is a 'span'?

Which tool is specifically designed for distributed tracing and was originally developed by Uber?

An SRE team defines an SLO that 99.9% of requests to a service should complete in under 500ms over a 30-day rolling window. If the service receives 10 million requests in a month, what is the maximum number of requests that can exceed the latency threshold while still meeting the SLO?

In PromQL, which function would you use to calculate the per-second rate of increase of a counter over a specified time window?

What is the main advantage of using OpenTelemetry over vendor-specific instrumentation libraries?

Which TWO of the following are valid Prometheus metric types? (Select two.)

Which THREE of the following are benefits of using a service mesh for observability? (Select three.)

Which TWO of the following are valid components of the Alertmanager configuration? (Select two.)

Which of the following is NOT one of the three pillars of observability in cloud-native environments?

A DevOps team wants to collect and forward logs from all nodes in a Kubernetes cluster to a centralized logging backend. Which component is specifically designed for lightweight log collection and forwarding?

A Prometheus alert rule fires when the error rate exceeds 5% for 5 minutes. The alert is sent to Alertmanager. What must be configured in Alertmanager to ensure the alert is deduplicated, grouped, and routed to the correct team?

In OpenTelemetry, which component is responsible for receiving, processing, and exporting telemetry data from multiple sources?

Which TWO of the following are Prometheus metric types? (Select two.)

Which THREE of the following are core components of the OpenTelemetry specification? (Select three.)

Which TWO of the following tools are commonly used for distributed tracing in cloud-native environments? (Select two.)

Which TWO of the following are valid PromQL functions? (Select two.)

Which THREE of the following are important considerations when defining SLOs (Service Level Objectives)? (Select three.)

Which THREE of the following are benefits of structured logging? (Select three.)

Practice all 86 Cloud Native Observability questions

Other KCNA exam domains

Kubernetes Fundamentals Container Orchestration Cloud Native Architecture Cloud Native Application Delivery

Frequently asked questions

What does the Cloud Native Observability domain cover on the KCNA exam?

The Cloud Native Observability domain covers the key concepts tested in this area of the KCNA exam blueprint published by CNCF. Courseiva provides free domain-focused practice, mock exams, missed-question review, and readiness tracking across all KCNA domains — no account required.

How many Cloud Native Observability questions are in the KCNA question bank?

The Courseiva KCNA question bank contains 86 questions in the Cloud Native Observability domain. Click any question to see the full explanation and answer breakdown.

What is the best way to practice Cloud Native Observability for KCNA?

Start with a 10-question focused session to identify your baseline accuracy in this domain. Read every explanation — even for questions you answer correctly — to understand the reasoning. Once you score consistently above 80%, move to a 20–30 question session to confirm depth before moving to the next domain.

Can I practice only Cloud Native Observability questions for KCNA?

Yes — the session launcher on this page draws questions exclusively from the Cloud Native Observability domain. Choose 10, 20, 30, or 50 questions for a focused session, or click individual questions to review them one by one.

Free forever · No credit card required

Track your KCNA domain progress

Save your results, see per-domain analytics, and get readiness scores — free, for every certification.

Free forever · Every certification included