KCNA Cloud Native Observability — All Questions With Answers

Question 1mediummultiple choice

Read the full NAT/PAT explanation →

A DevOps team notices that a microservice is returning 503 errors intermittently. The service runs in Kubernetes and uses a liveness probe. The team wants to understand the root cause without restarting the pod. Which observability approach should they use first?

Question 2hardmultiple choice

Read the full NAT/PAT explanation →

A platform team is designing a monitoring strategy for a multi-tenant Kubernetes cluster. Each tenant runs workloads in separate namespaces. The team needs to ensure tenant isolation while providing aggregated cluster-wide dashboards. Which approach best meets these requirements?

Question 3easymultiple choice

Read the full NAT/PAT explanation →

A Kubernetes administrator is troubleshooting a pod that is stuck in CrashLoopBackOff. The pod's restart count is increasing. Which initial step should the administrator take to diagnose the issue?

Question 4mediummultiple choice

Read the full NAT/PAT explanation →

An organization uses Prometheus and Grafana for monitoring. They want to alert when the 99th percentile of request latency exceeds 500ms for more than 5 minutes. Which PromQL query should they use in the alert rule?

Question 5hardmulti select

Read the full NAT/PAT explanation →

Which TWO of the following are best practices for structuring log output in cloud-native applications to maximize observability?

Question 6mediummulti select

Read the full NAT/PAT explanation →

Which THREE of the following are valid use cases for distributed tracing in a microservices architecture?

Question 7hardmultiple choice

Read the full NAT/PAT explanation →

A company runs a Kubernetes cluster with 50 worker nodes, each hosting multiple microservices. They use Prometheus for metrics collection and Grafana for dashboards. Recently, the Prometheus server has been experiencing out-of-memory (OOM) kills during peak hours, causing gaps in metric collection. The cluster has a dedicated monitoring namespace. The team has already increased the Prometheus pod's memory limits to 8GB, but OOMs still occur. The metrics retention is set to 15 days. The cardinality of certain metrics (e.g., HTTP request labels with user IDs) is very high. The team needs to resolve the OOM issue without losing critical alerting capability for at least the last 7 days of data. Which action should they take first?

Question 8mediummultiple choice

Read the full NAT/PAT explanation →

A company deploys a microservice application on Kubernetes. They notice that one of the services is returning 5xx errors intermittently. Which observability tool should they use to correlate the errors with resource usage across all pods of that service?

Question 9hardmultiple choice

Read the full NAT/PAT explanation →

A platform team wants to implement observability for a Kubernetes cluster running 500+ microservices. They need to reduce the cost of storing logs while retaining the ability to search for specific error patterns. Which strategy best achieves this?

Question 10easymultiple choice

Read the full NAT/PAT explanation →

A developer wants to monitor the health of a Kubernetes deployment by checking if the number of ready replicas matches the desired replicas. Which metric from kube-state-metrics should they query?

Question 11mediummulti select

Read the full NAT/PAT explanation →

Which TWO of the following are best practices for implementing observability in a cloud-native environment?

Question 12hardmultiple choice

Read the full NAT/PAT explanation →

You are an SRE managing a Kubernetes cluster with 200 nodes and 10,000 pods. The cluster runs a critical payment processing application. Users report that transactions are occasionally failing with a 'timeout' error. You have Prometheus and Grafana set up for monitoring, and you use Fluentd with Elasticsearch for logging. You notice that during peak hours, the CPU usage of the payment service pods spikes to 90%, but memory usage remains stable. The pod restart count is low. You also see that the response time of the payment service increases significantly during these spikes. You need to identify the root cause and propose a fix. Which course of action is most appropriate?

Question 13mediummultiple choice

Read the full NAT/PAT explanation →

A company is running a microservices application on a Kubernetes cluster. They have noticed that one of the services, 'payment-api', is experiencing intermittent high latency. The team wants to identify the root cause without modifying the application code. Which approach should they take?

Question 14hardmulti select

Read the full NAT/PAT explanation →

Which TWO of the following are recommended practices for achieving observability in a Kubernetes cluster?

Question 15mediumdrag order

Read the full NAT/PAT explanation →

Drag and drop the steps to perform a backup of etcd in a Kubernetes cluster into the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

1Step 1

2Step 2

3Step 3

4Step 4

5Step 5

Question 16mediummatching

Read the full NAT/PAT explanation →

Match each Kubernetes security concept to its definition.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Identity for processes running in a pod

Role-based access control to authorize API requests

Specifies how groups of pods are allowed to communicate

Deprecated but formerly controlled security-sensitive pod settings

Stores sensitive data like passwords and tokens

Question 17mediummultiple choice

Read the full NAT/PAT explanation →

Which of the following is a core component of the three pillars of observability?

Question 18easymultiple choice

Read the full NAT/PAT explanation →

What is the primary purpose of Prometheus in cloud native observability?

Question 19mediummultiple choice

Read the full NAT/PAT explanation →

Which command retrieves logs from a specific container named 'sidecar' in a multi-container pod?

Question 20hardmultiple choice

Read the full network assurance explanation →

In OpenTelemetry, what is the purpose of the Collector component?

Question 21mediummultiple choice

Read the full NAT/PAT explanation →

Which Prometheus metric type is best suited to count the number of HTTP requests received?

Question 22easymultiple choice

Read the full NAT/PAT explanation →

What is the purpose of Alertmanager in Prometheus?

Question 23mediummultiple choice

Read the full NAT/PAT explanation →

Which tool is primarily used for distributed tracing in cloud native environments?

Question 24hardmultiple choice

Read the full NAT/PAT explanation →

What is context propagation in distributed tracing?

Question 25mediummultiple choice

Read the full NAT/PAT explanation →

Which component of the metrics-server provides resource metrics like CPU and memory usage?

Question 26easymultiple choice

Read the full NAT/PAT explanation →

What does SLA stand for in the context of service reliability?

Question 27mediummultiple choice

Read the full NAT/PAT explanation →

Which log aggregation tool is designed specifically for Kubernetes and is often used as a lightweight alternative to Fluentd?

Question 28hardmultiple choice

Read the full NAT/PAT explanation →

A team wants to implement cost monitoring for their Kubernetes clusters. Which approach is most effective?

Question 29mediummulti select

Read the full NAT/PAT explanation →

Which TWO are pillars of observability? (Select two.)

Question 30hardmulti select

Read the full network assurance explanation →

Which THREE are responsibilities of the OpenTelemetry project? (Select three.)

Question 31mediummulti select

Read the full NAT/PAT explanation →

Which TWO are components of a distributed trace? (Select two.)

Question 32easymultiple choice

Read the full NAT/PAT explanation →

Which of the following is the correct definition of a Service Level Indicator (SLI)?

Question 33easymultiple choice

Read the full NAT/PAT explanation →

What is the primary purpose of structured logging?

Question 34easymultiple choice

Read the full NAT/PAT explanation →

Which Prometheus metric type is best suited for counting the total number of HTTP requests received by a service?

Question 35mediummultiple choice

Read the full NAT/PAT explanation →

A developer wants to view the logs of a specific container named 'sidecar' in a pod called 'web-app'. Which command should they use?

Question 36mediummultiple choice

Read the full NAT/PAT explanation →

Which component is responsible for aggregating metrics from Kubernetes nodes and exposing them to the metrics API?

Question 37mediummultiple choice

Read the full NAT/PAT explanation →

In the context of distributed tracing, what is a 'span'?

Question 38mediummultiple choice

Read the full NAT/PAT explanation →

Which tool is specifically designed for log aggregation and is built by Grafana Labs as a lightweight, cost-effective alternative to traditional log systems?

Question 39mediummultiple choice

Read the full NAT/PAT explanation →

A team wants to ensure that at least 99.9% of all requests to their application complete within 500ms over a 30-day window. How should this requirement be classified?

Question 40mediummultiple choice

Read the full network assurance explanation →

Which open-source project provides a unified standard for collecting and exporting telemetry data (metrics, logs, and traces) from applications?

Question 41hardmultiple choice

Read the full NAT/PAT explanation →

In Prometheus, what is the purpose of the Alertmanager component?

Question 42hardmultiple choice

Read the full network assurance explanation →

When using OpenTelemetry, what is the role of the 'Collector'?

Question 43hardmultiple choice

Read the full NAT/PAT explanation →

A company uses Prometheus for monitoring and wants to alert when the average CPU usage over 5 minutes exceeds 80%. Which PromQL query would correctly define this alert rule?

Question 44mediummulti select

Read the full NAT/PAT explanation →

Which TWO of the following are valid Prometheus metric types?

Question 45hardmulti select

Read the full network assurance explanation →

Which THREE of the following are components of the OpenTelemetry project?

Question 46hardmulti select

Read the full NAT/PAT explanation →

Which TWO of the following are examples of context propagation mechanisms used in distributed tracing?

Question 47easymultiple choice

Read the full NAT/PAT explanation →

Which of the following is considered one of the three pillars of observability?

Question 48mediummultiple choice

Read the full NAT/PAT explanation →

A DevOps team wants to collect logs from all Kubernetes nodes and forward them to a central log storage system. Which tool is specifically designed for lightweight log aggregation and forwarding on Kubernetes nodes?

Question 49hardmultiple choice

Read the full network assurance explanation →

A company uses OpenTelemetry to instrument their microservices. They want to ensure that traces from one service can be correlated with those from another service across network calls. Which OpenTelemetry concept enables this correlation?

Question 50easymultiple choice

Read the full NAT/PAT explanation →

Which Prometheus metric type is used to represent a value that can increase or decrease over time, such as memory usage?

Question 51mediummultiple choice

Read the full network assurance explanation →

An application is instrumented with OpenTelemetry to export traces to Jaeger. The team notices that some traces are incomplete. What is the most likely cause?

Question 52hardmultiple choice

Read the full NAT/PAT explanation →

A team wants to set up alerts when a Kubernetes pod consumes more than 90% of its memory limit for over 5 minutes. They use Prometheus and Alertmanager. Which Prometheus query would trigger an alert for a specific pod named 'web-app' in the 'default' namespace?

Question 53mediummultiple choice

Read the full network assurance explanation →

Which component of the OpenTelemetry architecture is responsible for receiving data from instrumented applications and processing it before export?

Question 54easymultiple choice

Read the full NAT/PAT explanation →

What is the purpose of the metrics-server in Kubernetes?

Question 55mediummultiple choice

Read the full NAT/PAT explanation →

A team wants to visualize metrics from Prometheus in a dashboard. Which tool is commonly used for this purpose?

Question 56hardmultiple choice

Read the full NAT/PAT explanation →

A company defines an SLO that 99.9% of requests to a service should complete in under 200ms. Which metric type is used to measure this SLO?

Question 57mediummultiple choice

Read the full NAT/PAT explanation →

Which tool is specifically designed for distributed tracing and is a Cloud Native Computing Foundation (CNCF) graduated project?

Question 58easymultiple choice

Read the full NAT/PAT explanation →

What does the 'kubectl logs' command retrieve?

Question 59mediummulti select

Read the full NAT/PAT explanation →

Which TWO of the following are valid Prometheus metric types? (Select two)

Question 60hardmulti select

Read the full network assurance explanation →

Which THREE of the following are components of the OpenTelemetry project? (Select three)

Question 61mediummulti select

Read the full NAT/PAT explanation →

Which TWO of the following are common log aggregation tools used in Kubernetes environments? (Select two)

Question 62easymultiple choice

Read the full NAT/PAT explanation →

Which of the following is NOT one of the three pillars of observability?

Question 63easymultiple choice

Read the full NAT/PAT explanation →

What is the primary purpose of structured logging?

Question 64easymultiple choice

Read the full NAT/PAT explanation →

Which tool is commonly used for log aggregation in Kubernetes and is designed to be lightweight?

Question 65mediummultiple choice

Read the full NAT/PAT explanation →

A developer wants to view the logs of a specific container named 'sidecar' inside a pod named 'app-pod'. Which command should they use?

Question 66mediummultiple choice

Read the full NAT/PAT explanation →

What type of Prometheus metric is best suited to count the total number of HTTP requests received by a service?

Question 67mediummultiple choice

Read the full NAT/PAT explanation →

Which of the following is true about Prometheus's pull-based model for collecting metrics?

Question 68mediummultiple choice

Read the full network assurance explanation →

What is the primary role of the OpenTelemetry Collector?

Question 69mediummultiple choice

Read the full NAT/PAT explanation →

In distributed tracing, what is a 'span'?

Question 70mediummultiple choice

Read the full NAT/PAT explanation →

Which tool is specifically designed for distributed tracing and was originally developed by Uber?

Question 71hardmultiple choice

Read the full NAT/PAT explanation →

An SRE team defines an SLO that 99.9% of requests to a service should complete in under 500ms over a 30-day rolling window. If the service receives 10 million requests in a month, what is the maximum number of requests that can exceed the latency threshold while still meeting the SLO?

Question 72hardmultiple choice

Read the full NAT/PAT explanation →

In PromQL, which function would you use to calculate the per-second rate of increase of a counter over a specified time window?

Question 73hardmultiple choice

Read the full network assurance explanation →

What is the main advantage of using OpenTelemetry over vendor-specific instrumentation libraries?

Question 74mediummulti select

Read the full NAT/PAT explanation →

Which TWO of the following are valid Prometheus metric types? (Select two.)

Question 75mediummulti select

Read the full NAT/PAT explanation →

Which THREE of the following are benefits of using a service mesh for observability? (Select three.)

Question 76hardmulti select

Read the full NAT/PAT explanation →

Which TWO of the following are valid components of the Alertmanager configuration? (Select two.)

Question 77easymultiple choice

Read the full NAT/PAT explanation →

Which of the following is NOT one of the three pillars of observability in cloud-native environments?

Question 78mediummultiple choice

Read the full NAT/PAT explanation →

A DevOps team wants to collect and forward logs from all nodes in a Kubernetes cluster to a centralized logging backend. Which component is specifically designed for lightweight log collection and forwarding?

Question 79hardmultiple choice

Read the full NAT/PAT explanation →

A Prometheus alert rule fires when the error rate exceeds 5% for 5 minutes. The alert is sent to Alertmanager. What must be configured in Alertmanager to ensure the alert is deduplicated, grouped, and routed to the correct team?

Question 80mediummultiple choice

Read the full network assurance explanation →

In OpenTelemetry, which component is responsible for receiving, processing, and exporting telemetry data from multiple sources?

Question 81mediummulti select

Read the full NAT/PAT explanation →

Which TWO of the following are Prometheus metric types? (Select two.)

Question 82hardmulti select

Read the full network assurance explanation →

Which THREE of the following are core components of the OpenTelemetry specification? (Select three.)

Question 83easymulti select

Read the full NAT/PAT explanation →

Which TWO of the following tools are commonly used for distributed tracing in cloud-native environments? (Select two.)

Question 84mediummulti select

Read the full NAT/PAT explanation →

Which TWO of the following are valid PromQL functions? (Select two.)

Question 85hardmulti select

Read the full NAT/PAT explanation →

Which THREE of the following are important considerations when defining SLOs (Service Level Objectives)? (Select three.)

Question 86mediummulti select

Read the full NAT/PAT explanation →

Which THREE of the following are benefits of structured logging? (Select three.)

Question 1mediummultiple choice

Read the full NAT/PAT explanation →

A DevOps team notices that a microservice is returning 503 errors intermittently. The service runs in Kubernetes and uses a liveness probe. The team wants to understand the root cause without restarting the pod. Which observability approach should they use first?

Question 2hardmultiple choice

Read the full NAT/PAT explanation →

A platform team is designing a monitoring strategy for a multi-tenant Kubernetes cluster. Each tenant runs workloads in separate namespaces. The team needs to ensure tenant isolation while providing aggregated cluster-wide dashboards. Which approach best meets these requirements?

Question 3easymultiple choice

Read the full NAT/PAT explanation →

A Kubernetes administrator is troubleshooting a pod that is stuck in CrashLoopBackOff. The pod's restart count is increasing. Which initial step should the administrator take to diagnose the issue?

Question 4mediummultiple choice

Read the full NAT/PAT explanation →

An organization uses Prometheus and Grafana for monitoring. They want to alert when the 99th percentile of request latency exceeds 500ms for more than 5 minutes. Which PromQL query should they use in the alert rule?

Question 5hardmulti select

Read the full NAT/PAT explanation →

Which TWO of the following are best practices for structuring log output in cloud-native applications to maximize observability?

Question 6mediummulti select

Read the full NAT/PAT explanation →

Which THREE of the following are valid use cases for distributed tracing in a microservices architecture?

Question 7hardmultiple choice

Read the full NAT/PAT explanation →

A company runs a Kubernetes cluster with 50 worker nodes, each hosting multiple microservices. They use Prometheus for metrics collection and Grafana for dashboards. Recently, the Prometheus server has been experiencing out-of-memory (OOM) kills during peak hours, causing gaps in metric collection. The cluster has a dedicated monitoring namespace. The team has already increased the Prometheus pod's memory limits to 8GB, but OOMs still occur. The metrics retention is set to 15 days. The cardinality of certain metrics (e.g., HTTP request labels with user IDs) is very high. The team needs to resolve the OOM issue without losing critical alerting capability for at least the last 7 days of data. Which action should they take first?

Question 8mediummultiple choice

Read the full NAT/PAT explanation →

A company deploys a microservice application on Kubernetes. They notice that one of the services is returning 5xx errors intermittently. Which observability tool should they use to correlate the errors with resource usage across all pods of that service?

Question 9hardmultiple choice

Read the full NAT/PAT explanation →

A platform team wants to implement observability for a Kubernetes cluster running 500+ microservices. They need to reduce the cost of storing logs while retaining the ability to search for specific error patterns. Which strategy best achieves this?

Question 10easymultiple choice

Read the full NAT/PAT explanation →

A developer wants to monitor the health of a Kubernetes deployment by checking if the number of ready replicas matches the desired replicas. Which metric from kube-state-metrics should they query?

Question 11mediummulti select

Read the full NAT/PAT explanation →

Which TWO of the following are best practices for implementing observability in a cloud-native environment?

Question 12hardmultiple choice

Read the full NAT/PAT explanation →

You are an SRE managing a Kubernetes cluster with 200 nodes and 10,000 pods. The cluster runs a critical payment processing application. Users report that transactions are occasionally failing with a 'timeout' error. You have Prometheus and Grafana set up for monitoring, and you use Fluentd with Elasticsearch for logging. You notice that during peak hours, the CPU usage of the payment service pods spikes to 90%, but memory usage remains stable. The pod restart count is low. You also see that the response time of the payment service increases significantly during these spikes. You need to identify the root cause and propose a fix. Which course of action is most appropriate?

Question 13mediummultiple choice

Read the full NAT/PAT explanation →

A company is running a microservices application on a Kubernetes cluster. They have noticed that one of the services, 'payment-api', is experiencing intermittent high latency. The team wants to identify the root cause without modifying the application code. Which approach should they take?

Question 14hardmulti select

Read the full NAT/PAT explanation →

Which TWO of the following are recommended practices for achieving observability in a Kubernetes cluster?

Question 15mediumdrag order

Read the full NAT/PAT explanation →

Drag and drop the steps to perform a backup of etcd in a Kubernetes cluster into the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

1Step 1

2Step 2

3Step 3

4Step 4

5Step 5

Question 16mediummatching

Read the full NAT/PAT explanation →

Match each Kubernetes security concept to its definition.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Identity for processes running in a pod

Role-based access control to authorize API requests

Specifies how groups of pods are allowed to communicate

Deprecated but formerly controlled security-sensitive pod settings

Stores sensitive data like passwords and tokens

Question 17mediummultiple choice

Read the full NAT/PAT explanation →

Which of the following is a core component of the three pillars of observability?

Question 18easymultiple choice

Read the full NAT/PAT explanation →

What is the primary purpose of Prometheus in cloud native observability?

Question 19mediummultiple choice

Read the full NAT/PAT explanation →

Which command retrieves logs from a specific container named 'sidecar' in a multi-container pod?

Question 20hardmultiple choice

Read the full network assurance explanation →

In OpenTelemetry, what is the purpose of the Collector component?

Question 21mediummultiple choice

Read the full NAT/PAT explanation →

Which Prometheus metric type is best suited to count the number of HTTP requests received?

Question 22easymultiple choice

Read the full NAT/PAT explanation →

What is the purpose of Alertmanager in Prometheus?

Question 23mediummultiple choice

Read the full NAT/PAT explanation →

Which tool is primarily used for distributed tracing in cloud native environments?

Question 24hardmultiple choice

Read the full NAT/PAT explanation →

What is context propagation in distributed tracing?

Question 25mediummultiple choice

Read the full NAT/PAT explanation →

Which component of the metrics-server provides resource metrics like CPU and memory usage?

Question 26easymultiple choice

Read the full NAT/PAT explanation →

What does SLA stand for in the context of service reliability?

Question 27mediummultiple choice

Read the full NAT/PAT explanation →

Which log aggregation tool is designed specifically for Kubernetes and is often used as a lightweight alternative to Fluentd?

Question 28hardmultiple choice

Read the full NAT/PAT explanation →

A team wants to implement cost monitoring for their Kubernetes clusters. Which approach is most effective?

Question 29mediummulti select

Read the full NAT/PAT explanation →

Which TWO are pillars of observability? (Select two.)

Question 30hardmulti select

Read the full network assurance explanation →

Which THREE are responsibilities of the OpenTelemetry project? (Select three.)

Question 31mediummulti select

Read the full NAT/PAT explanation →

Which TWO are components of a distributed trace? (Select two.)

Question 32easymultiple choice

Read the full NAT/PAT explanation →

Which of the following is the correct definition of a Service Level Indicator (SLI)?

Question 33easymultiple choice

Read the full NAT/PAT explanation →

What is the primary purpose of structured logging?

Question 34easymultiple choice

Read the full NAT/PAT explanation →

Which Prometheus metric type is best suited for counting the total number of HTTP requests received by a service?

Question 35mediummultiple choice

Read the full NAT/PAT explanation →

A developer wants to view the logs of a specific container named 'sidecar' in a pod called 'web-app'. Which command should they use?

Question 36mediummultiple choice

Read the full NAT/PAT explanation →

Which component is responsible for aggregating metrics from Kubernetes nodes and exposing them to the metrics API?

Question 37mediummultiple choice

Read the full NAT/PAT explanation →

In the context of distributed tracing, what is a 'span'?

Question 38mediummultiple choice

Read the full NAT/PAT explanation →

Which tool is specifically designed for log aggregation and is built by Grafana Labs as a lightweight, cost-effective alternative to traditional log systems?

Question 39mediummultiple choice

Read the full NAT/PAT explanation →

A team wants to ensure that at least 99.9% of all requests to their application complete within 500ms over a 30-day window. How should this requirement be classified?

Question 40mediummultiple choice

Read the full network assurance explanation →

Which open-source project provides a unified standard for collecting and exporting telemetry data (metrics, logs, and traces) from applications?

Question 41hardmultiple choice

Read the full NAT/PAT explanation →

In Prometheus, what is the purpose of the Alertmanager component?

Question 42hardmultiple choice

Read the full network assurance explanation →

When using OpenTelemetry, what is the role of the 'Collector'?

Question 43hardmultiple choice

Read the full NAT/PAT explanation →

A company uses Prometheus for monitoring and wants to alert when the average CPU usage over 5 minutes exceeds 80%. Which PromQL query would correctly define this alert rule?

Question 44mediummulti select

Read the full NAT/PAT explanation →

Which TWO of the following are valid Prometheus metric types?

Question 45hardmulti select

Read the full network assurance explanation →

Which THREE of the following are components of the OpenTelemetry project?

Question 46hardmulti select

Read the full NAT/PAT explanation →

Which TWO of the following are examples of context propagation mechanisms used in distributed tracing?

Question 47easymultiple choice

Read the full NAT/PAT explanation →

Which of the following is considered one of the three pillars of observability?

Question 48mediummultiple choice

Read the full NAT/PAT explanation →

A DevOps team wants to collect logs from all Kubernetes nodes and forward them to a central log storage system. Which tool is specifically designed for lightweight log aggregation and forwarding on Kubernetes nodes?

Question 49hardmultiple choice

Read the full network assurance explanation →

A company uses OpenTelemetry to instrument their microservices. They want to ensure that traces from one service can be correlated with those from another service across network calls. Which OpenTelemetry concept enables this correlation?

Question 50easymultiple choice

Read the full NAT/PAT explanation →

Which Prometheus metric type is used to represent a value that can increase or decrease over time, such as memory usage?

Question 51mediummultiple choice

Read the full network assurance explanation →

An application is instrumented with OpenTelemetry to export traces to Jaeger. The team notices that some traces are incomplete. What is the most likely cause?

Question 52hardmultiple choice

Read the full NAT/PAT explanation →

A team wants to set up alerts when a Kubernetes pod consumes more than 90% of its memory limit for over 5 minutes. They use Prometheus and Alertmanager. Which Prometheus query would trigger an alert for a specific pod named 'web-app' in the 'default' namespace?

Question 53mediummultiple choice

Read the full network assurance explanation →

Which component of the OpenTelemetry architecture is responsible for receiving data from instrumented applications and processing it before export?

Question 54easymultiple choice

Read the full NAT/PAT explanation →

What is the purpose of the metrics-server in Kubernetes?

Question 55mediummultiple choice

Read the full NAT/PAT explanation →

A team wants to visualize metrics from Prometheus in a dashboard. Which tool is commonly used for this purpose?

Question 56hardmultiple choice

Read the full NAT/PAT explanation →

A company defines an SLO that 99.9% of requests to a service should complete in under 200ms. Which metric type is used to measure this SLO?

Question 57mediummultiple choice

Read the full NAT/PAT explanation →

Which tool is specifically designed for distributed tracing and is a Cloud Native Computing Foundation (CNCF) graduated project?

Question 58easymultiple choice

Read the full NAT/PAT explanation →

What does the 'kubectl logs' command retrieve?

Question 59mediummulti select

Read the full NAT/PAT explanation →

Which TWO of the following are valid Prometheus metric types? (Select two)

Question 60hardmulti select

Read the full network assurance explanation →

Which THREE of the following are components of the OpenTelemetry project? (Select three)

Question 61mediummulti select

Read the full NAT/PAT explanation →

Which TWO of the following are common log aggregation tools used in Kubernetes environments? (Select two)

Question 62easymultiple choice

Read the full NAT/PAT explanation →

Which of the following is NOT one of the three pillars of observability?

Question 63easymultiple choice

Read the full NAT/PAT explanation →

What is the primary purpose of structured logging?

Question 64easymultiple choice

Read the full NAT/PAT explanation →

Which tool is commonly used for log aggregation in Kubernetes and is designed to be lightweight?

Question 65mediummultiple choice

Read the full NAT/PAT explanation →

A developer wants to view the logs of a specific container named 'sidecar' inside a pod named 'app-pod'. Which command should they use?

Question 66mediummultiple choice

Read the full NAT/PAT explanation →

What type of Prometheus metric is best suited to count the total number of HTTP requests received by a service?

Question 67mediummultiple choice

Read the full NAT/PAT explanation →

Which of the following is true about Prometheus's pull-based model for collecting metrics?

Question 68mediummultiple choice

Read the full network assurance explanation →

What is the primary role of the OpenTelemetry Collector?

Question 69mediummultiple choice

Read the full NAT/PAT explanation →

In distributed tracing, what is a 'span'?

Question 70mediummultiple choice

Read the full NAT/PAT explanation →

Which tool is specifically designed for distributed tracing and was originally developed by Uber?

Question 71hardmultiple choice

Read the full NAT/PAT explanation →

An SRE team defines an SLO that 99.9% of requests to a service should complete in under 500ms over a 30-day rolling window. If the service receives 10 million requests in a month, what is the maximum number of requests that can exceed the latency threshold while still meeting the SLO?

Question 72hardmultiple choice

Read the full NAT/PAT explanation →

In PromQL, which function would you use to calculate the per-second rate of increase of a counter over a specified time window?

Question 73hardmultiple choice

Read the full network assurance explanation →

What is the main advantage of using OpenTelemetry over vendor-specific instrumentation libraries?

Question 74mediummulti select

Read the full NAT/PAT explanation →

Which TWO of the following are valid Prometheus metric types? (Select two.)

Question 75mediummulti select

Read the full NAT/PAT explanation →

Which THREE of the following are benefits of using a service mesh for observability? (Select three.)

Question 76hardmulti select

Read the full NAT/PAT explanation →

Which TWO of the following are valid components of the Alertmanager configuration? (Select two.)

Question 77easymultiple choice

Read the full NAT/PAT explanation →

Which of the following is NOT one of the three pillars of observability in cloud-native environments?

Question 78mediummultiple choice

Read the full NAT/PAT explanation →

A DevOps team wants to collect and forward logs from all nodes in a Kubernetes cluster to a centralized logging backend. Which component is specifically designed for lightweight log collection and forwarding?

Question 79hardmultiple choice

Read the full NAT/PAT explanation →

A Prometheus alert rule fires when the error rate exceeds 5% for 5 minutes. The alert is sent to Alertmanager. What must be configured in Alertmanager to ensure the alert is deduplicated, grouped, and routed to the correct team?

Question 80mediummultiple choice

Read the full network assurance explanation →

In OpenTelemetry, which component is responsible for receiving, processing, and exporting telemetry data from multiple sources?

Question 81mediummulti select

Read the full NAT/PAT explanation →

Which TWO of the following are Prometheus metric types? (Select two.)

Question 82hardmulti select

Read the full network assurance explanation →

Which THREE of the following are core components of the OpenTelemetry specification? (Select three.)

Question 83easymulti select

Read the full NAT/PAT explanation →

Which TWO of the following tools are commonly used for distributed tracing in cloud-native environments? (Select two.)

Question 84mediummulti select

Read the full NAT/PAT explanation →

Which TWO of the following are valid PromQL functions? (Select two.)

Question 85hardmulti select

Read the full NAT/PAT explanation →

Which THREE of the following are important considerations when defining SLOs (Service Level Objectives)? (Select three.)

Question 86mediummulti select

Read the full NAT/PAT explanation →