CCNA Cloud Native Observability Questions — Page 2 of 2

MCQmedium

A company is running a microservices application on a Kubernetes cluster. They have noticed that one of the services, 'payment-api', is experiencing intermittent high latency. The team wants to identify the root cause without modifying the application code. Which approach should they take?

A.Monitor CPU and memory metrics from kube-state-metrics and correlate with latency.

B.Increase log verbosity for all services and search for error messages.

C.Implement distributed tracing using tools like Jaeger or Zipkin to trace requests across services.

D.Check node-level metrics using Prometheus Node Exporter.

AnswerC

Distributed tracing tracks request flow and identifies slow components.

Why this answer

Option C is correct because distributed tracing with tools like Jaeger or Zipkin allows you to follow a single request as it traverses multiple microservices, identifying exactly which service or call introduces latency. This approach does not require code changes (if the service mesh or sidecar proxy handles instrumentation) and is specifically designed to pinpoint performance bottlenecks in distributed systems, unlike CPU/memory metrics or log analysis which cannot trace a request's end-to-end path.

Exam trap

CNCF often tests the distinction between observability tools that provide request-level context (distributed tracing) versus aggregate resource metrics (kube-state-metrics, Node Exporter) or unstructured logs, leading candidates to mistakenly choose CPU/memory correlation or log analysis for pinpointing intermittent latency in a microservices architecture.

How to eliminate wrong answers

Option A is wrong because kube-state-metrics provides resource utilization data (CPU, memory) per pod or container, but high latency in a microservice is often caused by network delays, database contention, or upstream service failures—not necessarily correlated with local resource usage; correlation does not imply causation and cannot trace the request path. Option B is wrong because increasing log verbosity for all services generates massive volumes of unstructured data and relies on error messages that may not appear during intermittent latency spikes; logs lack the context of a specific request's journey across services, making root cause identification inefficient and often impossible. Option D is wrong because node-level metrics from Prometheus Node Exporter only show host-level resource usage (e.g., disk I/O, network bandwidth) and cannot reveal which microservice or request is causing latency within the cluster; they are useful for infrastructure troubleshooting but not for application-level distributed tracing.

Practice this question →

MCQhard

A company runs a Kubernetes cluster with 50 worker nodes, each hosting multiple microservices. They use Prometheus for metrics collection and Grafana for dashboards. Recently, the Prometheus server has been experiencing out-of-memory (OOM) kills during peak hours, causing gaps in metric collection. The cluster has a dedicated monitoring namespace. The team has already increased the Prometheus pod's memory limits to 8GB, but OOMs still occur. The metrics retention is set to 15 days. The cardinality of certain metrics (e.g., HTTP request labels with user IDs) is very high. The team needs to resolve the OOM issue without losing critical alerting capability for at least the last 7 days of data. Which action should they take first?

A.Implement recording rules to pre-aggregate high-cardinality metrics at a lower granularity

B.Drop high-cardinality metrics like HTTP request labels using relabel_configs

C.Reduce metrics retention to 7 days to free memory

D.Enable vertical pod autoscaler for the Prometheus pod

AnswerA

Recording rules reduce cardinality by aggregating metrics, lowering memory usage while preserving aggregated data for alerting.

Why this answer

Option A is correct because recording rules allow Prometheus to pre-aggregate high-cardinality metrics (e.g., HTTP request labels with user IDs) at a lower granularity, reducing the number of unique time series stored in memory. This directly addresses the OOM issue caused by cardinality explosion without discarding raw data entirely, preserving the ability to query aggregated metrics for alerting over the required 7-day window.

Exam trap

The trap here is confusing memory pressure (caused by cardinality) with storage pressure (caused by retention), leading candidates to incorrectly choose reducing retention (Option C) instead of addressing the root cause of high cardinality via recording rules.

How to eliminate wrong answers

Option B is wrong because dropping high-cardinality metrics entirely using relabel_configs would remove critical data needed for alerting and debugging, violating the requirement to retain alerting capability for at least 7 days. Option C is wrong because reducing retention to 7 days frees disk space, not memory; Prometheus OOMs are caused by in-memory time series cardinality, not storage volume. Option D is wrong because enabling vertical pod autoscaler would only adjust CPU/memory limits dynamically, but the fundamental issue is cardinality—more memory without reducing cardinality will still lead to OOM kills.

Practice this question →

Multi-Selecthard

Which THREE of the following are components of the OpenTelemetry project? (Select three)

Select 3 answers

A.OpenTelemetry Agent

B.OpenTelemetry API

C.OpenTelemetry SDK

D.OpenTelemetry Collector

E.OpenTelemetry Exporter

AnswersB, C, D

The API defines data types and interfaces.

Why this answer

The OpenTelemetry project includes the API, SDK, and Collector. The Agent (as a separate component) and Exporter are part of the SDK/Collector, not standalone components.

Practice this question →

Multi-Selecthard

Which TWO of the following are examples of context propagation mechanisms used in distributed tracing?

Select 2 answers

A.HTTP headers

B.Environment variables

C.Database queries

D.Shared filesystem

E.gRPC metadata

AnswersA, E

Headers like traceparent are used to propagate trace context across HTTP calls.

Why this answer

Option A is correct because HTTP headers, such as the `traceparent` and `tracestate` headers defined in the W3C Trace Context specification, are the standard mechanism for propagating trace context across service boundaries in distributed tracing. When a service receives an incoming HTTP request, it extracts the trace ID and span ID from these headers to continue the same trace. This allows trace data to be correlated across multiple microservices as the request flows through the system.

Exam trap

CNCF often tests the distinction between static configuration mechanisms (like environment variables or shared filesystems) and dynamic, in-band propagation mechanisms (like HTTP headers and gRPC metadata) that travel with each request.

Practice this question →

Multi-Selectmedium

Which TWO of the following are valid Prometheus metric types? (Select two)

Select 2 answers

A.Set

B.Counter

C.Timer

D.Meter

E.Gauge

AnswersB, E

Counter is a Prometheus metric type.

Why this answer

Prometheus has four metric types: Counter, Gauge, Histogram, and Summary. Counter and Gauge are two of them.

Practice this question →

Multi-Selectmedium

Which TWO of the following are valid Prometheus metric types? (Select two.)

Select 2 answers

A.Log

B.Counter

C.Event

D.Histogram

E.Trace

AnswersB, D

Counter is a valid Prometheus metric type.

Why this answer

Counter and Histogram are valid Prometheus metric types. Gauge is also valid, but only two are correct. Wait, the question asks for two; we need to ensure exactly two are correct.

Actually, Counter and Histogram are both valid. Gauge is also valid, but we need two. The correct two should be Counter and Histogram.

However, the question says 'Which TWO' and lists five options. We must select exactly two that are correct. Counter and Histogram are indeed valid types.

Gauge is also valid, but then there would be three; we need to check the options: A, B, C, D, E. Let's set options: A: Counter, B: Gauge, C: Histogram, D: Summary, E: Trace. Actually, Summary is also valid.

So there are four valid types: Counter, Gauge, Histogram, Summary. The question expects exactly two correct answers. To make it clear, we should pick two that are unambiguously correct.

I'll choose Counter and Histogram. But then Gauge and Summary are also valid. To avoid confusion, I'll set the options such that only two are valid metric types.

For example, include 'Trace' as an invalid option. Let me design: Options: A: Counter, B: Gauge, C: Histogram, D: Summary, E: Trace. Then all but Trace are valid, so there are four correct.

That's not good. I need to choose two out of five where exactly two are correct. Perhaps I can ask for two that are NOT valid? But the instruction says 'Which TWO are valid'.

I'll limit the valid ones to two by including some that are not valid. For example, include 'Counter' and 'Histogram' as valid, and 'Gauge' and 'Summary' as invalid? But they are valid. Let me think of a different set: Prometheus metric types: Counter, Gauge, Histogram, Summary.

So all four are valid. The question asks for two, but there are four valid. That would be ambiguous.

I'll change the question to ask for two that are NOT valid? No, let's stick to 'Which TWO' and ensure only two options are correct by including invalid ones like 'Trace' and 'Event'. Okay, so options: A: Counter (correct), B: Event (incorrect), C: Histogram (correct), D: Summary (incorrect because it's actually valid? No, Summary is valid. I'll replace Summary with 'Log' which is invalid).

So options: A: Counter, B: Event, C: Histogram, D: Log, E: Trace. Then correct are A and C. That works.

Practice this question →

MCQmedium

Which component of the metrics-server provides resource metrics like CPU and memory usage?

A.kube-apiserver

B.metrics-server

C.kubelet

D.Prometheus

AnswerB

The metrics-server is the component that collects and serves resource metrics.

Why this answer

The metrics-server collects resource metrics from kubelets and exposes them via the Metrics API.

Practice this question →

MCQmedium

Which tool is specifically designed for distributed tracing and was originally developed by Uber?

A.Prometheus

B.Loki

C.Grafana

D.Jaeger

AnswerD

Jaeger is a distributed tracing system originally built by Uber.

Why this answer

Jaeger was originally developed by Uber for distributed tracing.

Practice this question →

MCQmedium

What is the primary role of the OpenTelemetry Collector?

A.To replace Prometheus for metric collection

B.To receive, process, and export telemetry data

C.To store traces and metrics long-term

D.To generate traces for applications

AnswerB

The collector acts as a pipeline to handle telemetry data from multiple sources and send to one or more backends.

Why this answer

The OpenTelemetry Collector receives, processes, and exports telemetry data to various backends.

Practice this question →

MCQhard

A company defines an SLO that 99.9% of requests to a service should complete in under 200ms. Which metric type is used to measure this SLO?

A.Summary

B.Histogram

C.Gauge

D.Counter

AnswerB

Histograms allow calculating quantiles like p99 latency.

Why this answer

The SLO is based on latency, which is typically measured using a histogram to track request durations.

Practice this question →

MCQmedium

A DevOps team wants to collect and forward logs from all nodes in a Kubernetes cluster to a centralized logging backend. Which component is specifically designed for lightweight log collection and forwarding?

A.Fluent Bit

B.Prometheus

C.Jaeger

D.Grafana

AnswerA

Fluent Bit is lightweight and designed for log collection.

Why this answer

Fluent Bit is a lightweight log processor and forwarder, ideal for Kubernetes nodes.

Practice this question →