How should I use these Cloud Native Observability practice questions?

Read each scenario carefully and choose your answer before revealing the explanation. Then check why your choice was right or wrong. Repeat until the reasoning feels automatic.

Can I practise just Cloud Native Observability questions in a focused session?

Yes — use the session launcher on this page to start a 10-, 20-, 30- or 50-question session drawn entirely from the Cloud Native Observability domain.

KCNA · topic practice

Cloud Native Observability practice questions

Practise KCNA NAT and PAT questions covering address translation types, inside/outside interface roles, static vs dynamic vs PAT, and troubleshooting missing or incorrect translations.

Courseiva uses original exam-style practice questions designed for learning and revision. The goal is to understand the concepts, recognise exam patterns, and improve through explanations — not memorise copied exam dumps.

Reviewed byJohnson Ajibi· MSc IT Security

20 questionsDomain: Cloud Native Observability

Practice 10 questions Browse domain →

What the exam tests

What to know about Cloud Native Observability

Cloud concepts questions usually test the service model (IaaS/PaaS/SaaS) and deployment model (public/private/hybrid/community) appropriate for a given scenario.

IaaS, PaaS and SaaS responsibilities and examples.

Public, private, hybrid and community cloud deployment models.

On-premises vs cloud trade-offs: cost, control, scalability.

How cloud connectivity options (VPN, Direct Connect, ExpressRoute) work.

Why learners struggle

Why Cloud Native Observability questions are commonly missed

NAT questions are missed when learners confuse the four address types (inside local, inside global, outside local, outside global) or misapply the interface direction. A translation rule can look correct but still fail if the ACL, interface, or direction is wrong.

·Inside local vs inside global — inside local is the private source, inside global is the translated public address
·PAT overloads — many sources share one public IP using unique port numbers
·Interface direction — ip nat inside and ip nat outside must be on the correct interfaces
·Static NAT vs dynamic NAT vs PAT — each serves a different use case
·The NAT ACL identifies traffic to translate, not traffic to permit or deny
·A missing translation can look like a routing problem if the interfaces are misconfigured

Watch out for

Common Cloud Native Observability exam traps

▸IaaS gives you infrastructure control; SaaS gives you only the application.
▸Hybrid cloud combines on-premises and public cloud — not two public clouds.
▸Cloud does not automatically mean cheaper or more secure.
▸Management responsibility shifts with each service model (IaaS → PaaS → SaaS).

Practice set

Cloud Native Observability questions

20 questions · select your answer, then reveal the explanation

Question 1mediummultiple choice

Read the full NAT/PAT explanation →

A DevOps team notices that a microservice is returning 503 errors intermittently. The service runs in Kubernetes and uses a liveness probe. The team wants to understand the root cause without restarting the pod. Which observability approach should they use first?

Trap 1: Use kubectl describe pod to check recent events

Events may not capture every probe failure, especially if they are short-lived.

Trap 2: Increase log verbosity in the application to capture all requests

Logs may not capture probe failures, and increasing verbosity can impact performance.

Trap 3: Enable distributed tracing across the service mesh

Tracing is for request flows, not probe health checks.

Study all Cloud Native Observability common traps →

A
Use kubectl describe pod to check recent events
Why wrong: Events may not capture every probe failure, especially if they are short-lived.
B
Query Prometheus for kubelet metrics on probe successes and failures
Metrics like 'probe_success' from kubelet can show probe status over time, helping identify intermittent failures.
C
Increase log verbosity in the application to capture all requests
Why wrong: Logs may not capture probe failures, and increasing verbosity can impact performance.
D
Enable distributed tracing across the service mesh
Why wrong: Tracing is for request flows, not probe health checks.

Full breakdown with real-world context →

Question 2hardmultiple choice

Read the full NAT/PAT explanation →

A platform team is designing a monitoring strategy for a multi-tenant Kubernetes cluster. Each tenant runs workloads in separate namespaces. The team needs to ensure tenant isolation while providing aggregated cluster-wide dashboards. Which approach best meets these requirements?

Trap 1: Deploy a single Prometheus instance with namespace labels on all…

Single Prometheus does not enforce access control; any user could query metrics across namespaces.

Trap 2: Use a global Prometheus with recording rules to aggregate…

This does not provide isolation; all metrics are accessible in one place.

Trap 3: Have each tenant deploy their own monitoring stack and view…

Lacks aggregated cluster-wide dashboards, violating the requirement.

Study all Cloud Native Observability common traps →

A
Deploy a single Prometheus instance with namespace labels on all metrics
Why wrong: Single Prometheus does not enforce access control; any user could query metrics across namespaces.
B
Use a global Prometheus with recording rules to aggregate per-namespace metrics
Why wrong: This does not provide isolation; all metrics are accessible in one place.
C
Have each tenant deploy their own monitoring stack and view separately
Why wrong: Lacks aggregated cluster-wide dashboards, violating the requirement.
D
Deploy a Prometheus instance per tenant and use Thanos to aggregate metrics globally
Per-tenant Prometheus ensures isolation, and Thanos sidecar allows secure global aggregation with proper RBAC.

Full breakdown with real-world context →

Question 3easymultiple choice

Read the full NAT/PAT explanation →

A Kubernetes administrator is troubleshooting a pod that is stuck in CrashLoopBackOff. The pod's restart count is increasing. Which initial step should the administrator take to diagnose the issue?

Trap 1: Run 'kubectl describe pod <pod-name>' to check events

Events may indicate issues but usually lack application-level error details.

Trap 2: Check the Prometheus metrics for the pod's CPU usage

CPU metrics are unlikely to reveal crash causes.

Trap 3: Run 'kubectl exec -it <pod-name> -- /bin/sh' to inspect the…

Cannot exec into a crashed pod; it must be running.

Study all Cloud Native Observability common traps →

A
Run 'kubectl describe pod <pod-name>' to check events
Why wrong: Events may indicate issues but usually lack application-level error details.
B
Check the Prometheus metrics for the pod's CPU usage
Why wrong: CPU metrics are unlikely to reveal crash causes.
C
Run 'kubectl exec -it <pod-name> -- /bin/sh' to inspect the container
Why wrong: Cannot exec into a crashed pod; it must be running.
D
Run 'kubectl logs <pod-name>' to view the application logs
Logs often contain error messages that explain why the application is crashing.

Full breakdown with real-world context →

Question 4mediummultiple choice

Read the full NAT/PAT explanation →

An organization uses Prometheus and Grafana for monitoring. They want to alert when the 99th percentile of request latency exceeds 500ms for more than 5 minutes. Which PromQL query should they use in the alert rule?

Trap 1: histogram_quantile(0.99,…

Range of 1m is too short; need 5m to match requirement.

Trap 2: avg(rate(http_request_duration_seconds_bucket[5m])) > 0.5

This averages bucket rates, not percentile.

Trap 3: max(rate(http_request_duration_seconds_bucket[5m])) > 0.5

Max of bucket rates is not percentile.

Study all Cloud Native Observability common traps →

A
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[1m])) > 0.5
Why wrong: Range of 1m is too short; need 5m to match requirement.
B
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) > 0.5
Correctly calculates 99th percentile over 5 minutes, then compares to 0.5 seconds.
C
avg(rate(http_request_duration_seconds_bucket[5m])) > 0.5
Why wrong: This averages bucket rates, not percentile.
D
max(rate(http_request_duration_seconds_bucket[5m])) > 0.5
Why wrong: Max of bucket rates is not percentile.

Full breakdown with real-world context →

Question 5hardmulti select

Read the full NAT/PAT explanation →

Which TWO of the following are best practices for structuring log output in cloud-native applications to maximize observability?

Trap 1: Include verbose debug-level information in every log line

Too much verbosity increases storage and noise; use appropriate levels.

Trap 2: Use multi-line log entries for detailed error information

Multi-line logs are harder to parse; use single-line with escaped newlines if needed.

Trap 3: Avoid timestamps to reduce log size

Timestamps are essential for ordering and debugging.

Study all Cloud Native Observability common traps →

A
Include verbose debug-level information in every log line
Why wrong: Too much verbosity increases storage and noise; use appropriate levels.
B
Use multi-line log entries for detailed error information
Why wrong: Multi-line logs are harder to parse; use single-line with escaped newlines if needed.
C
Output logs in structured format such as JSON
Structured logs are machine-parseable and easily ingested by log aggregators.
D
Include a unique request or correlation ID in each log entry
Correlation IDs help trace requests across microservices.
E
Avoid timestamps to reduce log size
Why wrong: Timestamps are essential for ordering and debugging.

Full breakdown with real-world context →

Question 6mediummulti select

Read the full NAT/PAT explanation →

Which THREE of the following are valid use cases for distributed tracing in a microservices architecture?

Trap 1: Monitoring CPU and memory usage of each service instance

Resource metrics are better for Prometheus.

Trap 2: Capturing detailed error messages and stack traces

Logs are more appropriate for detailed error text.

Study all Cloud Native Observability common traps →

A
Monitoring CPU and memory usage of each service instance
Why wrong: Resource metrics are better for Prometheus.
B
Understanding the dependency graph between microservices
Traces reveal service call relationships.
C
Pinpointing the root cause of an error in a distributed transaction
Tracing shows where errors occur in the flow.
D
Identifying which service contributes the most latency to an end-user request
Tracing shows time spent in each span.
E
Capturing detailed error messages and stack traces
Why wrong: Logs are more appropriate for detailed error text.

Full breakdown with real-world context →

Question 7hardmultiple choice

Read the full NAT/PAT explanation →

A company runs a Kubernetes cluster with 50 worker nodes, each hosting multiple microservices. They use Prometheus for metrics collection and Grafana for dashboards. Recently, the Prometheus server has been experiencing out-of-memory (OOM) kills during peak hours, causing gaps in metric collection. The cluster has a dedicated monitoring namespace. The team has already increased the Prometheus pod's memory limits to 8GB, but OOMs still occur. The metrics retention is set to 15 days. The cardinality of certain metrics (e.g., HTTP request labels with user IDs) is very high. The team needs to resolve the OOM issue without losing critical alerting capability for at least the last 7 days of data. Which action should they take first?

Trap 1: Drop high-cardinality metrics like HTTP request labels using…

This may remove useful metrics; better to aggregate them.

Trap 2: Reduce metrics retention to 7 days to free memory

Reduces memory but also loses older data; less targeted than recording rules.

Trap 3: Enable vertical pod autoscaler for the Prometheus pod

VPA adjusts resources but does not reduce cardinality; OOM may still occur if node is saturated.

Study all Cloud Native Observability common traps →

A
Implement recording rules to pre-aggregate high-cardinality metrics at a lower granularity
Recording rules reduce cardinality by aggregating metrics, lowering memory usage while preserving aggregated data for alerting.
B
Drop high-cardinality metrics like HTTP request labels using relabel_configs
Why wrong: This may remove useful metrics; better to aggregate them.
C
Reduce metrics retention to 7 days to free memory
Why wrong: Reduces memory but also loses older data; less targeted than recording rules.
D
Enable vertical pod autoscaler for the Prometheus pod
Why wrong: VPA adjusts resources but does not reduce cardinality; OOM may still occur if node is saturated.

Full breakdown with real-world context →

Question 8mediummultiple choice

Read the full NAT/PAT explanation →

A company deploys a microservice application on Kubernetes. They notice that one of the services is returning 5xx errors intermittently. Which observability tool should they use to correlate the errors with resource usage across all pods of that service?

Trap 1: Grafana

Grafana visualizes data from sources like Prometheus, but does not store or correlate metrics itself.

Trap 2: Fluentd

Fluentd is a log collector, not a metrics system.

Trap 3: Jaeger

Jaeger is for distributed tracing, not metric correlation.

Study all Cloud Native Observability common traps →

A
Prometheus
Prometheus collects metrics and can correlate error rates with resource usage via labels.
B
Grafana
Why wrong: Grafana visualizes data from sources like Prometheus, but does not store or correlate metrics itself.
C
Fluentd
Why wrong: Fluentd is a log collector, not a metrics system.
D
Jaeger
Why wrong: Jaeger is for distributed tracing, not metric correlation.

Full breakdown with real-world context →

Question 9hardmultiple choice

Read the full NAT/PAT explanation →

A platform team wants to implement observability for a Kubernetes cluster running 500+ microservices. They need to reduce the cost of storing logs while retaining the ability to search for specific error patterns. Which strategy best achieves this?

Trap 1: Increase log retention to one year for compliance

Increases cost without addressing the need to reduce storage costs.

Trap 2: Store all logs in a centralized Elasticsearch cluster with high…

Storing all logs is costly; not the best cost optimization.

Trap 3: Aggregate logs into a single pod for easier indexing

Single pod is a single point of failure and doesn't reduce cost.

Study all Cloud Native Observability common traps →

A
Increase log retention to one year for compliance
Why wrong: Increases cost without addressing the need to reduce storage costs.
B
Store all logs in a centralized Elasticsearch cluster with high retention
Why wrong: Storing all logs is costly; not the best cost optimization.
C
Aggregate logs into a single pod for easier indexing
Why wrong: Single pod is a single point of failure and doesn't reduce cost.
D
Use structured logging and sample debug logs, retaining error logs fully
Sampling reduces volume while keeping critical error logs for search.

Full breakdown with real-world context →

Question 10easymultiple choice

Read the full NAT/PAT explanation →

A developer wants to monitor the health of a Kubernetes deployment by checking if the number of ready replicas matches the desired replicas. Which metric from kube-state-metrics should they query?

Trap 1: kube_deployment_spec_replicas

This shows desired replicas, but alone doesn't show health.

Trap 2: kube_node_status_condition

Node-level metric, not deployment health.

Trap 3: kube_pod_container_status_running

This is a pod-level metric, not deployment-level.

Study all Cloud Native Observability common traps →

A
kube_deployment_status_replicas_ready
This metric shows ready replicas, enabling comparison with desired replicas.
B
kube_deployment_spec_replicas
Why wrong: This shows desired replicas, but alone doesn't show health.
C
kube_node_status_condition
Why wrong: Node-level metric, not deployment health.
D
kube_pod_container_status_running
Why wrong: This is a pod-level metric, not deployment-level.

Full breakdown with real-world context →

Question 11mediummulti select

Read the full NAT/PAT explanation →

Which TWO of the following are best practices for implementing observability in a cloud-native environment?

Trap 1: Store all raw observability data indefinitely for forensic analysis

Storing all data indefinitely is costly and not a best practice.

Trap 2: Use only metrics and avoid logs to reduce complexity

Metrics alone lack context; logs and traces are also needed.

Trap 3: Randomly sample all traces and logs to reduce storage

Random sampling may miss critical errors; targeted sampling is better.

Study all Cloud Native Observability common traps →

A
Store all raw observability data indefinitely for forensic analysis
Why wrong: Storing all data indefinitely is costly and not a best practice.
B
Use only metrics and avoid logs to reduce complexity
Why wrong: Metrics alone lack context; logs and traces are also needed.
C
Add unique request IDs to logs for end-to-end tracing correlation
Request IDs help correlate logs across microservices for tracing.
D
Randomly sample all traces and logs to reduce storage
Why wrong: Random sampling may miss critical errors; targeted sampling is better.
E
Use structured logging (e.g., JSON format) for easier automated parsing
Structured logging allows tools like Fluentd to parse logs efficiently.

Full breakdown with real-world context →

Question 12hardmultiple choice

Read the full NAT/PAT explanation →

You are an SRE managing a Kubernetes cluster with 200 nodes and 10,000 pods. The cluster runs a critical payment processing application. Users report that transactions are occasionally failing with a 'timeout' error. You have Prometheus and Grafana set up for monitoring, and you use Fluentd with Elasticsearch for logging. You notice that during peak hours, the CPU usage of the payment service pods spikes to 90%, but memory usage remains stable. The pod restart count is low. You also see that the response time of the payment service increases significantly during these spikes. You need to identify the root cause and propose a fix. Which course of action is most appropriate?

Trap 1: Add more replicas of the payment service to distribute the load

Horizontal scaling helps but may not address the root cause if the service is inefficient; also adds complexity.

Trap 2: Increase the memory limits for the payment service pods to improve…

Memory is stable; increasing memory won't help CPU-bound issues.

Trap 3: Implement a circuit breaker pattern to fail fast and avoid timeouts

Circuit breaker prevents cascading failures but doesn't fix the underlying CPU issue.

Study all Cloud Native Observability common traps →

A
Add more replicas of the payment service to distribute the load
Why wrong: Horizontal scaling helps but may not address the root cause if the service is inefficient; also adds complexity.
B
Increase the memory limits for the payment service pods to improve caching
Why wrong: Memory is stable; increasing memory won't help CPU-bound issues.
C
Implement a circuit breaker pattern to fail fast and avoid timeouts
Why wrong: Circuit breaker prevents cascading failures but doesn't fix the underlying CPU issue.
D
Increase the CPU limits for the payment service pods to allow more CPU resources during spikes
This directly addresses the CPU bottleneck, reducing response time.

Full breakdown with real-world context →

Question 13mediummultiple choice

Read the full NAT/PAT explanation →

A company is running a microservices application on a Kubernetes cluster. They have noticed that one of the services, 'payment-api', is experiencing intermittent high latency. The team wants to identify the root cause without modifying the application code. Which approach should they take?

Trap 1: Monitor CPU and memory metrics from kube-state-metrics and…

Resource metrics may not directly indicate request latency.

Trap 2: Increase log verbosity for all services and search for error…

Logs may not capture latency across services.

Trap 3: Check node-level metrics using Prometheus Node Exporter.

Node metrics are too granular for service-level latency.

Study all Cloud Native Observability common traps →

A
Monitor CPU and memory metrics from kube-state-metrics and correlate with latency.
Why wrong: Resource metrics may not directly indicate request latency.
B
Increase log verbosity for all services and search for error messages.
Why wrong: Logs may not capture latency across services.
C
Implement distributed tracing using tools like Jaeger or Zipkin to trace requests across services.
Distributed tracing tracks request flow and identifies slow components.
D
Check node-level metrics using Prometheus Node Exporter.
Why wrong: Node metrics are too granular for service-level latency.

Full breakdown with real-world context →

Question 14hardmulti select

Read the full NAT/PAT explanation →

Which TWO of the following are recommended practices for achieving observability in a Kubernetes cluster?

Trap 1: Use a single centralized logging solution to aggregate logs from…

Single point of failure; prefer highly available solutions.

Trap 2: Store all debug logs for a minimum of 90 days for compliance.

Debug logs are verbose; store selectively.

Trap 3: Disable leader election for monitoring components to reduce…

Leader election prevents duplicate metrics in HA setups.

Study all Cloud Native Observability common traps →

A
Use a single centralized logging solution to aggregate logs from all components.
Why wrong: Single point of failure; prefer highly available solutions.
B
Store all debug logs for a minimum of 90 days for compliance.
Why wrong: Debug logs are verbose; store selectively.
C
Include correlation IDs in structured logs to enable tracing across services.
Correlation IDs help trace requests across microservices.
D
Disable leader election for monitoring components to reduce complexity.
Why wrong: Leader election prevents duplicate metrics in HA setups.
E
Use Prometheus with a pull-based model to scrape metrics from pods.
Prometheus pull model is standard for Kubernetes metrics.

Full breakdown with real-world context →

Question 15mediumdrag order

Read the full NAT/PAT explanation →

Drag and drop the steps to perform a backup of etcd in a Kubernetes cluster into the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

1Step 1

2Step 2

3Step 3

4Step 4

5Step 5

Question 16mediummatching

Read the full NAT/PAT explanation →

Match each Kubernetes security concept to its definition.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Identity for processes running in a pod

Role-based access control to authorize API requests

Specifies how groups of pods are allowed to communicate

Deprecated but formerly controlled security-sensitive pod settings

Stores sensitive data like passwords and tokens

Question 17mediummultiple choice

Read the full NAT/PAT explanation →

Which of the following is a core component of the three pillars of observability?

Trap 1: Alerting

Alerting is not a pillar; it is a downstream action from metrics.

Trap 2: SLIs

SLIs are service level indicators, not a pillar.

Trap 3: Dashboards

Dashboards are visualizations, not a pillar.

Study all Cloud Native Observability common traps →

A
Alerting
Why wrong: Alerting is not a pillar; it is a downstream action from metrics.
B
SLIs
Why wrong: SLIs are service level indicators, not a pillar.
C
Logs
Logs are one of the three pillars of observability.
D
Dashboards
Why wrong: Dashboards are visualizations, not a pillar.

Full breakdown with real-world context →

Question 18easymultiple choice

Read the full NAT/PAT explanation →

What is the primary purpose of Prometheus in cloud native observability?

Trap 1: Provide distributed tracing

Tracing is handled by Jaeger or Zipkin, not Prometheus.

Trap 2: Visualize data

Visualization is typically done with Grafana, not Prometheus.

Trap 3: Collect and store logs

Log collection is not Prometheus's primary function.

Study all Cloud Native Observability common traps →

A
Provide distributed tracing
Why wrong: Tracing is handled by Jaeger or Zipkin, not Prometheus.
B
Visualize data
Why wrong: Visualization is typically done with Grafana, not Prometheus.
C
Collect and store logs
Why wrong: Log collection is not Prometheus's primary function.
D
Collect and store metrics
Prometheus is a metrics system.

Full breakdown with real-world context →

Question 19mediummultiple choice

Read the full NAT/PAT explanation →

Which command retrieves logs from a specific container named 'sidecar' in a multi-container pod?

Trap 1: kubectl logs pod-name sidecar

This is incorrect; container name must follow -c flag.

Trap 2: kubectl logs pod-name --container sidecar

This is also valid but less common; however -c is the flag.

Trap 3: kubectl logs sidecar pod-name

Order is wrong; pod name comes first.

Study all Cloud Native Observability common traps →

A
kubectl logs pod-name sidecar
Why wrong: This is incorrect; container name must follow -c flag.
B
kubectl logs pod-name --container sidecar
Why wrong: This is also valid but less common; however -c is the flag.
C
kubectl logs pod-name -c sidecar
Correct syntax for specifying a container.
D
kubectl logs sidecar pod-name
Why wrong: Order is wrong; pod name comes first.

Full breakdown with real-world context →

Question 20hardmultiple choice

Read the full network assurance explanation →

In OpenTelemetry, what is the purpose of the Collector component?

Trap 1: Instrument code automatically

Instrumentation is done via SDKs, not the Collector.

Trap 2: Visualize traces and metrics

Visualization is done by tools like Jaeger UI or Grafana.

Trap 3: Aggregate logs from multiple sources

That's more for log aggregators like Fluentd.

Study all Cloud Native Observability common traps →

A
Instrument code automatically
Why wrong: Instrumentation is done via SDKs, not the Collector.
B
Receive, process, and export telemetry data
The Collector is a vendor-agnostic pipeline for telemetry data.
C
Visualize traces and metrics
Why wrong: Visualization is done by tools like Jaeger UI or Grafana.
D
Aggregate logs from multiple sources
Why wrong: That's more for log aggregators like Fluentd.

Full breakdown with real-world context →

Continue with 20-question session →

Free account

Track your progress over time

Create a free account to save your results and see which topics improve across sessions.

Focused Cloud Native Observability sessions

Start a Cloud Native Observability only practice session

Every question in these sessions is drawn from the Cloud Native Observability domain — nothing else.

10 questions 20 questions 30 questions 50 questions

Browse all Cloud Native Observability questions →Mixed KCNA session

Frequently asked questions

What does the KCNA exam test about Cloud Native Observability?: Cloud concepts questions usually test the service model (IaaS/PaaS/SaaS) and deployment model (public/private/hybrid/community) appropriate for a given scenario.
How should I use these practice questions?: Select your answer before revealing the explanation. Then read why each option is right or wrong — this active recall approach builds retention far faster than re-reading notes.
Can I practise just Cloud Native Observability questions in a focused session?: Yes — the session launcher on this page draws every question from the Cloud Native Observability domain. Use a 10-question session first to gauge your baseline, then move to 20 or 30 once the weak spots are clear.
Where can I practise other KCNA topics?: Use the topic links above to move to related areas, or go back to the KCNA question bank to see all topics.
Are these real exam questions or dumps?: These are original practice questions written to test the same concepts the KCNA exam covers. They are not copied from any real exam or dump site.

Cloud Native Observability only

10 questions 20 questions 30 questions 50 questions

Mixed KCNA session

Track your progress

A free account saves results across sessions and highlights which topics need work.

Study resources

All KCNA questions Cloud Native Observability domain overview KCNA exam guide

Exam traps to avoid

▸IaaS gives you infrastructure control; SaaS gives you only the application.
▸Hybrid cloud combines on-premises and public cloud — not two public clouds.
▸Cloud does not automatically mean cheaper or more secure.
▸Management responsibility shifts with each service model (IaaS → PaaS → SaaS).

Cloud Native Observability practice questions

What to know about Cloud Native Observability

Why Cloud Native Observability questions are commonly missed

Common Cloud Native Observability exam traps

Cloud Native Observability questions

A DevOps team notices that a microservice is returning 503 errors intermittently. The service runs in Kubernetes and uses a liveness probe. The team wants to understand the root cause without restarting the pod. Which observability approach should they use first?

A platform team is designing a monitoring strategy for a multi-tenant Kubernetes cluster. Each tenant runs workloads in separate namespaces. The team needs to ensure tenant isolation while providing aggregated cluster-wide dashboards. Which approach best meets these requirements?

A Kubernetes administrator is troubleshooting a pod that is stuck in CrashLoopBackOff. The pod's restart count is increasing. Which initial step should the administrator take to diagnose the issue?

An organization uses Prometheus and Grafana for monitoring. They want to alert when the 99th percentile of request latency exceeds 500ms for more than 5 minutes. Which PromQL query should they use in the alert rule?

Which TWO of the following are best practices for structuring log output in cloud-native applications to maximize observability?

Which THREE of the following are valid use cases for distributed tracing in a microservices architecture?

A company deploys a microservice application on Kubernetes. They notice that one of the services is returning 5xx errors intermittently. Which observability tool should they use to correlate the errors with resource usage across all pods of that service?

A platform team wants to implement observability for a Kubernetes cluster running 500+ microservices. They need to reduce the cost of storing logs while retaining the ability to search for specific error patterns. Which strategy best achieves this?

A developer wants to monitor the health of a Kubernetes deployment by checking if the number of ready replicas matches the desired replicas. Which metric from kube-state-metrics should they query?

Which TWO of the following are best practices for implementing observability in a cloud-native environment?

A company is running a microservices application on a Kubernetes cluster. They have noticed that one of the services, 'payment-api', is experiencing intermittent high latency. The team wants to identify the root cause without modifying the application code. Which approach should they take?

Which TWO of the following are recommended practices for achieving observability in a Kubernetes cluster?

Drag and drop the steps to perform a backup of etcd in a Kubernetes cluster into the correct order.

Match each Kubernetes security concept to its definition.

Which of the following is a core component of the three pillars of observability?

What is the primary purpose of Prometheus in cloud native observability?

Which command retrieves logs from a specific container named 'sidecar' in a multi-container pod?

In OpenTelemetry, what is the purpose of the Collector component?

Track your progress over time

Start a Cloud Native Observability only practice session

Related KCNA topic practice pages

Kubernetes Fundamentals practice questions

Container Orchestration practice questions

Cloud Native Architecture practice questions