KCNA Exam Questions and Answers

A developer deploys a pod that continuously restarts. 'kubectl describe pod' shows the container exits with code 137. What is the most likely cause?

The container is exceeding its memory limit and being OOM-killed.

Exit code 137 indicates SIGKILL, often from OOM.

The liveness probe is failing and restarting the container.

The init container is failing and blocking the main container.

The pod is hitting a resource quota limit at the namespace level.

Why: Exit code 137 (128 + 9) indicates the container was killed by SIGKILL. In Kubernetes, this most commonly occurs when the container exceeds its memory limit, triggering the OOM (Out-Of-Memory) killer. The kubelet enforces the resource limits specified in the pod spec, and when memory usage surpasses the limit, the kernel terminates the process with SIGKILL, resulting in exit code 137.

An application requires a unique identifier per replica, stored in an environment variable. Which Kubernetes resource should be used to inject this identifier into each pod without manual updates?

Deployment with pod anti-affinity to schedule each pod on a different node.

StatefulSet with an environment variable derived from the pod name.

StatefulSet pods have stable, unique names (e.g., myapp-0).

DaemonSet with a node name environment variable.

Job with a completion index environment variable.

Why: A StatefulSet provides stable, unique network identities and persistent storage per replica. The pod name (e.g., pod-0, pod-1) can be exposed via the Downward API or hostname. Option A is correct. Option B is wrong because Deployments create identical pods without ordering. Option C is wrong because DaemonSets run one pod per node. Option D is wrong because Jobs are for batch processing.

A pod is stuck in 'Pending' state. 'kubectl describe pod' shows '0/4 nodes are available: 4 node(s) had taint {node.kubernetes.io/unreachable: }, that the pod didn't tolerate.' What is the most likely cause?

All nodes have disk pressure.

All nodes are unreachable or have been cordoned.

The taint indicates nodes are unreachable.

The pod has a toleration that matches the taint.

The nodes do not have enough CPU or memory.

Why: The taint `node.kubernetes.io/unreachable` is automatically added by the node controller when a node becomes unreachable (e.g., network failure, kubelet stops heartbeating). The error shows all 4 nodes have this taint and the pod has no matching toleration, meaning the scheduler cannot place the pod. This directly indicates all nodes are unreachable or have been cordoned (which also adds the `node.kubernetes.io/unschedulable` taint, but here the specific taint is `unreachable`).

A team wants to minimize downtime during a Deployment rollout. Which strategy ensures that new pods are created before old pods are terminated?

Set strategy type to 'Recreate'.

Set strategy type to 'RollingUpdate' with maxSurge=0, maxUnavailable=1.

Set strategy type to 'RollingUpdate' with maxSurge=1, maxUnavailable=0.

New pods are created first, ensuring zero downtime.

Set strategy type to 'RollingUpdate' with maxSurge=1, maxUnavailable=1.

Why: Option C is correct because setting `maxSurge=1` and `maxUnavailable=0` in a RollingUpdate strategy ensures that one additional pod is created above the desired replica count before any existing pod is terminated. This guarantees zero downtime by maintaining full capacity during the rollout, as new pods become ready before old ones are removed.

A pod in a ReplicaSet is failing with 'CrashLoopBackOff'. 'kubectl logs pod' shows 'Error: listen tcp :8080: bind: address already in use'. What is the most likely cause?

The readiness probe is misconfigured.

The container image is missing the application binary.

The container's process is not terminating quickly enough on SIGTERM, causing a port conflict on restart.

Old process still holds the port.

The pod is using hostPort and two pods on the same node conflict.

Why: The error 'address already in use' on port 8080 indicates that when the container restarts, the previous process is still holding the port. This typically happens when the application does not handle SIGTERM properly and does not shut down within the terminationGracePeriodSeconds (default 30s), so the old process lingers while the new one tries to bind to the same port, causing a CrashLoopBackOff.

Which TWO of the following are valid ways to expose a set of pods as a network service within a Kubernetes cluster?

Create a StatefulSet with pod hostnames.

Create a Service of type ExternalName.

Create a ConfigMap with pod IPs.

Create a Service of type ClusterIP.

ClusterIP exposes pods internally.

Create an Ingress resource that routes to a Service.

Ingress exposes HTTP/HTTPS to Services.

Why: A Service of type ClusterIP exposes a set of pods as a network service within the cluster by assigning a stable virtual IP address and DNS name. Traffic to this IP is load-balanced across the pods matching the Service's label selector, enabling internal cluster communication without external exposure.

Want more Kubernetes Fundamentals practice?

All Container Orchestration questions

Domain 2: Container Orchestration

A team deploys a microservice that requires sticky sessions. The service runs on Kubernetes with multiple replicas. Which Kubernetes resource should be used to ensure requests from a client are consistently routed to the same pod?

Headless Service

Service with sessionAffinity: ClientIP

This configuration ensures requests from the same client IP go to the same pod.

Ingress with default settings

Deployment with hostNetwork: true

Why: Option B is correct because setting `sessionAffinity: ClientIP` on a Kubernetes Service ensures that all requests from the same client IP are routed to the same Pod. This is the standard Kubernetes mechanism for implementing sticky sessions without requiring changes to the application or ingress layer.

A Kubernetes cluster is experiencing network latency. The team suspects that the number of services and endpoints is causing iptables performance degradation. Which CNI plugin or network policy approach is most likely to improve performance?

Switch to Flannel with host-gw backend

Use Calico with iptables mode

Use an eBPF-based CNI plugin like Cilium

eBPF bypasses iptables, reducing latency and improving scalability.

Apply a default-deny NetworkPolicy

Why: C is correct because eBPF-based CNI plugins like Cilium bypass the traditional iptables chains entirely, using a kernel-level BPF (Berkeley Packet Filter) program to handle service load balancing and network policy enforcement. This eliminates the O(n) scaling issue of iptables rules with the number of services and endpoints, significantly reducing latency in large clusters.

A developer wants to ensure that a pod runs only on nodes with SSDs. Which mechanism should be used?

Apply a taint to nodes without SSDs and add tolerations to the pod

Use pod anti-affinity

Add a nodeSelector with disktype: ssd

nodeSelector ensures pods are scheduled on nodes with the specified label.

Define a ResourceQuota

Why: Option C is correct because `nodeSelector` is a simple and direct mechanism in Kubernetes to constrain a pod to run only on nodes that have a specific label, such as `disktype=ssd`. By labeling nodes with SSDs and adding the corresponding `nodeSelector` in the pod spec, the scheduler ensures the pod is placed exclusively on those nodes. This approach is straightforward and does not require complex scheduling constraints or resource management.

An application running in a Kubernetes pod needs to access a database that is deployed on a VM outside the cluster. The database IP is stable. Which is the best way to expose the database to the pod?

Expose the database via Ingress

Create a Service of type ExternalName pointing to the database hostname

ExternalName service provides a DNS alias to an external resource.

Use a Headless Service

Create an EndpointSlice manually with the pod IP

Why: Option B is correct because a Service of type ExternalName provides a DNS-based abstraction for external resources, mapping a Kubernetes service name to an external DNS name (the database hostname). This allows the pod to access the database via a stable in-cluster DNS name without needing to manage IP changes or network policies for external endpoints. It is the simplest and most Kubernetes-native way to expose a stable external IP to a pod.

A team notices that a ReplicaSet is not creating the desired number of pods. The ReplicaSet YAML is correctly configured with replicas: 3. The cluster has sufficient resources. What is the most likely cause?

The ReplicaSet is paused

The pod template references an invalid image pull secret

Invalid image pull secret would cause pods to fail with ImagePullBackOff, reducing the ready count.

The nodeSelector does not match any node

A ResourceQuota in the namespace limits the number of pods

Why: Option B is correct because an invalid image pull secret in the pod template prevents the kubelet from authenticating with the container registry, causing the pod creation to fail. The ReplicaSet controller attempts to create pods, but the scheduler cannot pull the image, so the pods remain in a pending or ImagePullBackOff state, never reaching the desired count of 3.

Which TWO of the following are valid ways to expose a set of pods as a network service in Kubernetes?

Service of type NodePort

NodePort exposes the service on each node's IP at a static port.

NetworkPolicy

Service of type ClusterIP

ClusterIP exposes the service on a cluster-internal IP.

Ingress resource

Deployment with replicas

Why: A Service of type NodePort exposes a set of pods on a static port on each node's IP address, making the service accessible from outside the cluster. This is a valid Kubernetes resource for exposing pods as a network service, as it creates a mapping from a node port to the ClusterIP and then to the target pods.

Want more Container Orchestration practice?

All Cloud Native Architecture questions

Domain 3: Cloud Native Architecture

A company wants to migrate its monolithic application to a cloud-native architecture on Kubernetes. The application currently uses a shared database and communicates via internal HTTP calls. Which design pattern should be applied first to increase resilience and enable independent scaling of components?

Adopt CQRS pattern to separate reads and writes

Use the strangler fig pattern to gradually replace monolith functionality

Allows incremental migration with minimal risk.

Implement database-per-service pattern

Deploy a sidecar container for each service

Why: The strangler fig pattern is the correct first step because it allows the team to incrementally replace specific functionalities of the monolithic application with microservices without disrupting the existing system. This pattern routes requests to either the old monolith or new services, enabling gradual migration, independent scaling of extracted components, and improved resilience by isolating failures. It directly addresses the need to move from a shared-database, HTTP-calling monolith to a cloud-native architecture on Kubernetes.

A cloud-native application is designed with multiple microservices that need to handle a sudden spike in traffic without manual intervention. Which Kubernetes feature best enables this?

VerticalPodAutoscaler

Cluster Autoscaler

HorizontalPodAutoscaler

Automatically scales pod replicas based on CPU/memory or custom metrics.

PodDisruptionBudget

Why: The HorizontalPodAutoscaler (HPA) automatically scales the number of pod replicas in a deployment based on observed CPU/memory utilization or custom metrics. This directly addresses the need to handle a sudden traffic spike without manual intervention by adding more pod instances to distribute the load.

A team is designing a cloud-native system that must maintain high availability across multiple cloud regions. The application uses Kubernetes clusters in each region. Which approach best ensures that the system can tolerate a full region failure while minimizing complexity?

Deploy a single Kubernetes cluster spanning all regions

Use a global load balancer with active-passive regional failover

Simpler to implement and manage while ensuring failover.

Run active-active in all regions with synchronous data replication

Implement manual failover procedures documented in runbooks

Why: Option B is correct because a global load balancer with active-passive regional failover provides a straightforward way to route traffic to a healthy secondary region when the primary fails, without the complexity of multi-region Kubernetes control planes or synchronous replication. This approach leverages DNS-based or anycast routing to detect region failure and redirect traffic, ensuring high availability while keeping the operational overhead low.

A microservice logs errors when connecting to the database. The logs show 'connection refused'. Which troubleshooting step should be taken first?

Verify the database Service and Endpoints in Kubernetes

Directly checks if the database service is available.

Scale up the microservice deployment

Restart the microservice pod

Check the logs of other microservices

Why: The 'connection refused' error indicates that the microservice is attempting to connect to a TCP port on the database endpoint, but no process is listening there. In Kubernetes, the first step is to verify that the database Service exists and that its Endpoints object contains the correct pod IPs and port. If the Endpoints are empty or missing, the Service is not routing traffic to any healthy database pod, which directly causes the refusal. This aligns with the Kubernetes troubleshooting hierarchy: always check the Service and Endpoints before assuming application-level issues.

Which practice is a key principle of cloud-native architecture?

Automated CI/CD pipelines

Enables rapid and reliable deployments.

Manual configuration management

Tight coupling of services

Preferring stateful applications over stateless

Why: Automated CI/CD pipelines are a key principle of cloud-native architecture because they enable rapid, reliable, and repeatable delivery of microservices. By automating build, test, and deployment stages, teams can achieve continuous integration and continuous delivery, which aligns with the cloud-native goals of agility, scalability, and resilience. This automation reduces human error and accelerates the feedback loop, essential for managing distributed systems in dynamic cloud environments.

A cloud-native application uses a service mesh (Istio) for traffic management. The team notices increased latency in inter-service communication. Which likely cause should be investigated first?

Kubernetes Network Policies blocking traffic

Misconfigured sidecar proxy settings

Can cause significant latency.

Application code is not optimized for the mesh

mTLS encryption overhead

Why: In Istio, the sidecar proxy (Envoy) intercepts all inbound and outbound traffic for the application container. Misconfigured proxy settings—such as incorrect timeouts, retry policies, or circuit breaker thresholds—can introduce significant latency by causing unnecessary retries, connection delays, or queueing. This is the most common and immediate cause of increased latency in a service mesh, as the data plane is directly in the request path.

Want more Cloud Native Architecture practice?

All Cloud Native Observability questions

Domain 4: Cloud Native Observability

A DevOps team notices that a microservice is returning 503 errors intermittently. The service runs in Kubernetes and uses a liveness probe. The team wants to understand the root cause without restarting the pod. Which observability approach should they use first?

Use kubectl describe pod to check recent events

Query Prometheus for kubelet metrics on probe successes and failures

Metrics like 'probe_success' from kubelet can show probe status over time, helping identify intermittent failures.

Increase log verbosity in the application to capture all requests

Enable distributed tracing across the service mesh

Why: Option B is correct because Prometheus can scrape kubelet metrics that expose liveness probe success and failure counts directly, allowing the team to see if the probe is failing without restarting the pod. This approach provides historical data on probe behavior, which is essential for diagnosing intermittent 503 errors that stem from the kubelet restarting the container when the liveness probe fails. Unlike other options, it does not require modifying the application or restarting the pod, and it directly surfaces the root cause if the probe is the issue.

A platform team is designing a monitoring strategy for a multi-tenant Kubernetes cluster. Each tenant runs workloads in separate namespaces. The team needs to ensure tenant isolation while providing aggregated cluster-wide dashboards. Which approach best meets these requirements?

Deploy a single Prometheus instance with namespace labels on all metrics

Use a global Prometheus with recording rules to aggregate per-namespace metrics

Have each tenant deploy their own monitoring stack and view separately

Deploy a Prometheus instance per tenant and use Thanos to aggregate metrics globally

Per-tenant Prometheus ensures isolation, and Thanos sidecar allows secure global aggregation with proper RBAC.

Why: Option D is correct because deploying a Prometheus instance per tenant enforces strong tenant isolation by preventing cross-tenant metric access or resource contention, while Thanos provides a global view by aggregating metrics from all tenants via sidecar-based or query-frontend federation. This approach satisfies both isolation and aggregated dashboards without compromising security or scalability.

A Kubernetes administrator is troubleshooting a pod that is stuck in CrashLoopBackOff. The pod's restart count is increasing. Which initial step should the administrator take to diagnose the issue?

Run 'kubectl describe pod <pod-name>' to check events

Check the Prometheus metrics for the pod's CPU usage

Run 'kubectl exec -it <pod-name> -- /bin/sh' to inspect the container

Run 'kubectl logs <pod-name>' to view the application logs

Logs often contain error messages that explain why the application is crashing.

Why: Option D is correct because when a pod is in CrashLoopBackOff, the immediate priority is to inspect the application logs to understand why the container is failing. `kubectl logs <pod-name>` retrieves the stdout/stderr output from the container, which typically contains error messages, stack traces, or configuration issues that caused the crash. This is the most direct and efficient first step before deeper investigation.

An organization uses Prometheus and Grafana for monitoring. They want to alert when the 99th percentile of request latency exceeds 500ms for more than 5 minutes. Which PromQL query should they use in the alert rule?

histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[1m])) > 0.5

histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) > 0.5

Correctly calculates 99th percentile over 5 minutes, then compares to 0.5 seconds.

avg(rate(http_request_duration_seconds_bucket[5m])) > 0.5

max(rate(http_request_duration_seconds_bucket[5m])) > 0.5

Why: Option B is correct because it uses `histogram_quantile(0.99, rate(...[5m]))` to calculate the 99th percentile request latency over a 5-minute window, matching the requirement to alert when this value exceeds 500ms (0.5 seconds) for more than 5 minutes. The `rate()` function with a 5m range computes the per-second increase of bucket counters over that duration, which is necessary for accurate quantile estimation in Prometheus.

Which TWO of the following are best practices for structuring log output in cloud-native applications to maximize observability?

Include verbose debug-level information in every log line

Use multi-line log entries for detailed error information

Output logs in structured format such as JSON

Structured logs are machine-parseable and easily ingested by log aggregators.

Include a unique request or correlation ID in each log entry

Correlation IDs help trace requests across microservices.

Avoid timestamps to reduce log size

Why: Option C is correct because structured logging (e.g., JSON) enables automated parsing, filtering, and querying by log aggregation tools like Fluentd, Logstash, or cloud-native observability backends (e.g., Elasticsearch, Loki). This format ensures each log entry has consistent key-value pairs, making it machine-readable and facilitating correlation across distributed services without manual text parsing.

Which THREE of the following are valid use cases for distributed tracing in a microservices architecture?

Monitoring CPU and memory usage of each service instance

Understanding the dependency graph between microservices

Traces reveal service call relationships.

Pinpointing the root cause of an error in a distributed transaction

Tracing shows where errors occur in the flow.

Identifying which service contributes the most latency to an end-user request

Tracing shows time spent in each span.

Capturing detailed error messages and stack traces

Why: Distributed tracing is designed to track the flow of a single request across multiple microservices, recording timing and causality. Option B is correct because tracing systems like Jaeger or Zipkin automatically build a dependency graph by analyzing the parent-child relationships between spans, which reveals how services interact. This is a core use case for understanding service topology and identifying bottlenecks in a distributed system.

Want more Cloud Native Observability practice?

All Cloud Native Application Delivery questions

Domain 5: Cloud Native Application Delivery

A startup wants to minimize downtime during application updates in Kubernetes. Which deployment strategy should they use?

RollingUpdate

Replaces pods incrementally, maintaining availability.

Canary

Blue/Green

Recreate

Why: The RollingUpdate strategy is the default in Kubernetes and minimizes downtime by gradually replacing old Pods with new ones while the application remains available. It uses a configurable `maxSurge` and `maxUnavailable` parameters to control the rate of change, ensuring that a specified number of Pods are always serving traffic. This makes it ideal for startups seeking zero-downtime updates without the complexity of additional tooling or infrastructure.

A DevOps engineer notices that after a Helm upgrade, the new pods are crash looping with 'ImagePullBackOff'. What is the most likely cause?

The pod's liveness probe is misconfigured

The Helm chart has a wrong image tag

A mistyped or non-existent tag leads to pull failures.

The service account lacks permissions

The deployment's resource requests exceed node capacity

Why: The 'ImagePullBackOff' error indicates that Kubernetes is unable to pull the container image from the registry. The most common cause during a Helm upgrade is a misconfigured or incorrect image tag in the Helm chart's values or templates, which causes the kubelet to fail when attempting to pull the specified image. This is distinct from runtime issues like probe failures or resource constraints, which would manifest as different error states.

A team wants to implement GitOps for their Kubernetes workloads using Argo CD. They have multiple environments (dev, staging, prod) in separate clusters. What is the best practice for structuring the Git repository?

A single branch with all environment manifests in the same folder

Separate repositories per environment

Store all manifests in a single file with environment labels

A monorepo with a directory per environment and overlays for differences

Standard GitOps pattern; clear separation with shared base and overlays.

Why: Option D is correct because a monorepo with a directory per environment and overlays (e.g., using Kustomize or Helm) allows you to manage environment-specific differences declaratively while keeping a single source of truth. Argo CD can sync each environment's directory to its respective cluster, and overlays minimize duplication by applying only the necessary patches (e.g., replica counts, ingress hosts) on top of a common base. This approach aligns with GitOps best practices for multi-environment deployments.

A user reports that a ConfigMap update is not reflected in running pods. Which action should be taken to ensure pods receive the updated configuration?

Perform a rollout restart of the deployment

Triggers new pods with updated ConfigMap values.

Delete and recreate the ConfigMap

Edit the deployment and change a label

Restart the kubelet on the nodes

Why: A is correct because ConfigMaps are mounted into pods as volumes or consumed via environment variables at pod creation time. Kubernetes does not automatically propagate ConfigMap updates to running pods; the only way to pick up the new configuration is to restart the pods. A rollout restart of the deployment (e.g., `kubectl rollout restart deployment`) triggers a new ReplicaSet, which creates fresh pods that read the updated ConfigMap.

Which TWO of the following are benefits of using Helm for application delivery?

Automatic scaling based on CPU usage

Ability to roll back to previous releases

Helm tracks releases and supports rollback with helm rollback.

Automatic canary deployments

Simplified packaging and templating of Kubernetes resources

Helm charts use Go templates to parameterize manifests.

Built-in monitoring and alerting

Why: Helm manages Kubernetes application releases as packaged charts. The `helm rollback` command allows you to revert to a previous revision of a release, which is a core benefit for safe application delivery and disaster recovery. This capability is built into Helm's release management system, which tracks each deployment as a revision with a unique version number.

Which THREE of the following practices are essential for a secure cloud native CI/CD pipeline?

Sign container images and verify signatures during deployment

Ensures image integrity and authenticity.

Store secrets in plain text in the pipeline configuration

Use a single long-lived service account for all pipeline steps

Scan container images for vulnerabilities before deployment

Identifies known CVEs in images.

Apply least-privilege IAM roles to pipeline components

Minimizes blast radius in case of compromise.

Why: Signing container images (e.g., using Cosign or Notary) and verifying those signatures during deployment ensures that only trusted, unmodified images are deployed, preventing supply chain attacks. This practice enforces image integrity and provenance, which is a core security requirement for cloud native CI/CD pipelines.

Want more Cloud Native Application Delivery practice?

Browse all KCNA questions Take a timed practice test

Frequently asked questions

How many questions are on the KCNA exam?

The KCNA exam is performance-based — there are no multiple-choice questions. It is a hands-on lab exam completed within 90 minutes. You complete practical tasks in a live or simulated environment. Courseiva practice questions cover the underlying concepts.

What types of questions appear on the KCNA exam?

Hands-on labs and command-line tasks in a live Kubernetes cluster.

How are KCNA questions organised by domain?

The exam covers 5 domains: Kubernetes Fundamentals, Container Orchestration, Cloud Native Architecture, Cloud Native Observability, Cloud Native Application Delivery. Questions are weighted by domain — higher-weight domains appear more on your actual exam.

Are these the actual KCNA exam questions?

No. These are original exam-style practice questions written against the official CNCF KCNA exam objectives. They are not copied from the real exam. Courseiva focuses on genuine understanding, not memorisation of braindumps.

Ready to practice all 60 KCNA questions?

Courseiva tracks your accuracy per domain and routes you toward weak areas automatically. Free, no account required.

CNCF · Free Practice Questions · Last reviewed May 2026

KCNA Exam Questions and Answers

30real exam-style questions organised by domain, each with the correct answer highlighted and a plain-English explanation of why it's right — and why the others are wrong.

60 exam questions

90 min time limit

5 exam domains

Overview Domain Blueprint Study Guide All QuestionsSample by Domain

1. Kubernetes Fundamentals 2. Container Orchestration 3. Cloud Native Architecture 4. Cloud Native Observability 5. Cloud Native Application Delivery

Domain 1: Kubernetes Fundamentals

All Kubernetes Fundamentals questions

A developer deploys a pod that continuously restarts. 'kubectl describe pod' shows the container exits with code 137. What is the most likely cause?

The container is exceeding its memory limit and being OOM-killed.

Exit code 137 indicates SIGKILL, often from OOM.

The liveness probe is failing and restarting the container.

The init container is failing and blocking the main container.

The pod is hitting a resource quota limit at the namespace level.

An application requires a unique identifier per replica, stored in an environment variable. Which Kubernetes resource should be used to inject this identifier into each pod without manual updates?

Deployment with pod anti-affinity to schedule each pod on a different node.

StatefulSet with an environment variable derived from the pod name.

StatefulSet pods have stable, unique names (e.g., myapp-0).

DaemonSet with a node name environment variable.

Job with a completion index environment variable.

All nodes have disk pressure.

All nodes are unreachable or have been cordoned.

The taint indicates nodes are unreachable.

The pod has a toleration that matches the taint.

The nodes do not have enough CPU or memory.

A team wants to minimize downtime during a Deployment rollout. Which strategy ensures that new pods are created before old pods are terminated?

Set strategy type to 'Recreate'.

Set strategy type to 'RollingUpdate' with maxSurge=0, maxUnavailable=1.

Set strategy type to 'RollingUpdate' with maxSurge=1, maxUnavailable=0.

New pods are created first, ensuring zero downtime.

Set strategy type to 'RollingUpdate' with maxSurge=1, maxUnavailable=1.

A pod in a ReplicaSet is failing with 'CrashLoopBackOff'. 'kubectl logs pod' shows 'Error: listen tcp :8080: bind: address already in use'. What is the most likely cause?

The readiness probe is misconfigured.

The container image is missing the application binary.

The container's process is not terminating quickly enough on SIGTERM, causing a port conflict on restart.

Old process still holds the port.

The pod is using hostPort and two pods on the same node conflict.

Which TWO of the following are valid ways to expose a set of pods as a network service within a Kubernetes cluster?

Create a StatefulSet with pod hostnames.

Create a Service of type ExternalName.

Create a ConfigMap with pod IPs.

Create a Service of type ClusterIP.

ClusterIP exposes pods internally.

Create an Ingress resource that routes to a Service.

Ingress exposes HTTP/HTTPS to Services.

Want more Kubernetes Fundamentals practice?

All Container Orchestration questions

Domain 2: Container Orchestration

Headless Service

Service with sessionAffinity: ClientIP

This configuration ensures requests from the same client IP go to the same pod.

Ingress with default settings

Deployment with hostNetwork: true

Switch to Flannel with host-gw backend

Use Calico with iptables mode

Use an eBPF-based CNI plugin like Cilium

eBPF bypasses iptables, reducing latency and improving scalability.

Apply a default-deny NetworkPolicy

A developer wants to ensure that a pod runs only on nodes with SSDs. Which mechanism should be used?

Apply a taint to nodes without SSDs and add tolerations to the pod

Use pod anti-affinity

Add a nodeSelector with disktype: ssd

nodeSelector ensures pods are scheduled on nodes with the specified label.

Define a ResourceQuota

An application running in a Kubernetes pod needs to access a database that is deployed on a VM outside the cluster. The database IP is stable. Which is the best way to expose the database to the pod?

Expose the database via Ingress

Create a Service of type ExternalName pointing to the database hostname

ExternalName service provides a DNS alias to an external resource.

Use a Headless Service

Create an EndpointSlice manually with the pod IP

The ReplicaSet is paused

The pod template references an invalid image pull secret

Invalid image pull secret would cause pods to fail with ImagePullBackOff, reducing the ready count.

The nodeSelector does not match any node

A ResourceQuota in the namespace limits the number of pods

Which TWO of the following are valid ways to expose a set of pods as a network service in Kubernetes?

Service of type NodePort

NodePort exposes the service on each node's IP at a static port.

NetworkPolicy

Service of type ClusterIP

ClusterIP exposes the service on a cluster-internal IP.

Ingress resource

Deployment with replicas

Want more Container Orchestration practice?

All Cloud Native Architecture questions

Domain 3: Cloud Native Architecture

Adopt CQRS pattern to separate reads and writes

Use the strangler fig pattern to gradually replace monolith functionality

Allows incremental migration with minimal risk.

Implement database-per-service pattern

Deploy a sidecar container for each service

A cloud-native application is designed with multiple microservices that need to handle a sudden spike in traffic without manual intervention. Which Kubernetes feature best enables this?

VerticalPodAutoscaler

Cluster Autoscaler

HorizontalPodAutoscaler

Automatically scales pod replicas based on CPU/memory or custom metrics.

PodDisruptionBudget

Deploy a single Kubernetes cluster spanning all regions

Use a global load balancer with active-passive regional failover

Simpler to implement and manage while ensuring failover.

Run active-active in all regions with synchronous data replication

Implement manual failover procedures documented in runbooks

A microservice logs errors when connecting to the database. The logs show 'connection refused'. Which troubleshooting step should be taken first?

Verify the database Service and Endpoints in Kubernetes

Directly checks if the database service is available.

Scale up the microservice deployment

Restart the microservice pod

Check the logs of other microservices

Which practice is a key principle of cloud-native architecture?

Automated CI/CD pipelines

Enables rapid and reliable deployments.

Manual configuration management

Tight coupling of services

Preferring stateful applications over stateless

A cloud-native application uses a service mesh (Istio) for traffic management. The team notices increased latency in inter-service communication. Which likely cause should be investigated first?

Kubernetes Network Policies blocking traffic

Misconfigured sidecar proxy settings

Can cause significant latency.

Application code is not optimized for the mesh

mTLS encryption overhead

Want more Cloud Native Architecture practice?

All Cloud Native Observability questions

Domain 4: Cloud Native Observability

Use kubectl describe pod to check recent events

Query Prometheus for kubelet metrics on probe successes and failures

Metrics like 'probe_success' from kubelet can show probe status over time, helping identify intermittent failures.

Increase log verbosity in the application to capture all requests

Enable distributed tracing across the service mesh

Deploy a single Prometheus instance with namespace labels on all metrics

Use a global Prometheus with recording rules to aggregate per-namespace metrics

Have each tenant deploy their own monitoring stack and view separately

Deploy a Prometheus instance per tenant and use Thanos to aggregate metrics globally

Per-tenant Prometheus ensures isolation, and Thanos sidecar allows secure global aggregation with proper RBAC.

A Kubernetes administrator is troubleshooting a pod that is stuck in CrashLoopBackOff. The pod's restart count is increasing. Which initial step should the administrator take to diagnose the issue?

Run 'kubectl describe pod <pod-name>' to check events

Check the Prometheus metrics for the pod's CPU usage

Run 'kubectl exec -it <pod-name> -- /bin/sh' to inspect the container

Run 'kubectl logs <pod-name>' to view the application logs

Logs often contain error messages that explain why the application is crashing.

histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[1m])) > 0.5

histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) > 0.5

Correctly calculates 99th percentile over 5 minutes, then compares to 0.5 seconds.

avg(rate(http_request_duration_seconds_bucket[5m])) > 0.5

max(rate(http_request_duration_seconds_bucket[5m])) > 0.5

Which TWO of the following are best practices for structuring log output in cloud-native applications to maximize observability?

Include verbose debug-level information in every log line

Use multi-line log entries for detailed error information

Output logs in structured format such as JSON

Structured logs are machine-parseable and easily ingested by log aggregators.

Include a unique request or correlation ID in each log entry

Correlation IDs help trace requests across microservices.

Avoid timestamps to reduce log size

Which THREE of the following are valid use cases for distributed tracing in a microservices architecture?

Monitoring CPU and memory usage of each service instance

Understanding the dependency graph between microservices

Traces reveal service call relationships.

Pinpointing the root cause of an error in a distributed transaction

Tracing shows where errors occur in the flow.

Identifying which service contributes the most latency to an end-user request

Tracing shows time spent in each span.

Capturing detailed error messages and stack traces

Want more Cloud Native Observability practice?

All Cloud Native Application Delivery questions

Domain 5: Cloud Native Application Delivery

A startup wants to minimize downtime during application updates in Kubernetes. Which deployment strategy should they use?

RollingUpdate

Replaces pods incrementally, maintaining availability.

Canary

Blue/Green

Recreate

A DevOps engineer notices that after a Helm upgrade, the new pods are crash looping with 'ImagePullBackOff'. What is the most likely cause?

The pod's liveness probe is misconfigured

The Helm chart has a wrong image tag

A mistyped or non-existent tag leads to pull failures.

The service account lacks permissions

The deployment's resource requests exceed node capacity

A single branch with all environment manifests in the same folder

Separate repositories per environment

Store all manifests in a single file with environment labels

A monorepo with a directory per environment and overlays for differences

Standard GitOps pattern; clear separation with shared base and overlays.

A user reports that a ConfigMap update is not reflected in running pods. Which action should be taken to ensure pods receive the updated configuration?

Perform a rollout restart of the deployment

Triggers new pods with updated ConfigMap values.

Delete and recreate the ConfigMap

Edit the deployment and change a label

Restart the kubelet on the nodes

Which TWO of the following are benefits of using Helm for application delivery?

Automatic scaling based on CPU usage

Ability to roll back to previous releases

Helm tracks releases and supports rollback with helm rollback.

Automatic canary deployments

Simplified packaging and templating of Kubernetes resources

Helm charts use Go templates to parameterize manifests.

Built-in monitoring and alerting

Which THREE of the following practices are essential for a secure cloud native CI/CD pipeline?

Sign container images and verify signatures during deployment

Ensures image integrity and authenticity.

Store secrets in plain text in the pipeline configuration

Use a single long-lived service account for all pipeline steps

Scan container images for vulnerabilities before deployment

Identifies known CVEs in images.

Apply least-privilege IAM roles to pipeline components

Minimizes blast radius in case of compromise.

Want more Cloud Native Application Delivery practice?