CCNA Rolling Updates Questions

45 questions · Rolling Updates topic · All types, answers revealed

1
MCQmedium

An administrator uses a rolling update strategy with serial: 3 and max_fail_percentage: 20. They have 10 hosts in the inventory. The first batch of 3 hosts: 2 succeed, 1 fails. What happens next?

A.The playbook continues with the next batch of 3 hosts.
B.The playbook retries the failed host, then continues.
C.The playbook marks the failed host as unreachable and continues.
D.The playbook aborts and no further hosts are updated.
AnswerD

Correct. The failure percentage in the batch exceeded max_fail_percentage.

Why this answer

Option D is correct because when `max_fail_percentage` is set to 20% and the inventory has 10 hosts, the maximum allowed failures across the entire play is 2 hosts (20% of 10 = 2). In the first batch of 3 hosts, 1 failure already occurred. If the playbook continued and another failure happened in a subsequent batch, the total failures would exceed 2, violating the `max_fail_percentage` constraint.

Ansible's rolling update logic aborts the entire play immediately when a failure occurs in a batch if the cumulative failures would exceed the allowed percentage, preventing further updates.

Exam trap

The trap here is that candidates mistakenly think `max_fail_percentage` applies per batch rather than to the total inventory, leading them to believe the playbook can continue with the next batch since only 1 of 3 hosts failed in the first batch.

How to eliminate wrong answers

Option A is wrong because continuing with the next batch would risk exceeding the `max_fail_percentage` of 20% (2 failures allowed out of 10 hosts) since 1 failure has already occurred and any additional failure would push the total to 2 or more, which is not permitted. Option B is wrong because Ansible does not automatically retry failed hosts in a rolling update; it aborts the play when the failure threshold is reached, and retrying would not change the fact that the failure count already consumes half of the allowed failures. Option C is wrong because marking the host as unreachable does not resolve the failure count; the `max_fail_percentage` is based on actual failures, not reachability status, and the playbook still aborts to prevent exceeding the threshold.

2
MCQmedium

During a rolling update using an Ansible playbook with serial: 2, one host in the first batch becomes unreachable. The playbook fails with an unreachable host error. How should the administrator proceed to complete the update on the remaining hosts while excluding the problematic host?

A.Use 'ansible-playbook playbook.yml --forks 1' to slow down the update.
B.Use 'ansible-playbook playbook.yml --limit all:!hostname' to exclude the unreachable host.
C.Add 'any_errors_fatal: false' to the playbook and rerun.
D.Rerun the playbook with the same command; it will skip the unreachable host automatically.
AnswerB

Correct. --limit can exclude the failed host using the '!' operator.

Why this answer

Option B is correct because the `--limit` flag with the pattern `all:!hostname` uses Ansible's inventory host pattern syntax to exclude a specific host from the playbook run. This allows the administrator to rerun the playbook against all hosts except the unreachable one, completing the rolling update without re-attempting the failed host. The `serial: 2` setting is irrelevant once the host is excluded, as the playbook will only target the remaining reachable hosts.

Exam trap

The trap here is that candidates assume Ansible automatically retries or skips unreachable hosts on subsequent runs, when in fact it will fail again unless the host is explicitly excluded using `--limit` or the connectivity issue is resolved.

How to eliminate wrong answers

Option A is wrong because `--forks 1` reduces the number of parallel connections to 1, which slows down execution but does not exclude the unreachable host; the playbook will still fail when it attempts to connect to that host. Option C is wrong because `any_errors_fatal: false` (the default) does not prevent failure from an unreachable host; unreachable hosts cause a fatal error regardless of this setting, and the playbook will still abort. Option D is wrong because Ansible does not automatically skip unreachable hosts on a rerun; the playbook will fail again on the same host unless it is explicitly excluded or the connectivity issue is resolved.

3
Drag & Dropmedium

Drag and drop the steps to configure a firewall rule using firewalld to allow HTTPS traffic in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order

Why this order

Firewalld commands: check zone, add service with --permanent, reload, verify, test.

4
MCQeasy

In OpenShift, a DeploymentConfig uses the RollingUpdate strategy. Which parameter controls the maximum number of pods that can be unavailable during an update?

A.minReadySeconds
B.maxSurge
C.revisionHistoryLimit
D.maxUnavailable
E.progressDeadlineSeconds
AnswerD

maxUnavailable sets the maximum number of pods that can be unavailable during the update.

Why this answer

In OpenShift, the RollingUpdate strategy for a DeploymentConfig uses the `maxUnavailable` parameter to specify the maximum number or percentage of pods that can be unavailable during the update process. This ensures that the desired number of pods remain available to serve traffic while the update rolls out, controlling the trade-off between update speed and availability.

Exam trap

The trap here is that candidates often confuse `maxUnavailable` with `maxSurge`, mistakenly thinking that controlling how many extra pods are created is the same as controlling how many can be unavailable, but `maxSurge` limits overshoot while `maxUnavailable` limits undershoot.

How to eliminate wrong answers

Option A is wrong because `minReadySeconds` controls how long a pod must be ready before it is considered available, not the number of unavailable pods during an update. Option B is wrong because `maxSurge` controls the maximum number of pods that can be created above the desired count during an update, not the number that can be unavailable. Option C is wrong because `revisionHistoryLimit` controls how many old ReplicationControllers are retained for rollback, not the update availability threshold.

Option E is wrong because `progressDeadlineSeconds` sets the maximum time for the deployment to make progress before it is considered failed, not the number of unavailable pods.

5
MCQhard

You are managing a critical web application deployed on OpenShift with 12 replicas. The application must maintain at least 10 replicas available during updates to meet an SLA. You initiate a rolling update using the default strategy, but the rollout is progressing slowly because only 2 new pods are created at a time, causing a prolonged update duration. You need to speed up the rollout without violating the SLA (10 available replicas). The current Deployment configuration has maxSurge: 25% (3 pods) and maxUnavailable: 25% (3 pods). You have permission to update the DeploymentConfig during the active rollout. Which action should you take?

A.Cancel the current rollout, set maxSurge to 4 and maxUnavailable to 2, then start a new rollout.
B.Set maxSurge to 50% and maxUnavailable to 50%, then run 'oc rollout resume'.
C.Set maxSurge to 5 and maxUnavailable to 2, then run 'oc rollout retry'.
D.Set maxSurge to 10 and maxUnavailable to 2, then run 'oc rollout resume'.
AnswerD

maxUnavailable=2 ensures at most 2 pods down, keeping 10 available. maxSurge=10 allows more new pods in parallel, speeding the rollout. Modifying during rollout is valid; resume continues the update.

6
Multi-Selectmedium

A company uses Ansible to perform a rolling update of 10 web servers behind an HAProxy load balancer. The playbook uses the `serial` keyword and includes tasks to disable a host from the load balancer, update the web server package, and re-enable the host. Which TWO best practices should the administrator apply to minimize downtime and ensure a successful rolling update?

Select 2 answers
A.Use `any_errors_fatal: true` to stop the playbook if any host fails.
B.Set `serial: 1` to update one host at a time.
C.Use `throttle: 1` to limit the number of concurrent tasks across all hosts.
D.Ensure the load balancer draining timeout is longer than the maximum expected update time per host.
E.Use `async` and `poll` to run the update tasks in the background while proceeding to the next host immediately.
AnswersB, D

Updating one host at a time minimizes the impact on the load balancer pool and ensures continuous service availability.

Why this answer

Option A is correct because setting `serial: 1` updates one host at a time, ensuring that at most one host is out of the pool, preserving capacity. Option D is correct because ensuring the load balancer draining timeout is longer than the update time prevents the host from being prematurely re-enabled before the update completes. Option B is wrong because `any_errors_fatal: true` would stop the entire update on the first failure, which is too aggressive for a rolling update where failures on individual hosts can be tolerated.

Option C is wrong because `throttle: 1` limits concurrency of a single task across hosts, but does not control batch size; it can coexist with `serial` but is not a best practice for rolling update orchestration. Option E is wrong because `async` and `poll` are used for long-running tasks that should run in the background, not for sequential batch updates.

7
MCQeasy

The deployment has 4 replicas. During a rolling update, what is the maximum number of pods that can be unavailable at any single time?

A.2
B.4
C.0
D.3
E.1
AnswerE

25% of 4 replicas is 1 pod.

Why this answer

Option E is correct because the default rolling update strategy for a Deployment in Kubernetes sets `maxUnavailable` to 25% of the desired replicas, rounded up. With 4 replicas, 25% equals 1, so at most 1 pod can be unavailable during the update. This ensures minimal disruption while allowing the update to proceed.

Exam trap

The trap here is that candidates often assume the default `maxUnavailable` is 1 regardless of replica count, but it is actually 25% of the desired replicas, rounded up, so with 4 replicas it is 1, but with 5 replicas it would be 2 (25% of 5 = 1.25, rounded up to 2).

How to eliminate wrong answers

Option A is wrong because 2 would correspond to a `maxUnavailable` of 50%, which is not the default and would require explicit configuration; the default is 25%. Option B is wrong because 4 would mean all pods could be unavailable simultaneously, which contradicts the rolling update strategy's goal of maintaining availability; this would be a recreate strategy. Option C is wrong because 0 would require `maxUnavailable` to be set to 0, which is possible but not the default; the default allows some unavailability to speed up the update.

Option D is wrong because 3 would correspond to 75% unavailability, which is not the default and would cause excessive disruption; the default is 25%.

8
Multi-Selectmedium

Which THREE commands can verify the status of a rolling update on an OpenShift deployment?

Select 3 answers
A.oc describe deployment/<name>
B.oc logs deployment/<name>
C.oc get events
D.oc get pods -l <selector>
E.oc rollout status deployment/<name>
AnswersA, C, E

Shows conditions like Progressing and Available.

Why this answer

Option A is correct because `oc describe deployment/<name>` provides detailed information about the deployment, including the current rollout status, revision history, and conditions such as Progressing or Available. This allows you to verify whether a rolling update is in progress, has completed, or has stalled by examining fields like `Replicas`, `UpdatedReplicas`, and `Conditions`.

Exam trap

The trap here is that candidates often confuse `oc logs` with `oc rollout status` or assume that listing pods alone is sufficient to determine rollout health, but only `oc rollout status` and `oc describe deployment` provide the authoritative rollout state, while `oc get events` reveals cluster-level events that may indicate rollout issues.

9
MCQeasy

An Ansible playbook uses the 'deployment' resource with a 'rolling_update' strategy. Which module is typically used to manage this in a Kubernetes/OpenShift environment?

A.ansible.builtin.copy
B.ansible.builtin.command
C.kubernetes.core.k8s
D.ansible.builtin.shell
E.ansible.builtin.service
AnswerC

The k8s module manages Kubernetes resources including deployments with rollout strategies.

Why this answer

The `kubernetes.core.k8s` module is the correct choice because it directly manages Kubernetes/OpenShift resources, including Deployments, and supports rolling update strategies natively. It interacts with the Kubernetes API to apply declarative configurations, handling the rolling update logic (e.g., `maxSurge`, `maxUnavailable`) without requiring manual orchestration via shell commands or file copies.

Exam trap

The trap here is that candidates may confuse managing system services (via `service`) or executing kubectl commands (via `command`/`shell`) with the proper Ansible module designed for Kubernetes resource orchestration, leading them to pick a generic module instead of the domain-specific `kubernetes.core.k8s`.

How to eliminate wrong answers

Option A is wrong because `ansible.builtin.copy` is used to copy files to the managed node's filesystem, not to interact with Kubernetes API resources. Option B is wrong because `ansible.builtin.command` executes arbitrary commands on the target host but lacks idempotency and native Kubernetes resource management, making it unsuitable for rolling updates. Option D is wrong because `ansible.builtin.shell` is similar to `command` but runs through a shell, introducing parsing risks and still not managing Kubernetes objects.

Option E is wrong because `ansible.builtin.service` manages system services (e.g., systemd) on the managed node, not container orchestration resources like Deployments.

10
MCQhard

You are managing a rolling update of a 10-node web application cluster using Ansible. The application requires that at least 8 nodes remain available during the update to handle traffic. You have written a playbook that uses serial: 2 (updates 2 nodes at a time). During a test run, the playbook updates the first batch of 2 nodes successfully, but when it proceeds to the second batch, one of the nodes fails to restart the web service. However, the playbook continues and updates the remaining nodes. At the end, only 7 nodes are healthy, causing performance degradation. You need to ensure that if a batch fails to meet the minimum health requirements, the entire rollout is stopped and no further updates are applied. Which course of action should you take?

A.Add a retry loop to the service restart task with a delay and count of 5.
B.Set ignore_errors: yes on the service restart task to avoid failures stopping the playbook.
C.Use the 'throttle' keyword with a rolling update strategy that includes a post-task health check and set max_fail_percentage to a value that aborts if the healthy node count drops below 8.
D.Increase serial to 3 to complete the update faster and reduce the chance of node failures.
AnswerC

throttle and max_fail_percentage combined can enforce health thresholds and abort the rollout when conditions are not met.

Why this answer

Option C is correct because it uses the `throttle` keyword with a rolling update strategy that includes a post-task health check and sets `max_fail_percentage` to abort the playbook if the healthy node count drops below 8. This ensures that if a batch fails to meet the minimum health requirements, the entire rollout is stopped and no further updates are applied, preventing performance degradation.

Exam trap

The trap here is that candidates often confuse retry mechanisms or error handling with the need for a batch-level health check and abort logic, assuming that retrying a failed task or ignoring errors will somehow prevent the overall rollout from continuing when health thresholds are breached.

How to eliminate wrong answers

Option A is wrong because a retry loop on the service restart task only retries the failed task on the same node; it does not stop the overall rollout or check the cluster-wide health status after each batch. Option B is wrong because `ignore_errors: yes` would cause the playbook to continue despite the failure, which is the opposite of stopping the rollout when health requirements are not met. Option D is wrong because increasing `serial` to 3 would update more nodes per batch, potentially causing even more nodes to be unhealthy at once and increasing the risk of dropping below the minimum of 8 healthy nodes.

11
Multi-Selecteasy

Which TWO of the following are valid 'serial' values for an Ansible rolling update playbook?

Select 2 answers
A.serial: [1,2,3]
B.serial: 50%
C.serial: 1
D.serial: '25%'
E.serial: 'batch'
AnswersB, C

A percentage is a valid value.

Why this answer

Option B is correct because Ansible's 'serial' keyword accepts a percentage value to control how many hosts are updated at a time during a rolling update. The format '50%' (without quotes in YAML) tells Ansible to update half of the hosts in the batch, which is a valid and commonly used pattern for gradual rollouts.

Exam trap

The trap here is that candidates confuse the valid 'serial' syntax with other Ansible constructs like 'batch' or lists, or incorrectly assume that quoted percentage strings are acceptable, when in fact Ansible requires the percentage as a bare value without quotes.

12
MCQhard

An organization uses Ansible Automation Platform to perform rolling updates on a 5-node PostgreSQL replication cluster. The playbook uses `serial: 1` and includes tasks to promote a standby, demote the primary, update PostgreSQL packages, and then re-elect the original primary. The cluster health check task verifies that replication lag is under 10 seconds before proceeding to the next node. Recently, during an update of the primary node (node1), the health check after re-election fails because replication lag is 15 seconds due to a large write load. The playbook aborts, leaving the cluster in a degraded state with node1 updated but not serving as primary. The administrator needs to ensure that the update continues while still maintaining cluster integrity. Which action should the administrator take?

A.Wrap the update tasks in a block with a rescue handler that reverts the update on the failed host and then continues with the next host.
B.Add `ignore_errors: yes` to the health check task so the playbook continues despite the failure.
C.Remove the health check task to allow the update to proceed without interruption.
D.Set `max_fail_percentage: 20` to allow up to one failure per update run.
AnswerA

This ensures that if any node fails the health check, the changes on that node are rolled back, and the update proceeds with the next node, preserving cluster integrity.

Why this answer

Option C is correct because using a rescue block allows reverting the update on node1 (e.g., demote it back to standby, re-promote the original standby) so the cluster returns to its original state, and then continue with the next node. This ensures no node remains in an inconsistent state and the update can proceed. Option A is wrong because removing the health check could allow a broken cluster state to go undetected, risking data integrity.

Option B is wrong because `ignore_errors` would continue without reverting the failed node, leaving node1 in a half-updated state that could cause issues. Option D is wrong because `max_fail_percentage: 20` would allow only 1 failure out of 5, but if node1 fails, the count is 1 (20%), so the playbook would still abort; even if it continued, it would not revert the failed node, leading to inconsistencies.

13
MCQmedium

A team uses Ansible to update a database cluster with one primary and two replicas. The goal is zero downtime. Which update order is the safest?

A.Update replicas first, then the primary.
B.Update in random order.
C.Update all nodes simultaneously.
D.Update the primary first, then replicas.
AnswerA

Correct. Replicas are updated first, then the primary after confirmation.

Why this answer

Updating replicas first ensures that if the update introduces a regression, it affects only the read-only replicas, which can be quickly rolled back without impacting write availability. Once replicas are confirmed healthy, the primary is updated and a controlled failover (e.g., using `patronictl switchover` or `repmgr standby switchover`) promotes a replica to primary, minimizing downtime to seconds. This order aligns with the principle of reducing blast radius and maintaining quorum in a cluster.

Exam trap

The trap here is that candidates assume updating the primary first is safer because it is the 'source of truth,' but in a clustered environment with zero-downtime requirements, updating replicas first is the standard practice to preserve write availability and allow safe rollback.

How to eliminate wrong answers

Option B is wrong because updating in random order risks updating the primary first, causing a write outage if the update fails, and may break replication consistency if replicas are updated before the primary without a controlled failover. Option C is wrong because updating all nodes simultaneously can cause a complete cluster outage if the update introduces a bug, and it violates the zero-downtime requirement by potentially losing quorum or causing split-brain scenarios. Option D is wrong because updating the primary first forces a failover to a replica that still runs the old version, which may be incompatible with the updated primary's data format or replication protocol, leading to replication lag or cluster instability.

14
MCQeasy

An administrator runs this playbook against a group of 10 web servers. The update fails on the third host (host3) due to a yum error. What is the most likely outcome?

A.Only the first batch (host1 and host2) are updated successfully; the remaining hosts are skipped.
B.All hosts are disabled in HAProxy and the playbook fails.
C.The playbook continues with the remaining hosts because the number of failures (1) is below the 25% threshold.
D.The playbook halts immediately after the failure on host3.
AnswerC

With `max_fail_percentage: 25`, up to 2 failures are allowed out of 10 hosts. One failure does not stop the playbook.

Why this answer

Option D is correct because `max_fail_percentage: 25` allows up to 25% of hosts to fail before aborting. With 10 hosts, 2 failures are allowed (25% of 10 = 2.5, so 2 failures). After host3 fails, only 1 failure has occurred (host3), which is below the threshold, so the playbook continues with the remaining batches.

Option A is wrong because only host3's disable/enable tasks would be affected; the playbook does not disable all hosts. Option B is wrong because `max_fail_percentage` prevents immediate halt unless the failure threshold is exceeded. Option C is wrong because the first batch (hosts 1-2) completes successfully, but the playbook continues to the next batch (hosts 3-4), and even though host3 fails, host4 may still be updated (unless it also fails).

15
MCQhard

An organization uses Ansible Tower (AWX) for rolling updates. They have a job template that runs a playbook with serial: 5. The inventory contains 50 hosts. The update fails after the first batch due to a syntax error in a playbook. After fixing the error, the administrator wants to resume updating from where it left off without updating already successful hosts. Which approach achieves this?

A.Use the job template survey to input a list of hosts to skip, and pass it as --limit.
B.Create a new job template with a dynamic inventory subset excluding the first batch hosts.
C.Modify the playbook to check if a host has already been updated using a fact and skip it.
D.Rerun the entire playbook; Ansible will skip hosts that are already in the desired state.
AnswerA

Correct. A survey variable can be used in the extra variables or limit field to exclude specific hosts.

Why this answer

Option A is correct because Ansible Tower's job template survey can collect a list of hosts to skip, which is then passed as the `--limit` option to the playbook. This allows the administrator to resume the rolling update from the next batch by excluding the first five already-successful hosts, avoiding re-running the playbook on them.

Exam trap

The trap here is that candidates assume Ansible's idempotency will automatically skip already-updated hosts, but in practice, idempotency depends on task design and does not prevent re-execution on successful hosts, which can cause unnecessary load or side effects in rolling updates.

How to eliminate wrong answers

Option B is wrong because creating a new job template with a dynamic inventory subset is cumbersome and error-prone; it requires manual inventory management and does not leverage Tower's built-in survey mechanism. Option C is wrong because modifying the playbook to check a fact for update status is unreliable and violates idempotency best practices; it adds complexity and may not accurately reflect the host state after a failed batch. Option D is wrong because rerunning the entire playbook with serial: 5 would re-execute on the first batch hosts, potentially causing unintended side effects or requiring them to be idempotent; Ansible does not automatically skip hosts based on desired state unless the tasks are idempotent, which is not guaranteed for all operations.

16
MCQmedium

Refer to the exhibit. The playbook uses serial: 1 (one host at a time). The update failed on web3.example.com. Based on the output, what is the most likely reason the play did not abort the rollout and how should the playbook be modified to stop on failure?

A.Add retries: 3 to the 'Update Apache config' task.
B.Set ignore_errors: yes on the 'Update Apache config' task.
C.Add max_fail_percentage: 0 to the play to abort on any failure.
D.Increase the serial value to update multiple hosts at once.
AnswerC

max_fail_percentage: 0 aborts the play if any host fails, preventing inconsistent state.

Why this answer

Option C is correct because the play uses `serial: 1` to update one host at a time, but by default Ansible continues to the next host even if a task fails on the current host. Setting `max_fail_percentage: 0` at the play level tells Ansible to abort the entire play immediately if any host fails, which is the intended behavior for a rolling update where a single failure should stop the rollout.

Exam trap

Red Hat often tests the distinction between per-task error handling (`ignore_errors`, `retries`) and play-level failure thresholds (`max_fail_percentage`), and the trap here is that candidates mistakenly think retrying a task or ignoring errors will stop the rollout, when in fact only `max_fail_percentage` controls whether the play aborts across hosts.

How to eliminate wrong answers

Option A is wrong because adding `retries: 3` to the 'Update Apache config' task would cause Ansible to retry that task up to three times on the same host, but it does not change the default behavior of continuing to the next host after a failure; the play would still proceed to web4.example.com after exhausting retries on web3.example.com. Option B is wrong because `ignore_errors: yes` would cause Ansible to treat the failure as a success and continue the rollout, which is the opposite of what is needed to stop on failure. Option D is wrong because increasing the `serial` value would update more hosts concurrently, but it does not address the core issue of aborting on failure; in fact, it could make the problem worse by allowing multiple hosts to fail before the play stops.

17
MCQhard

A company wants to implement a rolling update for a stateful application where hosts cannot be updated in parallel due to data consistency. They also need to ensure that if any host fails, the entire update is rolled back. Which strategy meets these requirements?

A.Use serial: 2 and any_errors_fatal: yes
B.Use serial: 1 and ignore_errors: yes
C.Use serial: 0 and max_fail_percentage: 0
D.Use serial: 1 and any_errors_fatal: yes
AnswerD

Correct. Single host updates and stop on error.

Why this answer

serial: 1 ensures one host at a time; any_errors_fatal: true causes the playbook to stop on first failure on any host, allowing rollback. The other options either allow parallel updates or don't stop on failure.

18
Multi-Selectmedium

Which TWO options are valid techniques for rolling out updates to a subset of hosts before updating the rest? (Choose exactly two.)

Select 2 answers
A.Use serial: 3 to update hosts in batches of 3.
B.Use a canary group with a separate playbook run and manual verification before updating the full fleet.
C.Use inventory host variables to mark hosts for early update and use conditional tasks.
D.Use the --forks=1 option to update one host at a time.
E.Use the 'strategy: random' directive in the playbook.
AnswersA, B

Serial is the built-in rolling update mechanism.

Why this answer

Canary deployments and serial batches are common techniques. Rolling update by serial is inherent in Ansible. Random selection is not a controlled technique.

19
MCQhard

A large enterprise manages thousands of servers grouped by data center. They are designing a rolling update that must complete within a maintenance window. Which combination of Ansible strategies best minimizes total update time while maintaining safety?

A.Set serial: 0 to update all hosts simultaneously.
B.Set serial to 10% and max_fail_percentage to 25%.
C.Set forks to 100 and max_fail_percentage to 50.
D.Set serial to 1 to update one host at a time with max_fail_percentage: 0.
AnswerB

Correct. Batch size of 10% updates many hosts in parallel, and 25% failure threshold allows some failures without aborting.

Why this answer

Option B is correct because setting `serial: 10%` updates hosts in batches of 10% of the inventory, which parallelizes the update across many hosts to minimize total time, while `max_fail_percentage: 25%` provides a safety net by aborting the play if more than 25% of the batch fails, preventing a cascade of failures from taking down the entire data center. This combination balances speed and safety for large-scale rolling updates within a maintenance window.

Exam trap

The trap here is that candidates confuse `serial` with `forks` or think that `serial: 0` is a valid way to update all hosts at once, when in fact `serial` must be a positive integer or percentage, and `forks` only controls parallelism within a batch, not the batch size itself.

How to eliminate wrong answers

Option A is wrong because `serial: 0` is not a valid Ansible setting; `serial` accepts an integer or percentage, and setting it to 0 would cause an error or be ignored, and the intent to update all hosts simultaneously would require `serial: 100%` or a very high number, but that eliminates rolling update safety entirely. Option C is wrong because `forks` controls the number of parallel tasks per batch, not the batch size; setting `forks: 100` with a default `serial` of 1 still updates one host at a time, so it does not minimize total update time, and `max_fail_percentage: 50` is too permissive, allowing half the batch to fail before aborting. Option D is wrong because `serial: 1` updates only one host at a time, which is the slowest possible approach and will not complete within a maintenance window for thousands of servers, and `max_fail_percentage: 0` aborts on any single failure, which is overly restrictive and not necessary for safety in a rolling update.

20
Multi-Selecteasy

Which TWO conditions cause an Ansible rolling update playbook to abort immediately? (Choose exactly two.)

Select 2 answers
A.The failure count in a batch exceeds max_fail_percentage.
B.A task returns 'changed' when 'changed_when: false' is used.
C.A task fails on a host, and 'any_errors_fatal' is set to yes.
D.The playbook runs with the --check flag.
E.A host is unreachable due to network issues.
AnswersA, C

Correct. The playbook aborts when batch failure percentage is exceeded.

Why this answer

Option A is correct because Ansible's rolling update logic uses `max_fail_percentage` to control batch failure tolerance. If the number of failed hosts in a batch exceeds this percentage, the playbook aborts immediately to prevent cascading failures. Option C is correct because setting `any_errors_fatal: yes` causes the entire playbook to abort as soon as any task fails on any host, regardless of batch boundaries.

Exam trap

The trap here is that candidates often confuse 'unreachable' (Option E) with a fatal error, but Ansible treats unreachable hosts as failed hosts that are skipped, not as an immediate abort condition unless explicitly configured with `any_errors_fatal` or `max_fail_percentage`.

21
MCQeasy

You are managing a Kubernetes cluster running a critical stateful application deployed as a StatefulSet with 3 replicas. The application uses persistent volumes with ReadWriteOnce access mode. You need to update the container image from version 1.0 to 1.1. The application's performance degrades if more than one replica is unavailable at any time. The StatefulSet is configured with the default RollingUpdate strategy (partition=0). You have a maintenance window of 30 minutes. The update must be completed within the window with minimal risk. Which of the following approaches should you take?

A.Manually delete each pod and let the StatefulSet recreate them with the new image.
B.Use the default rolling update with partition=0; the StatefulSet will update one pod at a time automatically.
C.Set maxUnavailable to 2 to speed up the rollout, then update the image.
D.Perform a canary update by setting partition to 2, updating one pod, validating, then setting partition to 0.
AnswerB

Correct: The default RollingUpdate strategy for StatefulSet updates one pod at a time, ensuring only one replica is unavailable at any moment, meeting the requirement.

Why this answer

Option B is correct because the default RollingUpdate strategy with partition=0 updates one pod at a time, ensuring that no more than one replica is unavailable at any moment. This matches the application's constraint that performance degrades if more than one replica is unavailable, and the 30-minute window is sufficient for a rolling update of 3 pods.

Exam trap

The trap here is that candidates may think a canary update (Option D) is safer, but they overlook that setting partition to 0 after the canary triggers a full rollout that updates remaining pods concurrently, violating the 'no more than one replica unavailable' constraint.

How to eliminate wrong answers

Option A is wrong because manually deleting pods bypasses the StatefulSet's rolling update logic, potentially causing all pods to be recreated simultaneously, which would violate the requirement that no more than one replica be unavailable at a time. Option C is wrong because setting maxUnavailable to 2 would allow two pods to be unavailable during the update, directly violating the application's constraint. Option D is wrong because setting partition to 2 would initially update only the pod with ordinal 2, but the subsequent step of setting partition to 0 would trigger an update of all remaining pods at once, risking more than one replica being unavailable.

22
MCQhard

A DevOps engineer is responsible for coordinating a rolling update of a Red Hat OpenShift Container Platform 4.12 cluster with 10 worker nodes. The cluster hosts a stateful application that uses persistent volumes with ReadWriteOnce access mode. The update involves a minor version upgrade of the cluster from 4.12.0 to 4.12.5. The engineer uses the recommended `oc adm upgrade` command. During the update, after the first worker node is updated, the engineer notices that the node's status shows 'NotReady' and the cluster version operator reports a degraded status. A check of the node logs reveals 'kubelet: Failed to run kubelet: Could not get kubelet config from cluster: could not get config from cluster: context deadline exceeded'. Which action should the engineer take first?

A.Rebuild the node from scratch using the machine config operator.
B.Check the network connectivity between the updated node and the control plane nodes on port 6443.
C.Roll back the entire cluster to version 4.12.0 using the `oc adm upgrade --to=4.12.0` command.
D.Increase the kubelet's `--node-status-update-frequency` parameter on the updated node.
AnswerB

The error suggests a timeout, so network connectivity is the likely cause.

Why this answer

Option B is correct because the error indicates the kubelet cannot communicate with the control plane on port 6443, likely due to a network issue (e.g., firewall, DNS, or routing). Option A is wrong because rolling back the entire cluster is a drastic first step; the issue is likely isolated. Option C is wrong because changing kubelet parameters is not a standard resolution for connectivity problems.

Option D is wrong because rebuilding the node from scratch is unnecessary and time-consuming; the issue is likely resolvable by restoring network connectivity.

23
MCQeasy

A systems administrator is performing a rolling update of a three-node Red Hat Enterprise Linux 8 cluster running a load-balanced web application. The update involves upgrading the httpd package. The administrator uses Ansible to update one node at a time. After updating the first node, the administrator checks the application health and finds that the node is serving requests correctly. The administrator proceeds to update the second node. However, after the second node update completes, the load balancer reports that both the first and second nodes are unavailable. What is the most likely cause?

A.The httpd service on both nodes experienced a segmentation fault after the update.
B.The first node was inadvertently excluded from the load balancer pool by the Ansible playbook.
C.The load balancer was configured to only allow one node to be active at a time.
D.The second node's update triggered a configuration change that disabled the health check endpoint on both nodes.
AnswerD

A shared configuration or package dependency could impact both nodes.

Why this answer

Option C is correct because the update on the second node may have triggered a configuration change (e.g., via a shared config file or package dependency) that also affected the health check endpoint on the first node. Option A is wrong because if the first node were excluded, it would have been unavailable immediately after its update. Option B is wrong because a segmentation fault is unlikely to affect both nodes simultaneously.

Option D is wrong because load balancers typically do not limit active nodes to one; they manage pools of multiple nodes.

24
MCQhard

An Ansible rolling update playbook has 'serial: 1' and 'max_fail_percentage: 0'. During the update of a 5-host group, the first host fails. What is the outcome?

A.The play pauses for manual intervention
B.The play retries the failed host
C.The play aborts immediately
D.The play continues with the remaining 4 hosts
E.The play marks the host as unreachable and continues
AnswerC

Any failure with max_fail_percentage: 0 aborts the entire play.

25
Multi-Selecthard

Which TWO of the following are best practices when coordinating rolling updates with Ansible?

Select 2 answers
A.Define a 'max_fail_percentage' to abort the update if too many hosts fail.
B.Use the 'serial' keyword to update a subset of hosts at a time.
C.Use 'strategy: free' to allow hosts to run tasks independently.
D.Use 'gather_facts: no' to speed up the playbook.
E.Set 'any_errors_fatal: true' to stop the update on the first failure.
AnswersA, B

This ensures the update stops if a critical number of hosts fail, preventing widespread issues.

Why this answer

Option A is correct because 'max_fail_percentage' allows you to define a threshold of host failures (as a percentage of the batch size) that, when exceeded, causes Ansible to abort the entire rolling update. This prevents the update from continuing when too many hosts have failed, which could lead to an inconsistent or degraded state across the infrastructure. Option B is correct because the 'serial' keyword controls the number of hosts (or percentage) that Ansible updates in each batch, ensuring that only a subset of hosts is taken out of service at a time, which maintains overall service availability during the rolling update.

Exam trap

The trap here is that candidates often confuse 'any_errors_fatal' (which stops on the first failure globally) with 'max_fail_percentage' (which aborts only after a threshold of failures in a batch), leading them to select option E instead of A.

26
MCQeasy

An administrator notices that during a rolling update, the playbook seems to hang after updating the first host. The playbook uses serial: 5. What is the most likely cause?

A.The playbook has an infinite loop.
B.One of the hosts in the batch is taking too long to complete its tasks.
C.The SSH control path is exhausted.
D.The max_fail_percentage is set too high.
AnswerB

Correct. A slow host delays the entire batch because Ansible waits for all hosts in the batch.

Why this answer

When `serial: 5` is set, Ansible processes hosts in batches of five. If one host in the batch takes an unusually long time to complete its tasks (e.g., due to a slow network, a hanging service restart, or a long-running command), the entire batch will appear to hang because Ansible waits for all hosts in the current batch to finish before proceeding to the next batch. This is the most likely cause of the observed behavior during a rolling update.

Exam trap

Red Hat often tests the misconception that `serial` controls parallelism across all hosts (like `forks`), but the trap here is that `serial` batches hosts sequentially, so a single slow host in a batch blocks the entire batch from completing, causing the playbook to appear to hang.

How to eliminate wrong answers

Option A is wrong because an infinite loop would cause the playbook to run indefinitely on a single host, not hang after updating the first host in a batch; the playbook would continue looping on that host without progressing. Option C is wrong because SSH control path exhaustion would typically manifest as SSH connection failures or errors, not a hang after the first host completes; it is a connection pooling issue, not a batch processing delay. Option D is wrong because `max_fail_percentage` controls how many hosts can fail before Ansible aborts the playbook; a high value would allow more failures, not cause a hang, and it does not affect the timing of task completion within a batch.

27
Multi-Selecthard

Which TWO Ansible playbook parameters directly control the number of host failures allowed before aborting a rolling update?

Select 2 answers
A.max_fail_percentage
B.any_errors_fatal
C.throttle
D.serial
E.ignore_errors
AnswersA, B

Sets the maximum percentage of failed hosts before abort.

Why this answer

Option A is correct because `max_fail_percentage` directly specifies the maximum percentage of hosts that can fail during a rolling update before Ansible aborts the entire batch. Option B is correct because `any_errors_fatal` causes the playbook to stop immediately if any host in the current batch fails, effectively limiting failures to zero before aborting the rolling update.

Exam trap

Red Hat often tests the distinction between `serial` (batch size) and `max_fail_percentage` (failure threshold), causing candidates to mistakenly think `serial` controls failure tolerance when it only controls concurrency.

28
MCQeasy

A team uses Ansible to update a web application across 10 servers with minimal downtime. Which playbook directive achieves one-at-a-time updates?

A.run_once: true
B.delegate_to: localhost
C.serial: 1
D.throttle: 1
E.forks: 10
AnswerC

Updates one host at a time, ensuring minimal downtime.

Why this answer

C is correct because the `serial: 1` directive in an Ansible playbook controls the number of hosts that are updated simultaneously. Setting `serial: 1` forces Ansible to execute the playbook on one host at a time, ensuring that the web application is updated sequentially across the 10 servers, which minimizes downtime by keeping the other 9 servers available during each individual update.

Exam trap

The trap here is that candidates confuse `serial` with `forks` or `throttle`, mistakenly thinking that limiting parallel connections (`forks: 1`) or task concurrency (`throttle: 1`) achieves the same sequential host behavior as `serial`, but only `serial` controls the batch size of hosts processed by the playbook.

How to eliminate wrong answers

Option A is wrong because `run_once: true` executes a task on only one host in the batch, not sequentially across all hosts, and is typically used for one-time setup tasks like generating a shared secret. Option B is wrong because `delegate_to: localhost` runs a task on the Ansible control node instead of the target servers, which does not control the order or batch size of host updates. Option D is wrong because `throttle: 1` limits the number of concurrent forks for a specific task but does not enforce sequential host processing across the entire play; it can still allow parallel execution of other tasks.

Option E is wrong because `forks: 10` sets the maximum number of parallel connections Ansible can make, but it does not guarantee one-at-a-time updates; with 10 forks, Ansible could attempt to update all 10 servers simultaneously.

29
Multi-Selectmedium

An operations team is planning a rolling update of a production OpenShift cluster running Red Hat Enterprise Linux CoreOS nodes. Which three practices should be followed to ensure minimal downtime and proper rollback capability?

Select 3 answers
A.Use canary nodes to validate the update before full rollout.
B.Monitor cluster health via `oc get clusterversion` after each node update.
C.Use the `oc adm update` command to orchestrate updates across nodes.
D.Configure a maximum surge of 25% to prevent resource exhaustion.
E.Place nodes in the same Availability Zone to simplify rollback.
AnswersA, C, D

Canary nodes help detect issues early, supporting rollback decisions.

Why this answer

Option A is correct because `oc adm update` is the standard command for orchestrating updates. Option C is correct because a maximum surge of 25% prevents resource exhaustion and ensures controlled rollout. Option E is correct because canary nodes allow validation before full rollout.

Option B is wrong because placing nodes in the same AZ is not necessary and may reduce resilience. Option D is wrong while monitoring is important, it is not a rolling update practice for minimizing downtime or rollback capability.

30
MCQeasy

A team runs the playbook shown in the exhibit. They notice that during the update, some requests are still being sent to servers that have been disabled. What is the most likely cause?

A.The disable task should use 'state: maintenance' instead of 'state: disabled'.
B.The 'serial: 2' setting allows two hosts to be disabled simultaneously, and the load balancer may not have drained connections.
C.The 'delegate_to' should be set to localhost, not lb01.
D.The 'serial' keyword is incorrectly used as a global variable.
AnswerB

With serial:2, both hosts are disabled at once, potentially causing connection issues.

Why this answer

Option B is correct because the `serial: 2` setting causes Ansible to update two hosts at a time. When a host is disabled in the load balancer, existing connections may not be fully drained before the next batch of hosts is updated, allowing traffic to still reach disabled servers. The load balancer needs time to drain active sessions, and a serial batch of 2 can overlap with that drain window.

Exam trap

Red Hat often tests the misconception that `serial` only controls parallelism and has no impact on load balancer connection draining, leading candidates to overlook the need for synchronization between disabling hosts and allowing connections to drain.

How to eliminate wrong answers

Option A is wrong because `state: disabled` is the correct parameter to mark a backend server as disabled in a load balancer module like `nginx` or `haproxy`; `state: maintenance` is not a valid state in standard Ansible load balancer modules. Option C is wrong because `delegate_to: lb01` is appropriate for running the disable task on the load balancer host; setting it to localhost would run the task on the control node, which would not affect the actual load balancer. Option D is wrong because `serial` is a valid play-level keyword that controls batch size, not a global variable; it is correctly used in the playbook to define rolling update behavior.

31
MCQhard

An operations team is designing a rolling update for a stateful application that requires quorum (minimum 3 out of 5 nodes online). They plan to use Ansible's serial keyword. Which serial value ensures the update proceeds without breaking quorum while still being efficient?

A.serial: 2
B.serial: 1
C.serial: 3
D.serial: 5
AnswerA

Updating 2 nodes leaves 3 online, maintaining quorum, and is efficient.

Why this answer

Option A is correct because setting serial: 2 ensures that only 2 nodes are taken down at a time during the rolling update. With a quorum requirement of 3 out of 5 nodes, taking down 2 nodes leaves 3 online, maintaining quorum. This is the most efficient value that does not risk breaking quorum.

Exam trap

The trap here is that candidates may confuse 'quorum' with 'majority' and incorrectly choose serial: 3, thinking that 3 out of 5 is a majority, but fail to realize that taking down 3 nodes leaves only 2 online, which is below the quorum threshold of 3.

How to eliminate wrong answers

Option B is wrong because serial: 1 would take down only 1 node at a time, which is safe but less efficient than serial: 2 since it increases the total update time. Option C is wrong because serial: 3 would take down 3 nodes at once, leaving only 2 online, which breaks the quorum requirement of 3 out of 5 nodes. Option D is wrong because serial: 5 would take down all 5 nodes simultaneously, completely breaking quorum and causing the application to fail.

32
MCQmedium

An OpenShift deployment is stuck in a degraded state after a rolling update. Which command helps diagnose the root cause by showing status conditions?

A.oc rollout history
B.oc get pods
C.oc logs deployment/myapp
D.oc get events
E.oc describe deployment/myapp
AnswerE

Describes the deployment and shows status conditions like Progressing and Available.

Why this answer

Option E is correct because `oc describe deployment/myapp` displays the full status conditions of the Deployment, including `Progressing`, `Available`, and `ReplicaFailure` conditions. When a rolling update leaves the Deployment in a degraded state, the `Conditions` field under `Status` explicitly shows the reason (e.g., `ProgressDeadlineExceeded` or `MinimumReplicasUnavailable`), which directly points to the root cause of the degradation.

Exam trap

The trap here is that candidates often reach for `oc logs` or `oc get events` to debug failures, but the exam specifically tests whether you know that `oc describe` on the Deployment resource reveals the structured status conditions that define the degraded state.

How to eliminate wrong answers

Option A is wrong because `oc rollout history` only shows revision history and change causes, not current status conditions or degradation reasons. Option B is wrong because `oc get pods` lists pod statuses but does not aggregate Deployment-level conditions or explain why the update failed. Option C is wrong because `oc logs deployment/myapp` is invalid syntax (logs are retrieved from pods, not deployments) and would not show Deployment status conditions.

Option D is wrong because `oc get events` shows cluster events but does not present the structured status conditions of a specific Deployment resource.

33
MCQmedium

There are 5 hosts in the webservers group. All updates succeed on the first batch of 2 hosts. On the second batch, one host fails. What is the result?

A.The play retries the failed host automatically.
B.The play marks the host as failed and continues with the next batch.
C.The play continues with the remaining 1 host in the third batch.
D.The play aborts due to max_fail_percentage being exceeded.
E.The play aborts immediately after the failure.
AnswerD

With max_fail_percentage: 0, any failure causes the play to abort.

Why this answer

The play defines a rolling update with a batch size of 2 and a max_fail_percentage of 20%. With 5 hosts total, 20% of 5 is 1, so only 1 failure is allowed. When the second batch has 1 failure, the cumulative failure count reaches 1, which equals the threshold, causing the play to abort.

This is why option D is correct.

Exam trap

Red Hat often tests the misconception that max_fail_percentage applies per batch rather than cumulatively across all hosts, leading candidates to incorrectly choose options that suggest the play continues with remaining batches.

How to eliminate wrong answers

Option A is wrong because Ansible does not automatically retry failed hosts in a rolling update; retries must be explicitly configured via the 'retries' parameter on the task or play. Option B is wrong because the play does not mark the host as failed and continue with the next batch; the max_fail_percentage check occurs after each batch, and exceeding it aborts the entire play. Option C is wrong because the play does not continue with the remaining 1 host in the third batch; the failure in the second batch already triggers the abort condition.

Option E is wrong because the play does not abort immediately after the failure; it completes the current batch of 2 hosts, then evaluates the cumulative failure count against max_fail_percentage before deciding to abort.

34
MCQeasy

An administrator wants to update a web server fleet with minimal downtime. They need to update each server one at a time. Which Ansible playbook directive should be used?

A.throttle: 1
B.forks: 1
C.serial: 1
D.max_fail_percentage: 0
AnswerC

Correct. serial sets the batch size for rolling updates.

Why this answer

Option C is correct because the `serial: 1` directive in an Ansible playbook controls the batch size of hosts that are updated simultaneously. Setting it to 1 ensures that only one host is updated at a time, which minimizes downtime by allowing the rest of the fleet to remain available while each server is sequentially updated.

Exam trap

The trap here is that candidates often confuse `serial` with `forks` or `throttle`, mistakenly thinking that limiting parallel task execution (`forks: 1`) or task concurrency (`throttle: 1`) achieves the same one-at-a-time host update behavior, but only `serial` controls the batch size of hosts processed sequentially.

How to eliminate wrong answers

Option A is wrong because `throttle: 1` limits the number of concurrent tasks per host or per play, but it does not control the batch size of hosts being updated; it limits task concurrency, not the sequential update of hosts. Option B is wrong because `forks: 1` sets the number of parallel processes Ansible uses to execute tasks on hosts, but it still allows all hosts in the batch to be processed in parallel; it does not enforce a one-at-a-time update across the entire fleet. Option D is wrong because `max_fail_percentage: 0` defines the maximum percentage of hosts that can fail before the playbook aborts, but it does not control the order or batch size of updates; it is a failure threshold, not a sequencing mechanism.

35
MCQhard

An OpenShift rolling update is failing because new pods crash immediately. Which parameter automatically triggers a rollback if no progress is made?

A.revisionHistoryLimit
B.maxSurge
C.progressDeadlineSeconds
D.maxUnavailable
E.minReadySeconds
AnswerC

If the deployment does not progress within this time, it is considered failed and rolls back.

Why this answer

The `progressDeadlineSeconds` parameter specifies the maximum duration (in seconds) that a deployment can make no progress before it is considered to have failed. When this deadline is exceeded, the deployment controller automatically triggers a rollback to the previous revision. This is the correct parameter for automatically rolling back a failed rolling update where new pods crash immediately.

Exam trap

The trap here is that candidates confuse `progressDeadlineSeconds` with `minReadySeconds`, thinking that a readiness check alone will trigger a rollback, but `minReadySeconds` only delays availability without initiating a rollback.

How to eliminate wrong answers

Option A is wrong because `revisionHistoryLimit` controls how many old ReplicaSets are retained for rollback, not the timing or automatic rollback trigger. Option B is wrong because `maxSurge` defines the maximum number of pods that can be created above the desired replica count during an update, not a rollback mechanism. Option D is wrong because `maxUnavailable` specifies the maximum number of pods that can be unavailable during the update process, not a progress deadline.

Option E is wrong because `minReadySeconds` determines how long a pod must be ready before it is considered available, but it does not trigger a rollback if no progress is made.

36
Multi-Selectmedium

An administrator needs to update a web application that runs as a Kubernetes Deployment with 5 replicas. The application is stateless, but the update must not cause any downtime. Which TWO strategies ensure zero-downtime rolling updates?

Select 2 answers
A.Omit the liveness probe from the pod spec.
B.Set strategy type to RollingUpdate with maxUnavailable=0 and maxSurge=1.
C.Set maxUnavailable=1 and maxSurge=0.
D.Use the Recreate strategy.
E.Configure a readiness probe that checks the application's health endpoint.
AnswersB, E

Correct: Ensures no pod is terminated until a new one is ready, and an extra pod can be created during the update to maintain capacity.

Why this answer

Option B is correct because setting `maxUnavailable=0` ensures that no pods are taken down before new ones are ready, while `maxSurge=1` allows one extra pod to be created above the desired replica count, enabling a rolling update without any downtime. This combination guarantees that at least 5 pods are always available during the update process.

Exam trap

The trap here is that candidates often confuse `maxUnavailable` and `maxSurge` values, mistakenly thinking that allowing one unavailable pod (maxUnavailable=1) is acceptable for zero-downtime, when in fact it can cause a temporary capacity deficit if the readiness probe is not fast enough.

37
Multi-Selecthard

Which THREE statements correctly describe the behavior of the 'serial' keyword in Ansible? (Choose exactly three.)

Select 3 answers
A.It can be set as a percentage of the total hosts.
B.It causes the playbook to run on a subset of hosts at a time.
C.It can be combined with max_fail_percentage to control failure thresholds.
D.It guarantees that only one task runs across all hosts at any time.
E.It applies globally to all plays in the playbook.
AnswersA, B, C

Correct. serial: 10% is valid.

Why this answer

Option A is correct because the 'serial' keyword in Ansible can be specified as a percentage (e.g., 'serial: 50%'), which tells Ansible to run the play on that percentage of hosts in the batch at a time. This is useful for controlling the rollout pace across a dynamic inventory where the exact host count may vary.

Exam trap

The trap here is that candidates often confuse 'serial' with a task-level concurrency control or assume it applies globally across all plays, when in fact it is a per-play batch size setting that controls how many hosts execute the entire play simultaneously.

38
MCQeasy

What is the effect of setting 'serial: 5' in an Ansible playbook that targets a group of 20 hosts?

A.The playbook runs on 5 hosts per task, then moves to next task.
B.The playbook runs on 5 hosts at a time, sequentially.
C.The playbook runs only on the first 5 hosts.
D.The playbook runs on all 20 hosts at once.
AnswerB

Correct. serial: 5 splits into batches of 5.

Why this answer

Setting 'serial: 5' in an Ansible playbook configures the playbook to execute each task on a batch of 5 hosts at a time, moving to the next batch only after the current batch completes all tasks. This ensures that no more than 5 hosts are processed concurrently, providing controlled, sequential rolling updates across the 20-host group.

Exam trap

The trap here is confusing 'serial' with 'forks' or per-task parallelism, leading candidates to think it limits hosts per task rather than per batch across all tasks.

How to eliminate wrong answers

Option A is wrong because 'serial' controls the number of hosts processed per batch across all tasks, not per task; the playbook runs all tasks on the first batch of 5 hosts before moving to the next batch. Option C is wrong because 'serial: 5' does not limit execution to only the first 5 hosts; it processes all 20 hosts in batches of 5. Option D is wrong because 'serial: 5' explicitly limits concurrency to 5 hosts at a time, preventing all 20 hosts from running simultaneously.

39
MCQmedium

An Ansible Engineer is planning a rolling update for a web application deployed across 10 nodes. The playbook uses the 'delegate_to' directive to manage load balancer health checks. Which of the following best describes the recommended approach to minimize downtime?

A.Use 'serial: 1' and delegate load balancer disable/enable tasks to localhost, ensuring each node is taken out of rotation before updating.
B.Run the update playbook with 'serial: 10' to update all nodes at once, then run a separate playbook to update the load balancer.
C.Run the update on each node manually using 'ansible-playbook --limit' and skip load balancer management to save time.
D.Use 'strategy: free' to allow nodes to update independently without controlling the load balancer.
AnswerA

This ensures each node is removed from the load balancer, updated, and then re-added, minimizing downtime.

Why this answer

Option A is correct because using 'serial: 1' ensures that only one node is updated at a time, and delegating load balancer disable/enable tasks to localhost (or the Ansible control node) allows the playbook to interact with the load balancer API to remove the node from the pool before the update and re-add it after. This minimizes downtime by ensuring traffic is not sent to a node being updated, while other nodes continue serving requests.

Exam trap

The trap here is that candidates may think 'serial: 10' is efficient because it updates all nodes quickly, but they overlook that it causes a full outage, whereas the correct approach prioritizes availability over speed.

How to eliminate wrong answers

Option B is wrong because 'serial: 10' updates all nodes simultaneously, which would cause a complete outage during the update window, defeating the purpose of a rolling update. Option C is wrong because manually running with '--limit' and skipping load balancer management does not automate the process and leaves nodes in the load balancer pool while they are being updated, causing traffic to be sent to an unavailable node and increasing downtime. Option D is wrong because 'strategy: free' allows nodes to run tasks independently without any serialization or load balancer coordination, leading to potential race conditions and no guarantee of minimizing downtime.

40
Matchingmedium

Match each Ansible playbook directive to its purpose.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Specify target hosts or groups

Enable privilege escalation

Define variables

List of modules to execute

Special tasks run on notification

Why these pairings

Key keywords in Ansible playbooks.

41
Multi-Selecteasy

Which TWO options are best practices for coordinating rolling updates with Ansible? (Choose exactly two.)

Select 2 answers
A.Set ignore_errors: yes to ensure the playbook continues even if some hosts fail.
B.Use the serial keyword to update hosts in batches.
C.Use the default serial setting (all hosts) for simplicity.
D.Set max_fail_percentage to limit the number of failed hosts before aborting.
E.Run all hosts in parallel to minimize total update time.
AnswersB, D

serial enables batching, which is the core of rolling updates.

Why this answer

Option B is correct because the `serial` keyword in Ansible controls how many hosts are updated at a time during a rolling update, allowing you to update hosts in batches to maintain service availability. This is a best practice for coordinating rolling updates as it prevents overwhelming the infrastructure and ensures that a subset of hosts remains operational while others are being updated.

Exam trap

The trap here is that candidates often confuse `ignore_errors` with error handling for rolling updates, not realizing that it bypasses failure detection, whereas `max_fail_percentage` is the correct way to control abort behavior during batch updates.

42
MCQmedium

A company uses Ansible to manage rolling updates of a web server fleet. During a deployment, the playbook fails on one host due to a transient network error, and the rest of the fleet is left in an inconsistent state. Which strategy would best minimize the risk of inconsistency in future rolling updates?

A.Add retries to each task so transient errors are automatically retried.
B.Use a larger serial batch size to complete the rollout faster.
C.Set ignore_errors: yes on all tasks to continue despite failures.
D.Set max_fail_percentage to 0 in the serial block to abort the rollout on any failure.
AnswerD

max_fail_percentage aborts the playbook if failure rate exceeds threshold, preventing inconsistency.

Why this answer

Option D is correct because setting `max_fail_percentage: 0` in a rolling update (using `serial`) tells Ansible to abort the entire playbook run if any single host fails. This prevents the rest of the fleet from being updated, avoiding an inconsistent state where some hosts have the new deployment and others do not. It directly addresses the risk of partial rollouts caused by transient errors.

Exam trap

The trap here is that candidates often confuse `max_fail_percentage` with `serial` or think that retries (`Option A`) are sufficient to guarantee consistency, when in fact only aborting the rollout on any failure prevents the fleet from reaching an inconsistent mixed state.

How to eliminate wrong answers

Option A is wrong because adding retries to each task only handles transient errors on a per-task basis, but if the retries are exhausted or the error occurs at a higher level (e.g., host unreachable), the playbook still fails on that host, leaving the fleet inconsistent. Option B is wrong because using a larger serial batch size increases the number of hosts updated simultaneously, which amplifies the risk of inconsistency if a failure occurs — it does not mitigate it. Option C is wrong because setting `ignore_errors: yes` on all tasks causes Ansible to continue despite failures, which can silently leave the failed host in an unknown or partially updated state while the rest of the fleet continues, worsening inconsistency.

43
MCQmedium

An Ansible rolling update playbook includes 'max_fail_percentage: 20'. If more than 20% of hosts fail during any batch, what happens?

A.The play pauses and waits for user input
B.The failed hosts are removed from inventory
C.The play retries failed hosts
D.The play aborts immediately
E.The play continues with remaining hosts
AnswerD

If failure percentage exceeds max_fail_percentage, the play stops.

44
MCQhard

In OpenShift, a deployment must gradually shift traffic to new pods during a rolling update. Which default strategy achieves this?

A.Blue-green deployment
B.RollingUpdate
C.Canary deployment
D.Custom strategy
E.Recreate
AnswerB

RollingUpdate gradually replaces old pods with new ones, shifting traffic.

45
MCQmedium

An Ansible playbook sets 'serial: 20%' for rolling updates, but the inventory contains 5 hosts. How many hosts are updated simultaneously?

A.1
B.2
C.3
D.0
E.5
AnswerA

20% of 5 is 1 host per batch.

Why this answer

When 'serial: 20%' is set in an Ansible playbook, the percentage is calculated based on the total number of hosts in the inventory. With 5 hosts, 20% of 5 equals 1.0, which is rounded down to 1. Therefore, only 1 host is updated at a time during the rolling update.

Exam trap

The trap here is that candidates often assume percentages are rounded up or that a fractional result like 1.0 would be treated as 2, but Ansible uses floor rounding (truncation) for serial batch sizes, and with exactly 1.0, the result is 1, not 2.

How to eliminate wrong answers

Option B is wrong because 2 would represent 40% of 5 hosts, not 20%. Option C is wrong because 3 would be 60% of the inventory, far exceeding the 20% specification. Option D is wrong because 0 would only occur if the percentage rounded down to zero (e.g., less than 1 host), but 20% of 5 is exactly 1.0, which rounds to 1.

Option E is wrong because 5 would represent 100% of the hosts, which would be a serial value of '100%' or 'serial: 5', not '20%'.

Ready to test yourself?

Try a timed practice session using only Rolling Updates questions.