A company uses Ansible to manage rolling updates of a web server fleet. During a deployment, the playbook fails on one host due to a transient network error, and the rest of the fleet is left in an inconsistent state. Which strategy would best minimize the risk of inconsistency in future rolling updates?
Trap 1: Add retries to each task so transient errors are automatically…
Retries may succeed but do not stop the rollout on persistent failures; inconsistency can still occur.
Trap 2: Use a larger serial batch size to complete the rollout faster.
Larger batches increase the number of hosts affected by a failure, worsening inconsistency.
Trap 3: Set ignore_errors: yes on all tasks to continue despite failures.
ignore_errors would mask failures, leading to inconsistency.
- A
Add retries to each task so transient errors are automatically retried.
Why wrong: Retries may succeed but do not stop the rollout on persistent failures; inconsistency can still occur.
- B
Use a larger serial batch size to complete the rollout faster.
Why wrong: Larger batches increase the number of hosts affected by a failure, worsening inconsistency.
- C
Set ignore_errors: yes on all tasks to continue despite failures.
Why wrong: ignore_errors would mask failures, leading to inconsistency.
- D
Set max_fail_percentage to 0 in the serial block to abort the rollout on any failure.
max_fail_percentage aborts the playbook if failure rate exceeds threshold, preventing inconsistency.