I am looking to write a playbook that updates and reboots each system one at a time. It need to verify that the node comes back healthy before starting on the next one. If it fails it should stop altogether.

I think I can just wait for a reboot and then check a condition before starting on the next one. Has anyone done this before? I think it is doable but I am curious on the thoughts of others.

For content I am looking to not kill my kubernetes cluster.