Conversation
oneswig
left a comment
There was a problem hiding this comment.
Nice work Bartosz, just a couple of questions / suggestions
|
|
||
| Stop Ceph | ||
| --------- | ||
| Procedure based on `Red Hat documentation <https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/administration_guide/understanding-process-management-for-ceph#powering-down-and-rebooting-a-red-hat-ceph-storage-cluster_admin>`__ |
There was a problem hiding this comment.
If there's something equivalent in the community docs it would be better, but the closest I found was https://docs.ceph.com/en/latest/rados/operations/operating/ and it doesn't cover setting all the flags below.
|
|
||
| .. code-block:: bash | ||
|
|
||
| systemctl poweroff |
There was a problem hiding this comment.
There might be serialised form of shutdown invocation using Kayobe's tools https://docs.openstack.org/kayobe/latest/administration/overcloud.html#running-commands - perhaps also with a small delay to the shutdown command so that it doesn't immediately chop off the ansible connection.
markgoddard
left a comment
There was a problem hiding this comment.
Nice addition. Looks like there is some room for automation here, but that can be added iteratively.
| .. code-block:: bash | ||
|
|
||
| for i in `openstack server list --all-projects -c ID -f value` ; \ | ||
| do openstack server stop $i ; done |
There was a problem hiding this comment.
Is this asynchronous? Should we check for success?
| - Stop the Ceph clients from using any Ceph resources (RBD, RADOS Gateway, CephFS) | ||
| - Check if cluster is in healthy state | ||
|
|
||
| .. code-block:: bash |
There was a problem hiding this comment.
Does it need to be indented more to be part of the bullet?
|
|
||
| - Stop CephFS (if applicable) | ||
|
|
||
| Stop CephFS cluster by reducing the number of ranks to 1, setting the cluster_down flag, and then failing the last rank. |
| ---------------------------- | ||
|
|
||
| Set maintenance mode in bifrost to prevent nodes from automatically | ||
| powering back on |
There was a problem hiding this comment.
Other option is to power off via bifrost
|
|
||
|
|
||
| Full Power on Procedure | ||
| ----------------------- |
There was a problem hiding this comment.
This needs to be a different heading style. Alternatively (preferably?) this section could go in another page called cold_start.rst.
There was a problem hiding this comment.
Or change the page to be: "Shutdown and power on procedures"
| * Shut down controllers | ||
| * Shut down Ceph nodes (if applicable) | ||
| * Shut down seed VM | ||
| * Shut down Ansible control host |
There was a problem hiding this comment.
We probably should't make any assumptions about what or where this is. It may not be the seed hypervisor, which should also be called out explicitly.
| * Perform a graceful shutdown of all virtual machine instances | ||
| * Stop Ceph (if applicable) | ||
| * Put all nodes into maintenance mode in Bifrost | ||
| * Shut down compute nodes |
There was a problem hiding this comment.
nit: this lists shutting down different types of nodes separately, but the procedure only stops the services separately, then shuts down all nodes at once.
| * Remove nodes from maintenance mode in bifrost | ||
| * Recover MariaDB cluster | ||
| * Start Ceph (if applicable) | ||
| * Check that all docker containers are running |
|
|
||
| .. code-block:: bash | ||
|
|
||
| kayobe# kayobe overcloud database recover |
There was a problem hiding this comment.
Wondering if it would be cleaner to stop the containers before shutdown, to avoid them starting up in a broken state.
|
Looks like quite a few comments still to be addressed. It's quite hard to review larger changes when force-pushed. Could you add commits, then squash at the end? |
| following order: | ||
|
|
||
| * Perform a graceful shutdown of all virtual machine instances | ||
| * Stop Ceph (if applicable) |
There was a problem hiding this comment.
This might be early for stopping Ceph, in case the OpenStack services are still using Ceph state (eg, image uploads). Perhaps stop Ceph at the point where the Ceph nodes are shut down.
sure, makes perfect sense - that was Gerrit habit ;) |
|
This would be nice to complete and merge. |
No description provided.