2 Salt Cluster Administration #
After you deploy a Ceph cluster, you will probably need to perform several modifications to it occasionally. These include adding or removing new nodes, disks, or services. This chapter describes how you can achieve these administration tasks.
2.1 Adding New Cluster Nodes #
The procedure of adding new nodes to the cluster is almost identical to the initial cluster node deployment described in Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”:
Tip: Prevent Rebalancing
When adding an OSD to the existing cluster, bear in mind that the cluster will be rebalancing for some time afterward. To minimize the rebalancing periods, add all OSDs you intend to add at the same time.
An additional way is to set the osd crush initial weight =
0 option in the ceph.conf file before adding
the OSDs:
Add
osd crush initial weight = 0to/srv/salt/ceph/configuration/files/ceph.conf.d/global.conf.Create the new configuration on the Salt master node:
root@master #salt 'SALT_MASTER_NODE' state.apply ceph.configuration.createOr:
root@master #salt-call state.apply ceph.configuration.createApply the new configuration to the targeted OSD minions:
root@master #salt 'OSD_MINIONS' state.apply ceph.configurationNote
If this is not a new node, but you want to proceed as if it were, ensure you remove the
/etc/ceph/destroyedOSDs.ymlfile from the node. Otherwise, any devices from the first attempt will be restored with their previous OSD ID and reweight.Run the following commands:
root@master #salt-run state.orch ceph.stage.1root@master #salt-run state.orch ceph.stage.2root@master #salt 'node*' state.apply ceph.osdAfter the new OSDs are added, adjust their weights as required with the
ceph osd crush reweightcommand in small increments. This allows the cluster to rebalance and become healthy between increasing increments so it does not overwhelm the cluster and clients accessing the cluster.
Install SUSE Linux Enterprise Server 15 SP1 on the new node and configure its network setting so that it resolves the Salt master host name correctly. Verify that it has a proper connection to both public and cluster networks, and that time synchronization is correctly configured. Then install the
salt-minionpackage:root@minion >zypper in salt-minionIf the Salt master's host name is different from
salt, edit/etc/salt/minionand add the following:master: DNS_name_of_your_salt_master
If you performed any changes to the configuration files mentioned above, restart the
salt.minionservice:root@minion >systemctl restart salt-minion.serviceOn the Salt master, accept the salt key of the new node:
root@master #salt-key --accept NEW_NODE_KEYVerify that
/srv/pillar/ceph/deepsea_minions.slstargets the new Salt minion and/or set the proper DeepSea grain. Refer to Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.2.2.1 “Matching the Minion Name” or Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.3 “Cluster Deployment”, Running Deployment Stages for more details.Run the preparation stage. It synchronizes modules and grains so that the new minion can provide all the information DeepSea expects.
root@master #salt-run state.orch ceph.stage.0Important: Possible Restart of DeepSea stage 0
If the Salt master rebooted after its kernel update, you need to restart DeepSea stage 0.
Run the discovery stage. It will write new file entries in the
/srv/pillar/ceph/proposalsdirectory, where you can edit relevant .yml files:root@master #salt-run state.orch ceph.stage.1Optionally, change
/srv/pillar/ceph/proposals/policy.cfgif the newly added host does not match the existing naming scheme. For details, refer to Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.5.1 “Thepolicy.cfgFile”.Run the configuration stage. It reads everything under
/srv/pillar/cephand updates the pillar accordingly:root@master #salt-run state.orch ceph.stage.2Pillar stores data which you can access with the following command:
root@master #salt target pillar.itemsTip: Modifying OSD's Layout
If you want to modify the default OSD's layout and change the drive groups configuration, follow the procedure described in Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.5.2 “DriveGroups”.
The configuration and deployment stages include newly added nodes:
root@master #salt-run state.orch ceph.stage.3root@master #salt-run state.orch ceph.stage.4
2.2 Adding New Roles to Nodes #
You can deploy all types of supported roles with DeepSea. See Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.5.1.2 “Role Assignment” for more information on supported role types and examples of matching them.
To add a new service to an existing node, follow these steps:
Adapt
/srv/pillar/ceph/proposals/policy.cfgto match the existing host with a new role. For more details, refer to Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.5.1 “Thepolicy.cfgFile”. For example, if you need to run an Object Gateway on a MON node, the line is similar to:role-rgw/xx/x/example.mon-1.sls
Run stage 2 to update the pillar:
root@master #salt-run state.orch ceph.stage.2Run stage 3 to deploy core services, or stage 4 to deploy optional services. Running both stages does not hurt.
2.3 Removing and Reinstalling Cluster Nodes #
Tip: Removing a Cluster Node Temporarily
The Salt master expects all minions to be present in the cluster and responsive. If a minion breaks and is not responsive anymore, it causes problems to the Salt infrastructure, mainly to DeepSea and Ceph Dashboard.
Before you fix the minion, delete its key from the Salt master temporarily:
root@master # salt-key -d MINION_HOST_NAMEAfter the minion is fixed, add its key to the Salt master again:
root@master # salt-key -a MINION_HOST_NAME
To remove a role from a cluster, edit
/srv/pillar/ceph/proposals/policy.cfg and remove the
corresponding line(s). Then run stages 2 and 5 as described in
Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.3 “Cluster Deployment”.
Note: Removing OSDs from Cluster
In case you need to remove a particular OSD node from your cluster, ensure that your cluster has more free disk space than the disk you intend to remove. Bear in mind that removing an OSD results in rebalancing of the whole cluster.
Before running stage 5 to do the actual removal, always check which OSDs are going to be removed by DeepSea:
root@master # salt-run rescinded.idsWhen a role is removed from a minion, the objective is to undo all changes related to that role. For most of the roles, the task is simple, but there may be problems with package dependencies. If a package is uninstalled, its dependencies are not.
Removed OSDs appear as blank drives. The related tasks overwrite the beginning of the file systems and remove backup partitions in addition to wiping the partition tables.
Note: Preserving Partitions Created by Other Methods
Disk drives previously configured by other methods, such as
ceph-deploy, may still contain partitions. DeepSea
will not automatically destroy these. The administrator must reclaim these
drives manually.
Example 2.1: Removing a Salt minion from the Cluster #
If your storage minions are named, for example, 'data1.ceph', 'data2.ceph'
... 'data6.ceph', and the related lines in your
policy.cfg are similar to the following:
[...] # Hardware Profile role-storage/cluster/data*.sls [...]
Then to remove the Salt minion 'data2.ceph', change the lines to the following:
[...] # Hardware Profile role-storage/cluster/data[1,3-6]*.sls [...]
Also keep in mind to adapt your drive_groups.yml file to match the new targets.
[...]
drive_group_name:
target: 'data[1,3-6]*'
[...]Then run stage 2, check which OSDs are going to be removed, and finish by running stage 5:
root@master #salt-run state.orch ceph.stage.2root@master #salt-run rescinded.idsroot@master #salt-run state.orch ceph.stage.5
Example 2.2: Migrating Nodes #
Assume the following situation: during the fresh cluster installation, you (the administrator) allocated one of the storage nodes as a stand-alone Object Gateway while waiting for the gateway's hardware to arrive. Now the permanent hardware has arrived for the gateway and you can finally assign the intended role to the backup storage node and have the gateway role removed.
After running stages 0 and 1 (see Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.3 “Cluster Deployment”, Running Deployment Stages) for the
new hardware, you named the new gateway rgw1. If the
node data8 needs the Object Gateway role removed and the storage
role added, and the current policy.cfg looks like
this:
# Hardware Profile role-storage/cluster/data[1-7]*.sls # Roles role-rgw/cluster/data8*.sls
Then change it to:
# Hardware Profile role-storage/cluster/data[1-8]*.sls # Roles role-rgw/cluster/rgw1*.sls
Run stages 2 to 4, check which OSDs are going to be possibly removed, and
finish by running stage 5. Stage 3 will add data8 as a
storage node. For a moment, data8 will have both roles.
Stage 4 will add the Object Gateway role to rgw1 and stage 5 will
remove the Object Gateway role from data8:
root@master #salt-run state.orch ceph.stage.2root@master #salt-run state.orch ceph.stage.3root@master #salt-run state.orch ceph.stage.4root@master #salt-run rescinded.idsroot@master #salt-run state.orch ceph.stage.5
Example 2.3: Removal of a Failed Node #
If the Salt minion is not responding and the administrator is unable to resolve the issue, we recommend removing the Salt key:
root@master # salt-key -d MINION_IDExample 2.4: Removal of a Failed Storage Node #
When a server fails (due to network, power, or other issues), it means that all the OSDs are dead. Issue the following commands for each OSD on the failed storage node:
cephadm@adm >ceph osd purge-actual $id --yes-i-really-mean-itcephadm@adm >ceph auth del osd.$id
Running the ceph osd purge-actual command is equivalent
to the following:
cephadm@adm >ceph destroy $idcephadm@adm >ceph osd rm $idcephadm@adm >ceph osd crush remove osd.$id
2.4 Redeploying Monitor Nodes #
When one or more of your monitor nodes fail and are not responding, you need to remove the failed monitors from the cluster and possibly then re-add them back in the cluster.
Important: The Minimum Is Three Monitor Nodes
The number of monitor nodes must not be less than three. If a monitor node fails, and as a result your cluster has only two monitor nodes, you need to temporarily assign the monitor role to other cluster nodes before you redeploy the failed monitor nodes. After you redeploy the failed monitor nodes, you can uninstall the temporary monitor roles.
For more information on adding new nodes/roles to the Ceph cluster, see Section 2.1, “Adding New Cluster Nodes” and Section 2.2, “Adding New Roles to Nodes”.
For more information on removing cluster nodes, refer to Section 2.3, “Removing and Reinstalling Cluster Nodes”.
There are two basic degrees of a Ceph node failure:
The Salt minion host is broken either physically or on the OS level, and does not respond to the
salt 'minion_name' test.pingcall. In such case you need to redeploy the server completely by following the relevant instructions in Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.3 “Cluster Deployment”.The monitor related services failed and refuse to recover, but the host responds to the
salt 'minion_name' test.pingcall. In such case, follow these steps:
Edit
/srv/pillar/ceph/proposals/policy.cfgon the Salt master, and remove or update the lines that correspond to the failed monitor nodes so that they now point to the working monitor nodes. For example:[...] # MON #role-mon/cluster/ses-example-failed1.sls #role-mon/cluster/ses-example-failed2.sls role-mon/cluster/ses-example-new1.sls role-mon/cluster/ses-example-new2.sls [...]
Run DeepSea stages 2 to 5 to apply the changes:
root@master #deepseastage run ceph.stage.2root@master #deepseastage run ceph.stage.3root@master #deepseastage run ceph.stage.4root@master #deepseastage run ceph.stage.5
2.5 Verify an Encrypted OSD #
After using DeepSea to deploy an OSD, you may want to verify that the OSD is encrypted.
Check the output of
ceph-volume lvm list(it should be run as root on the node where the OSDs in question are located):root@master #ceph-volume lvm list ====== osd.3 ======= [block] /dev/ceph-d9f09cf7-a2a4-4ddc-b5ab-b1fa4096f713/osd-data-71f62502-4c85-4944-9860-312241d41bb7 block device /dev/ceph-d9f09cf7-a2a4-4ddc-b5ab-b1fa4096f713/osd-data-71f62502-4c85-4944-9860-312241d41bb7 block uuid m5F10p-tUeo-6ZGP-UjxJ-X3cd-Ec5B-dNGXvG cephx lockbox secret cluster fsid 413d9116-e4f6-4211-a53b-89aa219f1cf2 cluster name ceph crush device class None encrypted 0 osd fsid f8596bf7-000f-4186-9378-170b782359dc osd id 3 type block vdo 0 devices /dev/vdb ====== osd.7 ======= [block] /dev/ceph-38914e8d-f512-44a7-bbee-3c20a684753d/osd-data-0f385f9e-ce5c-45b9-917d-7f8c08537987 block device /dev/ceph-38914e8d-f512-44a7-bbee-3c20a684753d/osd-data-0f385f9e-ce5c-45b9-917d-7f8c08537987 block uuid 1y3qcS-ZG01-Y7Z1-B3Kv-PLr6-jbm6-8B79g6 cephx lockbox secret cluster fsid 413d9116-e4f6-4211-a53b-89aa219f1cf2 cluster name ceph crush device class None encrypted 0 osd fsid 0f9a8002-4c81-4f5f-93a6-255252cac2c4 osd id 7 type block vdo 0 devices /dev/vdcNote the line that says
encrypted 0. This means the OSD is not encrypted. The possible values are as follows:encrypted 0 = not encrypted encrypted 1 = encrypted
If you get the following error, it means the node where you are running the command does not have any OSDs on it:
root@master #ceph-volume lvm list No valid Ceph lvm devices foundIf you have deployed a cluster with an OSD for which
ceph-volume lvm listshowsencrypted 1, the OSD is encrypted. If you are unsure, proceed to step two.Ceph OSD encryption-at-rest relies on the Linux kernel's
dm-cryptsubsystem and the Linux Unified Key Setup ("LUKS"). When creating an encrypted OSD,ceph-volumecreates an encrypted logical volume and saves the correspondingdm-cryptsecret key in the Ceph Monitor data store. When the OSD is to be started,ceph-volumeensures the device is mounted, retrieves thedm-cryptsecret key from the Ceph Monitor's, and decrypts the underlying device. This creates a new device, containing the unencrypted data, and this is the device the Ceph OSD daemon is started on.The OSD does not know whether the underlying logical volume is encrypted or not, there is no
ceph osd commandthat returns this information. However, it is possible to query LUKS for it, as follows.First, get the device of the OSD logical volume you are interested in. This can be obtained from the
ceph-volume lvm listoutput:block device /dev/ceph-d9f09cf7-a2a4-4ddc-b5ab-b1fa4096f713/osd-data-71f62502-4c85-4944-9860-312241d41bb7
Then, dump the LUKS header from that device:
root@master #cryptsetup luksDump OSD_BLOCK_DEVICEif the OSD is not encrypted, the output is as follows:
Device /dev/ceph-38914e8d-f512-44a7-bbee-3c20a684753d/osd-data-0f385f9e-ce5c-45b9-917d-7f8c08537987 is not a valid LUKS device.
If the OSD is encrypted, the output is as follows:
root@master #cryptsetup luksDump /dev/ceph-1ce61157-81be-427d-83ad-7337f05d8514/osd-data-89230c92-3ace-4685-97ff-6fa059cef63a LUKS header information for /dev/ceph-1ce61157-81be-427d-83ad-7337f05d8514/osd-data-89230c92-3ace-4685-97ff-6fa059cef63a Version: 1 Cipher name: aes Cipher mode: xts-plain64 Hash spec: sha256 Payload offset: 4096 MK bits: 256 MK digest: e9 41 85 f1 1b a3 54 e2 48 6a dc c2 50 26 a5 3b 79 b0 f2 2e MK salt: 4c 8c 9d 1f 72 1a 88 6c 06 88 04 72 81 7b e4 bb b1 70 e1 c2 7c c5 3b 30 6d f7 c8 9c 7c ca 22 7d MK iterations: 118940 UUID: 7675f03b-58e3-47f2-85fc-3bafcf1e589f Key Slot 0: ENABLED Iterations: 1906500 Salt: 8f 1f 7f f4 eb 30 5a 22 a5 b4 14 07 cc da dc 48 b5 e9 87 ef 3b 9b 24 72 59 ea 1a 0a ec 61 e6 42 Key material offset: 8 AF stripes: 4000 Key Slot 1: DISABLED Key Slot 2: DISABLED Key Slot 3: DISABLED Key Slot 4: DISABLED Key Slot 5: DISABLED Key Slot 6: DISABLED Key Slot 7: DISABLEDSince decrypting the data on an encrypted OSD disk requires knowledge of the corresponding
dm-cryptsecret key, OSD encryption provides protection for cases when a disk drive that was used as an OSD is decommissioned, lost, or stolen.
2.6 Adding an OSD Disk to a Node #
To add a disk to an existing OSD node, verify that any partition on the disk
was removed and wiped. Refer to Step 12 in
Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.3 “Cluster Deployment” for more details. Adapt
/srv/salt/ceph/configuration/files/drive_groups.yml
accordingly (refer to Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.5.2 “DriveGroups” for details). After
saving the file, run DeepSea's stage 3:
root@master #deepseastage run ceph.stage.3
2.7 Removing an OSD #
You can remove a Ceph OSD from the cluster by running the following command:
root@master # salt-run osd.remove OSD_ID
OSD_ID needs to be a number of the OSD without
the osd. prefix. For example, from
osd.3 only use the digit 3.
2.7.1 Removing Multiple OSDs #
Use the same procedure as mentioned in Section 2.7, “Removing an OSD” but simply supply multiple OSD IDs:
root@master # salt-run osd.remove 2 6 11 15
Removing osd 2 on host data1
Draining the OSD
Waiting for ceph to catch up.
osd.2 is safe to destroy
Purging from the crushmap
Zapping the device
Removing osd 6 on host data1
Draining the OSD
Waiting for ceph to catch up.
osd.6 is safe to destroy
Purging from the crushmap
Zapping the device
Removing osd 11 on host data1
Draining the OSD
Waiting for ceph to catch up.
osd.11 is safe to destroy
Purging from the crushmap
Zapping the device
Removing osd 15 on host data1
Draining the OSD
Waiting for ceph to catch up.
osd.15 is safe to destroy
Purging from the crushmap
Zapping the device
2:
True
6:
True
11:
True
15:
True2.7.2 Removing All OSDs on a Host #
To remove all OSDs on a specific host, run the following command:
root@master # salt-run osd.remove OSD_HOST_NAME2.7.3 Removing Broken OSDs Forcefully #
There are cases when removing an OSD gracefully (see Section 2.7, “Removing an OSD”) fails. This may happen, for example, if the OSD or its journal, WAL or DB are broken, when it suffers from hanging I/O operations, or when the OSD disk fails to unmount.
root@master # salt-run osd.remove OSD_ID force=TrueTip: Hanging Mounts
If a partition is still mounted on the disk being removed, the command
will exit with the 'Unmount failed - check for processes on
DEVICE' message. You can then list all
processes that access the file system with the fuser -m
DEVICE. If fuser
returns nothing, try manual unmount
DEVICE and watch the output of
dmesg or journalctl commands.
2.7.4 Validating OSD LVM Metadata #
After removing an OSD using the salt-run osd.remove
ID or through other ceph commands, LVM
metadata may not be completely removed. This means that if you want to
re-deploy a new OSD, old LVM metadata would be used.
First, check if the OSD has been removed:
cephadm@osd >ceph-volume lvm listEven if one of the OSD's has been removed successfully, it can still be listed. For example, if you removed
osd.2, the following would be the output:====== osd.2 ======= [block] /dev/ceph-a2189611-4380-46f7-b9a2-8b0080a1f9fd/osd-data-ddc508bc-6cee-4890-9a42-250e30a72380 block device /dev/ceph-a2189611-4380-46f7-b9a2-8b0080a1f9fd/osd-data-ddc508bc-6cee-4890-9a42-250e30a72380 block uuid kH9aNy-vnCT-ExmQ-cAsI-H7Gw-LupE-cvSJO9 cephx lockbox secret cluster fsid 6b6bbac4-eb11-45cc-b325-637e3ff9fa0c cluster name ceph crush device class None encrypted 0 osd fsid aac51485-131c-442b-a243-47c9186067db osd id 2 type block vdo 0 devices /dev/sda
In this example, you can see that
osd.2is still located in/dev/sda.Validate the LVM metadata on the OSD node:
cephadm@osd >ceph-volume inventoryThe output from running
ceph-volume inventorymarks the/dev/sdaavailablity asFalse. For example:Device Path Size rotates available Model name /dev/sda 40.00 GB True False QEMU HARDDISK /dev/sdb 40.00 GB True False QEMU HARDDISK /dev/sdc 40.00 GB True False QEMU HARDDISK /dev/sdd 40.00 GB True False QEMU HARDDISK /dev/sde 40.00 GB True False QEMU HARDDISK /dev/sdf 40.00 GB True False QEMU HARDDISK /dev/vda 25.00 GB True False
Run the following command on the OSD node to remove the LVM metadata completely:
cephadm@osd >ceph-volume lvm zap --osd-id ID --destroyRun the
inventorycommand again to verify that the/dev/sdaavailability returnsTrue. For example:cephadm@osd >ceph-volume inventory Device Path Size rotates available Model name /dev/sda 40.00 GB True True QEMU HARDDISK /dev/sdb 40.00 GB True False QEMU HARDDISK /dev/sdc 40.00 GB True False QEMU HARDDISK /dev/sdd 40.00 GB True False QEMU HARDDISK /dev/sde 40.00 GB True False QEMU HARDDISK /dev/sdf 40.00 GB True False QEMU HARDDISK /dev/vda 25.00 GB True FalseLVM metadata has been removed. You can safely run the
ddcommand on the device.The OSD can now be re-deployed without needing to reboot the OSD node:
root@master #salt-run state.orch ceph.stage.2root@master #salt-run state.orch ceph.stage.3
2.8 Replacing an OSD Disk #
There are several reasons why you may need to replace an OSD disk, for example:
The OSD disk failed or is soon going to fail based on SMART information, and can no longer be used to store data safely.
You need to upgrade the OSD disk, for example to increase its size.
You need to change the OSD disk layout.
You plan to move from a non-LVM to a LVM-based layout.
The replacement procedure is the same for both cases. It is also valid for both default and customized CRUSH Maps.
Suppose that, for example, '5' is the ID of the OSD whose disk needs to be replaced. The following command marks it as destroyed in the CRUSH Map but leaves its original ID:
root@master #salt-run osd.replace 5Tip:
osd.replaceandosd.removeThe Salt's
osd.replaceandosd.remove(see Section 2.7, “Removing an OSD”) commands are identical except thatosd.replaceleaves the OSD as 'destroyed' in the CRUSH Map whileosd.removeremoves all traces from the CRUSH Map.Manually replace the failed/upgraded OSD drive.
If you want to modify the default OSD's layout and change the DriveGroups configuration, follow the procedure described in Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.5.2 “DriveGroups”.
Run the deployment stage 3 to deploy the replaced OSD disk:
root@master #salt-run state.orch ceph.stage.3
Important: Shared device failure
If a shared device for DB/WAL fails, you need to perform the replacement procedure for all OSDs that share the failed device.
2.9 Recovering a Reinstalled OSD Node #
If the operating system breaks and is not recoverable on one of your OSD nodes, follow these steps to recover it and redeploy its OSD role with cluster data untouched:
Reinstall the base SUSE Linux Enterprise operating system on the node where the OS broke. Install the salt-minion packages on the OSD node, delete the old Salt minion key on the Salt master, and register the new Salt minion's key with the Salt master. For more information on the initial deployment, see Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.3 “Cluster Deployment”.
Instead of running the whole of stage 0, run the following parts:
root@master #salt 'osd_node' state.apply ceph.syncroot@master #salt 'osd_node' state.apply ceph.packages.commonroot@master #salt 'osd_node' state.apply ceph.minesroot@master #salt 'osd_node' state.apply ceph.updatesCopy the
ceph.confto the OSD node, and then activate the OSD:root@master #salt 'osd_node' state.apply ceph.configurationroot@master #salt 'osd_node' cmd.run "ceph-volume lvm activate --all"Verify activation with one of the following commands:
root@master #ceph -s # ORroot@master #ceph osd treeTo ensure consistency across the cluster, run the DeepSea stages in the following order:
root@master #salt-run state.orch ceph.stage.1root@master #salt-run state.orch ceph.stage.2root@master #salt-run state.orch ceph.stage.3root@master #salt-run state.orch ceph.stage.4root@master #salt-run state.orch ceph.stage.5root@master #salt-run state.orch ceph.stage.0Run DeepSea stage 0:
root@master #salt-run state.orch ceph.stage.0Reboot the relevant OSD node. All OSD disks will be rediscovered and reused.
Get Prometheus' node exporter installed/running:
root@master #salt 'RECOVERED_MINION' \ state.apply ceph.monitoring.prometheus.exporters.node_exporterRemove unnecessary Salt grains (best after all OSDs have been migrated to LVM):
root@master #salt -I roles:storage grains.delkey ceph
2.10 Moving the Admin Node to a New Server #
If you need to replace the Admin Node host with a new one, you need to move the
Salt master and DeepSea files. Use your favorite synchronization tool for
transferring the files. In this procedure, we use rsync
because it is a standard tool available in SUSE Linux Enterprise Server 15 SP1 software repositories.
Stop
salt-masterandsalt-minionservices on the old Admin Node:root@master #systemctl stop salt-master.serviceroot@master #systemctl stop salt-minion.serviceConfigure Salt on the new Admin Node so that the Salt master and Salt minions communicate. Find more details in Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.3 “Cluster Deployment”.
Tip: Transition of Salt Minions
To ease the transition of Salt minions to the new Admin Node, remove the original Salt master's public key from each of them:
root@minion >rm /etc/salt/pki/minion/minion_master.pubroot@minion >systemctl restart salt-minion.serviceVerify that the deepsea package is installed and install it if required.
root@master #zypper install deepseaCustomize the
policy.cfgfile by changing therole-masterline. Find more details in Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.5.1 “Thepolicy.cfgFile”.Synchronize
/srv/pillarand/srv/saltdirectories from the old Admin Node to the new one.Tip:
rsyncDry Run and Symbolic LinksIf possible, try synchronizing the files in a dry run first to see which files will be transferred (
rsync's option-n). Also, include symbolic links (rsync's option-a). Forrsync, the synchronize command will look as follows:root@master #rsync -avn /srv/pillar/ NEW-ADMIN-HOSTNAME:/srv/pillarIf you made custom changes to files outside of
/srv/pillarand/srv/salt, for example in/etc/salt/masteror/etc/salt/master.d, synchronize them as well.Now you can run DeepSea stages from the new Admin Node. Refer to Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.2 “Introduction to DeepSea” for their detailed description.
2.11 Automated Installation via Salt #
The installation can be automated by using the Salt reactor. For virtual environments or consistent hardware environments, this configuration will allow the creation of a Ceph cluster with the specified behavior.
Warning
Salt cannot perform dependency checks based on reactor events. There is a real risk of putting your Salt master into a death spiral.
The automated installation requires the following:
A properly created
/srv/pillar/ceph/proposals/policy.cfg.Prepared custom
global.ymlplaced to the/srv/pillar/ceph/stackdirectory.
The default reactor configuration will only run stages 0 and 1. This allows testing of the reactor without waiting for subsequent stages to complete.
When the first salt-minion starts, stage 0 will begin. A lock prevents multiple instances. When all minions complete stage 0, stage 1 will begin.
If the operation is performed properly, edit the file
/etc/salt/master.d/reactor.conf
and replace the following line
- /srv/salt/ceph/reactor/discovery.sls
with
- /srv/salt/ceph/reactor/all_stages.sls
Verify that the line is not commented out.
2.12 Updating the Cluster Nodes #
Keep the Ceph cluster nodes up-to-date by applying rolling updates regularly.
2.12.1 Software Repositories #
Before patching the cluster with the latest software packages, verify that all the cluster's nodes have access to the relevant repositories. Refer to Book “Deployment Guide”, Chapter 6 “Upgrading from Previous Releases”, Section 6.5.1 “Manual Node Upgrade Using the Installer DVD” for a complete list of the required repositories.
2.12.2 Repository Staging #
If you use a staging tool—for example, SUSE Manager, Subscription Management Tool, or Repository Mirroring Tool—that serves software repositories to the cluster nodes, verify that stages for both 'Updates' repositories for SUSE Linux Enterprise Server and SUSE Enterprise Storage are created at the same point in time.
We strongly recommend to use a staging tool to apply patches which have
frozen or staged patch levels. This
ensures that new nodes joining the cluster have the same patch level as the
nodes already running in the cluster. This way you avoid the need to apply
the latest patches to all the cluster's nodes before new nodes can join the
cluster.
2.12.3 zypper patch or zypper dup #
By default, cluster nodes are upgraded using the zypper
dup command. If you prefer to update the system using
zypper patch instead, edit
/srv/pillar/ceph/stack/global.yml and add the
following line:
update_method_init: zypper-patch
2.12.4 Cluster Node Reboots #
During the update, cluster nodes may be optionally rebooted if their kernel was upgraded by the update. If you want to eliminate the possibility of a forced reboot of potentially all nodes, either verify that the latest kernel is installed and running on Ceph nodes, or disable automatic node reboots as described in Book “Deployment Guide”, Chapter 7 “Customizing the Default Configuration”, Section 7.1.5 “Updates and Reboots during Stage 0”.
2.12.5 Downtime of Ceph Services #
Depending on the configuration, cluster nodes may be rebooted during the update as described in Section 2.12.4, “Cluster Node Reboots”. If there is a single point of failure for services such as Object Gateway, Samba Gateway, NFS Ganesha, or iSCSI, the client machines may be temporarily disconnected from services whose nodes are being rebooted.
2.12.6 Running the Update #
To update the software packages on all cluster nodes to the latest version, follow these steps:
Update the deepsea, salt-master, and salt-minion packages and restart relevant services on the Salt master:
root@master #salt -I 'roles:master' state.apply ceph.updates.masterUpdate and restart the salt-minion package on all cluster nodes:
root@master #salt -I 'cluster:ceph' state.apply ceph.updates.saltUpdate all other software packages on the cluster:
root@master #salt-run state.orch ceph.stage.0Restart Ceph related services:
root@master #salt-run state.orch ceph.restart
2.13 Halting or Rebooting Cluster #
In some cases it may be necessary to halt or reboot the whole cluster. We recommended carefully checking for dependencies of running services. The following steps provide an outline for stopping and starting the cluster:
Tell the Ceph cluster not to mark OSDs as out:
cephadm@adm >cephosd set nooutStop daemons and nodes in the following order:
Storage clients
Gateways, for example NFS Ganesha or Object Gateway
Metadata Server
Ceph OSD
Ceph Manager
Ceph Monitor
If required, perform maintenance tasks.
Start the nodes and servers in the reverse order of the shutdown process:
Ceph Monitor
Ceph Manager
Ceph OSD
Metadata Server
Gateways, for example NFS Ganesha or Object Gateway
Storage clients
Remove the noout flag:
cephadm@adm >cephosd unset noout
2.14 Adjusting ceph.conf with Custom Settings #
If you need to put custom settings into the ceph.conf
file, you can do so by modifying the configuration files in the
/srv/salt/ceph/configuration/files/ceph.conf.d
directory:
global.conf
mon.conf
mgr.conf
mds.conf
osd.conf
client.conf
rgw.conf
Note: Unique rgw.conf
The Object Gateway offers a lot of flexibility and is unique compared to the other
ceph.conf sections. All other Ceph components have
static headers such as [mon] or
[osd]. The Object Gateway has unique headers such as
[client.rgw.rgw1]. This means that the
rgw.conf file needs a header entry. For examples, see
/srv/salt/ceph/configuration/files/rgw.confor
/srv/salt/ceph/configuration/files/rgw-ssl.confSee Section 26.7, “Enabling HTTPS/SSL for Object Gateways” for more examples.
Important: Run stage 3
After you make custom changes to the above mentioned configuration files, run stages 3 and 4 to apply these changes to the cluster nodes:
root@master #salt-run state.orch ceph.stage.3root@master #salt-run state.orch ceph.stage.4
These files are included from the
/srv/salt/ceph/configuration/files/ceph.conf.j2
template file, and correspond to the different sections that the Ceph
configuration file accepts. Putting a configuration snippet in the correct
file enables DeepSea to place it into the correct section. You do not need
to add any of the section headers.
Tip
To apply any configuration options only to specific instances of a daemon,
add a header such as [osd.1]. The following
configuration options will only be applied to the OSD daemon with the ID 1.
2.14.1 Overriding the Defaults #
Later statements in a section overwrite earlier ones. Therefore it is
possible to override the default configuration as specified in the
/srv/salt/ceph/configuration/files/ceph.conf.j2
template. For example, to turn off cephx authentication, add the following
three lines to the
/srv/salt/ceph/configuration/files/ceph.conf.d/global.conf
file:
auth cluster required = none auth service required = none auth client required = none
When redefining the default values, Ceph related tools such as
rados may issue warnings that specific values from the
ceph.conf.j2 were redefined in
global.conf. These warnings are caused by one
parameter assigned twice in the resulting ceph.conf.
As a workaround for this specific case, follow these steps:
Change the current directory to
/srv/salt/ceph/configuration/create:root@master #cd /srv/salt/ceph/configuration/createCopy
default.slstocustom.sls:root@master #cp default.sls custom.slsEdit
custom.slsand changeceph.conf.j2tocustom-ceph.conf.j2.Change current directory to
/srv/salt/ceph/configuration/files:root@master #cd /srv/salt/ceph/configuration/filesCopy
ceph.conf.j2tocustom-ceph.conf.j2:root@master #cp ceph.conf.j2 custom-ceph.conf.j2Edit
custom-ceph.conf.j2and delete the following line:{% include "ceph/configuration/files/rbd.conf" %}Edit
global.ymland add the following line:configuration_create: custom
Refresh the pillar:
root@master #salt target saltutil.pillar_refreshRun stage 3:
root@master #salt-run state.orch ceph.stage.3
Now you should have only one entry for each value definition. To re-create the configuration, run:
root@master # salt-run state.orch ceph.configuration.create
and then verify the contents of
/srv/salt/ceph/configuration/cache/ceph.conf.
2.14.2 Including Configuration Files #
If you need to apply a lot of custom configurations, use the following
include statements within the custom configuration files to make file
management easier. Following is an example of the
osd.conf file:
[osd.1]
{% include "ceph/configuration/files/ceph.conf.d/osd1.conf" ignore missing %}
[osd.2]
{% include "ceph/configuration/files/ceph.conf.d/osd2.conf" ignore missing %}
[osd.3]
{% include "ceph/configuration/files/ceph.conf.d/osd3.conf" ignore missing %}
[osd.4]
{% include "ceph/configuration/files/ceph.conf.d/osd4.conf" ignore missing %}
In the previous example, the osd1.conf,
osd2.conf, osd3.conf, and
osd4.conf files contain the configuration options
specific to the related OSD.
Tip: Runtime Configuration
Changes made to Ceph configuration files take effect after the related Ceph daemons restart. See Section 25.1, “Runtime Configuration” for more information on changing the Ceph runtime configuration.
2.15 Enabling AppArmor Profiles #
AppArmor is a security solution that confines programs by a specific profile. For more details, refer to https://documentation.suse.com/sles/15-SP1/html/SLES-all/part-apparmor.html.
DeepSea provides three states for AppArmor profiles: 'enforce', 'complain', and 'disable'. To activate a particular AppArmor state, run:
salt -I "deepsea_minions:*" state.apply ceph.apparmor.default-STATE
To put the AppArmor profiles in an 'enforce' state:
root@master # salt -I "deepsea_minions:*" state.apply ceph.apparmor.default-enforceTo put the AppArmor profiles in a 'complain' state:
root@master # salt -I "deepsea_minions:*" state.apply ceph.apparmor.default-complainTo disable the AppArmor profiles:
root@master # salt -I "deepsea_minions:*" state.apply ceph.apparmor.default-disableTip: Enabling the AppArmor Service
Each of these three calls verifies if AppArmor is installed and installs it if
not, and starts and enables the related systemd service. DeepSea will
warn you if AppArmor was installed and started/enabled in another way and
therefore runs without DeepSea profiles.
2.16 Deactivating Tuned Profiles #
By default, DeepSea deploys Ceph clusters with tuned profiles active on
Ceph Monitor, Ceph Manager, and Ceph OSD nodes. In some cases, you may need to permanently
deactivate tuned profiles. To do so, put the following lines in
/srv/pillar/ceph/stack/global.yml and re-run stage 3:
alternative_defaults: tuned_mgr_init: default-off tuned_mon_init: default-off tuned_osd_init: default-off
root@master # salt-run state.orch ceph.stage.32.17 Removing an Entire Ceph Cluster #
The ceph.purge runner removes the entire Ceph cluster.
This way you can clean the cluster environment when testing different
setups. After the ceph.purge completes, the Salt
cluster is reverted back to the state at the end of DeepSea stage 1. You
can then either change the policy.cfg (see
Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.5.1 “The policy.cfg File”), or proceed to DeepSea stage 2
with the same setup.
To prevent accidental deletion, the orchestration checks if the safety is disengaged. You can disengage the safety measures and remove the Ceph cluster by running:
root@master #salt-run disengage.safetyroot@master #salt-run state.orch ceph.purge
Tip: Disabling Ceph Cluster Removal
If you want to prevent anyone from running the
ceph.purge runner, create a file named
disabled.sls in the
/srv/salt/ceph/purge directory and insert the
following line in the
/srv/pillar/ceph/stack/global.yml file:
purge_init: disabled
Important: Rescind Custom Roles
If you previously created custom roles for Ceph Dashboard (refer to
Section 6.6, “Adding Custom Roles” and
Section 14.2, “User Roles and Permissions” for detailed information), you need
to take manual steps to purge them before running the
ceph.purge runner. For example, if the custom role for
Object Gateway is named 'us-east-1', then follow these steps:
root@master #cd /srv/salt/ceph/rescindroot@master #rsync -a rgw/ us-east-1root@master #sed -i 's!rgw!us-east-1!' us-east-1/*.sls