Salt Cluster Administration | Administration Guide

Applies to SUSE Enterprise Storage 6

2 Salt Cluster Administration #

After you deploy a Ceph cluster, you will probably need to perform several modifications to it occasionally. These include adding or removing new nodes, disks, or services. This chapter describes how you can achieve these administration tasks.

2.1 Adding New Cluster Nodes #

The procedure of adding new nodes to the cluster is almost identical to the initial cluster node deployment described in Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”:

Tip: Prevent Rebalancing

When adding an OSD to the existing cluster, bear in mind that the cluster will be rebalancing for some time afterward. To minimize the rebalancing periods, add all OSDs you intend to add at the same time.

An additional way is to set the osd crush initial weight = 0 option in the ceph.conf file before adding the OSDs:

Add osd crush initial weight = 0 to /srv/salt/ceph/configuration/files/ceph.conf.d/global.conf.

Create the new configuration on the Salt master node:

root@master # salt 'SALT_MASTER_NODE' state.apply ceph.configuration.create

Or:

root@master # salt-call state.apply ceph.configuration.create

Apply the new configuration to the targeted OSD minions:
```
root@master # salt 'OSD_MINIONS' state.apply ceph.configuration
```
Note
If this is not a new node, but you want to proceed as if it were, ensure you remove the /etc/ceph/destroyedOSDs.yml file from the node. Otherwise, any devices from the first attempt will be restored with their previous OSD ID and reweight.
Run the following commands:
```
root@master # salt-run state.orch ceph.stage.1
root@master # salt-run state.orch ceph.stage.2
root@master # salt 'node*' state.apply ceph.osd
```
After the new OSDs are added, adjust their weights as required with the ceph osd crush reweight command in small increments. This allows the cluster to rebalance and become healthy between increasing increments so it does not overwhelm the cluster and clients accessing the cluster.

Install SUSE Linux Enterprise Server 15 SP1 on the new node and configure its network setting so that it resolves the Salt master host name correctly. Verify that it has a proper connection to both public and cluster networks, and that time synchronization is correctly configured. Then install the salt-minion package:
```
root@minion > zypper in salt-minion
```
If the Salt master's host name is different from salt, edit /etc/salt/minion and add the following:
```
master: DNS_name_of_your_salt_master
```
If you performed any changes to the configuration files mentioned above, restart the salt.minion service:
```
root@minion > systemctl restart salt-minion.service
```
On the Salt master, accept the salt key of the new node:
```
root@master # salt-key --accept NEW_NODE_KEY
```
Verify that /srv/pillar/ceph/deepsea_minions.sls targets the new Salt minion and/or set the proper DeepSea grain. Refer to Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.2.2.1 “Matching the Minion Name” or Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.3 “Cluster Deployment”, Running Deployment Stages for more details.
Run the preparation stage. It synchronizes modules and grains so that the new minion can provide all the information DeepSea expects.
```
root@master # salt-run state.orch ceph.stage.0
```
Important: Possible Restart of DeepSea stage 0
If the Salt master rebooted after its kernel update, you need to restart DeepSea stage 0.
Run the discovery stage. It will write new file entries in the /srv/pillar/ceph/proposals directory, where you can edit relevant .yml files:
```
root@master # salt-run state.orch ceph.stage.1
```
Optionally, change /srv/pillar/ceph/proposals/policy.cfg if the newly added host does not match the existing naming scheme. For details, refer to Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.5.1 “The policy.cfg File”.
Run the configuration stage. It reads everything under /srv/pillar/ceph and updates the pillar accordingly:
```
root@master # salt-run state.orch ceph.stage.2
```
Pillar stores data which you can access with the following command:
```
root@master # salt target pillar.items
```
Tip: Modifying OSD's Layout
If you want to modify the default OSD's layout and change the drive groups configuration, follow the procedure described in Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.5.2 “DriveGroups”.

The configuration and deployment stages include newly added nodes:

root@master # salt-run state.orch ceph.stage.3
root@master # salt-run state.orch ceph.stage.4

2.2 Adding New Roles to Nodes #

You can deploy all types of supported roles with DeepSea. See Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.5.1.2 “Role Assignment” for more information on supported role types and examples of matching them.

To add a new service to an existing node, follow these steps:

Adapt /srv/pillar/ceph/proposals/policy.cfg to match the existing host with a new role. For more details, refer to Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.5.1 “The policy.cfg File”. For example, if you need to run an Object Gateway on a MON node, the line is similar to:
```
role-rgw/xx/x/example.mon-1.sls
```

Run stage 2 to update the pillar:

root@master # salt-run state.orch ceph.stage.2

Run stage 3 to deploy core services, or stage 4 to deploy optional services. Running both stages does not hurt.

2.3 Removing and Reinstalling Cluster Nodes #

Tip: Removing a Cluster Node Temporarily

The Salt master expects all minions to be present in the cluster and responsive. If a minion breaks and is not responsive anymore, it causes problems to the Salt infrastructure, mainly to DeepSea and Ceph Dashboard.

Before you fix the minion, delete its key from the Salt master temporarily:

root@master # salt-key -d MINION_HOST_NAME

After the minion is fixed, add its key to the Salt master again:

root@master # salt-key -a MINION_HOST_NAME

To remove a role from a cluster, edit /srv/pillar/ceph/proposals/policy.cfg and remove the corresponding line(s). Then run stages 2 and 5 as described in Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.3 “Cluster Deployment”.

Note: Removing OSDs from Cluster

In case you need to remove a particular OSD node from your cluster, ensure that your cluster has more free disk space than the disk you intend to remove. Bear in mind that removing an OSD results in rebalancing of the whole cluster.

Before running stage 5 to do the actual removal, always check which OSDs are going to be removed by DeepSea:

root@master # salt-run rescinded.ids

When a role is removed from a minion, the objective is to undo all changes related to that role. For most of the roles, the task is simple, but there may be problems with package dependencies. If a package is uninstalled, its dependencies are not.

Removed OSDs appear as blank drives. The related tasks overwrite the beginning of the file systems and remove backup partitions in addition to wiping the partition tables.

Note: Preserving Partitions Created by Other Methods

Disk drives previously configured by other methods, such as ceph-deploy, may still contain partitions. DeepSea will not automatically destroy these. The administrator must reclaim these drives manually.

Example 2.1: Removing a Salt minion from the Cluster #

If your storage minions are named, for example, 'data1.ceph', 'data2.ceph' ... 'data6.ceph', and the related lines in your policy.cfg are similar to the following:

[...]
# Hardware Profile
role-storage/cluster/data*.sls
[...]

Then to remove the Salt minion 'data2.ceph', change the lines to the following:

[...]
# Hardware Profile
role-storage/cluster/data[1,3-6]*.sls
[...]

Also keep in mind to adapt your drive_groups.yml file to match the new targets.

    [...]
    drive_group_name:
      target: 'data[1,3-6]*'
    [...]

Then run stage 2, check which OSDs are going to be removed, and finish by running stage 5:

root@master # salt-run state.orch ceph.stage.2
root@master # salt-run rescinded.ids
root@master # salt-run state.orch ceph.stage.5

Example 2.2: Migrating Nodes #

Assume the following situation: during the fresh cluster installation, you (the administrator) allocated one of the storage nodes as a stand-alone Object Gateway while waiting for the gateway's hardware to arrive. Now the permanent hardware has arrived for the gateway and you can finally assign the intended role to the backup storage node and have the gateway role removed.

After running stages 0 and 1 (see Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.3 “Cluster Deployment”, Running Deployment Stages) for the new hardware, you named the new gateway rgw1. If the node data8 needs the Object Gateway role removed and the storage role added, and the current policy.cfg looks like this:

# Hardware Profile
role-storage/cluster/data[1-7]*.sls

# Roles
role-rgw/cluster/data8*.sls

Then change it to:

# Hardware Profile
role-storage/cluster/data[1-8]*.sls

# Roles
role-rgw/cluster/rgw1*.sls

Run stages 2 to 4, check which OSDs are going to be possibly removed, and finish by running stage 5. Stage 3 will add data8 as a storage node. For a moment, data8 will have both roles. Stage 4 will add the Object Gateway role to rgw1 and stage 5 will remove the Object Gateway role from data8:

root@master # salt-run state.orch ceph.stage.2
root@master # salt-run state.orch ceph.stage.3
root@master # salt-run state.orch ceph.stage.4
root@master # salt-run rescinded.ids
root@master # salt-run state.orch ceph.stage.5

Example 2.3: Removal of a Failed Node #

If the Salt minion is not responding and the administrator is unable to resolve the issue, we recommend removing the Salt key:

root@master # salt-key -d MINION_ID

Example 2.4: Removal of a Failed Storage Node #

When a server fails (due to network, power, or other issues), it means that all the OSDs are dead. Issue the following commands for each OSD on the failed storage node:

cephadm@adm > ceph osd purge-actual $id --yes-i-really-mean-it
cephadm@adm > ceph auth del osd.$id

Running the ceph osd purge-actual command is equivalent to the following:

cephadm@adm > ceph destroy $id
cephadm@adm > ceph osd rm $id
cephadm@adm > ceph osd crush remove osd.$id

2.4 Redeploying Monitor Nodes #

When one or more of your monitor nodes fail and are not responding, you need to remove the failed monitors from the cluster and possibly then re-add them back in the cluster.

Important: The Minimum Is Three Monitor Nodes

The number of monitor nodes must not be less than three. If a monitor node fails, and as a result your cluster has only two monitor nodes, you need to temporarily assign the monitor role to other cluster nodes before you redeploy the failed monitor nodes. After you redeploy the failed monitor nodes, you can uninstall the temporary monitor roles.

For more information on adding new nodes/roles to the Ceph cluster, see Section 2.1, “Adding New Cluster Nodes” and Section 2.2, “Adding New Roles to Nodes”.

For more information on removing cluster nodes, refer to Section 2.3, “Removing and Reinstalling Cluster Nodes”.

There are two basic degrees of a Ceph node failure:

The Salt minion host is broken either physically or on the OS level, and does not respond to the salt 'minion_name' test.ping call. In such case you need to redeploy the server completely by following the relevant instructions in Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.3 “Cluster Deployment”.
The monitor related services failed and refuse to recover, but the host responds to the salt 'minion_name' test.ping call. In such case, follow these steps:

Edit /srv/pillar/ceph/proposals/policy.cfg on the Salt master, and remove or update the lines that correspond to the failed monitor nodes so that they now point to the working monitor nodes. For example:
```
[...]
# MON
#role-mon/cluster/ses-example-failed1.sls
#role-mon/cluster/ses-example-failed2.sls
role-mon/cluster/ses-example-new1.sls
role-mon/cluster/ses-example-new2.sls
[...]
```

Run DeepSea stages 2 to 5 to apply the changes:

root@master # deepsea stage run ceph.stage.2
root@master # deepsea stage run ceph.stage.3
root@master # deepsea stage run ceph.stage.4
root@master # deepsea stage run ceph.stage.5

2.5 Verify an Encrypted OSD #

After using DeepSea to deploy an OSD, you may want to verify that the OSD is encrypted.

Check the output of ceph-volume lvm list (it should be run as root on the node where the OSDs in question are located):

root@master # ceph-volume lvm list

  ====== osd.3 =======

    [block]       /dev/ceph-d9f09cf7-a2a4-4ddc-b5ab-b1fa4096f713/osd-data-71f62502-4c85-4944-9860-312241d41bb7

        block device              /dev/ceph-d9f09cf7-a2a4-4ddc-b5ab-b1fa4096f713/osd-data-71f62502-4c85-4944-9860-312241d41bb7
        block uuid                m5F10p-tUeo-6ZGP-UjxJ-X3cd-Ec5B-dNGXvG
        cephx lockbox secret
        cluster fsid              413d9116-e4f6-4211-a53b-89aa219f1cf2
        cluster name              ceph
        crush device class        None
        encrypted                 0
        osd fsid                  f8596bf7-000f-4186-9378-170b782359dc
        osd id                    3
        type                      block
        vdo                       0
        devices                   /dev/vdb

  ====== osd.7 =======

    [block]       /dev/ceph-38914e8d-f512-44a7-bbee-3c20a684753d/osd-data-0f385f9e-ce5c-45b9-917d-7f8c08537987

        block device              /dev/ceph-38914e8d-f512-44a7-bbee-3c20a684753d/osd-data-0f385f9e-ce5c-45b9-917d-7f8c08537987
        block uuid                1y3qcS-ZG01-Y7Z1-B3Kv-PLr6-jbm6-8B79g6
        cephx lockbox secret
        cluster fsid              413d9116-e4f6-4211-a53b-89aa219f1cf2
        cluster name              ceph
        crush device class        None
        encrypted                 0
        osd fsid                  0f9a8002-4c81-4f5f-93a6-255252cac2c4
        osd id                    7
        type                      block
        vdo                       0
        devices                   /dev/vdc

Note the line that says encrypted 0. This means the OSD is not encrypted. The possible values are as follows:

  encrypted                 0  = not encrypted
  encrypted                 1  = encrypted

If you get the following error, it means the node where you are running the command does not have any OSDs on it:

root@master # ceph-volume lvm list
No valid Ceph lvm devices found

If you have deployed a cluster with an OSD for which ceph-volume lvm list shows encrypted 1, the OSD is encrypted. If you are unsure, proceed to step two.

Ceph OSD encryption-at-rest relies on the Linux kernel's dm-crypt subsystem and the Linux Unified Key Setup ("LUKS"). When creating an encrypted OSD, ceph-volume creates an encrypted logical volume and saves the corresponding dm-crypt secret key in the Ceph Monitor data store. When the OSD is to be started, ceph-volume ensures the device is mounted, retrieves the dm-crypt secret key from the Ceph Monitor's, and decrypts the underlying device. This creates a new device, containing the unencrypted data, and this is the device the Ceph OSD daemon is started on.

The OSD does not know whether the underlying logical volume is encrypted or not, there is no ceph osd command that returns this information. However, it is possible to query LUKS for it, as follows.

First, get the device of the OSD logical volume you are interested in. This can be obtained from the ceph-volume lvm list output:

block device              /dev/ceph-d9f09cf7-a2a4-4ddc-b5ab-b1fa4096f713/osd-data-71f62502-4c85-4944-9860-312241d41bb7

Then, dump the LUKS header from that device:

root@master # cryptsetup luksDump OSD_BLOCK_DEVICE

if the OSD is not encrypted, the output is as follows:

Device /dev/ceph-38914e8d-f512-44a7-bbee-3c20a684753d/osd-data-0f385f9e-ce5c-45b9-917d-7f8c08537987 is not a valid LUKS device.

If the OSD is encrypted, the output is as follows:

root@master # cryptsetup luksDump /dev/ceph-1ce61157-81be-427d-83ad-7337f05d8514/osd-data-89230c92-3ace-4685-97ff-6fa059cef63a
  LUKS header information for /dev/ceph-1ce61157-81be-427d-83ad-7337f05d8514/osd-data-89230c92-3ace-4685-97ff-6fa059cef63a

  Version:        1
  Cipher name:    aes
  Cipher mode:    xts-plain64
  Hash spec:      sha256
  Payload offset: 4096
  MK bits:        256
  MK digest:      e9 41 85 f1 1b a3 54 e2 48 6a dc c2 50 26 a5 3b 79 b0 f2 2e
  MK salt:        4c 8c 9d 1f 72 1a 88 6c 06 88 04 72 81 7b e4 bb
                  b1 70 e1 c2 7c c5 3b 30 6d f7 c8 9c 7c ca 22 7d
  MK iterations:  118940
  UUID:           7675f03b-58e3-47f2-85fc-3bafcf1e589f

  Key Slot 0: ENABLED
          Iterations:             1906500
          Salt:                   8f 1f 7f f4 eb 30 5a 22 a5 b4 14 07 cc da dc 48
                                  b5 e9 87 ef 3b 9b 24 72 59 ea 1a 0a ec 61 e6 42
          Key material offset:    8
          AF stripes:             4000
  Key Slot 1: DISABLED
  Key Slot 2: DISABLED
  Key Slot 3: DISABLED
  Key Slot 4: DISABLED
  Key Slot 5: DISABLED
  Key Slot 6: DISABLED
  Key Slot 7: DISABLED

Since decrypting the data on an encrypted OSD disk requires knowledge of the corresponding dm-crypt secret key, OSD encryption provides protection for cases when a disk drive that was used as an OSD is decommissioned, lost, or stolen.

2.6 Adding an OSD Disk to a Node #

To add a disk to an existing OSD node, verify that any partition on the disk was removed and wiped. Refer to Step 12 in Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.3 “Cluster Deployment” for more details. Adapt /srv/salt/ceph/configuration/files/drive_groups.yml accordingly (refer to Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.5.2 “DriveGroups” for details). After saving the file, run DeepSea's stage 3:

root@master # deepsea stage run ceph.stage.3

2.7 Removing an OSD #

You can remove a Ceph OSD from the cluster by running the following command:

root@master # salt-run osd.remove OSD_ID

OSD_ID needs to be a number of the OSD without the osd. prefix. For example, from osd.3 only use the digit 3.

2.7.1 Removing Multiple OSDs #

Use the same procedure as mentioned in Section 2.7, “Removing an OSD” but simply supply multiple OSD IDs:

root@master # salt-run osd.remove 2 6 11 15
Removing osd 2 on host data1
Draining the OSD
Waiting for ceph to catch up.
osd.2 is safe to destroy
Purging from the crushmap
Zapping the device


Removing osd 6 on host data1
Draining the OSD
Waiting for ceph to catch up.
osd.6 is safe to destroy
Purging from the crushmap
Zapping the device


Removing osd 11 on host data1
Draining the OSD
Waiting for ceph to catch up.
osd.11 is safe to destroy
Purging from the crushmap
Zapping the device


Removing osd 15 on host data1
Draining the OSD
Waiting for ceph to catch up.
osd.15 is safe to destroy
Purging from the crushmap
Zapping the device


2:
True
6:
True
11:
True
15:
True

2.7.2 Removing All OSDs on a Host #

To remove all OSDs on a specific host, run the following command:

root@master # salt-run osd.remove OSD_HOST_NAME

2.7.3 Removing Broken OSDs Forcefully #

There are cases when removing an OSD gracefully (see Section 2.7, “Removing an OSD”) fails. This may happen, for example, if the OSD or its journal, WAL or DB are broken, when it suffers from hanging I/O operations, or when the OSD disk fails to unmount.

root@master # salt-run osd.remove OSD_ID force=True

Tip: Hanging Mounts

If a partition is still mounted on the disk being removed, the command will exit with the 'Unmount failed - check for processes on DEVICE' message. You can then list all processes that access the file system with the fuser -m DEVICE. If fuser returns nothing, try manual unmount DEVICE and watch the output of dmesg or journalctl commands.

2.7.4 Validating OSD LVM Metadata #

After removing an OSD using the salt-run osd.remove ID or through other ceph commands, LVM metadata may not be completely removed. This means that if you want to re-deploy a new OSD, old LVM metadata would be used.

First, check if the OSD has been removed:

cephadm@osd > ceph-volume lvm list

Even if one of the OSD's has been removed successfully, it can still be listed. For example, if you removed osd.2, the following would be the output:

  ====== osd.2 =======

  [block] /dev/ceph-a2189611-4380-46f7-b9a2-8b0080a1f9fd/osd-data-ddc508bc-6cee-4890-9a42-250e30a72380

  block device /dev/ceph-a2189611-4380-46f7-b9a2-8b0080a1f9fd/osd-data-ddc508bc-6cee-4890-9a42-250e30a72380
  block uuid kH9aNy-vnCT-ExmQ-cAsI-H7Gw-LupE-cvSJO9
  cephx lockbox secret
  cluster fsid 6b6bbac4-eb11-45cc-b325-637e3ff9fa0c
  cluster name ceph
  crush device class None
  encrypted 0
  osd fsid aac51485-131c-442b-a243-47c9186067db
  osd id 2
  type block
  vdo 0
  devices /dev/sda

In this example, you can see that osd.2 is still located in /dev/sda.

Validate the LVM metadata on the OSD node:

cephadm@osd > ceph-volume inventory

The output from running ceph-volume inventory marks the /dev/sda availablity as False. For example:

  Device Path Size rotates available Model name
  /dev/sda 40.00 GB True False QEMU HARDDISK
  /dev/sdb 40.00 GB True False QEMU HARDDISK
  /dev/sdc 40.00 GB True False QEMU HARDDISK
  /dev/sdd 40.00 GB True False QEMU HARDDISK
  /dev/sde 40.00 GB True False QEMU HARDDISK
  /dev/sdf 40.00 GB True False QEMU HARDDISK
  /dev/vda 25.00 GB True False

Run the following command on the OSD node to remove the LVM metadata completely:
```
cephadm@osd > ceph-volume lvm zap --osd-id ID --destroy
```

Run the inventory command again to verify that the /dev/sda availability returns True. For example:

cephadm@osd > ceph-volume inventory
Device Path Size rotates available Model name
/dev/sda 40.00 GB True True QEMU HARDDISK
/dev/sdb 40.00 GB True False QEMU HARDDISK
/dev/sdc 40.00 GB True False QEMU HARDDISK
/dev/sdd 40.00 GB True False QEMU HARDDISK
/dev/sde 40.00 GB True False QEMU HARDDISK
/dev/sdf 40.00 GB True False QEMU HARDDISK
/dev/vda 25.00 GB True False

LVM metadata has been removed. You can safely run the dd command on the device.

The OSD can now be re-deployed without needing to reboot the OSD node:

root@master # salt-run state.orch ceph.stage.2
root@master # salt-run state.orch ceph.stage.3

2.8 Replacing an OSD Disk #

There are several reasons why you may need to replace an OSD disk, for example:

The OSD disk failed or is soon going to fail based on SMART information, and can no longer be used to store data safely.
You need to upgrade the OSD disk, for example to increase its size.
You need to change the OSD disk layout.
You plan to move from a non-LVM to a LVM-based layout.

The replacement procedure is the same for both cases. It is also valid for both default and customized CRUSH Maps.

Suppose that, for example, '5' is the ID of the OSD whose disk needs to be replaced. The following command marks it as destroyed in the CRUSH Map but leaves its original ID:
```
root@master # salt-run osd.replace 5
```
Tip: osd.replace and osd.remove
The Salt's osd.replace and osd.remove (see Section 2.7, “Removing an OSD”) commands are identical except that osd.replace leaves the OSD as 'destroyed' in the CRUSH Map while osd.remove removes all traces from the CRUSH Map.
Manually replace the failed/upgraded OSD drive.
If you want to modify the default OSD's layout and change the DriveGroups configuration, follow the procedure described in Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.5.2 “DriveGroups”.
Run the deployment stage 3 to deploy the replaced OSD disk:
```
root@master # salt-run state.orch ceph.stage.3
```

Important: Shared device failure

If a shared device for DB/WAL fails, you need to perform the replacement procedure for all OSDs that share the failed device.

2.9 Recovering a Reinstalled OSD Node #

If the operating system breaks and is not recoverable on one of your OSD nodes, follow these steps to recover it and redeploy its OSD role with cluster data untouched:

Reinstall the base SUSE Linux Enterprise operating system on the node where the OS broke. Install the salt-minion packages on the OSD node, delete the old Salt minion key on the Salt master, and register the new Salt minion's key with the Salt master. For more information on the initial deployment, see Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.3 “Cluster Deployment”.

Instead of running the whole of stage 0, run the following parts:

root@master # salt 'osd_node' state.apply ceph.sync
root@master # salt 'osd_node' state.apply ceph.packages.common
root@master # salt 'osd_node' state.apply ceph.mines
root@master # salt 'osd_node' state.apply ceph.updates

Copy the ceph.conf to the OSD node, and then activate the OSD:

root@master # salt 'osd_node' state.apply ceph.configuration
root@master # salt 'osd_node' cmd.run "ceph-volume lvm activate --all"

Verify activation with one of the following commands:

root@master # ceph -s
# OR
root@master # ceph osd tree

To ensure consistency across the cluster, run the DeepSea stages in the following order:

root@master # salt-run state.orch ceph.stage.1
root@master # salt-run state.orch ceph.stage.2
root@master # salt-run state.orch ceph.stage.3
root@master # salt-run state.orch ceph.stage.4
root@master # salt-run state.orch ceph.stage.5
root@master # salt-run state.orch ceph.stage.0

Run DeepSea stage 0:

root@master # salt-run state.orch ceph.stage.0

Reboot the relevant OSD node. All OSD disks will be rediscovered and reused.

Get Prometheus' node exporter installed/running:

root@master # salt 'RECOVERED_MINION' \
 state.apply ceph.monitoring.prometheus.exporters.node_exporter

Remove unnecessary Salt grains (best after all OSDs have been migrated to LVM):
```
root@master # salt -I roles:storage grains.delkey ceph
```

2.10 Moving the Admin Node to a New Server #

If you need to replace the Admin Node host with a new one, you need to move the Salt master and DeepSea files. Use your favorite synchronization tool for transferring the files. In this procedure, we use rsync because it is a standard tool available in SUSE Linux Enterprise Server 15 SP1 software repositories.

Stop salt-master and salt-minion services on the old Admin Node:

root@master # systemctl stop salt-master.service
root@master # systemctl stop salt-minion.service

Configure Salt on the new Admin Node so that the Salt master and Salt minions communicate. Find more details in Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.3 “Cluster Deployment”.
Tip: Transition of Salt Minions
To ease the transition of Salt minions to the new Admin Node, remove the original Salt master's public key from each of them:
```
root@minion > rm /etc/salt/pki/minion/minion_master.pub
root@minion > systemctl restart salt-minion.service
```
Verify that the deepsea package is installed and install it if required.
```
root@master # zypper install deepsea
```
Customize the policy.cfg file by changing the role-master line. Find more details in Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.5.1 “The policy.cfg File”.
Synchronize /srv/pillar and /srv/salt directories from the old Admin Node to the new one.
Tip: rsync Dry Run and Symbolic Links
If possible, try synchronizing the files in a dry run first to see which files will be transferred (rsync's option -n). Also, include symbolic links (rsync's option -a). For rsync, the synchronize command will look as follows:
```
root@master # rsync -avn /srv/pillar/ NEW-ADMIN-HOSTNAME:/srv/pillar
```
If you made custom changes to files outside of /srv/pillar and /srv/salt, for example in /etc/salt/master or /etc/salt/master.d, synchronize them as well.
Now you can run DeepSea stages from the new Admin Node. Refer to Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.2 “Introduction to DeepSea” for their detailed description.

2.11 Automated Installation via Salt #

The installation can be automated by using the Salt reactor. For virtual environments or consistent hardware environments, this configuration will allow the creation of a Ceph cluster with the specified behavior.

Warning

Salt cannot perform dependency checks based on reactor events. There is a real risk of putting your Salt master into a death spiral.

The automated installation requires the following:

A properly created /srv/pillar/ceph/proposals/policy.cfg.
Prepared custom global.yml placed to the /srv/pillar/ceph/stack directory.

The default reactor configuration will only run stages 0 and 1. This allows testing of the reactor without waiting for subsequent stages to complete.

When the first salt-minion starts, stage 0 will begin. A lock prevents multiple instances. When all minions complete stage 0, stage 1 will begin.

If the operation is performed properly, edit the file

/etc/salt/master.d/reactor.conf

and replace the following line

- /srv/salt/ceph/reactor/discovery.sls

with

- /srv/salt/ceph/reactor/all_stages.sls

Verify that the line is not commented out.

2.12 Updating the Cluster Nodes #

Keep the Ceph cluster nodes up-to-date by applying rolling updates regularly.

2.12.1 Software Repositories #

Before patching the cluster with the latest software packages, verify that all the cluster's nodes have access to the relevant repositories. Refer to Book “Deployment Guide”, Chapter 6 “Upgrading from Previous Releases”, Section 6.5.1 “Manual Node Upgrade Using the Installer DVD” for a complete list of the required repositories.

2.12.2 Repository Staging #

If you use a staging tool—for example, SUSE Manager, Subscription Management Tool, or Repository Mirroring Tool—that serves software repositories to the cluster nodes, verify that stages for both 'Updates' repositories for SUSE Linux Enterprise Server and SUSE Enterprise Storage are created at the same point in time.

We strongly recommend to use a staging tool to apply patches which have frozen or staged patch levels. This ensures that new nodes joining the cluster have the same patch level as the nodes already running in the cluster. This way you avoid the need to apply the latest patches to all the cluster's nodes before new nodes can join the cluster.

2.12.3 `zypper patch` or `zypper dup` #

By default, cluster nodes are upgraded using the zypper dup command. If you prefer to update the system using zypper patch instead, edit /srv/pillar/ceph/stack/global.yml and add the following line:

update_method_init: zypper-patch

2.12.4 Cluster Node Reboots #

During the update, cluster nodes may be optionally rebooted if their kernel was upgraded by the update. If you want to eliminate the possibility of a forced reboot of potentially all nodes, either verify that the latest kernel is installed and running on Ceph nodes, or disable automatic node reboots as described in Book “Deployment Guide”, Chapter 7 “Customizing the Default Configuration”, Section 7.1.5 “Updates and Reboots during Stage 0”.

2.12.5 Downtime of Ceph Services #

Depending on the configuration, cluster nodes may be rebooted during the update as described in Section 2.12.4, “Cluster Node Reboots”. If there is a single point of failure for services such as Object Gateway, Samba Gateway, NFS Ganesha, or iSCSI, the client machines may be temporarily disconnected from services whose nodes are being rebooted.

2.12.6 Running the Update #

To update the software packages on all cluster nodes to the latest version, follow these steps:

Update the deepsea, salt-master, and salt-minion packages and restart relevant services on the Salt master:
```
root@master # salt -I 'roles:master' state.apply ceph.updates.master
```
Update and restart the salt-minion package on all cluster nodes:
```
root@master # salt -I 'cluster:ceph' state.apply ceph.updates.salt
```
Update all other software packages on the cluster:
```
root@master # salt-run state.orch ceph.stage.0
```

Restart Ceph related services:

root@master # salt-run state.orch ceph.restart

2.13 Halting or Rebooting Cluster #

In some cases it may be necessary to halt or reboot the whole cluster. We recommended carefully checking for dependencies of running services. The following steps provide an outline for stopping and starting the cluster:

Tell the Ceph cluster not to mark OSDs as out:
```
cephadm@adm > ceph osd set noout
```
Stop daemons and nodes in the following order:
1. Storage clients
2. Gateways, for example NFS Ganesha or Object Gateway
3. Metadata Server
4. Ceph OSD
5. Ceph Manager
6. Ceph Monitor
If required, perform maintenance tasks.
Start the nodes and servers in the reverse order of the shutdown process:
1. Ceph Monitor
2. Ceph Manager
3. Ceph OSD
4. Metadata Server
5. Gateways, for example NFS Ganesha or Object Gateway
6. Storage clients
Remove the noout flag:
```
cephadm@adm > ceph osd unset noout
```

2.14 Adjusting `ceph.conf` with Custom Settings #

If you need to put custom settings into the ceph.conf file, you can do so by modifying the configuration files in the /srv/salt/ceph/configuration/files/ceph.conf.d directory:

global.conf
mon.conf
mgr.conf
mds.conf
osd.conf
client.conf
rgw.conf

Note: Unique `rgw.conf`

The Object Gateway offers a lot of flexibility and is unique compared to the other ceph.conf sections. All other Ceph components have static headers such as [mon] or [osd]. The Object Gateway has unique headers such as [client.rgw.rgw1]. This means that the rgw.conf file needs a header entry. For examples, see

/srv/salt/ceph/configuration/files/rgw.conf

/srv/salt/ceph/configuration/files/rgw-ssl.conf

See Section 26.7, “Enabling HTTPS/SSL for Object Gateways” for more examples.

Important: Run stage 3

After you make custom changes to the above mentioned configuration files, run stages 3 and 4 to apply these changes to the cluster nodes:

root@master # salt-run state.orch ceph.stage.3
root@master # salt-run state.orch ceph.stage.4

These files are included from the /srv/salt/ceph/configuration/files/ceph.conf.j2 template file, and correspond to the different sections that the Ceph configuration file accepts. Putting a configuration snippet in the correct file enables DeepSea to place it into the correct section. You do not need to add any of the section headers.

Tip

To apply any configuration options only to specific instances of a daemon, add a header such as [osd.1]. The following configuration options will only be applied to the OSD daemon with the ID 1.

2.14.1 Overriding the Defaults #

Later statements in a section overwrite earlier ones. Therefore it is possible to override the default configuration as specified in the /srv/salt/ceph/configuration/files/ceph.conf.j2 template. For example, to turn off cephx authentication, add the following three lines to the /srv/salt/ceph/configuration/files/ceph.conf.d/global.conf file:

auth cluster required = none
auth service required = none
auth client required = none

When redefining the default values, Ceph related tools such as rados may issue warnings that specific values from the ceph.conf.j2 were redefined in global.conf. These warnings are caused by one parameter assigned twice in the resulting ceph.conf.

As a workaround for this specific case, follow these steps:

Change the current directory to /srv/salt/ceph/configuration/create:
```
root@master # cd /srv/salt/ceph/configuration/create
```
Copy default.sls to custom.sls:
```
root@master # cp default.sls custom.sls
```
Edit custom.sls and change ceph.conf.j2 to custom-ceph.conf.j2.
Change current directory to /srv/salt/ceph/configuration/files:
```
root@master # cd /srv/salt/ceph/configuration/files
```

Copy ceph.conf.j2 to custom-ceph.conf.j2:

root@master # cp ceph.conf.j2 custom-ceph.conf.j2

Edit custom-ceph.conf.j2 and delete the following line:
```
{% include "ceph/configuration/files/rbd.conf" %}
```
Edit global.yml and add the following line:
```
configuration_create: custom
```

Refresh the pillar:

root@master # salt target saltutil.pillar_refresh

Run stage 3:

root@master # salt-run state.orch ceph.stage.3

Now you should have only one entry for each value definition. To re-create the configuration, run:

root@master # salt-run state.orch ceph.configuration.create

and then verify the contents of /srv/salt/ceph/configuration/cache/ceph.conf.

2.14.2 Including Configuration Files #

If you need to apply a lot of custom configurations, use the following include statements within the custom configuration files to make file management easier. Following is an example of the osd.conf file:

[osd.1]
{% include "ceph/configuration/files/ceph.conf.d/osd1.conf" ignore missing %}
[osd.2]
{% include "ceph/configuration/files/ceph.conf.d/osd2.conf" ignore missing %}
[osd.3]
{% include "ceph/configuration/files/ceph.conf.d/osd3.conf" ignore missing %}
[osd.4]
{% include "ceph/configuration/files/ceph.conf.d/osd4.conf" ignore missing %}

In the previous example, the osd1.conf, osd2.conf, osd3.conf, and osd4.conf files contain the configuration options specific to the related OSD.

Tip: Runtime Configuration

Changes made to Ceph configuration files take effect after the related Ceph daemons restart. See Section 25.1, “Runtime Configuration” for more information on changing the Ceph runtime configuration.

2.15 Enabling AppArmor Profiles #

AppArmor is a security solution that confines programs by a specific profile. For more details, refer to https://documentation.suse.com/sles/15-SP1/html/SLES-all/part-apparmor.html.

DeepSea provides three states for AppArmor profiles: 'enforce', 'complain', and 'disable'. To activate a particular AppArmor state, run:

salt -I "deepsea_minions:*" state.apply ceph.apparmor.default-STATE

To put the AppArmor profiles in an 'enforce' state:

root@master # salt -I "deepsea_minions:*" state.apply ceph.apparmor.default-enforce

To put the AppArmor profiles in a 'complain' state:

root@master # salt -I "deepsea_minions:*" state.apply ceph.apparmor.default-complain

To disable the AppArmor profiles:

root@master # salt -I "deepsea_minions:*" state.apply ceph.apparmor.default-disable

Tip: Enabling the AppArmor Service

Each of these three calls verifies if AppArmor is installed and installs it if not, and starts and enables the related systemd service. DeepSea will warn you if AppArmor was installed and started/enabled in another way and therefore runs without DeepSea profiles.

2.16 Deactivating Tuned Profiles #

By default, DeepSea deploys Ceph clusters with tuned profiles active on Ceph Monitor, Ceph Manager, and Ceph OSD nodes. In some cases, you may need to permanently deactivate tuned profiles. To do so, put the following lines in /srv/pillar/ceph/stack/global.yml and re-run stage 3:

alternative_defaults:
 tuned_mgr_init: default-off
 tuned_mon_init: default-off
 tuned_osd_init: default-off

root@master # salt-run state.orch ceph.stage.3

2.17 Removing an Entire Ceph Cluster #

The ceph.purge runner removes the entire Ceph cluster. This way you can clean the cluster environment when testing different setups. After the ceph.purge completes, the Salt cluster is reverted back to the state at the end of DeepSea stage 1. You can then either change the policy.cfg (see Book “Deployment Guide”, Chapter 5 “Deploying with DeepSea/Salt”, Section 5.5.1 “The policy.cfg File”), or proceed to DeepSea stage 2 with the same setup.

To prevent accidental deletion, the orchestration checks if the safety is disengaged. You can disengage the safety measures and remove the Ceph cluster by running:

root@master # salt-run disengage.safety
root@master # salt-run state.orch ceph.purge

Tip: Disabling Ceph Cluster Removal

If you want to prevent anyone from running the ceph.purge runner, create a file named disabled.sls in the /srv/salt/ceph/purge directory and insert the following line in the /srv/pillar/ceph/stack/global.yml file:

purge_init: disabled

Important: Rescind Custom Roles

If you previously created custom roles for Ceph Dashboard (refer to Section 6.6, “Adding Custom Roles” and Section 14.2, “User Roles and Permissions” for detailed information), you need to take manual steps to purge them before running the ceph.purge runner. For example, if the custom role for Object Gateway is named 'us-east-1', then follow these steps:

root@master # cd /srv/salt/ceph/rescind
root@master # rsync -a rgw/ us-east-1
root@master # sed -i 's!rgw!us-east-1!' us-east-1/*.sls