Changing the Configuration of SBD
- WHAT?
How to change a High Availability cluster's SBD configuration when the SBD service is already running.
- WHY?
You might need to change the cluster's SBD configuration for various reasons, such as increasing resilience, using custom settings instead of the defaults, or switching to a different STONITH mechanism.
- EFFORT
Each task in this article only takes a few minutes and does not require any downtime for cluster resources.
- REQUIREMENTS
An existing SUSE Linux Enterprise High Availability cluster
SBD already configured and running
A hardware watchdog device on all cluster nodes
Shared storage accessible from all nodes (if using disk-based SBD)
1 What is SBD? #
SBD (STONITH Block Device) provides a node fencing mechanism without using an external power-off device. The software component (the SBD daemon) works together with a watchdog device to ensure that misbehaving nodes are fenced. SBD can be used in disk-based mode with shared block storage, or in diskless mode using only the watchdog.
Disk-based SBD uses shared block storage to exchange fencing messages between the nodes. It can be used with one to three devices. One device is appropriate for simple cluster setups, but two or three devices are recommended for more complex setups or critical workloads.
Diskless SBD fences nodes by using only the watchdog, without relying on a shared storage device. A node is fenced if it loses quorum, if any monitored daemon is lost and cannot be recovered, or if Pacemaker determines that the node requires fencing.
1.1 Components #
- SBD daemon
The SBD daemon starts on each node before the rest of the cluster stack and stops in the reverse order. This ensures that cluster resources are never active without SBD supervision.
- SBD device (disk-based SBD)
A small logical unit (or a small partition on a logical unit) is formatted for use with SBD. A message layout is created on the device with slots for up to 255 nodes.
- Messages (disk-based SBD)
The message layout on the SBD device is used to send fencing messages to nodes. The SBD daemon on each node monitors the message slot and immediately complies with any requests. To avoid becoming disconnected from fencing messages, the SBD daemon also fences the node if it loses its connection to the SBD device.
- Watchdog
SBD needs a watchdog on each node to ensure that misbehaving nodes are really stopped. SBD “feeds” the watchdog by regularly writing a service pulse to it. If SBD stops feeding the watchdog, the hardware enforces a system restart. This protects against failures of the SBD process itself, such as becoming stuck on an I/O error.
1.2 Limitations and recommendations #
- Disk-based SBD
The shared storage can be Fibre Channel (FC), Fibre Channel over Ethernet (FCoE), or iSCSI.
The shared storage must not use host-based RAID, LVM, Cluster MD, or DRBD.
Using storage-based RAID and multipathing is recommended for increased reliability.
If a shared storage device has different
/dev/sdXnames on different nodes, SBD communication will fail. To avoid this, always use stable device names, such as/dev/disk/by-id/DEVICE_ID.An SBD device can be shared between different clusters, up to a limit of 255 nodes.
When using more than one SBD device, all devices must have the same configuration.
- Diskless SBD
Diskless SBD cannot handle a split-brain scenario for a two-node cluster. This configuration should only be used for clusters with more than two nodes, or in combination with QDevice to help handle split-brain scenarios.
1.3 For more information #
For more information, see the man page sbd or run the
crm sbd help command.
2 Changing the SBD timeout settings #
SBD relies on multiple different timeout settings to manage node fencing. When you
configure SBD using the CRM Shell, these timeouts are automatically calculated and
adjusted. The automatic values are sufficient for most use cases, but if you need to
change them, you can use the crm sbd configure command.
In this procedure, the script checks whether it is safe to restart the cluster services automatically. If any non-STONITH resources are running, the script warns you to restart the cluster services manually. This allows you to put the cluster into maintenance mode first to avoid resource downtime. However, be aware that the resources will not have cluster protection while in maintenance mode.
SBD is already configured and running.
Perform this procedure on only one node in the cluster:
Log in either as the
rootuser or as a user withsudoprivileges.Check the current timeout settings:
>sudo crm sbd configure showChange one or both of the following timeout values as needed:
>sudo crm sbd configure \watchdog-timeout=INTEGER_IN_SECONDS \1msgwait-timeout=INTEGER_IN_SECONDS \2If you change one timeout, the other timeout is automatically adjusted so that the
msgwait-timeoutis twice thewatchdog-timeout. You only need to change both timeouts manually if you want themsgwait-timeoutto be more than double thewatchdog-timeout. If you try to make themsgwait-timeoutless than double thewatchdog-timeout, the command fails with a warning.The
watchdog-timeoutdefines how long the watchdog waits for a response from SBD before fencing the node. Diskless SBD reads this timeout from/etc/sysconfig/sbd, but disk-based SBD reads it from the device metadata, which takes precedence over the settings in/etc/sysconfig/sbd.For disk-based SBD on a multipath setup, this timeout must be longer than the
max_polling_intervalin/etc/multipath.conf, to allow enough time to detect a path failure and switch to the next path.Only used for disk-based SBD. After the
msgwait-timeoutis reached, SBD assumes that a message written to the node's slot on the SBD device was delivered successfully. The timeout must be long enough for the node to detect that it needs to self-fence.When you increase this timeout, the script also automatically adjusts the
SBD_DELAY_STARTsetting. This helps to avoid a situation where a node reboots too quickly and rejoins the cluster before the fencing action is considered complete, which can cause a split-brain scenario.You should not need to change the
allocate-timeoutor theloop-timeout.The script automatically adjusts any other related timeouts in the cluster and displays the new values. The script also checks whether it is safe to restart the cluster services automatically. If any non-STONITH resources are running, the script warns you to restart the cluster services manually.
If you need to restart the cluster services manually, follow these steps to avoid resource downtime:
Put the cluster into maintenance mode:
>sudo crm maintenance onIn this state, the cluster stops monitoring all resources. This allows the services managed by the resources to keep running while the cluster restarts. However, be aware that the resources will not have cluster protection while in maintenance mode.
Restart the cluster services on all nodes:
>sudo crm cluster restart --allCheck the status of the cluster:
>sudo crm statusThe nodes will have the status
UNCLEAN (offline), but will soon change toOnline.When the nodes are back online, put the cluster back into normal operation:
>sudo crm maintenance off
When you change a timeout with
crm sbd configure, the global STONITH timeouts are also adjusted automatically. The automatic values are sufficient for most use cases, but if you need to change them, you can use thecrm configure propertycommand:>sudo crm configure property stonith-timeout=INTEGER_IN_SECONDS1>sudo crm configure property stonith-watchdog-timeout=INTEGER_IN_SECONDS2This command does not automatically adjust any other timeouts, and these settings might be overwritten if you change the SBD configuration again.
Confirm that the timeout settings changed:
>sudo crm sbd configure show
If you need to manually calculate any timeouts, you can use these basic formulas for most use cases:
- Disk-based SBD
msgwait-timeout >= (watchdog-timeout * 2)stonith-timeout >= msgwait-timeout + 20%- Diskless SBD
stonith-watchdog-timeout >= (watchdog-timeout * 2)stonith-timeout >= stonith-watchdog-timeout + 20%
For more information, run the crm help TimeoutFormulas command.
3 Changing the SBD watchdog device #
Use this procedure to change the watchdog device that SBD uses. This can be useful if your system has multiple hardware watchdogs available or if you need to switch from the software watchdog to a hardware watchdog.
In this procedure, the script checks whether it is safe to restart the cluster services automatically. If any non-STONITH resources are running, the script warns you to restart the cluster services manually. This allows you to put the cluster into maintenance mode first to avoid resource downtime. However, be aware that the resources will not have cluster protection while in maintenance mode.
SBD is already configured and running.
All nodes have the new watchdog device available.
Perform this procedure on only one node in the cluster:
Log in either as the
rootuser or as a user withsudoprivileges.Check the available watchdog devices:
>sudo sbd query-watchdogThe output shows which watchdog is being used by SBD and which other watchdogs are available.
Change the SBD watchdog device, specifying either the device name (for example,
/dev/watchdog1) or the driver name (for example,iTCO_wdt):>sudo crm sbd configure watchdog-device=WATCHDOGThe script updates the SBD configuration file with the new watchdog device and checks whether it is safe to restart the cluster services automatically. If any non-STONITH resources are running, the script warns you to restart the cluster services manually.
If you need to restart the cluster services manually, follow these steps to avoid resource downtime:
Put the cluster into maintenance mode:
>sudo crm maintenance onIn this state, the cluster stops monitoring all resources. This allows the services managed by the resources to keep running while the cluster restarts. However, be aware that the resources will not have cluster protection while in maintenance mode.
Restart the cluster services on all nodes:
>sudo crm cluster restart --allCheck the status of the cluster:
>sudo crm statusThe nodes will have the status
UNCLEAN (offline), but will soon change toOnline.When the nodes are back online, put the cluster back into normal operation:
>sudo crm maintenance off
Check the watchdog devices again:
>sudo sbd query-watchdogThe output should now show the new watchdog device being used by SBD.
Check that the correct watchdog is configured on all nodes:
>sudo crm sbd statusIn the section
Watchdog info, the watchdog device and driver should be the same on every node.
4 Changing diskless SBD to disk-based SBD #
Use this procedure to change diskless SBD to disk-based SBD.
In this procedure, the setup script automatically puts the cluster into maintenance mode and restarts the cluster services. In maintenance mode, the cluster stops monitoring all resources. This allows the services managed by the resources to keep running even during the cluster restart. However, be aware that the resources will not have cluster protection while in maintenance mode.
Make sure any device you want to use for SBD does not hold any important data. Configuring a device for use with SBD overwrites the existing data.
Diskless SBD is configured and running.
All nodes can access shared storage.
The path to the shared storage device is consistent across all nodes. Use stable device names such as
/dev/disk/by-id/DEVICE_ID.
Perform this procedure on only one node in the cluster:
Log in either as the
rootuser or as a user withsudoprivileges.Check the status of SBD:
>sudo crm sbd status# Type of SBD: Diskless SBD configuredConfigure disk-based SBD. Use
--force(or-F) to allow you to reconfigure SBD even when it is already running, and--sbd-device(or-s) to specify the shared storage device:>sudo crm --force cluster init sbd --sbd-device /dev/disk/by-id/DEVICE_IDYou can use
--sbd-device(or-s) multiple times to configure up to three SBD devices.The script initializes SBD on the shared storage device, creates a
stonith:fence_sbdcluster resource, and updates the SBD configuration file and timeout settings. The script also puts the cluster into maintenance mode, restarts the cluster services, then puts the cluster back into normal operation.Check the status of the cluster:
>sudo crm statusThe nodes should be
Onlineand the resourcesStarted.Check the status of SBD:
>sudo crm sbd status# Type of SBD: Disk-based SBD configured
5 Changing disk-based SBD to diskless SBD #
Use this procedure to change disk-based SBD to diskless SBD.
Diskless SBD cannot handle a split-brain scenario for a two-node cluster. This configuration should only be used for clusters with more than two nodes, or in combination with QDevice to help handle split-brain scenarios.
In this procedure, the setup script automatically puts the cluster into maintenance mode and restarts the cluster services. In maintenance mode, the cluster stops monitoring all resources. This allows the services managed by the resources to keep running even during the cluster restart. However, be aware that the resources will not have cluster protection while in maintenance mode.
Disk-based SBD is configured and running.
Perform this procedure on only one node in the cluster:
Log in either as the
rootuser or as a user withsudoprivileges.Check the status of SBD:
>sudo crm sbd status# Type of SBD: Disk-based SBD configuredConfigure diskless SBD. Use
--force(or-F) to allow you to reconfigure SBD even when it is already running, and--enable-sbd(or-S) to specify that no device is needed:>sudo crm --force cluster init sbd --enable-sbdThe script stops and removes the
stonith:fence_sbdcluster resource, then updates the SBD configuration file and timeout settings. The script also puts the cluster into maintenance mode, restarts the cluster services, then puts the cluster back into normal operation.Check the status of the cluster:
>sudo crm statusThe nodes should be
Onlineand the resourcesStarted.Check the status of SBD:
>sudo crm sbd status# Type of SBD: Diskless SBD configured
6 Adding another SBD device #
Use this procedure to add more SBD devices to a cluster that already has disk-based SBD configured. The cluster can have up to three SBD devices.
In this procedure, the script checks whether it is safe to restart the cluster services automatically. If any non-STONITH resources are running, the script warns you to restart the cluster services manually. This allows you to put the cluster into maintenance mode first to avoid resource downtime. However, be aware that the resources will not have cluster protection while in maintenance mode.
Disk-based SBD is already configured and running with at least one device.
An additional shared storage device is accessible from all cluster nodes.
Perform this procedure on only one node in the cluster:
Log in either as the
rootuser or as a user withsudoprivileges.Check which device or devices are already configured for use with SBD:
>sudo crm sbd configure show sysconfigThe output shows one or more device IDs in the
SBD_DEVICEline.Add a new device to the existing SBD configuration:
>sudo crm sbd device add /dev/disk/by-id/DEVICE_IDThe script initializes SBD on the new device, updates the SBD configuration file, and checks whether it is safe to restart the cluster services automatically. If any non-STONITH resources are running, the script warns you to restart the cluster services manually.
If you need to restart the cluster services manually, follow these steps to avoid resource downtime:
Put the cluster into maintenance mode:
>sudo crm maintenance onIn this state, the cluster stops monitoring all resources. This allows the services managed by the resources to keep running while the cluster restarts. However, be aware that the resources will not have cluster protection while in maintenance mode.
Restart the cluster services on all nodes:
>sudo crm cluster restart --allCheck the status of the cluster:
>sudo crm statusThe nodes will have the status
UNCLEAN (offline), but will soon change toOnline.When the nodes are back online, put the cluster back into normal operation:
>sudo crm maintenance off
Check the SBD configuration again:
>sudo crm sbd configure show sysconfigThe output should now show more devices.
Check the status of SBD to make sure all the nodes can see the new device:
>sudo crm sbd status
7 Replacing an existing SBD device with a new device #
If you need to replace an SBD device, you can use crm sbd device add
to add the new device and crm sbd device remove to remove the old device.
If the cluster has two SBD devices, you can run these commands in any order. However, if
the cluster has one or three SBD devices, you must run these commands in a specific order:
One device:
crm sbd device removecannot remove the only device, so you must add the new device before you can remove the old device.Three devices:
crm sbd device addcannot add a fourth device, so you must remove the old device before you can add the new device.
In this procedure, the cluster services must be restarted twice: once after adding the new device and once after removing the old device. We recommend putting the cluster into maintenance mode first to avoid resource downtime. However, be aware that the resources will not have cluster protection while in maintenance mode.
Disk-based SBD is already configured and running with at least one device.
An additional shared storage device is accessible from all cluster nodes.
Perform this procedure on only one node in the cluster:
Log in either as the
rootuser or as a user withsudoprivileges.Put the cluster into maintenance mode:
>sudo crm maintenance onIn this state, the cluster stops monitoring all resources. This allows the services managed by the resources to keep running while the cluster restarts. However, be aware that the resources will not have cluster protection while in maintenance mode.
Check how many devices are already configured for use with SBD:
>sudo crm sbd configure show sysconfigThe output shows one or more device IDs in the
SBD_DEVICEline. The number of devices determines the order of the next steps.Add or remove a device, depending on the number of devices shown in Step 3:
- One device:
Add the new device to the existing SBD configuration:
>sudo crm sbd device add /dev/disk/by-id/DEVICE_IDThe script restarts the cluster services automatically.
- Two or three devices:
Remove the old device from the SBD configuration:
>sudo crm sbd device remove /dev/disk/by-id/DEVICE_IDThe script warns you to restart the cluster services manually.
If you need to restart the cluster services manually, run the following command:
>sudo crm cluster restart --allCheck the status of the cluster:
>sudo crm statusThe nodes will have the status
UNCLEAN (offline), but will soon change toOnline.Add or remove a device, depending on the number of devices shown in Step 3:
- One device:
Remove the old device from the SBD configuration:
>sudo crm sbd device remove /dev/disk/by-id/DEVICE_IDThe script warns you to restart the cluster services manually.
- Two or three devices:
Add the new device to the existing SBD configuration:
>sudo crm sbd device add /dev/disk/by-id/DEVICE_IDThe script restarts the cluster services automatically.
If you need to restart the cluster services manually, run the following command:
>sudo crm cluster restart --allCheck the status of the cluster:
>sudo crm statusThe nodes will have the status
UNCLEAN (offline), but will soon change toOnline.When the nodes are back online, put the cluster back into normal operation:
>sudo crm maintenance offCheck the SBD configuration again:
>sudo crm sbd configure show sysconfigThe output should now show the new device in the
SBD_DEVICEline.Check the status of SBD to make sure the correct device is listed:
>sudo crm sbd status
8 Removing an SBD device #
Use this procedure to remove an SBD device from a cluster with multiple SBD devices configured. You cannot use this method if there is only one SBD device configured.
In this procedure, the script checks whether it is safe to restart the cluster services automatically. If any non-STONITH resources are running, the script warns you to restart the cluster services manually. This allows you to put the cluster into maintenance mode first to avoid resource downtime. However, be aware that the resources will not have cluster protection while in maintenance mode.
Disk-based SBD is configured with more than one device.
Perform this procedure on only one node in the cluster:
Log in either as the
rootuser or as a user withsudoprivileges.Check which devices are already configured for use with SBD:
>sudo crm sbd configure show sysconfigThe output shows multiple device IDs in the
SBD_DEVICEline.Remove a device from the SBD configuration:
>sudo crm sbd device remove /dev/disk/by-id/DEVICE_IDThe script removes the device, updates the SBD configuration file, and checks whether it is safe to restart the cluster services automatically. If any non-STONITH resources are running, the script warns you to restart the cluster services manually.
If you need to restart the cluster services manually, follow these steps to avoid resource downtime:
Put the cluster into maintenance mode:
>sudo crm maintenance onIn this state, the cluster stops monitoring all resources. This allows the services managed by the resources to keep running while the cluster restarts. However, be aware that the resources will not have cluster protection while in maintenance mode.
Restart the cluster services on all nodes:
>sudo crm cluster restart --allCheck the status of the cluster:
>sudo crm statusThe nodes will have the status
UNCLEAN (offline), but will soon change toOnline.When the nodes are back online, put the cluster back into normal operation:
>sudo crm maintenance off
Check the SBD configuration again:
>sudo crm sbd configure show sysconfigThe output should now show fewer devices.
Check the status of SBD to make sure the device was removed from all the nodes:
>sudo crm sbd status
9 Removing all SBD configuration #
Use this procedure to remove all SBD-related configuration from the cluster. You might need to do this if you want to switch from SBD to a physical STONITH device. Keep in mind that to be supported, all SUSE Linux Enterprise High Availability clusters must have either SBD or a physical STONITH device configured.
In this procedure, the script checks whether it is safe to restart the cluster services automatically. If any non-STONITH resources are running, the script warns you to restart the cluster services manually. This allows you to put the cluster into maintenance mode first to avoid resource downtime. However, be aware that the resources will not have cluster protection while in maintenance mode.
Perform this procedure on only one node in the cluster:
Log in either as the
rootuser or as a user withsudoprivileges.Remove the SBD configuration from the cluster:
>sudo crm sbd purgeThe script stops the SBD service on all nodes, moves the SBD configuration file to a backup file, and adjusts any SBD-related cluster properties. The script also checks whether it is safe to restart the cluster services automatically. If any non-STONITH resources are running, the script warns you to restart the cluster services manually.
If you need to restart the cluster services manually, follow these steps to avoid resource downtime:
Put the cluster into maintenance mode:
>sudo crm maintenance onIn this state, the cluster stops monitoring all resources. This allows the services managed by the resources to keep running while the cluster restarts. However, be aware that the resources will not have cluster protection while in maintenance mode.
Restart the cluster services on all nodes:
>sudo crm cluster restart --allCheck the status of the cluster:
>sudo crm statusThe nodes will have the status
UNCLEAN (offline), but will soon change toOnline.When the nodes are back online, put the cluster back into normal operation:
>sudo crm maintenance off
Confirm that the SBD configuration is gone:
>sudo crm sbd statusERROR: SBD configuration file /etc/sysconfig/sbd not foundCheck the cluster configuration:
>sudo crm configure showThe output should show
stonith-enabled=falseand no other SBD-related properties.
10 Legal Notice #
Copyright© 2006–2025 SUSE LLC and contributors. All rights reserved.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or (at your option) version 1.3; with the Invariant Section being this copyright notice and license. A copy of the license version 1.2 is included in the section entitled “GNU Free Documentation License”.
For SUSE trademarks, see https://www.suse.com/company/legal/. All other third-party trademarks are the property of their respective owners. Trademark symbols (®, ™ etc.) denote trademarks of SUSE and its affiliates. Asterisks (*) denote third-party trademarks.
All information found in this book has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither SUSE LLC, its affiliates, the authors, nor the translators shall be held liable for possible errors or the consequences thereof.