This is a draft document that was built and uploaded automatically. It may document beta software and be incomplete or even incorrect. Use this document at your own risk.

Jump to content
Changing the Configuration of SBD
SUSE Linux Enterprise High Availability 16.0

Changing the Configuration of SBD

Publication Date: 24 Oct 2025
WHAT?

How to change a High Availability cluster's SBD configuration when the SBD service is already running.

WHY?

You might need to change the cluster's SBD configuration for various reasons, such as increasing resilience, using custom settings instead of the defaults, or switching to a different STONITH mechanism.

EFFORT

Each task in this article only takes a few minutes and does not require any downtime for cluster resources.

REQUIREMENTS
  • An existing SUSE Linux Enterprise High Availability cluster

  • SBD already configured and running

  • A hardware watchdog device on all cluster nodes

  • Shared storage accessible from all nodes (if using disk-based SBD)

1 What is SBD?

SBD (STONITH Block Device) provides a node fencing mechanism without using an external power-off device. The software component (the SBD daemon) works together with a watchdog device to ensure that misbehaving nodes are fenced. SBD can be used in disk-based mode with shared block storage, or in diskless mode using only the watchdog.

Disk-based SBD uses shared block storage to exchange fencing messages between the nodes. It can be used with one to three devices. One device is appropriate for simple cluster setups, but two or three devices are recommended for more complex setups or critical workloads.

Diskless SBD fences nodes by using only the watchdog, without relying on a shared storage device. A node is fenced if it loses quorum, if any monitored daemon is lost and cannot be recovered, or if Pacemaker determines that the node requires fencing.

1.1 Components

SBD daemon

The SBD daemon starts on each node before the rest of the cluster stack and stops in the reverse order. This ensures that cluster resources are never active without SBD supervision.

SBD device (disk-based SBD)

A small logical unit (or a small partition on a logical unit) is formatted for use with SBD. A message layout is created on the device with slots for up to 255 nodes.

Messages (disk-based SBD)

The message layout on the SBD device is used to send fencing messages to nodes. The SBD daemon on each node monitors the message slot and immediately complies with any requests. To avoid becoming disconnected from fencing messages, the SBD daemon also fences the node if it loses its connection to the SBD device.

Watchdog

SBD needs a watchdog on each node to ensure that misbehaving nodes are really stopped. SBD feeds the watchdog by regularly writing a service pulse to it. If SBD stops feeding the watchdog, the hardware enforces a system restart. This protects against failures of the SBD process itself, such as becoming stuck on an I/O error.

1.2 Limitations and recommendations

Disk-based SBD
  • The shared storage can be Fibre Channel (FC), Fibre Channel over Ethernet (FCoE), or iSCSI.

  • The shared storage must not use host-based RAID, LVM, Cluster MD, or DRBD.

  • Using storage-based RAID and multipathing is recommended for increased reliability.

  • If a shared storage device has different /dev/sdX names on different nodes, SBD communication will fail. To avoid this, always use stable device names, such as /dev/disk/by-id/DEVICE_ID.

  • An SBD device can be shared between different clusters, up to a limit of 255 nodes.

  • When using more than one SBD device, all devices must have the same configuration.

Diskless SBD
  • Diskless SBD cannot handle a split-brain scenario for a two-node cluster. This configuration should only be used for clusters with more than two nodes, or in combination with QDevice to help handle split-brain scenarios.

1.3 For more information

For more information, see the man page sbd or run the crm sbd help command.

2 Changing the SBD timeout settings

SBD relies on multiple different timeout settings to manage node fencing. When you configure SBD using the CRM Shell, these timeouts are automatically calculated and adjusted. The automatic values are sufficient for most use cases, but if you need to change them, you can use the crm sbd configure command.

Important
Important: Cluster restart required

In this procedure, the script checks whether it is safe to restart the cluster services automatically. If any non-STONITH resources are running, the script warns you to restart the cluster services manually. This allows you to put the cluster into maintenance mode first to avoid resource downtime. However, be aware that the resources will not have cluster protection while in maintenance mode.

Requirements
  • SBD is already configured and running.

Perform this procedure on only one node in the cluster:

  1. Log in either as the root user or as a user with sudo privileges.

  2. Check the current timeout settings:

    > sudo crm sbd configure show
  3. Change one or both of the following timeout values as needed:

    > sudo crm sbd configure \
      watchdog-timeout=INTEGER_IN_SECONDS \1
      msgwait-timeout=INTEGER_IN_SECONDS \2

    If you change one timeout, the other timeout is automatically adjusted so that the msgwait-timeout is twice the watchdog-timeout. You only need to change both timeouts manually if you want the msgwait-timeout to be more than double the watchdog-timeout. If you try to make the msgwait-timeout less than double the watchdog-timeout, the command fails with a warning.

    1

    The watchdog-timeout defines how long the watchdog waits for a response from SBD before fencing the node. Diskless SBD reads this timeout from /etc/sysconfig/sbd, but disk-based SBD reads it from the device metadata, which takes precedence over the settings in /etc/sysconfig/sbd.

    For disk-based SBD on a multipath setup, this timeout must be longer than the max_polling_interval in /etc/multipath.conf, to allow enough time to detect a path failure and switch to the next path.

    2

    Only used for disk-based SBD. After the msgwait-timeout is reached, SBD assumes that a message written to the node's slot on the SBD device was delivered successfully. The timeout must be long enough for the node to detect that it needs to self-fence.

    When you increase this timeout, the script also automatically adjusts the SBD_DELAY_START setting. This helps to avoid a situation where a node reboots too quickly and rejoins the cluster before the fencing action is considered complete, which can cause a split-brain scenario.

    Tip

    You should not need to change the allocate-timeout or the loop-timeout.

    The script automatically adjusts any other related timeouts in the cluster and displays the new values. The script also checks whether it is safe to restart the cluster services automatically. If any non-STONITH resources are running, the script warns you to restart the cluster services manually.

  4. If you need to restart the cluster services manually, follow these steps to avoid resource downtime:

    1. Put the cluster into maintenance mode:

      > sudo crm maintenance on

      In this state, the cluster stops monitoring all resources. This allows the services managed by the resources to keep running while the cluster restarts. However, be aware that the resources will not have cluster protection while in maintenance mode.

    2. Restart the cluster services on all nodes:

      > sudo crm cluster restart --all
    3. Check the status of the cluster:

      > sudo crm status

      The nodes will have the status UNCLEAN (offline), but will soon change to Online.

    4. When the nodes are back online, put the cluster back into normal operation:

      > sudo crm maintenance off
  5. When you change a timeout with crm sbd configure, the global STONITH timeouts are also adjusted automatically. The automatic values are sufficient for most use cases, but if you need to change them, you can use the crm configure property command:

    > sudo crm configure property stonith-timeout=INTEGER_IN_SECONDS1
    > sudo crm configure property stonith-watchdog-timeout=INTEGER_IN_SECONDS2

    This command does not automatically adjust any other timeouts, and these settings might be overwritten if you change the SBD configuration again.

    1

    The stonith-timeout defines how long to wait for the STONITH action to complete.

    2

    Only used for diskless SBD. The stonith-watchdog-timeout defines how long to wait for the watchdog to fence the node.

  6. Confirm that the timeout settings changed:

    > sudo crm sbd configure show

If you need to manually calculate any timeouts, you can use these basic formulas for most use cases:

Disk-based SBD

msgwait-timeout >= (watchdog-timeout * 2)

stonith-timeout >= msgwait-timeout + 20%

Diskless SBD

stonith-watchdog-timeout >= (watchdog-timeout * 2)

stonith-timeout >= stonith-watchdog-timeout + 20%

For more information, run the crm help TimeoutFormulas command.

3 Changing the SBD watchdog device

Use this procedure to change the watchdog device that SBD uses. This can be useful if your system has multiple hardware watchdogs available or if you need to switch from the software watchdog to a hardware watchdog.

Important
Important: Cluster restart required

In this procedure, the script checks whether it is safe to restart the cluster services automatically. If any non-STONITH resources are running, the script warns you to restart the cluster services manually. This allows you to put the cluster into maintenance mode first to avoid resource downtime. However, be aware that the resources will not have cluster protection while in maintenance mode.

Requirements
  • SBD is already configured and running.

  • All nodes have the new watchdog device available.

Perform this procedure on only one node in the cluster:

  1. Log in either as the root user or as a user with sudo privileges.

  2. Check the available watchdog devices:

    > sudo sbd query-watchdog

    The output shows which watchdog is being used by SBD and which other watchdogs are available.

  3. Change the SBD watchdog device, specifying either the device name (for example, /dev/watchdog1) or the driver name (for example, iTCO_wdt):

    > sudo crm sbd configure watchdog-device=WATCHDOG

    The script updates the SBD configuration file with the new watchdog device and checks whether it is safe to restart the cluster services automatically. If any non-STONITH resources are running, the script warns you to restart the cluster services manually.

  4. If you need to restart the cluster services manually, follow these steps to avoid resource downtime:

    1. Put the cluster into maintenance mode:

      > sudo crm maintenance on

      In this state, the cluster stops monitoring all resources. This allows the services managed by the resources to keep running while the cluster restarts. However, be aware that the resources will not have cluster protection while in maintenance mode.

    2. Restart the cluster services on all nodes:

      > sudo crm cluster restart --all
    3. Check the status of the cluster:

      > sudo crm status

      The nodes will have the status UNCLEAN (offline), but will soon change to Online.

    4. When the nodes are back online, put the cluster back into normal operation:

      > sudo crm maintenance off
  5. Check the watchdog devices again:

    > sudo sbd query-watchdog

    The output should now show the new watchdog device being used by SBD.

  6. Check that the correct watchdog is configured on all nodes:

    > sudo crm sbd status

    In the section Watchdog info, the watchdog device and driver should be the same on every node.

4 Changing diskless SBD to disk-based SBD

Use this procedure to change diskless SBD to disk-based SBD.

Important
Important: Cluster restart required

In this procedure, the setup script automatically puts the cluster into maintenance mode and restarts the cluster services. In maintenance mode, the cluster stops monitoring all resources. This allows the services managed by the resources to keep running even during the cluster restart. However, be aware that the resources will not have cluster protection while in maintenance mode.

Warning
Warning: Overwriting existing data

Make sure any device you want to use for SBD does not hold any important data. Configuring a device for use with SBD overwrites the existing data.

Requirements
  • Diskless SBD is configured and running.

  • All nodes can access shared storage.

  • The path to the shared storage device is consistent across all nodes. Use stable device names such as /dev/disk/by-id/DEVICE_ID.

Perform this procedure on only one node in the cluster:

  1. Log in either as the root user or as a user with sudo privileges.

  2. Check the status of SBD:

    > sudo crm sbd status
    # Type of SBD:
    Diskless SBD configured
  3. Configure disk-based SBD. Use --force (or -F) to allow you to reconfigure SBD even when it is already running, and --sbd-device (or -s) to specify the shared storage device:

    > sudo crm --force cluster init sbd --sbd-device /dev/disk/by-id/DEVICE_ID
    Tip

    You can use --sbd-device (or -s) multiple times to configure up to three SBD devices.

    The script initializes SBD on the shared storage device, creates a stonith:fence_sbd cluster resource, and updates the SBD configuration file and timeout settings. The script also puts the cluster into maintenance mode, restarts the cluster services, then puts the cluster back into normal operation.

  4. Check the status of the cluster:

    > sudo crm status

    The nodes should be Online and the resources Started.

  5. Check the status of SBD:

    > sudo crm sbd status
    # Type of SBD:
    Disk-based SBD configured

5 Changing disk-based SBD to diskless SBD

Use this procedure to change disk-based SBD to diskless SBD.

Diskless SBD cannot handle a split-brain scenario for a two-node cluster. This configuration should only be used for clusters with more than two nodes, or in combination with QDevice to help handle split-brain scenarios.

Important
Important: Cluster restart required

In this procedure, the setup script automatically puts the cluster into maintenance mode and restarts the cluster services. In maintenance mode, the cluster stops monitoring all resources. This allows the services managed by the resources to keep running even during the cluster restart. However, be aware that the resources will not have cluster protection while in maintenance mode.

Requirements
  • Disk-based SBD is configured and running.

Perform this procedure on only one node in the cluster:

  1. Log in either as the root user or as a user with sudo privileges.

  2. Check the status of SBD:

    > sudo crm sbd status
    # Type of SBD:
    Disk-based SBD configured
  3. Configure diskless SBD. Use --force (or -F) to allow you to reconfigure SBD even when it is already running, and --enable-sbd (or -S) to specify that no device is needed:

    > sudo crm --force cluster init sbd --enable-sbd

    The script stops and removes the stonith:fence_sbd cluster resource, then updates the SBD configuration file and timeout settings. The script also puts the cluster into maintenance mode, restarts the cluster services, then puts the cluster back into normal operation.

  4. Check the status of the cluster:

    > sudo crm status

    The nodes should be Online and the resources Started.

  5. Check the status of SBD:

    > sudo crm sbd status
    # Type of SBD:
    Diskless SBD configured

6 Adding another SBD device

Use this procedure to add more SBD devices to a cluster that already has disk-based SBD configured. The cluster can have up to three SBD devices.

Important
Important: Cluster restart required

In this procedure, the script checks whether it is safe to restart the cluster services automatically. If any non-STONITH resources are running, the script warns you to restart the cluster services manually. This allows you to put the cluster into maintenance mode first to avoid resource downtime. However, be aware that the resources will not have cluster protection while in maintenance mode.

Requirements
  • Disk-based SBD is already configured and running with at least one device.

  • An additional shared storage device is accessible from all cluster nodes.

Perform this procedure on only one node in the cluster:

  1. Log in either as the root user or as a user with sudo privileges.

  2. Check which device or devices are already configured for use with SBD:

    > sudo crm sbd configure show sysconfig

    The output shows one or more device IDs in the SBD_DEVICE line.

  3. Add a new device to the existing SBD configuration:

    > sudo crm sbd device add /dev/disk/by-id/DEVICE_ID

    The script initializes SBD on the new device, updates the SBD configuration file, and checks whether it is safe to restart the cluster services automatically. If any non-STONITH resources are running, the script warns you to restart the cluster services manually.

  4. If you need to restart the cluster services manually, follow these steps to avoid resource downtime:

    1. Put the cluster into maintenance mode:

      > sudo crm maintenance on

      In this state, the cluster stops monitoring all resources. This allows the services managed by the resources to keep running while the cluster restarts. However, be aware that the resources will not have cluster protection while in maintenance mode.

    2. Restart the cluster services on all nodes:

      > sudo crm cluster restart --all
    3. Check the status of the cluster:

      > sudo crm status

      The nodes will have the status UNCLEAN (offline), but will soon change to Online.

    4. When the nodes are back online, put the cluster back into normal operation:

      > sudo crm maintenance off
  5. Check the SBD configuration again:

    > sudo crm sbd configure show sysconfig

    The output should now show more devices.

  6. Check the status of SBD to make sure all the nodes can see the new device:

    > sudo crm sbd status

7 Replacing an existing SBD device with a new device

If you need to replace an SBD device, you can use crm sbd device add to add the new device and crm sbd device remove to remove the old device. If the cluster has two SBD devices, you can run these commands in any order. However, if the cluster has one or three SBD devices, you must run these commands in a specific order:

  • One device: crm sbd device remove cannot remove the only device, so you must add the new device before you can remove the old device.

  • Three devices: crm sbd device add cannot add a fourth device, so you must remove the old device before you can add the new device.

Important
Important: Cluster restart required

In this procedure, the cluster services must be restarted twice: once after adding the new device and once after removing the old device. We recommend putting the cluster into maintenance mode first to avoid resource downtime. However, be aware that the resources will not have cluster protection while in maintenance mode.

Requirements
  • Disk-based SBD is already configured and running with at least one device.

  • An additional shared storage device is accessible from all cluster nodes.

Perform this procedure on only one node in the cluster:

  1. Log in either as the root user or as a user with sudo privileges.

  2. Put the cluster into maintenance mode:

    > sudo crm maintenance on

    In this state, the cluster stops monitoring all resources. This allows the services managed by the resources to keep running while the cluster restarts. However, be aware that the resources will not have cluster protection while in maintenance mode.

  3. Check how many devices are already configured for use with SBD:

    > sudo crm sbd configure show sysconfig

    The output shows one or more device IDs in the SBD_DEVICE line. The number of devices determines the order of the next steps.

  4. Add or remove a device, depending on the number of devices shown in Step 3:

    One device:

    Add the new device to the existing SBD configuration:

    > sudo crm sbd device add /dev/disk/by-id/DEVICE_ID

    The script restarts the cluster services automatically.

    Two or three devices:

    Remove the old device from the SBD configuration:

    > sudo crm sbd device remove /dev/disk/by-id/DEVICE_ID

    The script warns you to restart the cluster services manually.

  5. If you need to restart the cluster services manually, run the following command:

    > sudo crm cluster restart --all

    Check the status of the cluster:

    > sudo crm status

    The nodes will have the status UNCLEAN (offline), but will soon change to Online.

  6. Add or remove a device, depending on the number of devices shown in Step 3:

    One device:

    Remove the old device from the SBD configuration:

    > sudo crm sbd device remove /dev/disk/by-id/DEVICE_ID

    The script warns you to restart the cluster services manually.

    Two or three devices:

    Add the new device to the existing SBD configuration:

    > sudo crm sbd device add /dev/disk/by-id/DEVICE_ID

    The script restarts the cluster services automatically.

  7. If you need to restart the cluster services manually, run the following command:

    > sudo crm cluster restart --all

    Check the status of the cluster:

    > sudo crm status

    The nodes will have the status UNCLEAN (offline), but will soon change to Online.

  8. When the nodes are back online, put the cluster back into normal operation:

    > sudo crm maintenance off
  9. Check the SBD configuration again:

    > sudo crm sbd configure show sysconfig

    The output should now show the new device in the SBD_DEVICE line.

  10. Check the status of SBD to make sure the correct device is listed:

    > sudo crm sbd status

8 Removing an SBD device

Use this procedure to remove an SBD device from a cluster with multiple SBD devices configured. You cannot use this method if there is only one SBD device configured.

Important
Important: Cluster restart required

In this procedure, the script checks whether it is safe to restart the cluster services automatically. If any non-STONITH resources are running, the script warns you to restart the cluster services manually. This allows you to put the cluster into maintenance mode first to avoid resource downtime. However, be aware that the resources will not have cluster protection while in maintenance mode.

Requirements
  • Disk-based SBD is configured with more than one device.

Perform this procedure on only one node in the cluster:

  1. Log in either as the root user or as a user with sudo privileges.

  2. Check which devices are already configured for use with SBD:

    > sudo crm sbd configure show sysconfig

    The output shows multiple device IDs in the SBD_DEVICE line.

  3. Remove a device from the SBD configuration:

    > sudo crm sbd device remove /dev/disk/by-id/DEVICE_ID

    The script removes the device, updates the SBD configuration file, and checks whether it is safe to restart the cluster services automatically. If any non-STONITH resources are running, the script warns you to restart the cluster services manually.

  4. If you need to restart the cluster services manually, follow these steps to avoid resource downtime:

    1. Put the cluster into maintenance mode:

      > sudo crm maintenance on

      In this state, the cluster stops monitoring all resources. This allows the services managed by the resources to keep running while the cluster restarts. However, be aware that the resources will not have cluster protection while in maintenance mode.

    2. Restart the cluster services on all nodes:

      > sudo crm cluster restart --all
    3. Check the status of the cluster:

      > sudo crm status

      The nodes will have the status UNCLEAN (offline), but will soon change to Online.

    4. When the nodes are back online, put the cluster back into normal operation:

      > sudo crm maintenance off
  5. Check the SBD configuration again:

    > sudo crm sbd configure show sysconfig

    The output should now show fewer devices.

  6. Check the status of SBD to make sure the device was removed from all the nodes:

    > sudo crm sbd status

9 Removing all SBD configuration

Use this procedure to remove all SBD-related configuration from the cluster. You might need to do this if you want to switch from SBD to a physical STONITH device. Keep in mind that to be supported, all SUSE Linux Enterprise High Availability clusters must have either SBD or a physical STONITH device configured.

Important
Important: Cluster restart required

In this procedure, the script checks whether it is safe to restart the cluster services automatically. If any non-STONITH resources are running, the script warns you to restart the cluster services manually. This allows you to put the cluster into maintenance mode first to avoid resource downtime. However, be aware that the resources will not have cluster protection while in maintenance mode.

Perform this procedure on only one node in the cluster:

  1. Log in either as the root user or as a user with sudo privileges.

  2. Remove the SBD configuration from the cluster:

    > sudo crm sbd purge

    The script stops the SBD service on all nodes, moves the SBD configuration file to a backup file, and adjusts any SBD-related cluster properties. The script also checks whether it is safe to restart the cluster services automatically. If any non-STONITH resources are running, the script warns you to restart the cluster services manually.

  3. If you need to restart the cluster services manually, follow these steps to avoid resource downtime:

    1. Put the cluster into maintenance mode:

      > sudo crm maintenance on

      In this state, the cluster stops monitoring all resources. This allows the services managed by the resources to keep running while the cluster restarts. However, be aware that the resources will not have cluster protection while in maintenance mode.

    2. Restart the cluster services on all nodes:

      > sudo crm cluster restart --all
    3. Check the status of the cluster:

      > sudo crm status

      The nodes will have the status UNCLEAN (offline), but will soon change to Online.

    4. When the nodes are back online, put the cluster back into normal operation:

      > sudo crm maintenance off
  4. Confirm that the SBD configuration is gone:

    > sudo crm sbd status
    ERROR: SBD configuration file /etc/sysconfig/sbd not found
  5. Check the cluster configuration:

    > sudo crm configure show

    The output should show stonith-enabled=false and no other SBD-related properties.

HA glossary

active/active, active/passive

How resources run on the nodes. Active/passive means that resources only run on the active node, but can move to the passive node if the active node fails. Active/active means that all nodes are active at once, and resources can run on (and move to) any node in the cluster.

arbitrator

An arbitrator is a machine running outside the cluster to provide an additional instance for cluster calculations. For example, QNetd provides a vote to help QDevice participate in quorum decisions.

CIB (cluster information base)

An XML representation of the whole cluster configuration and status (cluster options, nodes, resources, constraints and the relationships to each other). The CIB manager (pacemaker-based) keeps the CIB synchronized across the cluster and handles requests to modify it.

clone

A clone is an identical copy of an existing node, used to make deploying multiple nodes simpler.

In the context of a cluster resource, a clone is a resource that can be active on multiple nodes. Any resource can be cloned if its resource agent supports it.

cluster

A high-availability cluster is a group of servers (physical or virtual) designed primarily to secure the highest possible availability of data, applications and services. Not to be confused with a high-performance cluster, which shares the application load to achieve faster results.

Cluster logical volume manager (Cluster LVM)

The term Cluster LVM indicates that LVM is being used in a cluster environment. This requires configuration adjustments to protect the LVM metadata on shared storage.

cluster partition

A cluster partition occurs when communication fails between one or more nodes and the rest of the cluster. The nodes are split into partitions but are still active. They can only communicate with nodes in the same partition and are unaware of the separated nodes. This is known as a split brain scenario.

cluster stack

The ensemble of software technologies and components that make up a cluster.

colocation constraint

A type of resource constraint that specifies which resources can or cannot run together on a node.

concurrency violation

A resource that should be running on only one node in the cluster is running on several nodes.

Corosync

Corosync provides reliable messaging, membership and quorum information about the cluster. This is handled by the Corosync Cluster Engine, a group communication system.

CRM (cluster resource manager)

The management entity responsible for coordinating all non-local interactions in a High Availability cluster. SUSE Linux Enterprise High Availability uses Pacemaker as the CRM. It interacts with several components: local executors on its own node and on the other nodes, non-local CRMs, administrative commands, the fencing functionality, and the membership layer.

crmsh (CRM Shell)

The command-line utility crmsh manages the cluster, nodes and resources.

Csync2

A synchronization tool for replicating configuration files across all nodes in the cluster.

DC (designated coordinator)

The pacemaker-controld daemon is the cluster controller, which coordinates all actions. This daemon has an instance on each cluster node, but only one instance is elected to act as the DC. The DC is elected when the cluster services start, or if the current DC fails or leaves the cluster. The DC decides whether a cluster-wide change must be performed, such as fencing a node or moving resources.

disaster

An unexpected interruption of critical infrastructure caused by nature, humans, hardware failure, or software bugs.

disaster recovery

The process by which a function is restored to the normal, steady state after a disaster.

Disaster Recovery Plan

A strategy to recover from a disaster with the minimum impact on IT infrastructure.

DLM (Distributed Lock Manager)

DLM coordinates accesses to shared resources in a cluster, for example, managing file locking in clustered file systems to increase performance and availability.

DRBD

DRBD® is a block device designed for building High Availability clusters. It replicates data on a primary device to secondary devices in a way that ensures all copies of the data remain identical.

existing cluster

The term existing cluster is used to refer to any cluster that consists of at least one node. An existing cluster has a basic Corosync configuration that defines the communication channels, but does not necessarily have resource configuration yet.

failover

Occurs when a resource or node fails on one machine and the affected resources move to another node.

failover domain

A named subset of cluster nodes that are eligible to run a resource if a node fails.

fencing

Prevents access to a shared resource by isolated or failing cluster members. There are two classes of fencing: resource-level fencing and node-level fencing. Resource-level fencing ensures exclusive access to a resource. Node-level fencing prevents a failed node from accessing shared resources and prevents resources from running on a node with an uncertain status. This is usually done by resetting or powering off the node.

GFS2

Global File System 2 (GFS2) is a shared disk file system for Linux computer clusters. GFS2 allows all nodes to have direct concurrent access to the same shared block storage. GFS2 has no disconnected operating mode, and no client or server roles. All nodes in a GFS2 cluster function as peers. GFS2 supports up to 32 cluster nodes. Using GFS2 in a cluster requires hardware to allow access to the shared storage, and a lock manager to control access to the storage.

group

Resource groups contain multiple resources that need to be located together, started sequentially and stopped in the reverse order.

Hawk (HA Web Konsole)

A user-friendly Web-based interface for monitoring and administering a High Availability cluster from Linux or non-Linux machines. Hawk can be accessed from any machine that can connect to the cluster nodes, using a graphical Web browser.

heuristics

QDevice supports using a set of commands (heuristics) that run locally on start-up of cluster services, cluster membership change, successful connection to the QNetd server, or optionally at regular times. The result is used in calculations to determine which partition should have quorum.

knet (kronosnet)

A network abstraction layer supporting redundancy, security, fault tolerance, and fast fail-over of network links. In SUSE Linux Enterprise High Availability 16, knet is the default transport protocol for the Corosync communication channels.

local cluster

A single cluster in one location (for example, all nodes are located in one data center). Network latency is minimal. Storage is typically accessed synchronously by all nodes.

local executor

The local executor is located between Pacemaker and the resources on each node. Through the pacemaker-execd daemon, Pacemaker can start, stop and monitor resources.

location

In the context of a whole cluster, location can refer to the physical location of nodes (for example, all nodes might be located in the same data center). In the context of a location constraint, location refers to the nodes on which a resource can or cannot run.

location constraint

A type of resource constraint that defines the nodes on which a resource can or cannot run.

meta attributes (resource options)

Parameters that tell the CRM (cluster resource manager) how to treat a specific resource. For example, you might define a resource's priority or target role.

metro cluster

A single cluster that can stretch over multiple buildings or data centers, with all sites connected by Fibre Channel. Network latency is usually low. Storage is frequently replicated using mirroring or synchronous replication.

network device bonding

Network device bonding combines two or more network interfaces into a single bonded device to increase bandwidth and/or provide redundancy. When using Corosync, the bonded device is not managed by the cluster software. Therefore, the bonded device must be configured on every cluster node that might need to access it.

node

Any server (physical or virtual) that is a member of a cluster.

order constraint

A type of resource constraint that defines the sequence of actions.

Pacemaker

Pacemaker is the CRM (cluster resource manager) in SUSE Linux Enterprise High Availability, or the brain that reacts to events occurring in the cluster. Events might be nodes that join or leave the cluster, failure of resources, or scheduled activities such as maintenance, for example. The pacemakerd daemon launches and monitors all other related daemons.

parameters (instance attributes)

Parameters determine which instance of a service the resource controls.

primitive

A primitive resource is the most basic type of cluster resource.

promotable clone

Promotable clones are a special type of clone resource that can be promoted. Active instances of these resources are divided into two states: promoted and unpromoted (also known as active and passive or primary and secondary).

QDevice

QDevice and QNetd participate in quorum decisions. The corosync-qdevice daemon runs on each cluster node and communicates with QNetd to provide a configurable number of votes, allowing a cluster to sustain more node failures than the standard quorum rules allow.

QNetd

QNetd is an arbitrator that runs outside the cluster. The corosync-qnetd daemon provides a vote to the corosync-qdevice daemon on each node to help it participate in quorum decisions.

quorum

A cluster partition is defined to have quorum (be quorate) if it has the majority of nodes (or votes). Quorum distinguishes exactly one partition. This is part of the algorithm to prevent several disconnected partitions or nodes (split brain) from proceeding and causing data and service corruption. Quorum is a prerequisite for fencing, which then ensures that quorum is unique.

RA (resource agent)

A script acting as a proxy to manage a resource (for example, to start, stop or monitor a resource). SUSE Linux Enterprise High Availability supports different kinds of resource agents.

ReaR (Relax and Recover)

An administrator tool set for creating disaster recovery images.

resource

Any type of service or application that is known to Pacemaker, for example, an IP address, a file system, or a database. The term resource is also used for DRBD, where it names a set of block devices that use a common connection for replication.

resource constraint

Resource constraints specify which cluster nodes resources can run on, what order resources load in, and what other resources a specific resource is dependent on.

See also colocation constraint, location constraint and order constraint.

resource set

As an alternative format for defining location, colocation or order constraints, you can use resource sets, where primitives are grouped together in one set. When creating a constraint, you can specify multiple resources for the constraint to apply to.

resource template

To help create many resources with similar configurations, you can define a resource template. After being defined, it can be referenced in primitives or in certain types of constraints. If a template is referenced in a primitive, the primitive inherits all operations, instance attributes (parameters), meta attributes and utilization attributes defined in the template.

SBD (STONITH Block Device)

SBD provides a node fencing mechanism through the exchange of messages via shared block storage. Alternatively, it can be used in diskless mode. In either case, it needs a hardware or software watchdog on each node to ensure that misbehaving nodes are really stopped.

scheduler

The scheduler is implemented as pacemaker-schedulerd. When a cluster transition is needed, pacemaker-schedulerd calculates the expected next state of the cluster and determines what actions need to be scheduled to achieve the next state.

split brain

A scenario in which the cluster nodes are divided into two or more groups that do not know about each other (either through a software or hardware failure). STONITH prevents a split-brain scenario from badly affecting the entire cluster. Also known as a partitioned cluster scenario.

The term split brain is also used in DRBD but means that the nodes contain different data.

SPOF (single point of failure)

Any component of a cluster that, if it fails, triggers the failure of the entire cluster.

STONITH

An acronym for shoot the other node in the head. It refers to the fencing mechanism that shuts down a misbehaving node to prevent it from causing trouble in a cluster. In a Pacemaker cluster, STONITH is managed by the fencing subsystem pacemaker-fenced.

switchover

The planned moving of resources to other nodes in a cluster. See also failover.

utilization

Tells the CRM what capacity a certain resource requires from a node.

watchdog

SBD (STONITH Block Device) needs a watchdog on each node to ensure that misbehaving nodes are really stopped. SBD feeds the watchdog by regularly writing a service pulse to it. If SBD stops feeding the watchdog, the hardware enforces a system restart. This protects against failures of the SBD process itself, such as becoming stuck on an I/O error.