This is a draft document that was built and uploaded automatically. It may document beta software and be incomplete or even incorrect. Use this document at your own risk.

Jump to content
Configuring Node Fencing in a High Availability Cluster
SUSE Linux Enterprise High Availability 16.0

Configuring Node Fencing in a High Availability Cluster

Publication Date: 31 Oct 2025
WHAT?

How to configure a SUSE Linux Enterprise High Availability cluster to use a physical node fencing device.

WHY?

To be supported, all SUSE Linux Enterprise High Availability clusters must have node fencing configured.

EFFORT

Adding physical fencing devices takes approximately 5-10 minutes, depending on the complexity of the cluster.

GOAL

Protect the cluster from data corruption by fencing failed nodes.

REQUIREMENTS
  • An existing SUSE Linux Enterprise High Availability cluster

  • A physical fencing device, such as a power switch or network switch

To use SBD as the node fencing mechanism instead of a physical device, see Configuring Disk-Based SBD in an Existing High Availability Cluster.

1 What is node fencing?

In a split-brain scenario, cluster nodes are divided into two or more groups (or partitions) that do not know about each other. This might be because of a hardware or software failure, or a failed network connection, for example. A split-brain scenario can be resolved by fencing (resetting or powering off) one or more of the nodes. Node fencing prevents a failed node from accessing shared resources and prevents cluster resources from running on a node with an uncertain status. This helps protect the cluster from data corruption.

To be supported, all SUSE Linux Enterprise High Availability clusters must have at least one node fencing device configured. For critical workloads, we recommend using two or three fencing devices. A fencing device can be either a physical device (a power switch) or a software mechanism (SBD in combination with a watchdog).

1.1 Components

pacemaker-fenced

The pacemaker-fenced daemon runs on every node in the High Availability cluster. It accepts fencing requests from pacemaker-controld. It can also check the status of the fencing device.

Fence agent

Each type of fencing device can be controlled by a specific fence agent, a stonith-class resource agent that acts as an interface between the cluster and the fencing device. Starting or stopping a fencing resource means registering or deregistering the fencing device with the pacemaker-fenced daemon and does not perform any operation on the device itself. Monitoring a fencing resource means logging in to the device to verify that it works.

Fencing device

The fencing device is the actual physical device that resets or powers off a node when requested by the cluster via the fence agent. The device you use depends on your budget and hardware.

1.2 Fencing devices

Physical devices
  • Power Distribution Units (PDU) are devices with multiple power outlets that can provide remote load monitoring and power recycling.

  • Uninterruptible Power Supplies (UPS) provide emergency power to connected equipment in the event of a power failure.

  • Blade power control devices can be used for fencing if the cluster nodes are running on a set of blades. This device must be capable of managing single-blade computers.

  • Lights-out devices are network-connected devices that allow remote management and monitoring of servers.

Software mechanisms
  • Disk-based SBD fences nodes by exchanging messages via shared block storage. It works together with a watchdog on each node to ensure that misbehaving nodes are really stopped.

  • Diskless SBD fences nodes by using only the watchdog, without a shared storage device. Unlike other node fencing mechanisms, diskless SBD does not need a fence agent.

  • The fence_kdump agent checks if a node is performing a kernel dump (kdump). If a kdump is in progress, the cluster acts as if the node was fenced, because the node will reboot after the kdump is complete. If a kdump is not in progress, the next fencing device fences the node. This fence agent must be used together with a physical fencing device. It cannot be used with SBD.

1.3 For more information

For more information, see https://clusterlabs.org/projects/pacemaker/doc/3.0/Pacemaker_Explained/html/fencing.html.

For a full list of available fence agents, run the crm ra list stonith command.

For details about a specific fence agent, run the crm ra info stonith:fence_AGENT command.

2 Creating fencing resources for a physical device

Each type of fencing device can be controlled by a specific fence agent, a stonith-class resource agent that acts as an interface between the cluster and the fencing device. Starting or stopping a fencing resource means registering or deregistering the fencing device with the pacemaker-fenced daemon and does not perform any operation on the device itself. Monitoring a fencing resource means logging in to the device to verify that it works.

When a node needs to be fenced, the fencing action is usually performed by a different node in the cluster. Therefore, in this procedure you will create multiple fencing resources, each targeting a specific node. Each fencing resource can run on any node in the cluster except for the node it targets.

Requirements
  • An existing High Availability cluster is already running.

  • All cluster nodes can access a physical fencing device.

Perform this procedure on only one node in the cluster:

  1. Log in either as the root user or as a user with sudo privileges.

  2. Show the list of available fence agents:

    > sudo crm ra list stonith
  3. Show the list of required and optional parameters for your device, and make a note of the parameters you need for your specific setup:

    > sudo crm ra info stonith:fence_AGENT
  4. Start the crm interactive shell:

    > sudo crm configure

    This mode lets you make multiple configuration changes before committing all the changes at once.

  5. Create a fencing resource for every node in the cluster. Specify your device type, the parameters for that device type, and a monitor operation:

    crm(live)configure# primitive RESOURCE-NAME stonith:fence_AGENT \
      params KEY=VALUE KEY=VALUE KEY=VALUE [...] \
      op monitor interval=INTEGER timeout=INTEGER
    Example 1: Fencing resources for two nodes with an IBM RSA device

    This example shows a basic resource configuration for an IBM RSA lights-out device on two nodes, alice and bob:

    crm(live)configure# primitive fence-rsa-alice stonith:fence_rsa \
      params pcmk_host_list=alice \1
      ip=192.168.1.101 username=root password=secret \2
      op monitor interval=30m timeout=120s3
    crm(live)configure# primitive fence-rsa-bob stonith:external/fence_rsa \
      params pcmk_host_list=bob \
      ip=192.168.1.102 username=root password=secret \
      op monitor interval=30m timeout=120s

    1

    Use pcmk_host_list to specify the node for this resource to target. In this example, the resource fence-rsa-alice fences the node alice.

    2

    Provide login details for the fencing device. The required parameters depend on the specific device.

    If you use the password parameter, the password is obscured in the output of crm configure show, but is stored as plain text in the CIB and the command history. Alternatively, you can use a different parameter, such as identity_file.

    3

    Include a monitor operation to check the status of the device. Ideally, fencing devices are not needed very often and are unlikely to fail during a fencing operation. Therefore, a monitoring interval of 30 minutes or more should be sufficient for most devices.

  6. Add location constraints so that each fencing resource cannot run on the node it targets:

    crm(live)configure# location CONSTRAINT-NAME RESOURCE-NAME -inf: NODE-NAME
    Example 2: Location constraints for IBM RSA resources on two nodes

    This example shows location constraints for two nodes, alice and bob:

    crm(live)configure# location loc-rsa-alice fence-rsa-alice -inf: alice
    crm(live)configure# location loc-rsa-bob fence-rsa-bob -inf: bob

    The resource fence-rsa-alice must not run on alice, and the resource fence-rsa-bob must not run on bob. In a two-node cluster, this means fence-rsa-alice always runs on bob. In a cluster with more nodes, this means fence-rsa-alice can run on any node except alice.

  7. Enable node fencing for the whole cluster:

    crm(live)configure# property stonith-enabled=true
  8. Add a fencing timeout to define how long to wait for the fencing action to finish:

    crm(live)configure# property stonith-timeout=60

    The default is 60 seconds, but you might need to change it for your specific setup and infrastructure.

  9. Review the updated cluster configuration:

    crm(live)configure# show
  10. Commit the changes:

    crm(live)configure# commit
  11. Exit the crm interactive shell:

    crm(live)configure# quit
  12. Check the status of the cluster to make sure the fencing resources can start:

    > sudo crm status

If the fencing resources have the status Stopped, the nodes might have failed to connect to the fencing device. You can check the connection with the command line tool for your specific fence agent. For more information, run the man fence_AGENT command.

Example 3: Testing a node's connection to an IBM RSA device

This command uses the example details from the previous procedure to check the status of node bob. Adjust this command for your specific configuration and device.

alice> sudo fence_rsa -a 192.168.1.102 -l root -p secret -n bob -o status

If the connection is successful, the output shows Status: ON. If the connection is not successful, the output shows an error message that explains the issue.

3 Preventing node fencing during a kernel dump

Use this procedure if the nodes have kdump configured. If not, you can skip this procedure.

The fence_kdump agent checks if a node is performing a kernel dump (kdump). If a kdump is in progress, the cluster acts as if the node was fenced, because the node will reboot after the kdump is complete. If a kdump is not in progress, the next fencing device fences the node. This fence agent must be used together with a physical fencing device. It cannot be used with SBD.

Requirements
  • The cluster uses a physical node fencing device.

  • Cluster resources for the fencing device are already configured.

  • kdump is installed and configured on all nodes.

Perform this procedure on only one node in the cluster:

  1. Log in either as the root user or as a user with sudo privileges.

  2. Create a fence_kdump resource that can check all the nodes in the cluster. For example:

    > sudo crm configure primitive RESOURCE-NAME stonith:fence_kdump \
      params pcmk_host_list="NODE-LIST" timeout=INTEGER

    The resource is registered with the pacemaker-fenced daemon on all the specified nodes. You do not need to clone this resource.

    For more information, run the crm ra info stonith:fence_kdump command.

    Example 4: fence_kdump resource for two nodes

    This example shows a basic resource configuration for two nodes, alice and bob:

    > sudo crm configure primitive check-kdump stonith:fence_kdump \
      params pcmk_host_list="alice,bob"1 timeout=602

    1

    A comma-separated list of the cluster nodes. When a node needs to be fenced, this resource listens for a message from fence_kdump_send on that node. If a message is received, the node is considered fenced. If no message is received, the physical fencing device must fence the node.

    2

    How long to wait for a message from a node. The default is 60 seconds.

  3. Check that the fence_kdump resource appears on all nodes:

    > sudo crm cluster run "sudo stonith_admin -L"

    You should see output similar to this:

    INFO: [alice]
    check-kdump
    fence-rsa-bob
    2 fence devices found
    
    INFO: [bob]
    check-kdump
    fence-rsa-alice
    2 fence devices found
  4. Specify the order of the fencing devices. This tells the cluster to check if a kdump is in progress before deciding whether to call the physical fencing device. Include all the nodes in one command:

    > sudo crm configure fencing_topology \
      NODE-NAME: KDUMP-RESOURCE FENCING-RESOURCE \
      NODE-NAME: KDUMP-RESOURCE FENCING-RESOURCE \
      [...]

    For more information, run the crm configure help fencing_topology command.

    Example 5: Fencing topology for two nodes

    This example shows the order of the fencing devices for two nodes, alice and bob:

    > sudo crm configure fencing_topology \
      alice: check-kdump fence-rsa-alice \
      bob: check-kdump fence-rsa-bob

    Both nodes have kdump and a physical IBM RSA device configured. If alice needs to be fenced, the cluster first calls the resource check-kdump to check whether alice is performing a kdump. If not, the cluster calls the resource fence-rsa-alice to fence alice.

  5. You might need to increase the fencing timeout so the fencing action has time to finish:

    > sudo crm configure property stonith-timeout=INTEGER

    The appropriate value depends on your specific setup and infrastructure.

  6. Open the firewall port for kdump messages on all nodes:

    > sudo crm cluster run "sudo firewall-cmd --add-port=7410/udp --permanent"
    > sudo crm cluster run "sudo firewall-cmd --reload"
  7. Configure fence_kdump_send to send a message to all nodes when the kdump process is finished. In the file /etc/sysconfig/kdump, edit the KDUMP_POSTSCRIPT line:

    KDUMP_POSTSCRIPT="/usr/lib/fence_kdump_send -c 51 -i 102 -p 74103 NODE-LIST"4

    1

    Use --count (or -c) to specify how many messages to send. We recommend sending multiple messages in case the first message fails.

    2

    Use --interval (or -i) to specify the interval between messages. The default is 10 seconds.

    3

    Use --port (or -p) to specify the firewall port for kdump messages.

    4

    Replace NODE-LIST with a space-separated list of all the cluster nodes.

  8. Copy the kdump/etc/sysconfig/kdump file to the rest of the nodes:

    > sudo crm cluster copy /etc/sysconfig/kdump
  9. Regenerate the kdump initrd on all nodes:

    > sudo crm cluster run "sudo mkdumprd"

4 Testing node fencing

The crm cluster crash_test command simulates cluster failures and reports the results. To test node fencing, you can run one or both of the tests --fence-node and --split-brain-iptables.

The command supports the following checks:

--fence-node NODE

Fences a specific node passed from the command line.

--kill-sbd/--kill-corosync/ --kill-pacemakerd

Kills the daemons for SBD, Corosync, or Pacemaker. After running one of these tests, you can find a report in the directory /var/lib/crmsh/crash_test/. The report includes a test case description, action logging, and an explanation of possible results.

--split-brain-iptables

Simulates a split-brain scenario by blocking the Corosync port, and checks whether one node can be fenced as expected. You must install iptables before you can run this test.

For more information, run the crm cluster crash_test --help command.

This example uses nodes called alice and bob, and tests fencing bob. To watch bob change status during the test, you can log in to Hawk and navigate to Status › Nodes, or run crm status from another node.

Example 6: Manually triggering node fencing
admin@alice> sudo crm cluster crash_test --fence-node bob

==============================================
Testcase:          Fence node bob
Fence action:      reboot
Fence timeout:     95

!!! WARNING WARNING WARNING !!!
THIS CASE MAY LEAD TO NODE BE FENCED.
TYPE Yes TO CONTINUE, OTHER INPUTS WILL CANCEL THIS CASE [Yes/No](No): Yes
INFO: Trying to fence node "bob"
INFO: Waiting 95s for node "bob" reboot...
INFO: Node "bob" will be fenced by "alice"!
INFO: Node "bob" was fenced by "alice" at DATE TIME

HA glossary

active/active, active/passive

How resources run on the nodes. Active/passive means that resources only run on the active node, but can move to the passive node if the active node fails. Active/active means that all nodes are active at once, and resources can run on (and move to) any node in the cluster.

arbitrator

An arbitrator is a machine running outside the cluster to provide an additional instance for cluster calculations. For example, QNetd provides a vote to help QDevice participate in quorum decisions.

CIB (cluster information base)

An XML representation of the whole cluster configuration and status (cluster options, nodes, resources, constraints and the relationships to each other). The CIB manager (pacemaker-based) keeps the CIB synchronized across the cluster and handles requests to modify it.

clone

A clone is an identical copy of an existing node, used to make deploying multiple nodes simpler.

In the context of a cluster resource, a clone is a resource that can be active on multiple nodes. Any resource can be cloned if its resource agent supports it.

cluster

A high-availability cluster is a group of servers (physical or virtual) designed primarily to secure the highest possible availability of data, applications and services. Not to be confused with a high-performance cluster, which shares the application load to achieve faster results.

Cluster logical volume manager (Cluster LVM)

The term Cluster LVM indicates that LVM is being used in a cluster environment. This requires configuration adjustments to protect the LVM metadata on shared storage.

cluster partition

A cluster partition occurs when communication fails between one or more nodes and the rest of the cluster. The nodes are split into partitions but are still active. They can only communicate with nodes in the same partition and are unaware of the separated nodes. This is known as a split brain scenario.

cluster stack

The ensemble of software technologies and components that make up a cluster.

colocation constraint

A type of resource constraint that specifies which resources can or cannot run together on a node.

concurrency violation

A resource that should be running on only one node in the cluster is running on several nodes.

Corosync

Corosync provides reliable messaging, membership and quorum information about the cluster. This is handled by the Corosync Cluster Engine, a group communication system.

CRM (cluster resource manager)

The management entity responsible for coordinating all non-local interactions in a High Availability cluster. SUSE Linux Enterprise High Availability uses Pacemaker as the CRM. It interacts with several components: local executors on its own node and on the other nodes, non-local CRMs, administrative commands, the fencing functionality, and the membership layer.

crmsh (CRM Shell)

The command-line utility crmsh manages the cluster, nodes and resources.

Csync2

A synchronization tool for replicating configuration files across all nodes in the cluster.

DC (designated coordinator)

The pacemaker-controld daemon is the cluster controller, which coordinates all actions. This daemon has an instance on each cluster node, but only one instance is elected to act as the DC. The DC is elected when the cluster services start, or if the current DC fails or leaves the cluster. The DC decides whether a cluster-wide change must be performed, such as fencing a node or moving resources.

disaster

An unexpected interruption of critical infrastructure caused by nature, humans, hardware failure, or software bugs.

disaster recovery

The process by which a function is restored to the normal, steady state after a disaster.

Disaster Recovery Plan

A strategy to recover from a disaster with the minimum impact on IT infrastructure.

DLM (Distributed Lock Manager)

DLM coordinates accesses to shared resources in a cluster, for example, managing file locking in clustered file systems to increase performance and availability.

DRBD

DRBD® is a block device designed for building High Availability clusters. It replicates data on a primary device to secondary devices in a way that ensures all copies of the data remain identical.

existing cluster

The term existing cluster is used to refer to any cluster that consists of at least one node. An existing cluster has a basic Corosync configuration that defines the communication channels, but does not necessarily have resource configuration yet.

failover

Occurs when a resource or node fails on one machine and the affected resources move to another node.

failover domain

A named subset of cluster nodes that are eligible to run a resource if a node fails.

fencing

Prevents access to a shared resource by isolated or failing cluster members. There are two classes of fencing: resource-level fencing and node-level fencing. Resource-level fencing ensures exclusive access to a resource. Node-level fencing prevents a failed node from accessing shared resources and prevents resources from running on a node with an uncertain status. This is usually done by resetting or powering off the node.

GFS2

Global File System 2 (GFS2) is a shared disk file system for Linux computer clusters. GFS2 allows all nodes to have direct concurrent access to the same shared block storage. GFS2 has no disconnected operating mode, and no client or server roles. All nodes in a GFS2 cluster function as peers. GFS2 supports up to 32 cluster nodes. Using GFS2 in a cluster requires hardware to allow access to the shared storage, and a lock manager to control access to the storage.

group

Resource groups contain multiple resources that need to be located together, started sequentially and stopped in the reverse order.

Hawk (HA Web Konsole)

A user-friendly Web-based interface for monitoring and administering a High Availability cluster from Linux or non-Linux machines. Hawk can be accessed from any machine that can connect to the cluster nodes, using a graphical Web browser.

heuristics

QDevice supports using a set of commands (heuristics) that run locally on start-up of cluster services, cluster membership change, successful connection to the QNetd server, or optionally at regular times. The result is used in calculations to determine which partition should have quorum.

knet (kronosnet)

A network abstraction layer supporting redundancy, security, fault tolerance, and fast fail-over of network links. In SUSE Linux Enterprise High Availability 16, knet is the default transport protocol for the Corosync communication channels.

local cluster

A single cluster in one location (for example, all nodes are located in one data center). Network latency is minimal. Storage is typically accessed synchronously by all nodes.

local executor

The local executor is located between Pacemaker and the resources on each node. Through the pacemaker-execd daemon, Pacemaker can start, stop and monitor resources.

location

In the context of a whole cluster, location can refer to the physical location of nodes (for example, all nodes might be located in the same data center). In the context of a location constraint, location refers to the nodes on which a resource can or cannot run.

location constraint

A type of resource constraint that defines the nodes on which a resource can or cannot run.

meta attributes (resource options)

Parameters that tell the CRM (cluster resource manager) how to treat a specific resource. For example, you might define a resource's priority or target role.

metro cluster

A single cluster that can stretch over multiple buildings or data centers, with all sites connected by Fibre Channel. Network latency is usually low. Storage is frequently replicated using mirroring or synchronous replication.

network device bonding

Network device bonding combines two or more network interfaces into a single bonded device to increase bandwidth and/or provide redundancy. When using Corosync, the bonded device is not managed by the cluster software. Therefore, the bonded device must be configured on every cluster node that might need to access it.

node

Any server (physical or virtual) that is a member of a cluster.

order constraint

A type of resource constraint that defines the sequence of actions.

Pacemaker

Pacemaker is the CRM (cluster resource manager) in SUSE Linux Enterprise High Availability, or the brain that reacts to events occurring in the cluster. Events might be nodes that join or leave the cluster, failure of resources, or scheduled activities such as maintenance, for example. The pacemakerd daemon launches and monitors all other related daemons.

parameters (instance attributes)

Parameters determine which instance of a service the resource controls.

primitive

A primitive resource is the most basic type of cluster resource.

promotable clone

Promotable clones are a special type of clone resource that can be promoted. Active instances of these resources are divided into two states: promoted and unpromoted (also known as active and passive or primary and secondary).

QDevice

QDevice and QNetd participate in quorum decisions. The corosync-qdevice daemon runs on each cluster node and communicates with QNetd to provide a configurable number of votes, allowing a cluster to sustain more node failures than the standard quorum rules allow.

QNetd

QNetd is an arbitrator that runs outside the cluster. The corosync-qnetd daemon provides a vote to the corosync-qdevice daemon on each node to help it participate in quorum decisions.

quorum

A cluster partition is defined to have quorum (be quorate) if it has the majority of nodes (or votes). Quorum distinguishes exactly one partition. This is part of the algorithm to prevent several disconnected partitions or nodes (split brain) from proceeding and causing data and service corruption. Quorum is a prerequisite for fencing, which then ensures that quorum is unique.

RA (resource agent)

A script acting as a proxy to manage a resource (for example, to start, stop or monitor a resource). SUSE Linux Enterprise High Availability supports different kinds of resource agents.

ReaR (Relax and Recover)

An administrator tool set for creating disaster recovery images.

resource

Any type of service or application that is known to Pacemaker, for example, an IP address, a file system, or a database. The term resource is also used for DRBD, where it names a set of block devices that use a common connection for replication.

resource constraint

Resource constraints specify which cluster nodes resources can run on, what order resources load in, and what other resources a specific resource is dependent on.

See also colocation constraint, location constraint and order constraint.

resource set

As an alternative format for defining location, colocation or order constraints, you can use resource sets, where primitives are grouped together in one set. When creating a constraint, you can specify multiple resources for the constraint to apply to.

resource template

To help create many resources with similar configurations, you can define a resource template. After being defined, it can be referenced in primitives or in certain types of constraints. If a template is referenced in a primitive, the primitive inherits all operations, instance attributes (parameters), meta attributes and utilization attributes defined in the template.

SBD (STONITH Block Device)

SBD provides a node fencing mechanism through the exchange of messages via shared block storage. Alternatively, it can be used in diskless mode. In either case, it needs a hardware or software watchdog on each node to ensure that misbehaving nodes are really stopped.

scheduler

The scheduler is implemented as pacemaker-schedulerd. When a cluster transition is needed, pacemaker-schedulerd calculates the expected next state of the cluster and determines what actions need to be scheduled to achieve the next state.

split brain

A scenario in which the cluster nodes are divided into two or more groups that do not know about each other (either through a software or hardware failure). STONITH prevents a split-brain scenario from badly affecting the entire cluster. Also known as a partitioned cluster scenario.

The term split brain is also used in DRBD but means that the nodes contain different data.

SPOF (single point of failure)

Any component of a cluster that, if it fails, triggers the failure of the entire cluster.

STONITH

Another term for the fencing mechanism that shuts down a misbehaving node to prevent it from causing trouble in a cluster. In a Pacemaker cluster, node fencing is managed by the fencing subsystem pacemaker-fenced.

switchover

The planned moving of resources to other nodes in a cluster. See also failover.

utilization

Tells the CRM what capacity a certain resource requires from a node.

watchdog

SBD (STONITH Block Device) needs a watchdog on each node to ensure that misbehaving nodes are really stopped. SBD feeds the watchdog by regularly writing a service pulse to it. If SBD stops feeding the watchdog, the hardware enforces a system restart. This protects against failures of the SBD process itself, such as becoming stuck on an I/O error.