Configuring QDevice and QNetd in an Existing High Availability Cluster
- WHAT?
How to use the CRM Shell to configure QDevice and QNetd in a High Availability cluster that is already installed and running.
- WHY?
QDevice and the arbitrator QNetd participate in quorum calculations in a split-brain scenario. This allows the cluster to sustain more node failures than the standard quorum rules allow.
- EFFORT
Configuring QDevice and QNetd in an existing cluster only takes a few minutes and does not require any downtime for cluster resources.
- GOAL
Help the cluster make quorum calculations more easily. This is recommended for clusters with an even number of nodes, especially two-node clusters.
- REQUIREMENTS
An existing SUSE Linux Enterprise High Availability cluster.
An additional SUSE Linux Enterprise Server to run QNetd.
We recommend having the cluster nodes reach the QNetd server via a different network than the one Corosync uses. Ideally, the QNetd server should be in a separate rack from the cluster, or at least on a separate PSU and not in the same network segment as the Corosync communication channels.
1 What are QDevice and QNetd? #
When communication fails between one or more nodes and the rest of the cluster (a split-brain scenario), a cluster partition occurs. The nodes can only communicate with other nodes in the same partition and are unaware of the separated nodes. A cluster partition has quorum (or is “quorate”) if it has the majority of nodes (or “votes”). This is determined by quorum calculation. Quorum must be calculated so the non-quorate nodes can be fenced.
QDevice and QNetd participate in quorum calculations in a split-brain scenario. QDevice runs on each cluster node and communicates with an arbitrator, QNetd, to provide a configurable number of votes to the cluster. This allows the cluster to sustain more node failures than the standard quorum rules allow. We recommend using QDevice and QNetd for clusters with an even number of nodes, and especially for two-node clusters.
1.1 Components #
- QDevice (
corosync-qdevice) QDevice runs together with Corosync on each cluster node. It communicates with the arbitrator QNetd to provide a configurable number of votes to help with quorum calculation.
- QNetd (
corosync-qnetd) QNetd is an arbitrator that provides a vote to the QDevice service running on the cluster nodes. The QNetd server runs outside the cluster, so you cannot move cluster resources to this server. QNetd can support multiple clusters if each cluster has a unique name.
- Algorithms
QDevice supports different algorithms to determine how votes are assigned. “Fifty-fifty split” is helpful for clusters with an even number of nodes. “Last man standing” is helpful for clusters where only one active node needs to remain quorate.
- Heuristics
QDevice supports a set of commands (or “heuristics”) that run when the cluster services start (or restart), when the cluster membership changes, and when nodes connect to the QNetd server. Optionally, you can also configure the commands to run at regular intervals. The result is sent to QNetd to help with the quorum calculation. Heuristics can be written in any programming language.
- Tiebreaker
This is used as a fallback if the cluster partitions are equal even after the heuristics results are applied. The tie-breaker vote can be configured to go to the node with the lowest node ID, the highest node ID, or a specific node ID.
1.2 Benefits #
Clusters with an even number of nodes can make quorum calculations more easily.
The cluster can sustain more node failures than the standard quorum rules allow.
You can write your own heuristics scripts to affect votes. This is especially useful for complex setups.
Two-node clusters can use diskless SBD if QDevice is also configured.
One QNetd server can provide votes for multiple clusters.
QNetd can work with TLS for client certificate checking.
1.3 For more information #
For more information, see the man pages corosync-qdevice and
corosync-qnetd.
2 Setting up the QNetd server #
QNetd is an arbitrator that provides a vote to the QDevice service running on the cluster nodes. The QNetd server runs outside the cluster, so you cannot move cluster resources to this server. QNetd can support multiple clusters if each cluster has a unique name.
By default, QNetd runs the corosync-qnetd daemon as the user
coroqnetd in the group coroqnetd.
This avoids running the daemon as root.
SUSE Linux Enterprise Server is installed and registered with the SUSE Customer Center.
You have an additional registration code for SUSE Linux Enterprise High Availability.
We recommend having the cluster nodes reach the QNetd server via a different network than the one Corosync uses.
Perform this procedure on a server that is not part of the cluster:
Log in either as the
rootuser or as a user withsudoprivileges.Enable the SUSE Linux Enterprise High Availability extension:
>sudo SUSEConnect -p sle-ha/16.0/x86_64 -r HA_REGCODEInstall the corosync-qnetd package:
>sudo zypper install corosync-qnetdYou do not need to manually start the
corosync-qnetdservice. It starts automatically when you configure QDevice on the cluster.
The QNetd server is ready to accept connections from a QDevice client
(corosync-qdevice). Further configuration is handled
by crmsh when you connect QDevice clients.
3 Connecting QDevice to the QNetd server #
QDevice runs together with Corosync on each cluster node. It communicates with the arbitrator QNetd to provide a configurable number of votes to help with quorum calculation.
This procedure explains how to configure QDevice after the cluster is already installed and running, not during the initial cluster setup.
The setup script checks if a cluster restart is required and whether it is safe to do so automatically. If any non-STONITH resources are running, the script warns you to restart the cluster services manually. This allows you to put the cluster into maintenance mode first to avoid resource downtime. However, be aware that the resources will not have cluster protection while in maintenance mode.
An existing High Availability cluster is already running.
The latest corosync-qdevice package is installed on all nodes.
The latest corosync-qnetd package is installed on the QNetd server.
To connect to the QNetd server as a
sudouser: The user must have passwordlesssudopermission.To connect to the QNetd server as the
rootuser: Passwordless SSH authentication must be configured between the nodes and the QNetd server.
Perform this procedure on only one cluster node:
Log in either as the
rootuser or as a user withsudoprivileges.Run the QDevice stage of the cluster setup script:
>sudo crm cluster init qdeviceConfirm with
ythat you want to configure QDevice and QNetd.Enter the IP address or host name of the QNetd server, with or without a user name:
If you include a non-
rootuser name, a later step will prompt you for the user's password and the script will configure passwordless SSH authentication from the nodes to the QNetd server.If you omit a user name, the script defaults to the
rootuser, so passwordless SSH authentication must already be configured for the nodes to access the QNetd server.
For the remaining fields, you can accept the default values or change them as required:
Accept the proposed port (
5403) or enter a different one.Choose the algorithm that determines how votes are assigned. The default is
ffsplit.ffsplit(“fifty-fifty split”): If the cluster splits into two even partitions, one of the partitions gets the vote based on the results of heuristics checks and other factors. This algorithm is helpful for clusters with an even number of nodes.lms(“last man standing”): If only one remaining node can still communicate with the QNetd server, that node gets the vote. This algorithm is helpful for clusters where only one active node needs to remain quorate.
Choose the method to use when a tie-breaker is required. The default is
lowest.lowest: The node with the lowest node ID gets the vote.highest: The node with the highest node ID gets the vote.Alternatively, you can enter a specific node ID. The designated node always gets the vote.
Choose whether to enable TLS for client certificate checking. The default is
on.off: TLS is not required and should not be tried.on: Attempt to connect with TLS, but connect without TLS if it is not available.required: TLS is mandatory, so QDevice exits with an error if TLS is not available.
Enter heuristics commands to assist in quorum calculation, or leave the field blank to skip this step.
You can enter one command, multiple commands separated by semicolons, or the path to a script file. The commands can be written in any programming language.
If you enter heuristics commands, you must also select the mode of operation. The default is
sync.sync: QDevice runs heuristics when the cluster services start (or restart), when the cluster membership changes, and when nodes connect to the QNetd server.on: QDevice runs heuristics in the same scenarios assyncand also at regular intervals.
If required, the script prompts you for the password of the QNetd server, then configures passwordless SSH authentication between the cluster nodes and the QNetd server.
The script configures QDevice on the nodes and completes the QNetd server's configuration, including generating CA and server certificates and starting the
corosync-qnetdservice. The script also checks whether a cluster restart is required and whether it is safe to do so automatically. If any non-STONITH resources are running, the script warns you to restart the cluster services manually.If you need to restart the cluster services manually, follow these steps to avoid resource downtime:
Put the cluster into maintenance mode:
>sudo crm maintenance onIn this state, the cluster stops monitoring all resources. This allows the services managed by the resources to keep running while the cluster restarts. However, be aware that the resources will not have cluster protection while in maintenance mode.
Restart the cluster services on all nodes:
>sudo crm cluster restart --allCheck the status of the cluster:
>sudo crm statusThe nodes will have the status
UNCLEAN (offline), but will soon change toOnline.When the nodes are back online, put the cluster back into normal operation:
>sudo crm maintenance off
4 Checking the QDevice and QNetd setup #
Use the crm corosync status command to check the cluster's quorum status
and the status of QDevice and QNetd. You can run this command from any node in the cluster.
The following examples show a cluster with two nodes (alice and
bob) and a QNetd server (charlie).
>sudo crm corosync status quorum1 alice member 2 bob member Quorum information ------------------ Date: [...] Quorum provider: corosync_votequorum Nodes: 2 Node ID: 2 Ring ID: 1.e Quorate: Yes Votequorum information ---------------------- Expected votes: 3 Highest expected: 3 Total votes: 3 Quorum: 2 Flags: Quorate Qdevice Membership information ---------------------- Nodeid Votes Qdevice Name 1 1 A,V,NMW alice 2 1 A,V,NMW bob (local) 0 1 Qdevice
The Membership information section shows the following status codes:
A(alive) orNA(not alive)Shows the connectivity status between QDevice and Corosync.
V(vote) orNV(non vote)Shows if the node has a vote.
Vmeans that both nodes can communicate with each other. In a split-brain scenario, one node would be set toVand the other node would be set toNV.MW(master wins) orNMW(not master wins)Shows if the master_wins flag is set. By default, the flag is not set, so the status is
NMW.NR(not registered)Shows that the cluster is not using a quorum device.
>sudo crm corosync status qdevice1 alice member 2 bob member Qdevice information ------------------- Model: Net Node ID: 1 HB interval: 10000ms Sync HB interval: 30000ms Configured node list: 0 Node ID = 1 1 Node ID = 2 Heuristics: Disabled Ring ID: 1.e Membership node list: 1, 2 Quorate: Yes Quorum node list: 0 Node ID = 2, State = member 1 Node ID = 1, State = member Expected votes: 3 Last poll call: [...] Qdevice-net information ---------------------- Cluster name: hacluster QNetd host: charlie:5403 Connect timeout: 8000ms HB interval: 8000ms VQ vote timer interval: 5000ms TLS: Supported Algorithm: Fifty-Fifty split Tie-breaker: Node with lowest node ID KAP Tie-breaker: Enabled Poll timer running: Yes (cast vote) State: Connected TLS active: Yes (client certificate sent) Connected since: [...] Echo reply received: [...]
>sudo crm corosync status qnetd1 alice member 2 bob member Cluster "hacluster": Algorithm: Fifty-Fifty split (KAP Tie-breaker) Tie-breaker: Node with lowest node ID Node ID 1: Client address: ::ffff:192.168.1.185:45676 HB interval: 8000ms Configured node list: 1, 2 Ring ID: 1.e Membership node list: 1, 2 Heuristics: Undefined (membership: Undefined, regular: Undefined) TLS active: Yes (client certificate verified) Vote: ACK (ACK) Node ID 2: Client address: ::ffff:192.168.1.168:55034 HB interval: 8000ms Configured node list: 1, 2 Ring ID: 1.e Membership node list: 1, 2 Heuristics: Undefined (membership: Undefined, regular: Undefined) TLS active: Yes (client certificate verified) Vote: No change (ACK)
5 Changing the QDevice or QNetd configuration #
Use this procedure to change the configuration of QDevice or QNetd (for example, to
change the tie-breaker method from lowest to highest).
Log in either as the
rootuser or as a user withsudoprivileges.Put the cluster into maintenance mode:
>sudo crm maintenance onIn this state, the cluster stops monitoring all resources. This allows the services managed by the resources to keep running even when you stop the cluster services.
Stop the cluster services on all nodes:
>sudo crm cluster stop --allOpen the Corosync configuration file:
>sudo crm corosync editChange the required setting in the
quorumsection, then save and close the file.Copy the new configuration to all nodes:
>sudo crm corosync pushStart the cluster service on all nodes:
>sudo crm cluster start --allCheck the status of the cluster:
>sudo crm statusThe nodes will have the status
UNCLEAN (offline), but will soon change toOnline.When the nodes are back online, put the cluster back into normal operation:
>sudo crm maintenance offVerify that the change was successful:
>sudo crm corosync status qnetd
6 Legal Notice #
Copyright© 2006–2025 SUSE LLC and contributors. All rights reserved.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or (at your option) version 1.3; with the Invariant Section being this copyright notice and license. A copy of the license version 1.2 is included in the section entitled “GNU Free Documentation License”.
For SUSE trademarks, see https://www.suse.com/company/legal/. All other third-party trademarks are the property of their respective owners. Trademark symbols (®, ™ etc.) denote trademarks of SUSE and its affiliates. Asterisks (*) denote third-party trademarks.
All information found in this book has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither SUSE LLC, its affiliates, the authors, nor the translators shall be held liable for possible errors or the consequences thereof.