14 QDevice and QNetd #
            QDevice and QNetd participate in quorum decisions. With
            assistance from the arbitrator corosync-qnetd,
            corosync-qdevice provides
            a configurable number of votes, allowing a cluster to sustain
            more node failures than the standard quorum rules allow. We
            recommend deploying corosync-qnetd
            and corosync-qdevice for
            clusters with an even number of nodes, and especially for two-node clusters.
         
14.1 Conceptual overview #
In comparison to calculating quora among cluster nodes, the QDevice-and-QNetd approach has the following benefits:
- It provides better sustainability in case of node failures. 
- You can write your own heuristics scripts to affect votes. This is especially useful for complex setups. 
- It enables you to configure a QNetd server to provide votes for multiple clusters. 
- It allows using diskless SBD for two-node clusters. 
- It helps with quorum decisions for clusters with an even number of nodes under split-brain situations, especially for two-node clusters. 
A setup with QDevice/QNetd consists of the following components and mechanisms:
- QNetd (corosync-qnetd)
- A systemd service (a daemon, the “QNetd server”) which is not part of the cluster. The systemd service provides a vote to the - corosync-qdevicedaemon.- To improve security, - corosync-qnetdcan work with TLS for client certificate checking.
- QDevice (corosync-qdevice)
- A systemd service (a daemon) on each cluster node running together with Corosync. This is the client of - corosync-qnetd. Its primary use is to allow a cluster to sustain more node failures than standard quorum rules allow.- QDevice is designed to work with different arbitrators. However, currently, only QNetd is supported. 
- Algorithms
- QDevice supports different algorithms, which determine the behavior of how votes are assigned. Currently, the following exist: - FFSplit (“fifty-fifty split”) is the default. It is used for clusters with an even number of nodes. If the cluster splits into two similar partitions, this algorithm provides one vote to one of the partitions, based on the results of heuristics checks and other factors. 
- LMS (“last man standing”) allows the only remaining node that can see the QNetd server to get the votes. So this algorithm is useful when a cluster with only one active node should remain quorate. 
 
- Heuristics
- QDevice supports a set of commands (“heuristics”). The commands are executed locally on start-up of cluster services, cluster membership change, successful connection to - corosync-qnetd, or, optionally, at regular times. The heuristics can be set with the quorum.device.heuristics key (in the- corosync.conffile) or with the- --qdevice-heuristics-modeoption. Both know the values- off(default),- sync, and- on. The difference between- syncand- onis that you can additionally execute the above commands regularly.- Only if all commands are executed successfully are the heuristics considered to have passed; otherwise, they failed. The heuristics' result is sent to - corosync-qnetdwhere it is used in calculations to determine which partition should be quorate.
- Tiebreaker
- This is used as a fallback if the cluster partitions are completely equal even with the same heuristics results. It can be configured to be the lowest, the highest, or a specific node ID. 
14.2 Requirements and prerequisites #
Before setting up QDevice and QNetd, you need to prepare the environment as follows:
- In addition to the cluster nodes, you have a separate machine which will become the QNetd server. See Section 14.3, “Setting up the QNetd server”. 
- A different physical network than the one that Corosync uses. It is recommended for QDevice to reach the QNetd server. Ideally, the QNetd server should be in a separate rack than the main cluster, or at least on a separate PSU and not in the same network segment as the Corosync ring or rings. 
14.3 Setting up the QNetd server #
The QNetd server is not part of the cluster stack, and it is also not a real member of your cluster. As such, you cannot move resources to this server.
         The QNetd server is almost “state free”. Usually, you do not need to
         change anything in the configuration file /etc/sysconfig/corosync-qnetd.
         By default, the corosync-qnetd service runs the daemon
         as user coroqnetd
         in the group coroqnetd. This avoids
         running the daemon as root.
      
To create a QNetd server, proceed as follows:
- On the machine that will become the QNetd server, install SUSE Linux Enterprise Server 15 SP3. 
- Enable the SUSE Linux Enterprise High Availability using the command listed in - SUSEConnect --list-extensions.
- Install the corosync-qnetd package: - #- zypperinstall corosync-qnetd- You do not need to manually start the - corosync-qnetdservice. It will be started automatically when you configure QDevice on the cluster.
         Your QNetd server is ready to accept connections from a QDevice client
         corosync-qdevice.
         Further configuration is not needed.
      
14.4 Connecting QDevice clients to the QNetd server #
After you set up your QNetd server, you can set up and run the clients. You can connect the clients to the QNetd server during the installation of your cluster, or you can add them later. This procedure documents how to add them later.
- On all nodes, install the corosync-qdevice package: - #- zypperinstall corosync-qdevice
- On one of the nodes, run the following command to configure QDevice: - #- crmcluster init qdevice Do you want to configure QDevice (y/n)?- yHOST or IP of the QNetd server to be used []- QNETD_SERVERTCP PORT of QNetd server [5403] QNetd decision ALGORITHM (ffsplit/lms) [ffsplit] QNetd TIE_BREAKER (lowest/highest/valid node id) [lowest] Whether using TLS on QDevice/QNetd (on/off/required) [on] Heuristics COMMAND to run with absolute path; For multiple commands, use ";" to separate []- Confirm with - ythat you want to configure QDevice, then enter the host name or IP address of the QNetd server. For the remaining fields, you can accept the default values or change them if required.Important:- SBD_WATCHDOG_TIMEOUTfor diskless SBD and QDevice- If you use QDevice with diskless SBD, the - SBD_WATCHDOG_TIMEOUTvalue must be greater than QDevice's- sync_timeoutvalue, or SBD will time out and fail to start.- The default value for - sync_timeoutis 30 seconds. Therefore, in the file- /etc/sysconfig/sbd, make sure that- SBD_WATCHDOG_TIMEOUTis set to a greater value, such as- 35.
14.5 Setting up a QDevice with heuristics #
If you need additional control over how votes are determined, use heuristics. Heuristics are a set of commands that are executed in parallel.
         For this purpose, the command crm cluster init qdevice
         provides the option --qdevice-heuristics. You can
         pass one or more commands (separated by semicolons) with absolute paths.
      
         For example, if your own command for heuristic checks is located at
         /usr/sbin/my-script.sh you can run it on
         one of your cluster nodes as follows:
      
#crmcluster init qdevice --qnetd-hostname=charlie \ --qdevice-heuristics=/usr/sbin/my-script.sh \ --qdevice-heuristics-mode=on
         The command or commands can be written in any language such as Shell, Python, or Ruby.
         If they succeed, they return 0 (zero), otherwise they return an error code.
      
You can also pass a set of commands. Only when all commands finish successfully (return code is zero), have the heuristics passed.
         The --qdevice-heuristics-mode=on option lets the heuristics
         commands run regularly.
      
14.6 Checking and showing quorum status #
You can query the quorum status on one of your cluster nodes as shown in Example 14.1, “Status of QDevice”. It shows the status of your QDevice nodes.
#corosync-quorumtool1 Quorum information ------------------ Date: ... Quorum provider: corosync_votequorum Nodes: 2 2 Node ID: 3232235777 3 Ring ID: 3232235777/8 Quorate: Yes 4 Votequorum information ---------------------- Expected votes: 3 Highest expected: 3 Total votes: 3 Quorum: 2 Flags: Quorate Qdevice Membership information ---------------------- Nodeid Votes Qdevice Name 3232235777 1 A,V,NMW 192.168.1.1 (local) 5 3232235778 1 A,V,NMW 192.168.1.2 5 0 1 Qdevice
| 
                  As an alternative with an identical result, you can also use
                  the  | |
| The number of nodes we are expecting. In this example, it is a two-node cluster. | |
| 
                  As the node ID is not explicitly specified in  | |
| The quorum status. In this case, the cluster has quorum. | |
| The status for each cluster node means: 
 | 
If you query the status of the QNetd server, you get a similar output to that shown in Example 14.2, “Status of QNetd server”:
#corosync-qnetd-tool1 Cluster "hacluster": 2 Algorithm: Fifty-Fifty split 3 Tie-breaker: Node with lowest node ID Node ID 3232235777: 4 Client address: ::ffff:192.168.1.1:54732 HB interval: 8000ms Configured node list: 3232235777, 3232235778 Ring ID: aa10ab0.8 Membership node list: 3232235777, 3232235778 Heuristics: Undefined (membership: Undefined, regular: Undefined) TLS active: Yes (client certificate verified) Vote: ACK (ACK) Node ID 3232235778: Client address: ::ffff:192.168.1.2:43016 HB interval: 8000ms Configured node list: 3232235777, 3232235778 Ring ID: aa10ab0.8 Membership node list: 3232235777, 3232235778 Heuristics: Undefined (membership: Undefined, regular: Undefined) TLS active: Yes (client certificate verified) Vote: No change (ACK)
| 
                As an alternative with an identical result, you can also use
                the  | |
| 
                The name of your cluster as set in the configuration file
                 | |
| 
                The algorithm currently used. In this example, it is  | |
| 
                This is the entry for the node with the IP address
                 | 
14.7 For more information #
For additional information about QDevice and QNetd, see the man pages of corosync-qdevice(8) and corosync-qnetd(8).