This is a draft document that was built and uploaded automatically. It may document beta software and be incomplete or even incorrect. Use this document at your own risk.

Jump to content
Highly Available NFS Storage with DRBD and Pacemaker
SUSE Linux Enterprise High Availability 12 SP4

Highly Available NFS Storage with DRBD and Pacemaker

SUSE Linux Enterprise High Availability 12 SP4

Publication Date: November 08, 2023

This document describes how to set up highly available NFS storage in a two-node cluster, using the following components of SUSE Linux Enterprise High Availability 12 SP4: DRBD* (Distributed Replicated Block Device), LVM (Logical Volume Manager), and Pacemaker, the cluster resource management framework.

Warning
Warning: This guide is no longer recommended

The method described in this version of the guide is outdated and may cause issues in some setups. For more information, see https://www.suse.com/support/kb/doc/?id=000020396.

The process for configuring highly available NFS storage has been improved in versions 12 SP5 and 15 SP3.

Copyright © 2006–2023 SUSE LLC and contributors. All rights reserved.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or (at your option) version 1.3; with the Invariant Section being this copyright notice and license. A copy of the license version 1.2 is included in the section entitled GNU Free Documentation License.

For SUSE trademarks, see http://www.suse.com/company/legal/. All third-party trademarks are the property of their respective owners. Trademark symbols (®, ™ etc.) denote trademarks of SUSE and its affiliates. Asterisks (*) denote third-party trademarks.

All information found in this book has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither SUSE LLC, its affiliates, the authors nor the translators shall be held liable for possible errors or the consequences thereof.

1 Usage Scenario

This document will help you set up a highly available NFS server. The cluster used to for the highly available NFS storage has the following properties:

  • Two nodes: alice (IP: 192.168.1.1) and bob (IP: 192.168.1.2), connected to each other via network.

  • Two floating, virtual IP addresses (192.168.1.10 and 192.168.1.11), allowing clients to connect to the service no matter which physical node it is running on. One IP address is used for cluster administration with Hawk2, the other IP address is used exclusively for the NFS exports.

  • A shared storage device, used as an SBD fencing mechanism. This avoids split brain scenarios.

  • Failover of resources from one node to the other if the active host breaks down (active/passive setup).

  • Local storage on each host. The data is synchronized between the hosts using DRBD on top of LVM.

  • A file system exported through NFS.

After installing and setting up the basic two-node cluster, and extending it with storage and cluster resources for NFS, you will have a highly available NFS storage server.

2 Installing a Basic Two-Node Cluster

Before you proceed, install and set up a basic two-node cluster. This task is described in the Installation and Setup Quick Start. The Installation and Setup Quick Start describes how to use the crm shell to set up a cluster with minimal effort.

3 Creating an LVM Device

LVM (Logical Volume Manager) enables flexible distribution of hard disk space over several file systems.

To prepare your disks for LVM, do the following:

  1. Check if the locking type of LVM2 is cluster-aware. The keyword locking_type in /etc/lvm/lvm.conf must contain the value 3 (the default is 1). Copy the configuration to all nodes, if necessary.

  2. Create an LVM volume group and replace /dev/sdbX with your corresponding device for LVM:

    # pvcreate /dev/sdbX
  3. Create an LVM Volume Group nfs that includes this physical volume:

    # vgcreate nfs /dev/sdbX
  4. Create one or more logical volumes in the volume group nfs. This example assumes a 20 gigabyte volume, named work:

    # lvcreate -n work -L 20G nfs
  5. Activate the volume group:

    # vgchange -ay nfs

After you have successfully executed the above steps, your system will make visible the following device: /dev/VOLGROUP/LOGICAL_VOLUME. In this case it will be /dev/nfs/work.

4 Creating a DRBD Device

This section describes how to set up a DRBD device on top of LVM. The configuration of LVM as a back-end of DRBD has some benefits:

  • Easier setup than with LVM on top of DRBD.

  • Easier administration in case the LVM disks need to be resized or more disks are added to the volume group.

As the LVM volume group is named nfs, the DRBD resource uses the same name.

4.1 Creating DRBD Configuration

For consistency reasons, it is highly recommended to follow this advice:

  • Use the directory /etc/drbd.d/ for your configuration.

  • Name the file according to the purpose of the resource.

  • Put your resource configuration in a file with a .res extension. In the following examples, the file /etc/drbd.d/nfs.res is used.

Proceed as follows:

Procedure 1: Creating a DRBD Configuration
  1. Create the file /etc/drbd.d/nfs.res with the following contents:

    resource nfs {
       device /dev/drbd0; 1
       disk   /dev/nfs/work; 2
       meta-disk internal; 3
    
       net {
          protocol  C; 4
          fencing: resource-and-stonith;
       }
    
       handlers { 5
          fence-peer "/usr/lib/drbd/crm-fence-peer.9.sh";
          after-resync-target "/usr/lib/drbd/crm-unfence-peer.9.sh";
          # ...
       }
    
       connection-mesh { 6
          hosts     alice bob;
       }
       on alice { 7
          address   192.168.1.1:7790;
          node-id   0;
       }
       on bob { 7
          address   192.168.1.2:7790;
          node-id   1;
       }
    }

    1

    The DRBD device that applications are supposed to access.

    2

    The lower-level block device used by DRBD to store the actual data. This is the LVM device that was created in Section 3, “Creating an LVM Device”.

    3

    Where the metadata format is stored. Using internal, the metadata is stored together with the user data on the same device. See the man page for further information.

    4

    The specified protocol to be used for this connection. For protocol C, a write is considered to be complete when it has reached all disks, be they local or remote.

    5

    Enables resource-level fencing. If the DRBD replication link becomes disconnected, Pacemaker tries to promote the DRBD resource to another node. During this process, the scripts were called. See Book “Administration Guide”, Chapter 18 “DRBD”, Section 18.6 “Using Resource-Level Fencing” for more information.

    6

    Defines all nodes of a mesh. The hosts parameter contains all host names that share the same DRBD setup.

    7

    Contains the IP address and a unique identifier for each node.

  2. Open /etc/csync2/csync2.cfg and check whether the following two lines exist:

    include /etc/drbd.conf;
    include /etc/drbd.d/*.res;

    If not, add them to the file.

  3. Copy the file to the other nodes:

    # csync2 -xv

    For information about Csync2, refer to Book “Administration Guide”, Chapter 4 “Using the YaST Cluster Module”, Section 4.7 “Transferring the Configuration to All Nodes”.

4.2 Activating the DRBD Device

After you have prepared your DRBD configuration, proceed as follows:

  1. If you use a firewall in your cluster, open port 7790 in your firewall configuration.

  2. The first time you do this, execute the following commands on both nodes (in our example, alice and bob):

    # drbdadm create-md nfs
    # drbdadm up nfs

    This initializes the metadata storage and creates the /dev/drbd0 device.

  3. If the DRBD devices on all nodes have the same data, skip the initial resynchronization. Use the following command:

    # drbdadm new-current-uuid --clear-bitmap nfs/0
  4. Make alice primary:

    # drbdadm primary --force nfs
  5. Check the DRBD status:

    # drbdadm status nfs

    This returns the following message:

    nfs role:Primary
      disk:UpToDate
      alice role:Secondary
        peer-disk:UpToDate

After the synchronization is complete, you can access the DRBD resource on the block device /dev/drbd0. Use this device for creating your file system. Find more information about DRBD in Book “Administration Guide”, Chapter 18 “DRBD”.

4.3 Creating the File System

After you have finished Section 4.2, “Activating the DRBD Device”, you should see a DRBD device on /dev/drbd0:

# mkfs.ext3 /dev/drbd0

5 Adjusting Pacemaker's Configuration

A resource might fail back to its original node when that node is back online and in the cluster. To prevent a resource from failing back to the node that it was running on, or to specify a different node for the resource to fail back to, change its resource stickiness value. You can either specify resource stickiness when you are creating a resource or afterward.

To adjust the option, open the crm shell as root (or any non-root user that is part of the haclient group) and run the following commands:

# crm configure
crm(live)configure# rsc_defaults resource-stickiness="200"
crm(live)configure# commit

For more information about global cluster options, refer to Book “Administration Guide”, Chapter 5 “Configuration and Administration Basics”, Section 5.2 “Quorum Determination”.

6 Creating Cluster Resources

The following sections cover the configuration of the required resources for a highly available NFS cluster. The configuration steps use the crm shell. The following list shows the necessary cluster resources:

Overview of Cluster Resources
DRBD Primitive and Multi-state Resource

These resources are used to replicate data. The multi-state resource is switched from and to the Primary and Secondary roles as deemed necessary by the cluster resource manager.

NFS Kernel Server Resource

With this resource, Pacemaker ensures that the NFS server daemons are always available.

NFS Exports

One or more NFS exports, typically corresponding to the file system.

Example NFS Scenario
  • The following configuration examples assume that 192.168.1.11 is the virtual IP address to use for an NFS server which serves clients in the 192.168.1.x/24 subnet.

  • The service exports data served from /srv/nfs/work.

  • Into this export directory, the cluster will mount ext3 file systems from the DRBD device /dev/drbd0. This DRBD device sits on top of an LVM logical volume with the name nfs.

6.1 DRBD Primitive and Multi-state Resource

To configure these resources, run the following commands from the crm shell:

crm(live)# configure
crm(live)configure# primitive drbd_nfs \
  ocf:linbit:drbd \
    params drbd_resource="nfs" \
  op monitor interval="15" role="Master" \
  op monitor interval="30" role="Slave"
crm(live)configure# ms ms-drbd_nfs drbd_nfs \
  meta master-max="1" master-node-max="1" clone-max="2" \
  clone-node-max="1" notify="true"
crm(live)configure# commit

This will create a Pacemaker multi-state resource corresponding to the DRBD resource nfs. Pacemaker should now activate your DRBD resource on both nodes and promote it to the master role on one of them.

Check the state of the cluster with the crm status command, or run drbdadm status.

6.2 NFS Kernel Server Resource

In the crm shell, the resource for the NFS server daemons must be configured as a clone of a systemd resource type.

crm(live)configure# primitive nfsserver \
  systemd:nfs-server \
  op monitor interval="30s"
crm(live)configure# clone cl-nfsserver nfsserver \
   meta interleave=true
crm(live)configure# commit

After you have committed this configuration, Pacemaker should start the NFS Kernel server processes on both nodes.

6.3 File System Resource

  1. Configure the file system type resource as follows (but do not commit this configuration yet):

    crm(live)configure# primitive fs_work \
      ocf:heartbeat:Filesystem \
      params device=/dev/drbd0 \
        directory=/srv/nfs/work \
        fstype=ext3 \
      op monitor interval="10s"
  2. Combine these resources into a Pacemaker resource group:

    crm(live)configure# group g-nfs fs_work
  3. Add the following constraints to make sure that the group is started on the same node on which the DRBD multi-state resource is in the master role:

    crm(live)configure# order o-drbd_before_nfs inf: \
      ms-drbd_nfs:promote g-nfs:start
    crm(live)configure# colocation col-nfs_on_drbd inf: \
      g-nfs ms-drbd_nfs:Master
  4. Commit this configuration:

    crm(live)configure# commit

After these changes have been committed, Pacemaker mounts the DRBD device to /srv/nfs/work on the same node. Confirm this with mount (or by looking at /proc/mounts).

6.4 NFS Export Resources

When your DRBD, LVM, and file system resources are working properly, continue with the resources managing your NFS exports. To create highly available NFS export resources, use the exportfs resource type.

To export the /srv/nfs/work directory to clients, use the following primitive:

  1. Create NFS exports with the following commands:

    crm(live)configure# primitive exportfs_work \
      ocf:heartbeat:exportfs \
        params directory="/srv/nfs/work" \
          options="rw,mountpoint" \
          clientspec="192.168.1.0/24" \
          wait_for_leasetime_on_stop=true \
          fsid=100 \
      op monitor interval="30s"
  2. After you have created these resources, append them to the existing g-nfs resource group:

    crm(live)configure# modgroup g-nfs add exportfs_work
  3. Commit this configuration:

    crm(live)configure# commit

    Pacemaker will export the NFS virtual file system root and the two other exports.

  4. Confirm that the NFS exports are set up properly:

    # exportfs -v
    /srv/nfs/work   IP_ADDRESS_OF_CLIENT(OPTIONS)

6.5 Virtual IP Address for NFS Exports

The initial installation creates an administrative virtual IP address for Hawk2. Although you could use this IP address for your NFS exports too, create another one exclusively for NFS exports. This makes it easier to apply security restrictions later. Use the following commands in the crm shell:

crm(live)configure# primitive vip_nfs IPaddr2 \
   params ip=192.168.1.11 cidr_netmask=24 \
   op monitor interval=10 timeout=20
crm(live)configure# modgroup g-nfs add vip_nfs
crm(live)configure# commit

7 Using the NFS Service

This section outlines how to use the highly available NFS service from an NFS client.

To connect to the NFS service, make sure to use the virtual IP address to connect to the cluster rather than a physical IP configured on one of the cluster nodes' network interfaces. For compatibility reasons, use the full path of the NFS export on the server.

In its simplest form, the command to mount the NFS export looks like this:

# mount -t nfs 192.168.1.11:/srv/nfs/work /home/work

To configure a specific transport protocol (proto) and maximum read and write request sizes (rsize and wsize), use:

# mount -o rsize=32768,wsize=32768 \
    192.168.1.11:/srv/nfs/work /home/work

In case you need to be compatible with NFS version 3, include the value vers=3 after the -o option.

For further NFS mount options, consult the nfs man page.