20 Cluster Multi-device (Cluster MD) #
The cluster multi-device (Cluster MD) is a software based RAID storage solution for a cluster. Cluster MD provides the redundancy of RAID1 mirroring to the cluster. Currently, only RAID1 is supported now. This chapter shows you how to create and use Cluster MD.
20.1 Conceptual Overview #
The Cluster MD provides support for use of RAID1 across a cluster environment. The disks or devices used by Cluster MD are accessed by each node. If one device of the Cluster MD fails, it can be replaced at runtime by another device and it is re-synced to provide the same amount of redundancy. The Cluster MD requires Corosync and Distributed Lock Manager (DLM) for co-ordination and messaging.
A Cluster MD device is not automatically started on boot like the rest of the regular MD devices. A clustered device needs to be started using resource agents to ensure the DLM resource has been started.
20.2 Creating a Clustered MD RAID Device #
- A running cluster with pacemaker. 
- A resource agent for DLM (see Procedure 15.1, “Configuring a Base Group for DLM” on how to configure DLM). 
- At least two shared disk devices. You can use an additional device as a spare which will fail over automatically in case of device failure. 
- An installed package cluster-md-kmp-default. 
    Always use cluster-wide persistent device names, such as
    /dev/disk/by-id/DEVICE_ID.
    Unstable device names like /dev/sdX or
    /dev/dm-X might become mismatched on different
    nodes, causing major problems across the cluster.
  
- Make sure the DLM resource is up and running on every node of the cluster and check the resource status with the command: - #- crm_resource-r dlm -W
- Create the Cluster MD device: - If you do not have an existing normal RAID device, create the Cluster MD device on the node running the DLM resource with the following command: - #- mdadm--create /dev/md0 --bitmap=clustered \ --metadata=1.2 --raid-devices=2 --level=mirror \ /dev/disk/by-id/DEVICE_ID1 /dev/disk/by-id/DEVICE_ID2- As Cluster MD only works with version 1.2 of the metadata, it is recommended to specify the version using the - --metadataoption. For other useful options, refer to the man page of- mdadm. Monitor the progress of the re-sync in- /proc/mdstat.
- If you already have an existing normal RAID, first clear the existing bitmap and then create the clustered bitmap: - #- mdadm--grow /dev/mdX --bitmap=none- #- mdadm--grow /dev/mdX --bitmap=clustered
- Optionally, to create a Cluster MD device with a spare device for automatic failover, run the following command on one cluster node: - #- mdadm--create /dev/md0 --bitmap=clustered --raid-devices=2 \ --level=mirror --spare-devices=1 --metadata=1.2 \ /dev/disk/by-id/DEVICE_ID1 /dev/disk/by-id/DEVICE_ID2 /dev/disk/by-id/DEVICE_ID3
 
- Get the UUID and the related md path: - #- mdadm--detail --scan- The UUID must match the UUID stored in the superblock. For details on the UUID, refer to the - mdadm.confman page.
- Open - /etc/mdadm.confand add the md device name and the devices associated with it. Use the UUID from the previous step:- DEVICE /dev/disk/by-id/DEVICE_ID1 /dev/disk/by-id/DEVICE_ID2 ARRAY /dev/md0 UUID=1d70f103:49740ef1:af2afce5:fcf6a489 
- Open Csync2's configuration file - /etc/csync2/csync2.cfgand add- /etc/mdadm.conf:- group ha_group { # ... list of files pruned ... include /etc/mdadm.conf; }
- Copy the configuration file to all nodes: - #- csync2 -xv
20.3 Configuring a Resource Agent #
Configure a CRM resource as follows:
- Create a - Raid1primitive:- crm(live)configure#- primitiveraider Raid1 \ params raidconf="/etc/mdadm.conf" raiddev=/dev/md0 \ force_clones=true \ op monitor timeout=20s interval=10 \ op start timeout=20s interval=0 \ op stop timeout=20s interval=0
- Add the - raiderresource to the base group for storage that you have created for DLM:- crm(live)configure#- modgroupg-storage add raider- The - addsub-command appends the new group member by default.- If not already done, clone the - g-storagegroup so that it runs on all nodes:- crm(live)configure#- clonecl-storage g-storage \ meta interleave=true target-role=Started
- Review your changes with - show.
- If everything seems correct, submit your changes with - commit.
20.4 Adding a Device #
To add a device to an existing, active Cluster MD device, first ensure that
   the device is “visible” on each node with the command
   cat /proc/mdstat.
   If the device is not visible, the command will fail.
  
Use the following command on one cluster node:
#mdadm--manage /dev/md0 --add /dev/disk/by-id/DEVICE_ID
The behavior of the new device added depends on the state of the Cluster MD device:
- If only one of the mirrored devices is active, the new device becomes the second device of the mirrored devices and a recovery is initiated. 
- If both devices of the Cluster MD device are active, the new added device becomes a spare device. 
20.5 Re-adding a Temporarily Failed Device #
Quite often the failures are transient and limited to a single node. If any of the nodes encounters a failure during an I/O operation, the device will be marked as failed for the entire cluster.
This could happen, for example, because of a cable failure on one of the nodes. After correcting the problem, you can re-add the device. Only the outdated parts will be synchronized as opposed to synchronizing the entire device by adding a new one.
To re-add the device, run the following command on one cluster node:
#mdadm--manage /dev/md0 --re-add /dev/disk/by-id/DEVICE_ID
20.6 Removing a Device #
Before removing a device at runtime for replacement, do the following:
- Make sure the device is failed by introspecting - /proc/mdstat. Look for an- (F)before the device.
- Run the following command on one cluster node to make a device fail: - #- mdadm--manage /dev/md0 --fail /dev/disk/by-id/DEVICE_ID
- Remove the failed device using the command on one cluster node: - #- mdadm--manage /dev/md0 --remove /dev/disk/by-id/DEVICE_ID
20.7 Assembling Cluster MD as normal RAID at the disaster recovery site #
In the event of disaster recovery, you might face the situation that you do not have a Pacemaker cluster stack in the infrastructure on the disaster recovery site, but applications still need to access the data on the existing Cluster MD disks, or from the backups.
   You can convert a Cluster MD RAID to a normal RAID by using the --assemble
   operation with the -U no-bitmap option to change the metadata
   of the RAID disks accordingly.
   
Find an example below of how to assemble all arrays on the data recovery site:
while read i; do
   NAME=`echo $i | sed 's/.*name=//'|awk '{print $1}'|sed 's/.*://'`
   UUID=`echo $i | sed 's/.*UUID=//'|awk '{print $1}'`
   mdadm -AR "/dev/md/$NAME" -u $UUID -U no-bitmap
   echo "NAME =" $NAME ", UUID =" $UUID ", assembled."
done < <(mdadm -Es)