17 NVMe-oF #
This chapter describes how to set up an NVMe over Fabrics host and target.
17.1 Overview #
NVM Express® (NVMe®) is an interface standard for accessing non-volatile storage, commonly SSD disks. NVMe supports much higher speeds and has a lower latency than SATA.
NVMe-oF™ is an architecture to access NVMe storage over different networking fabrics—for example, RDMA, TCP, or NVMe over Fibre Channel (FC-NVMe). The role of NVMe-oF is similar to iSCSI. To increase the fault-tolerance, NVMe-oF has a built-in support for multipathing. The NVMe-oF multipathing is not based on the traditional DM-Multipathing.
The NVMe host is the machine that connects to an NVMe target. The NVMe target is the machine that shares its NVMe block devices.
NVMe is supported on SUSE Linux Enterprise Server 15 SP7. There are Kernel modules available for the NVMe block storage and NVMe-oF target and host.
To see if your hardware requires any special consideration, refer to Section 17.4, “Special hardware configuration”.
17.2 Setting up an NVMe-oF host #
To use NVMe-oF, a target must be available with one of the supported networking methods. Supported are NVMe over Fibre Channel, TCP, and RDMA. The following sections describe how to connect a host to an NVMe target.
17.2.1 Installing command line client #
        To use NVMe-oF, you need the nvme command line
        tool. Install it with zypper:
      
>sudozypper in nvme-cli
        Use nvme --help to list all available subcommands.
        Man pages are available for nvme subcommands.
        Consult them by executing man
        nvme-SUBCOMMAND. For example, to
        view the man page for the discover subcommand, execute
        man nvme-discover.
      
17.2.2 Discovering NVMe-oF targets #
To list available NVMe subsystems on the NVMe-oF target, you need the discovery controller address and service ID.
>sudonvme discover -t TRANSPORT -a DISCOVERY_CONTROLLER_ADDRESS -s SERVICE_ID
        Replace TRANSPORT with the underlying
        transport medium: loop, rdma,
        tcp, or fc. Replace
        DISCOVERY_CONTROLLER_ADDRESS with the
        address of the discovery controller. For RDMA and TCP, this should be
        an IPv4 address. Replace SERVICE_ID with the
        transport service ID. If the service is IP based, like RDMA or TCP,
        service ID specifies the port number. For Fibre Channel, the service ID
        is not required.
      
The NVMe hosts only see the subsystems they are allowed to connect to.
Example:
>sudonvme discover -t tcp -a 10.0.0.3 -s 4420
For the FC, the example looks as follows:
>sudonvme discover --transport=fc \ --traddr=nn-0x201700a09890f5bf:pn-0x201900a09890f5bf \ --host-traddr=nn-0x200000109b579ef6:pn-0x100000109b579ef6
        For more details, see man nvme-discover.
      
17.2.3 Connecting to NVMe-oF targets #
        After you have identified the NVMe subsystem, you can connect it with
        the nvme connect command.
      
>sudonvme connect -t transport -a DISCOVERY_CONTROLLER_ADDRESS -s SERVICE_ID -n SUBSYSTEM_NQN
        Replace TRANSPORT with the underlying
        transport medium: loop, rdma,
        tcp or fc. Replace
        DISCOVERY_CONTROLLER_ADDRESS with the
        address of the discovery controller. For RDMA and TCP this should be an
        IPv4 address. Replace SERVICE_ID with the
        transport service ID. If the service is IP based, like RDMA or TCP,
        this specifies the port number. Replace
        SUBSYSTEM_NQN with the NVMe qualified name
        of the desired subsystem as found by the discovery command.
        NQN is the abbreviation for  NVMe
        Qualified Name. The NQN must be unique.
      
Example:
>sudonvme connect -t tcp -a 10.0.0.3 -s 4420 -n nqn.2014-08.com.example:nvme:nvm-subsystem-sn-d78432
For the FC, the example looks as follows:
>sudonvme connect --transport=fc \ --traddr=nn-0x201700a09890f5bf:pn-0x201900a09890f5bf \ --host-traddr=nn-0x200000109b579ef6:pn-0x100000109b579ef6 \ --nqn=nqn.2014-08.org.nvmexpress:uuid:1a9e23dd-466e-45ca-9f43-a29aaf47cb21
        Alternatively, use nvme connect-all to connect to
        all discovered namespaces. For advanced usage, see man
        nvme-connect and man nvme-connect-all.
      
        In case of a path loss, the NVMe subsystem tries to reconnect for a
        time period, defined by the ctrl-loss-tmo option of
        the nvme connect command. After this time (default
        value is 600s), the path is removed and the upper layers of the block
        layer (file system) are notified. By default, the file system is then
        mounted read-only, which usually is not the expected behavior.
        Therefore, it is recommended to set the
        ctrl-loss-tmo option so that the NVMe subsystem
        keeps trying to reconnect without a limit. To do so, run the following
        command:
      
>sudonvme connect --ctrl-loss-tmo=-1
        To make an NVMe over Fabrics subsystem available at boot, create a
        /etc/nvme/discovery.conf file on the host with the
        parameters passed to the discover command (as
        described in
        Section 17.2.2, “Discovering NVMe-oF targets”. For
        example, if you use the discover command as follows:
      
>sudonvme discover -t tcp -a 10.0.0.3 -s 4420
        Add the parameters of the discover command to the
        /etc/nvme/discovery.conf file:
      
echo "-t tcp -a 10.0.0.3 -s 4420" | sudo tee -a /etc/nvme/discovery.conf
Then enable the service:
>sudosystemctl enable nvmf-autoconnect.service
17.2.4 Multipathing #
        NVMe native multipathing is enabled by default. If the
        CMIC option in the controller identity settings is
        set, the NVMe stack recognizes an NVME drive as a multipathed device by
        default.
      
To manage the multipathing, you can use the following:
- nvme list-subsys
- Prints the layout of the multipath devices. 
- multipath -ll
- The command has a compatibility mode and displays NVMe multipath devices. Bear in mind that you need to enable the - enable_foreignoption to use the command. For details, refer to Section 18.13, “Miscellaneous options”.
- nvme-core.multipath=N
- When the option is added as a boot parameter, the NVMe native multipathing will be disabled. 
17.3 Setting up an NVMe-oF target #
17.3.1 Installing command line client #
        To configure an NVMe-oF target, you need the
        nvmetcli command line tool. Install it with
        zypper:
      
>sudozypper in nvmetcli
        The current documentation for nvmetcli is available
        at
        https://git.infradead.org/users/hch/nvmetcli.git/blob_plain/HEAD:/Documentation/nvmetcli.txt.
      
17.3.2 Configuration steps #
The following procedure provides an example of how to set up an NVMe-oF target.
        The configuration is stored in a tree structure. Use the command
        cd to navigate. Use ls to list
        objects. You can create new objects with create.
      
- Start the - nvmetcliinteractive shell:- >- sudo- nvmetcli
- Create a new port: - (nvmetcli)>- cd ports- (nvmetcli)>- create 1- (nvmetcli)>- ls 1/o- 1 o- referrals o- subsystems
- Create an NVMe subsystem: - (nvmetcli)>- cd /subsystems- (nvmetcli)>- create nqn.2014-08.org.nvmexpress:NVMf:uuid:c36f2c23-354d-416c-95de-f2b8ec353a82- (nvmetcli)>- cd nqn.2014-08.org.nvmexpress:NVMf:uuid:c36f2c23-354d-416c-95de-f2b8ec353a82/- (nvmetcli)>- lso- nqn.2014-08.org.nvmexpress:NVMf:uuid:c36f2c23-354d-416c-95de-f2b8ec353a82 o- allowed_hosts o- namespaces
- Create a new namespace and set an NVMe device to it: - (nvmetcli)>- cd namespaces- (nvmetcli)>- create 1- (nvmetcli)>- cd 1- (nvmetcli)>- set device path=/dev/nvme0n1Parameter path is now '/dev/nvme0n1'.
- Enable the previously created namespace: - (nvmetcli)>- cd ..- (nvmetcli)>- enableThe Namespace has been enabled.
- Display the created namespace: - (nvmetcli)>- cd ..- (nvmetcli)>- lso- nqn.2014-08.org.nvmexpress:NVMf:uuid:c36f2c23-354d-416c-95de-f2b8ec353a82 o- allowed_hosts o- namespaces o- 1
- Allow all hosts to use the subsystem. Only do this in secure environments. - (nvmetcli)>- set attr allow_any_host=1Parameter allow_any_host is now '1'.- Alternatively, you can allow only specific hosts to connect: - (nvmetcli)>- cd nqn.2014-08.org.nvmexpress:NVMf:uuid:c36f2c23-354d-416c-95de-f2b8ec353a82/allowed_hosts/- (nvmetcli)>- create hostnqn
- List all created objects: - (nvmetcli)>- cd /- (nvmetcli)>- lso- / o- hosts o- ports | o- 1 | o- referrals | o- subsystems o- subsystems o- nqn.2014-08.org.nvmexpress:NVMf:uuid:c36f2c23-354d-416c-95de-f2b8ec353a82 o- allowed_hosts o- namespaces o- 1
- Make the target available via TCP. Use - trtype=rdmafor RDMA:- (nvmetcli)>- cd ports/1/- (nvmetcli)>- set addr adrfam=ipv4 trtype=tcp traddr=10.0.0.3 trsvcid=4420Parameter trtype is now 'tcp'. Parameter adrfam is now 'ipv4'. Parameter trsvcid is now '4420'. Parameter traddr is now '10.0.0.3'.- Alternatively, you can make it available with Fibre Channel: - (nvmetcli)>- cd ports/1/- (nvmetcli)>- set addr adrfam=fc trtype=fc traddr=nn-0x1000000044001123:pn-0x2000000055001123 trsvcid=none
- Link the subsystem to the port: - (nvmetcli)>- cd /ports/1/subsystems- (nvmetcli)>- create nqn.2014-08.org.nvmexpress:NVMf:uuid:c36f2c23-354d-416c-95de-f2b8ec353a82- Now you can verify that the port is enabled using - dmesg:- #dmesg ... [ 257.872084] nvmet_tcp: enabling port 1 (10.0.0.3:4420)
17.3.3 Back up and restore target configuration #
You can save the target configuration in a JSON file with the following commands:
>sudonvmetcli(nvmetcli)>saveconfig nvme-target-backup.json
To restore the configuration, use:
(nvmetcli)>restore nvme-target-backup.json
You can also wipe the current configuration:
(nvmetcli)>clear
17.4 Special hardware configuration #
17.4.1 Overview #
Some hardware needs special configuration to work correctly. Skim the titles of the following sections to see if you are using any of the mentioned devices or vendors.
17.4.2 Broadcom #
        If you are using the Broadcom Emulex LightPulse Fibre Channel
        SCSI driver, add a Kernel configuration parameter on the
        target and host for the lpfc module:
      
>sudoecho "options lpfc lpfc_enable_fc4_type=3" > /etc/modprobe.d/lpfc.conf
Make sure that the Broadcom adapter firmware has at least version 11.4.204.33. Also make sure that you have the current versions of nvme-cli, nvmetcli and the Kernel installed.
        To enable a Fibre Channel port as an NVMe target, an additional
        module parameter needs to be configured:
        lpfc_enable_nvmet=
        COMMA_SEPARATED_WWPNS. Enter the WWPN with a
        leading 0x, for example
        lpfc_enable_nvmet=0x2000000055001122,0x2000000055003344.
        Only listed WWPNs will be configured for target mode. A Fibre Channel
        port can either be configured as target or as
        initiator.
      
17.4.3 Marvell #
FC-NVMe is supported on QLE269x, QLE27xx, and QLE28xx adapters. FC-NVMe support is enabled by default in the Marvell® QLogic® QLA2xxx Fibre Channel driver.
To confirm NVMe is enabled, run the following command:
        > cat /sys/module/qla2xxx/parameters/ql2xnvmeenable
        A resulting 1 means NVMe is enabled, a
        0 indicates it is disabled.
      
Next, ensure that the Marvell adapter firmware is at least version 8.08.204 by checking the output of the following command:
        > cat /sys/class/scsi_host/host0/fw_versionLast, ensure that the latest versions available for SUSE Linux Enterprise Server of nvme-cli, QConvergeConsoleCLI, and the Kernel are installed. You can check for updates and patches by running:
        # zypper lu && zypper pchkFor more details on FC-NVMe, configuring NVMe over FC BFS or FPIN Link Integrity Marginal Path Detection Support, refer to the relevant sections in the following Marvell user guides:
- Marvell® QLogic® Fibre Channel Adapters 2800 Series User Guide (part number MA2854601-00) at https://www.marvell.com/content/dam/marvell/en/public-collateral/fibre-channel/marvell-fibre-channel-adapters-qlogic-series-2800-user-guide.pdf 
- Marvell® QLogic® Fibre Channel Adapters 2700 Series User Guide (part number 83270-546-00) at https://www.marvell.com/content/dam/marvell/en/public-collateral/fibre-channel/marvell-fibre-channel-adapters-qlogic-series-2700-user-guide.pdf 
- Marvell® QLogic® Fibre Channel Adapters 2600 Series User Guide (part number FC0054609-00) at https://www.marvell.com/content/dam/marvell/en/public-collateral/fibre-channel/marvell-fibre-channel-adapters-qlogic-series-2600-user-guide.pdf 
- User’s Guide—UEFI Human Interface Infrastructure, 2690 Series 16GFC, 2740 / 2760 / 2770 Series 32GFC, 2870 Series 64GFC Fibre Channel Adapters (part number BK3254602-00) at https://www.marvell.com/content/dam/marvell/en/public-collateral/fibre-channel/marvell-fibre-channel-adapters-qlogic-series-2690-2740-2760-2770-uefi-human-interface-infrastructure-user-guide.pdf 
17.4.3.1 NVMe over Fibre Channel Boot From SAN Setup #
NVMe over Fibre Channel Boot From SAN (FC-NVMe BFS) is currently supported natively from SLES 15 SP4, on systems with UEFI firmware capable of booting from NVMe over Fabrics.
FC-NVMe BFS is only supported on systems with UEFI firmware that can boot from NVMe over FC and not supported through legacy BIOS firmware.
Custom Host NQN configured in a pre-boot environment for FC-NVMe BFS is not supported. Instead you need the default Host NQN string for FC-NVMe BFS installation.
Once you have configured the UEFI driver using the BIOS setup menu for FC-NVMe boot, and configured the NVMe storage with the Initiator Host NQN, there are no more FC-NVMe BFS steps. You can now detect the NVMe storage during the install of SLES 15 SP4, or later.
17.4.3.2 FPIN Link Integrity Marginal Path Detection Support #
DM Multipath FPIN link integrity (FPIN-LI) marginal path detection is supported from SLES 15 SP4 with Brocade and Cisco fabric switches.
Refer to Marvell FC Adapter, Brocade FOS and Cisco NX-OS documentation for more information on the fabric notification functionality provided by the HBA and switch.
17.4.3.3 Multipath configuration file changes from SLES 15 SP4 onwards #
          Enable the attribute marginal_pathgroups in the
          /etc/multipath.conf file. Adding, for example:
          marginal_pathgroups 1
        
marginal_pathgroups attribute
            If the marginal_pathgroups attribute is not
            enabled, marginal path detection will not work for the Fabric
            Notification event.
          
          See Section 18.8, “Multipath configuration” for more information
          about marginal path setting in multipath.conf.
        
17.5 Booting from NVMe-oF over TCP #
SLES supports booting from NVMe-oF over TCP according to the NVM Express® Boot Specification 1.0.
The UEFI pre-boot environment can be configured to attempt NVMe-oF over TCP connections to remote storage servers and use these for booting. The pre-boot environment creates an ACPI table—NVMe Boot Firmware Table (NBFT) to store information about the NVMe-oF configuration used for booting. The operating system uses this table at a later boot stage to set up networking and NVMe-oF connections to access the root file system.
17.5.1 System requirements #
To boot the system from NVMe-oF over TCP, the following requirements must be met:
- SLES15 SP7 or later. 
- A SAN storage array supporting NVMe-oF over TCP 
- A host system with a BIOS that supports booting from NVMe-oF over TCP. Contact your hardware vendor for information about support for this feature. Booting from NVMe-oF over TCP is currently only supported on UEFI platforms. 
17.5.2 Installation #
To install SLES from NVMe-oF over TCP, proceed as follows:
- Use the host system's UEFI setup menus to configure NVMe-oF connections to be established at boot time. Typically, you need to configure both networking (local IP addresses, gateways, etc.) and NVMe-oF targets (remote IP address, subsystem NQN or discovery NQN). Refer to the hardware documentation for the configuration description. Your hardware vendor may provide means to manage the BIOS configuration centrally and remotely. Please contact your hardware vendor for additional information. 
- Prepare the installation as described in Book “Deployment Guide”. 
- Start the system installation using any supported installation method. You do not need to use any specific boot parameters to enable installation on NVMe-oF over TCP. 
- If the BIOS has been configured correctly, the disk partitioning dialog in YaST will show NVMe namespaces exported by the subsystems configured in the BIOS. They will be displayed as NVMe devices, where the - tcpstring indicates that the devices are connected via the TCP transport. Install the operating system (in particular the EFI boot partition and the root file system) on these namespaces.
- Complete the installation. 
After installation, the system should boot from NVMe-oF over TCP automatically. If it does not, check if the boot priority is set correctly in the BIOS setup.
        The network interfaces used for booting are named
        nbft0, nbft1 and so on. To get
        information about the NVMe-oF boot, run the command:
      
#nvme nbft show
17.6 More information #
      For more details about the abilities of the nvme
      command, refer to nvme nvme-help.
    
The following links provide a basic introduction to NVMe and NVMe-oF: