GPFS NVMesh

From Define Wiki
Revision as of 09:41, 5 June 2018 by Patrick (talk | contribs)
Jump to navigation Jump to search

This page covers the use of GPFS with NVMesh.

Setup

Making NVMesh volumes visible to GPFS

By default GPFS can only see block devices of certain types. Known disk types currently are:

  powerdisk  - EMC power path disk
  vpath      - IBM virtual path disk
  dmm        - Device-Mapper Multipath (DMM)
  dlmfdrv    - Hitachi dlm
  hdisk      - AIX hard disk
  lv         - AIX logical volume.  Historical usage only.
               Not allowed as a new device to mmcrnsd.
  gpt        - GPFS partition on Windows disk
  generic    - Device having no unique failover or multipathing
               characteristic (predominantly Linux devices).
  dasd       - DASD device (for Linux on z Systems)

To list all the currently known devices run the following command.

[root@excelero-a ~]# mmdevdiscover
sdb generic
sdb1 generic
sdb2 generic
sda generic
sda1 generic
sda2 generic
dm-0 dmm
dm-1 dmm
dm-2 dmm
dm-3 dmm
dm-4 dmm
dm-5 dmm

To use NVMesh block devices with GPFS an additional known disk type needs to be added to GPFS. The mmdevdiscover script has a built in function to run an arbitrary user script during execution. We use this to add NVMesh devices to the list of known drives. Below is a simple bash script that will find all attached NVMesh volumes and label them as generic Linux devices.

[root@excelero-a ~]# cat /var/mmfs/etc/nsddevices
#!/bin/bash

if [[ -d /dev/nvmesh ]]; then
        cd /dev && for dev in $(ls nvmesh/); do
                echo nvmesh/$dev generic
        done
fi

exit 1

Note that the file /var/mmfs/etc/nsddevices needs to be created on all systems (both servers and clients).

After adding /var/mmfs/etc/nsddevices, confirm that the NVMesh volumes are visible in the output of mmdevdiscover.

[root@excelero-a ~]# mmdevdiscover
nvmesh/nv01 generic
nvmesh/nv01p1 generic
nvmesh/nv02 generic
nvmesh/nv02p1 generic
nvmesh/nv03 generic
nvmesh/nv03p1 generic
nvmesh/nv04 generic
nvmesh/nv04p1 generic
.
.

NSD Creation

As the NVMesh block devices are available on all servers we can setup GPFS using a direct attached (share all) setup. In this configuration, all block devices used as NSDs appear as local devices on each server. This is the optimal configuration for GPFS, as it means that all network traffic occurs at a block level and so removes the need for GPFS to share devices.

To create the NSDs, create a stanza file that has an entry for each NSD. This only needs to be done on one server, GPFS will sync the configuration across the cluster.

Since all block devices are attached to each server and client, we do not need to specify any servers in the stanza file.

 
[root@excelero-a ~]# cat nsd.stanza
%nsd:
        nsd=nsd01
        device=/dev/nvmesh/nv01
        usage=dataAndMetadata
%nsd:
        nsd=nsd02
        device=/dev/nvmesh/nv02
        usage=dataAndMetadata
%nsd:
        nsd=nsd03
        device=/dev/nvmesh/nv03
        usage=dataAndMetadata
%nsd:
        nsd=nsd04
        device=/dev/nvmesh/nv04
        usage=dataAndMetadata

This stanza file specifies 4 NSDs to be created.

Create the NSDs using the mmcrnsd command. Once created confirm that the NSDs are mapped to each server and client, and that GPFS sees them as directly attached storage.

[root@excelero-a ~]# mmlsnsd -M

 Disk name    NSD volume ID      Device         Node name                Remarks
---------------------------------------------------------------------------------------
 nsd01        C0A800015B156123   /dev/nvmesh/nv01 dgx-1.admin
 nsd01        C0A800015B156123   /dev/nvmesh/nv01 excelero-a.admin
 nsd01        C0A800015B156123   /dev/nvmesh/nv01 excelero-b.admin
 nsd01        C0A800015B156123   /dev/nvmesh/nv01 excelero-c.admin
 nsd01        C0A800015B156123   /dev/nvmesh/nv01 excelero-d.admin
 nsd02        C0A800015B156124   /dev/nvmesh/nv02 dgx-1.admin
 nsd02        C0A800015B156124   /dev/nvmesh/nv02 excelero-a.admin
 nsd02        C0A800015B156124   /dev/nvmesh/nv02 excelero-b.admin
 nsd02        C0A800015B156124   /dev/nvmesh/nv02 excelero-c.admin
 nsd02        C0A800015B156124   /dev/nvmesh/nv02 excelero-d.admin
 nsd03        C0A800015B156125   /dev/nvmesh/nv03 dgx-1.admin
 nsd03        C0A800015B156125   /dev/nvmesh/nv03 excelero-a.admin
 nsd03        C0A800015B156125   /dev/nvmesh/nv03 excelero-b.admin
 nsd03        C0A800015B156125   /dev/nvmesh/nv03 excelero-c.admin
 nsd03        C0A800015B156125   /dev/nvmesh/nv03 excelero-d.admin
 nsd04        C0A800015B156126   /dev/nvmesh/nv04 dgx-1.admin
 nsd04        C0A800015B156126   /dev/nvmesh/nv04 excelero-a.admin
 nsd04        C0A800015B156126   /dev/nvmesh/nv04 excelero-b.admin
 nsd04        C0A800015B156126   /dev/nvmesh/nv04 excelero-c.admin
 nsd04        C0A800015B156126   /dev/nvmesh/nv04 excelero-d.admin

[root@excelero-a ~]# mmlsnsd -L

 File system   Disk name    NSD volume ID      NSD servers
---------------------------------------------------------------------------------------------
 gpfs1         nsd01        C0A800015B156123   (directly attached)
 gpfs1         nsd02        C0A800015B156124   (directly attached)
 gpfs1         nsd03        C0A800015B156125   (directly attached)
 gpfs1         nsd04        C0A800015B156126   (directly attached)

mmlsnsd -L should be checked on every system as a sanity check of the configuration.

File system creation

Once all NSDs are created, use mmcrfs command to create a file system. The minimum invocation of this command is of the form

mmcrfs fs_name -F nsd_stanza_file

It is worth reading the man page of mmcrfs to get an idea of what options are available as some of the options can not be changed after the file system is created. Some common options include

 
-A Auto-mount the file system when GPFS daemon starts 
-B File system block size 
-j Block allocation map (scatter is recommended for flash storage)
-m Default metadata replication factor
-M Default metadata replication factor
-r Default metadata replication factor
-R Default metadata replication factor
-n Estimated number of nodes that will mount the file system

A good baseline that has been shown to work is

mmcrfs gpfs1 -F nsd.stanza -A no -B 4m -D posix -j scatter -m 1 -M 1 -r 1 -R 1 -n 1 -E no -k posix -S yes

After the file system is created mount it on all servers using the mmmount command.

# Replace gpfs1 with the name of the file system
mmmount gpfs1 -a

If at this point any client fails to mount the filesystem and reports a stale file handle, then it is most likely due to that client no recognising the NVMesh volume as a valid target. Recheck that the /var/mmfs/etc/nsddevices script was added to the failing client and that its contents are correct. Check the output of mmdevdiscover on the client to confirm that the block devices are visible, and try remounting the GPFS file system using

mmmount gpfs1

locally.

Optimisations and Performance Tuning