Difference between revisions of "GPFS NVMesh"
(Created page with "This page covers the use of GPFS with NVMesh. = Setup = === Making NVMesh volumes visible to GPFS === By default GPFS can only see block devices of certain types. Known di...") |
|||
| Line 141: | Line 141: | ||
=== File system creation === | === File system creation === | ||
| − | Once all NSDs are created, use mmcrfs command to create a file system. The minimum | + | Once all NSDs are created, use mmcrfs command to create a file system. The minimum invocation of this command is of the form |
<syntaxhighlight>mmcrfs fs_name -F nsd_stanza_file</syntaxhighlight> | <syntaxhighlight>mmcrfs fs_name -F nsd_stanza_file</syntaxhighlight> | ||
It is worth reading the man page of mmcrfs to get an idea of what options are available as some of the options can not be changed after the file system is created. Some common options include | It is worth reading the man page of mmcrfs to get an idea of what options are available as some of the options can not be changed after the file system is created. Some common options include | ||
| Line 148: | Line 148: | ||
-A Auto-mount the file system when GPFS daemon starts | -A Auto-mount the file system when GPFS daemon starts | ||
-B File system block size | -B File system block size | ||
| − | -j Block allocation map | + | -j Block allocation map (scatter is recommended for flash storage) |
-m Default metadata replication factor | -m Default metadata replication factor | ||
-M Default metadata replication factor | -M Default metadata replication factor | ||
| Line 156: | Line 156: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
| − | A | + | A good baseline that has been shown to work is |
| + | <syntaxhighlight> | ||
| + | mmcrfs gpfs1 -F nsd.stanza -A no -B 4m -D posix -j scatter -m 1 -M 1 -r 1 -R 1 -n 1 -E no -k posix -S yes | ||
| + | </syntaxhighlight> | ||
| + | After the file system is created mount it on all servers using the mmmount command. | ||
| + | <syntaxhighlight> | ||
| + | # Replace gpfs1 with the name of the file system | ||
| + | mmmount gpfs1 -a | ||
</syntaxhighlight> | </syntaxhighlight> | ||
| + | |||
| + | If at this point any client fails to mount the filesystem and reports a stale file handle, then it is most likely due to that client no recognising the NVMesh volume as a valid target. Recheck that the /var/mmfs/etc/nsddevices script was added to the failing client and that its contents are correct. Check the output of mmdevdiscover on the client to confirm that the block devices are visible, and try remounting the GPFS file system using | ||
| + | <syntaxhighlight>mmmount gpfs1</syntaxhighlight> | ||
| + | locally. | ||
| + | |||
| + | = Optimisations and Performance Tuning = | ||
Revision as of 09:41, 5 June 2018
This page covers the use of GPFS with NVMesh.
Setup
Making NVMesh volumes visible to GPFS
By default GPFS can only see block devices of certain types. Known disk types currently are:
powerdisk - EMC power path disk
vpath - IBM virtual path disk
dmm - Device-Mapper Multipath (DMM)
dlmfdrv - Hitachi dlm
hdisk - AIX hard disk
lv - AIX logical volume. Historical usage only.
Not allowed as a new device to mmcrnsd.
gpt - GPFS partition on Windows disk
generic - Device having no unique failover or multipathing
characteristic (predominantly Linux devices).
dasd - DASD device (for Linux on z Systems)
To list all the currently known devices run the following command.
[root@excelero-a ~]# mmdevdiscover
sdb generic
sdb1 generic
sdb2 generic
sda generic
sda1 generic
sda2 generic
dm-0 dmm
dm-1 dmm
dm-2 dmm
dm-3 dmm
dm-4 dmm
dm-5 dmmTo use NVMesh block devices with GPFS an additional known disk type needs to be added to GPFS. The mmdevdiscover script has a built in function to run an arbitrary user script during execution. We use this to add NVMesh devices to the list of known drives. Below is a simple bash script that will find all attached NVMesh volumes and label them as generic Linux devices.
[root@excelero-a ~]# cat /var/mmfs/etc/nsddevices
#!/bin/bash
if [[ -d /dev/nvmesh ]]; then
cd /dev && for dev in $(ls nvmesh/); do
echo nvmesh/$dev generic
done
fi
exit 1Note that the file /var/mmfs/etc/nsddevices needs to be created on all systems (both servers and clients).
After adding /var/mmfs/etc/nsddevices, confirm that the NVMesh volumes are visible in the output of mmdevdiscover.
[root@excelero-a ~]# mmdevdiscover
nvmesh/nv01 generic
nvmesh/nv01p1 generic
nvmesh/nv02 generic
nvmesh/nv02p1 generic
nvmesh/nv03 generic
nvmesh/nv03p1 generic
nvmesh/nv04 generic
nvmesh/nv04p1 generic
.
.NSD Creation
As the NVMesh block devices are available on all servers we can setup GPFS using a direct attached (share all) setup. In this configuration, all block devices used as NSDs appear as local devices on each server. This is the optimal configuration for GPFS, as it means that all network traffic occurs at a block level and so removes the need for GPFS to share devices.
To create the NSDs, create a stanza file that has an entry for each NSD. This only needs to be done on one server, GPFS will sync the configuration across the cluster.
Since all block devices are attached to each server and client, we do not need to specify any servers in the stanza file.
[root@excelero-a ~]# cat nsd.stanza
%nsd:
nsd=nsd01
device=/dev/nvmesh/nv01
usage=dataAndMetadata
%nsd:
nsd=nsd02
device=/dev/nvmesh/nv02
usage=dataAndMetadata
%nsd:
nsd=nsd03
device=/dev/nvmesh/nv03
usage=dataAndMetadata
%nsd:
nsd=nsd04
device=/dev/nvmesh/nv04
usage=dataAndMetadataThis stanza file specifies 4 NSDs to be created.
Create the NSDs using the mmcrnsd command. Once created confirm that the NSDs are mapped to each server and client, and that GPFS sees them as directly attached storage.
[root@excelero-a ~]# mmlsnsd -M
Disk name NSD volume ID Device Node name Remarks
---------------------------------------------------------------------------------------
nsd01 C0A800015B156123 /dev/nvmesh/nv01 dgx-1.admin
nsd01 C0A800015B156123 /dev/nvmesh/nv01 excelero-a.admin
nsd01 C0A800015B156123 /dev/nvmesh/nv01 excelero-b.admin
nsd01 C0A800015B156123 /dev/nvmesh/nv01 excelero-c.admin
nsd01 C0A800015B156123 /dev/nvmesh/nv01 excelero-d.admin
nsd02 C0A800015B156124 /dev/nvmesh/nv02 dgx-1.admin
nsd02 C0A800015B156124 /dev/nvmesh/nv02 excelero-a.admin
nsd02 C0A800015B156124 /dev/nvmesh/nv02 excelero-b.admin
nsd02 C0A800015B156124 /dev/nvmesh/nv02 excelero-c.admin
nsd02 C0A800015B156124 /dev/nvmesh/nv02 excelero-d.admin
nsd03 C0A800015B156125 /dev/nvmesh/nv03 dgx-1.admin
nsd03 C0A800015B156125 /dev/nvmesh/nv03 excelero-a.admin
nsd03 C0A800015B156125 /dev/nvmesh/nv03 excelero-b.admin
nsd03 C0A800015B156125 /dev/nvmesh/nv03 excelero-c.admin
nsd03 C0A800015B156125 /dev/nvmesh/nv03 excelero-d.admin
nsd04 C0A800015B156126 /dev/nvmesh/nv04 dgx-1.admin
nsd04 C0A800015B156126 /dev/nvmesh/nv04 excelero-a.admin
nsd04 C0A800015B156126 /dev/nvmesh/nv04 excelero-b.admin
nsd04 C0A800015B156126 /dev/nvmesh/nv04 excelero-c.admin
nsd04 C0A800015B156126 /dev/nvmesh/nv04 excelero-d.admin
[root@excelero-a ~]# mmlsnsd -L
File system Disk name NSD volume ID NSD servers
---------------------------------------------------------------------------------------------
gpfs1 nsd01 C0A800015B156123 (directly attached)
gpfs1 nsd02 C0A800015B156124 (directly attached)
gpfs1 nsd03 C0A800015B156125 (directly attached)
gpfs1 nsd04 C0A800015B156126 (directly attached)mmlsnsd -L should be checked on every system as a sanity check of the configuration.
File system creation
Once all NSDs are created, use mmcrfs command to create a file system. The minimum invocation of this command is of the form
mmcrfs fs_name -F nsd_stanza_fileIt is worth reading the man page of mmcrfs to get an idea of what options are available as some of the options can not be changed after the file system is created. Some common options include
-A Auto-mount the file system when GPFS daemon starts
-B File system block size
-j Block allocation map (scatter is recommended for flash storage)
-m Default metadata replication factor
-M Default metadata replication factor
-r Default metadata replication factor
-R Default metadata replication factor
-n Estimated number of nodes that will mount the file systemA good baseline that has been shown to work is
mmcrfs gpfs1 -F nsd.stanza -A no -B 4m -D posix -j scatter -m 1 -M 1 -r 1 -R 1 -n 1 -E no -k posix -S yesAfter the file system is created mount it on all servers using the mmmount command.
# Replace gpfs1 with the name of the file system
mmmount gpfs1 -aIf at this point any client fails to mount the filesystem and reports a stale file handle, then it is most likely due to that client no recognising the NVMesh volume as a valid target. Recheck that the /var/mmfs/etc/nsddevices script was added to the failing client and that its contents are correct. Check the output of mmdevdiscover on the client to confirm that the block devices are visible, and try remounting the GPFS file system using
mmmount gpfs1locally.