Lustre SFF
Introduction
Lustre SFF (Small Form Factor), is a compact deployment of ZFS-backed (Zettabyte File System) Lustre intended as an alternative to NFS for a comparable capacity and scalability.
http://lustre.ornl.gov/ecosystem-2016/documents/tutorials/Stearman-LLNL-ZFS.pdf
Intel Enterprise Edition for Lustre White Paper
The Intel January 2014 white paper "Architecting a high performance storage system" serves as a good starting point for optimizing Lustre SFF.
Backend Storage
smartctl
smartctl (smartmontools; Self-Monitoring, Analysis and Reporting Technology System) is used to uniquely identify devices, conduct device testing, and assess the health of devices.
sgpdd_survey
sgpdd-survey (sg3_utils{,-libs} and Lustre iokit https://downloads.hpdd.intel.com/public/lustre ) is used to analyze backend storage (dd is not suitable as response to multiple IO threads is of interest).
- rszlo-rszhi
- record size in KB
- Affects how many blocks can be transferred in each transaction. Simulates Lustre RPC size.
- crglo-crghi
- number of regions
- Simulates multiple Lustre clients per OST. More regions requires more seeking and hence lower performance.
- thrlo-thrhi
- number of threads
- Simulates OSS threads.
- size
- total size in MB
- blocksize
- 512 B
- Default size is 8 GB and blocksize is 512 B but 32 GB (or 2x system memory) and 1 MB blocksize recommended to simulate Lustre sequential workload.
Recommended parameters: rszhi=1024, thrhi=16, crghi=16, size=32768 (or twice RAM), dio=1, oflag=direct, iflag=direct bs=1048576
obdfilter-survey
- case
- local-disk, network-echo, network-disk
- Run survey on disk-backed local obdfilter instances, network loopback or disk instances.
- thrlo-thrhi
- Number of threads
- nobjlo-nobjhi
- Number of objects to read/write.
- rszlo-rszhi
- Record size in KB.
- size
- Total IO size in MB.
- targets
- Names of obdfilter instances.
Recommended parameters: rszlo=rszhi=1024, nobjhi=128, thrhi=128
http://wiki.lustre.org/images/4/40/Wednesday_shpc-2009-benchmarking.pdf
Manually Installing Production Intel Enterprise Edition for Lustre
cd ee-3*
./create_installer zfs
tar xzvpf lustre-zfs*.tar.gz
cd lustre-zfs
./install # Takes some time as custom modules are compiled against kernel.
reboot # Would use new kernel if a new one is installed but not applicable for ZFS.
modprobe spl # Implements Solaris kernel compatibility interfaces.
modprobe zfsOfficial documentation points to http://build.hpdd.intel.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml for installing manually.
A mirror is created to be used as a combined MGS/MDT file system following "10.1 Configuring a simple Lustre file system".
zpool create -f mgt mirror /dev/disk/by-id/wwn-0x5001* # Replace device names as required.
zpool create -f ost0001 raidz2 /dev/disk/by-id/wwn-0x5000c5002* # Replace device names as required.
zpool create -f ost0002 raidz2 /dev/disk/by-id/wwn-0x5000c5005* # Replace device names as required.Installing Non-Production Test Lustre (ZFS provided)
This will conflict with IEEL.
yum install epel-release # Provides DKMS.
gpg --quiet --with-fingerprint /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
# pub 4096R/352C64E5 2013-12-16 Fedora EPEL (7) <epel@fedoraproject.org>
# Key fingerprint = 91E9 7D7C 4A5E 96F1 7F3E 888F 6A2F AEA2 352C 64E5
yum install http://download.zfsonlinux.org/epel/zfs-release$(rpm -E %dist).noarch.rpm
gpg --quiet --with-fingerprint /etc/pki/rpm-gpg/RPM-GPG-KEY-zfsonlinux
# pub 2048R/F14AB620 2013-03-21 ZFS on Linux <zfs@zfsonlinux.org>
# Key fingerprint = C93A FFFD 9F3F 7B03 C310 CEB6 A9D5 A1C0 F14A B620
# sub 2048R/99685629 2013-03-21https://github.com/zfsonlinux/zfs/wiki/RHEL-%26-CentOS
yum install lustre-dkms-* lustre-osd-zfs-mount* # Downloaded from HPDD.Installing ZFS
http://lustre.ornl.gov/ecosystem-2016/documents/tutorials/Stearman-LLNL-ZFS.pdf
yum localinstall --nogpgcheck http://archive.zfsonlinux.org/epel/zfs-release.el7.noarch.rpm
yum install kernel-devel zfsZFS Pools
ZFS Best Practices: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practics_Guide
Example pool creation:
zpool create scratchZ -o cachefile=none -o ashift=12 -O recordsize=1M -f $(lsscsi -i | grep ST1 | awk '{printf " /dev/disk/by-id/scsi-"$7 }')