Difference between revisions of "Lustre: Using lustre-iokit"

Revision as of 14:58, 24 March 2015

Lustre IO Kit is available from: http://downloads.lustre.org/public/tools/lustre-iokit/

Instructions below are based on version 1.2, tests were conducted on standard compute nodes running over 1GB links. Disks are capable of ~130MB/s RW. One MDT and Four OSS, one OST per OSS.

Lustre-iokit requires the following packages to be installed (part of centos repo in Platform HPC)

sg3_utils
sg3_utils-libs

Download/Build lustre-iokit

# Get src RPM
wget http://downloads.lustre.org/public/tools/lustre-iokit/lustre-iokit-1.2-200709210921.src.rpm

# Install src RPM
rpm -ivh lustre-iokit-1.2-200709210921.src.rpm

# Build binary RPM package
rpmbuild -bb /usr/src/redhat/SPECS/lustre-iokit.spec

# Install built package
rpm -ivh  /usr/src/redhat/RPMS/noarch/lustre-iokit-1.2-200709210921.noarch.rpm

lustre-iokit Contents

The following files are provided as part of the package

[root@atass ~]$ rpm -qpl /usr/src/redhat/RPMS/noarch/lustre-iokit-1.2-200709210921.noarch.rpm 
/usr/bin/config.sh
/usr/bin/gather_stats_everywhere.sh
/usr/bin/ior-survey
/usr/bin/libecho
/usr/bin/lstats.sh
/usr/bin/obdfilter-survey
/usr/bin/ost-survey
/usr/bin/parse-ior
/usr/bin/plot-obdfilter
/usr/bin/plot-ost
/usr/bin/plot-sgpdd
/usr/bin/sgpdd-survey
/usr/share/doc/lustre-iokit-1.2
/usr/share/doc/lustre-iokit-1.2/README.ior-survey
/usr/share/doc/lustre-iokit-1.2/README.lstats.sh
/usr/share/doc/lustre-iokit-1.2/README.obdfilter-survey
/usr/share/doc/lustre-iokit-1.2/README.ost-survey
/usr/share/doc/lustre-iokit-1.2/README.sgpdd-survey

The toolkit contains the following tests:

sgpdd_survey
obdfilter_survey
ost_survey
stats-collect

sgpdd-survey

This is a bare metal dd performance test and can be run on the OST at block level. Running sgp_dd will ERASE the contents of the disk devices. This is NOT to be run on any OST where you care about any data!

Note requires sg module to have been loaded and output from sg_map must show the device

[root@atass07 ~]$ scsidevs=/dev/sda size=128 crghi=16 thrhi=21 /usr/bin/sgpdd-survey 
Wed Aug 17 17:37:25 BST 2011 sgpdd-survey on /dev/sda from atass07
total_size   131072K rsz 1024 crg     1 thr     1 write  135.23 MB/s     1 x 135.98 =  135.98 MB/s read  135.35 MB/s     1 x 136.08 =  136.08 MB/s 
total_size   131072K rsz 1024 crg     1 thr     2 write  140.37 MB/s     1 x 141.12 =  141.12 MB/s read  131.26 MB/s     1 x 131.92 =  131.92 MB/s 
total_size   131072K rsz 1024 crg     1 thr     4 write  138.47 MB/s     1 x 139.25 =  139.25 MB/s read  131.55 MB/s     1 x 132.27 =  132.27 MB/s 
total_size   131072K rsz 1024 crg     1 thr     8 write  139.05 MB/s     1 x 139.86 =  139.86 MB/s read  132.54 MB/s     1 x 133.28 =  133.28 MB/s 
total_size   131072K rsz 1024 crg     1 thr    16 write  135.06 MB/s     1 x 135.78 =  135.78 MB/s read  131.55 MB/s     1 x 132.24 =  132.24 MB/s 
total_size   131072K rsz 1024 crg     2 thr     2 write  124.71 MB/s     2 x  62.69 =  125.39 MB/s read  120.18 MB/s     2 x  60.41 =  120.81 MB/s 
total_size   131072K rsz 1024 crg     2 thr     4 write  116.48 MB/s     2 x  58.53 =  117.05 MB/s read  126.02 MB/s     2 x  63.33 =  126.67 MB/s 
total_size   131072K rsz 1024 crg     2 thr     8 write  122.01 MB/s     2 x  61.32 =  122.64 MB/s read  126.48 MB/s     2 x  63.57 =  127.14 MB/s 
total_size   131072K rsz 1024 crg     2 thr    16 write  120.14 MB/s     2 x  60.38 =  120.75 MB/s read  126.68 MB/s     2 x  63.66 =  127.32 MB/s 
total_size   131072K rsz 1024 crg     4 thr     4 write  108.01 MB/s     4 x  27.13 =  108.53 MB/s read  126.74 MB/s     4 x  31.85 =  127.41 MB/s 
total_size   131072K rsz 1024 crg     4 thr     8 write  110.71 MB/s     4 x  27.84 =  111.35 MB/s read  120.44 MB/s     4 x  30.28 =  121.12 MB/s 
total_size   131072K rsz 1024 crg     4 thr    16 write  126.18 MB/s     4 x  31.75 =  126.99 MB/s read  134.38 MB/s     4 x  33.79 =  135.15 MB/s 
total_size   131072K rsz 1024 crg     8 thr     8 write  122.43 MB/s     8 x  15.42 =  123.37 MB/s read  131.48 MB/s     8 x  16.57 =  132.52 MB/s 
total_size   131072K rsz 1024 crg     8 thr    16 write  133.87 MB/s     8 x  16.86 =  134.89 MB/s read  144.50 MB/s     8 x  18.17 =  145.34 MB/s 
total_size   131072K rsz 1024 crg    16 thr    16 write  131.38 MB/s    16 x   8.30 =  132.75 MB/s read  140.90 MB/s    16 x   8.86 =  141.75 MB/s

obdfilter-survey

Lustre IOKIT provides obdfilter-survey script which exercises obdfilter layer in Lustre IO stack for reading, writing and rewriting Lustre objects. Obdfilter-survey is primarily used for sizing OST throughput performance over the network.

Lustre OSS needs to be configured before running survey on OSTs Check output of lctl dl command on OSS nodes to verify existence of obdfilter instances

[root@atass04 ~]$ lctl dl 
  0 UP mgc MGC172.20.0.4@tcp cfc518c4-e41d-5dc5-240f-86f24981edfb 5
  1 UP ost OSS OSS_uuid 3
  2 UP obdfilter lustrewt-OST0003 lustrewt-OST0003_UUID 31

Ensure the obdecho is present

modprobe odbecho

Obdfilter-survey can be invoked with following parameters

case: local-disk, network-echo, network-disk, Run survey on disk-backed local obdfilter instances, network loopback

or disk instances

thrlo-thrhi: High - low counts of threads
nobjlo-nobjhi: No of objects to read/write
rszlo-rszhi: High - low record size in KB
size: Total IO size in MB
targets: names of obdfilter instances

Recommended parameters are:

rszlo=rszhi=1024, nobjhi=128, thrhi=128

Run locally against an OST (on the OSS)

[root@oss01 ~]$ rszio=rszhi=1024 nobjhi=128 thrhi=128 size=1024 case=disk sh obdfilter-survey
Thu Aug 18 11:57:16 BST 2011 Obdfilter-survey for case=disk from atass04
ost  1 sz  1048576K rsz 1024 obj    1 thr    1 write   22.27 [  20.96,  22.98] 
ost  1 sz  1048576K rsz 1024 obj    1 thr    2 write   22.28 [  20.98,  22.98] 
ost  1 sz  1048576K rsz 1024 obj    1 thr    4 write   38.06 [  32.94,  39.96] 
ost  1 sz  1048576K rsz 1024 obj    1 thr    8 write   53.47 [  47.91,  55.95] 
ost  1 sz  1048576K rsz 1024 obj    1 thr   16 write   76.96 [  64.94,  79.93] 
ost  1 sz  1048576K rsz 1024 obj    1 thr   32 write   96.43 [  94.91,  96.91] 
ost  1 sz  1048576K rsz 1024 obj    1 thr   64 write  100.21 [  91.91, 127.88] 
ost  1 sz  1048576K rsz 1024 obj    1 thr  128 write  113.79 [ 127.88, 127.88] 
... [etc]

Run over the network from a lustre client

Unmount the /mnt/lustre filesystem before running ($ umount /mnt/lustre)

[root@comp10 ~]$ rszio=rszhi=1024 nobjhi=8 thrhi=8 size=1024 targets="oss01" case=network sh obdfilter-survey
Warning: Permanently added 'atass01,172.20.0.5' (RSA) to the list of known hosts.
Thu Aug 18 12:27:36 BST 2011 Obdfilter-survey for case=network from atass10
ost  1 sz  1048576K rsz 1024 obj    1 thr    1 write   45.35 [  44.91,  45.96] 
ost  1 sz  1048576K rsz 1024 obj    1 thr    2 write   88.53 [  86.71,  90.92] 
ost  1 sz  1048576K rsz 1024 obj    1 thr    4 write  100.02 [  98.91, 100.91] 
ost  1 sz  1048576K rsz 1024 obj    1 thr    8 write  105.18 [ 103.91, 107.90] 
ost  1 sz  1048576K rsz 1024 obj    2 thr    2 write   90.30 [  89.83,  91.82] 
ost  1 sz  1048576K rsz 1024 obj    2 thr    4 write  100.06 [  99.81, 100.91] 
ost  1 sz  1048576K rsz 1024 obj    2 thr    8 write  104.68 [ 103.90, 106.90] 
ost  1 sz  1048576K rsz 1024 obj    4 thr    4 write   99.62 [  98.91, 100.91] 
ost  1 sz  1048576K rsz 1024 obj    4 thr    8 write  104.81 [ 102.91, 109.90] 
ost  1 sz  1048576K rsz 1024 obj    8 thr    8 write  104.54 [ 103.87, 104.90] 
done!

Scirpts for running obdfilter

#!/bin/bash
 
#TARGETS="oss-3-0.lustre3.net:bb-OST0000 oss-3-0.lustre3.net:bb-OST0001 oss-3-0.lustre3.net:bb-OST0002 oss-3-0.lustre3.net:bb-OST0003 oss-3-1.lustre3.net:bb-OST0004 oss-3-1.lustre3.net:bb-OST0005 oss-3-1.lustre3.net:bb-OST0006 oss-3-1.lustre3.net:bb-OST0007 oss-3-2.lustre3.net:bb-OST0008 oss-3-2.lustre3.net:bb-OST0009 oss-3-2.lustre3.net:bb-OST000a oss-3-2.lustre3.net:bb-OST000b oss-3-3.lustre3.net:bb-OST000c oss-3-3.lustre3.net:bb-OST000d oss-3-3.lustre3.net:bb-OST000e oss-3-3.lustre3.net:bb-OST000f"
TARGETS="
lustre02-oss1.boston.co.uk:lfs2-OST0003
lustre02-oss2.boston.co.uk:lfs2-OST0000
lustre02-oss3.boston.co.uk:lfs2-OST0002
lustre02-oss4.boston.co.uk:lfs2-OST0001
"
NOBJLO=1
NOBJHI=256
THRLO=1
THRHI=256
#OUTPUT="/root/test_results/obd/12lun_00_01"
 OUTPUT="/root/test_results/obd/short_peak"
 
# The test dataset size (MB) for each LUN. The total dataset size must be larger than 2 times of the RAM size in order to avoid the caching.
# The calculation of the size
# (RAM size * 2) / the number of LUNs
# For example, the server RAM is 24GB and there are 5 LUN
# the size should be (24GB * 2 ) / 5 ~= 10 GB = 10240 MB
# However, it is a good idea to run a very short trial run to make sure the test configuration working properly before scheduling a complete test.
# SIZE=100 is a good number for the trial run
 
 
#SIZE="46000"
SIZE="32000"
#SIZE="100"
 
ssh lustre02-oss1 mkdir -p $OUTPUT
ssh lustre02-oss2 mkdir -p $OUTPUT
ssh lustre02-oss3 mkdir -p $OUTPUT
ssh lustre02-oss4 mkdir -p $OUTPUT

thrhi=$THRHI thrlo=$THRLO nobjhi=$NOBJHI nobjlo=$NOBJLO size=$SIZE case="disk" targets=$TARGETS rslt_loc=$OUTPUT obdfilter-survey

ost-survey

Run on a lustre client with the filesystem mounted.

TBC: Why not ~100MB/s, both disks/network are capable of this?

[root@atass07 ~]$ ost-survey /mnt/lustre/
/usr/bin/ost-survey: 08/30/11 OST speed survey on /mnt/lustre/ from 172.20.0.11@tcp
Number of Active OST devices : 4
warning: deprecated usage of setstripe positional parameters.  Use -c, -i, -s instead.
warning: deprecated usage of setstripe positional parameters.  Use -c, -i, -s instead.
warning: deprecated usage of setstripe positional parameters.  Use -c, -i, -s instead.
warning: deprecated usage of setstripe positional parameters.  Use -c, -i, -s instead.
Worst  Read OST indx: 0 speed: 56.321735
Best   Read OST indx: 1 speed: 59.597370
Read Average: 57.899567 +/- 1.248223 MB/s
Worst  Write OST indx: 2 speed: 59.103396
Best   Write OST indx: 0 speed: 68.555924
Write Average: 63.212332 +/- 3.600104 MB/s
Ost#  Read(MB/s)  Write(MB/s)  Read-time  Write-time
----------------------------------------------------
0     56.322       68.556        0.533      0.438
1     59.597       60.915        0.503      0.492
2     57.187       59.103        0.525      0.508
3     58.493       64.275        0.513      0.467

# increase the file size -s 500:
[root@blade1 ~]# ost-survey -s 500 /mnt/lustre/^C

Ost#  Read(MB/s)  Write(MB/s)  Read-time  Write-time
----------------------------------------------------
0     4006.794       90.196        0.125      5.543
1     3972.007       89.683        0.126      5.575
2     4010.817       92.355        0.125      5.414
3     3996.448       89.458        0.125      5.589
4     4055.085       92.344        0.123      5.415