Difference between revisions of "GPFS NVMesh"
(Created page with "This page covers the use of GPFS with NVMesh. = Setup = === Making NVMesh volumes visible to GPFS === By default GPFS can only see block devices of certain types. Known di...") |
|||
| (One intermediate revision by the same user not shown) | |||
| Line 141: | Line 141: | ||
=== File system creation === | === File system creation === | ||
| − | Once all NSDs are created, use mmcrfs command to create a file system. The minimum | + | Once all NSDs are created, use mmcrfs command to create a file system. The minimum invocation of this command is of the form |
<syntaxhighlight>mmcrfs fs_name -F nsd_stanza_file</syntaxhighlight> | <syntaxhighlight>mmcrfs fs_name -F nsd_stanza_file</syntaxhighlight> | ||
It is worth reading the man page of mmcrfs to get an idea of what options are available as some of the options can not be changed after the file system is created. Some common options include | It is worth reading the man page of mmcrfs to get an idea of what options are available as some of the options can not be changed after the file system is created. Some common options include | ||
| Line 148: | Line 148: | ||
-A Auto-mount the file system when GPFS daemon starts | -A Auto-mount the file system when GPFS daemon starts | ||
-B File system block size | -B File system block size | ||
| − | -j Block allocation map | + | -j Block allocation map (scatter is recommended for flash storage) |
-m Default metadata replication factor | -m Default metadata replication factor | ||
-M Default metadata replication factor | -M Default metadata replication factor | ||
| Line 156: | Line 156: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
| − | A | + | A good baseline that has been shown to work is |
| + | <syntaxhighlight> | ||
| + | mmcrfs gpfs1 -F nsd.stanza -A no -B 4m -D posix -j scatter -m 1 -M 1 -r 1 -R 1 -n 1 -E no -k posix -S yes | ||
| + | </syntaxhighlight> | ||
| + | |||
| + | After the file system is created mount it on all servers using the mmmount command. | ||
| + | <syntaxhighlight> | ||
| + | # Replace gpfs1 with the name of the file system | ||
| + | mmmount gpfs1 -a | ||
| + | </syntaxhighlight> | ||
| + | |||
| + | If at this point any client fails to mount the filesystem and reports a stale file handle, then it is most likely due to that client no recognising the NVMesh volume as a valid target. Recheck that the /var/mmfs/etc/nsddevices script was added to the failing client and that its contents are correct. Check the output of mmdevdiscover on the client to confirm that the block devices are visible, and try remounting the GPFS file system using | ||
| + | <syntaxhighlight>mmmount gpfs1</syntaxhighlight> | ||
| + | locally. | ||
| + | |||
| + | = Optimisations and Performance Tuning = | ||
| + | === Multiple NSDs === | ||
| + | |||
| + | During testing it was found that having one large NVMesh volume that striped across all servers limited throughput. So it is recommended that to get the most throughput possible create a separate NVMesh volume for each Excelero server and use GPFS to stripe across them. In our test configuration that meant that each Excelero server exported a volume that consisted of 4 NVMes in RAID0. Each of these volumes are presented to GPFS as separate NSDs. | ||
| + | |||
| + | === GPFS Tuning === | ||
| + | |||
| + | Since Excelero is resposible for sharing all of the block devices, the required GPFS tuning is minimised to a small set parameters. | ||
| + | |||
| + | GPFS is a smart file system, that tries to auto tune to get the most performance possible, but we can override some defaults to help the tuning algorithm find optimal parameters. It is important to note that any values set in GPFS are considered to be guidelines as opposed to hard set values, GPFS may change them based on the tuning of some other parameters. | ||
| + | |||
| + | The first thing to look at is caching and prefetching. With NVMesh we want to avoid any caching or prefetching of files. The parameter 'prefetchAggressiveness' determines what the prefetching behaviour of GPFS is. By default it has a value of 2, which indicates that GPFS should prefetch files if the first access occurs at a zero offset in the file, or if the second access is sequential. To tell GPFS not to prefetch any files we set 'prefetchAggressiveness=0'. | ||
| + | |||
| + | GPFS also tries to limit the amount of IO going to each server to avoid overloading them and causing IO requests to queue. This limit is controlled by 'maxMBpS'. The recommendation is to set this to twice the network rate, up to its maximum of 100000MB/s. This is not a hard limit, so even if this maximum value is lower than what the network or servers are capable of it won't effect the overall throughput. | ||
| + | |||
| + | Along with the maximum bandwidth GPFS tries to guess what the expected throughput should be based on the number of LUNs a server has attached. Obviously this won't work with NVMesh because each LUN that GPFS sees will in reality consist of multiple drives. It is recommended to set 'ignorePrefetchLUNCount=yes' which instructs GPFS to not rely on the LUN count to estimate throughput. | ||
| + | |||
| + | GPFS uses multiple threads to handle IO requests in parallel. It does this using IO queues. Each of these queues is dedicated to processing small IO or large IO requests. By default a large IO request is any IO that is larger than 64k. We can control the total number of queues, the ratio of small:large queues and the number of threads in each queue to fine tune the system to handle specific workloads. | ||
| + | The following parameters are used to control this. | ||
| + | nsdSmallThreadRatio: ratio of small to large threads | ||
| + | nsdThreadsPerQueue: Number of threads in each IO queue | ||
| + | nsdMaxWorkerThreads: Total number of threads | ||
| + | |||
| + | For reference IBM recommend the following as a guideline for a general use system. | ||
| + | |||
| + | nsdSmallThreadRatio=1 | ||
| + | nsdThreadsPerQueue=12 | ||
| + | nsdMaxWorkerThreads=480 | ||
| + | |||
| + | This configuration results in a total of 40 queues, 20 dedicated to handling small IO requests and 20 for large. | ||
| + | |||
| + | For an NVMesh system that is optimised for large IO and throughput, the following configuration was used. | ||
| + | nsdSmallThreadRatio=0 | ||
| + | nsdThreadsPerQueue=24 | ||
| + | nsdMaxWorkerThreads=2040 | ||
| + | |||
| + | This provides 85 queues, each with 24 threads, to handle large IO requests. This is good for a system that needs to handle high throughput but it does sacrifice IOPs and small IO performance. | ||
| + | |||
| + | Unfortunately there are no real guidelines on how these should be set, and it is not always deterministic how changing them will effect the file system, so it is necessary to test performance after altering them to ensure the desired performance levels are being met. | ||
| + | |||
| + | The final tunable parameter is workerThreads. This sets the total number of threads that the GPFS daemon should use. Changing this will change the value of several other parameters. The maximum value is 8096, but any value between 4096 and 8182 has proven to perform well during testing. If it is set too high, GPFS may auto-tune it to a lower value to better suit the value of other parameters. For reference, in testing | ||
| + | workerThreads=6141 | ||
| + | was used to achieve the best benchmark result. | ||
| + | |||
| + | |||
| + | === NVmesh Client Tuning === | ||
| + | NVMesh allows for some tuning through modifying kernel parameters. Despite configuring GPFS to not prefetch, it more than likely will still perform some form of read ahead. This causes a drop in performance, but we can use the max_ios_per_cpu kernel parameter to throttle IO requests, which in effectively makes it impossible for GPFS to prefetch. | ||
| + | |||
| + | By default NVMesh allows each CPU to queue 64 IO requests at any one time. At first glance this looks beneficial, but it ultimately ends up degrading performance as it is enabling GPFS to read ahead, which is causing caching. Ideally we want NVMesh processing blocks at the same rate the GPFS is, thus preventing GPFS from even attempting to prefetch data. NVMesh recommend setting max_ios_per_cpu to 8 as a base figure. In testing we have found that this is still too high, particularly on a client with a high core count. It is a good idea to start with a value of 1, and increment from there, until performance starts to drop off again. In a test configuration with a single DGX-1 as a client the optimal performance was achieved using a value of 3. | ||
| + | |||
| + | This only needs to be set on NVMesh clients. It can be set on the fly by doing | ||
| + | <syntaxhighlight> echo 3 > /sys/module/nvmeibc/parameters/max_ios_per_cpu </syntaxhighlight> | ||
| + | which allows for easy retesting as it doesn't need the client to be restarted. | ||
| + | |||
| + | Once a good value is found, it can be set permanently by doing | ||
| + | <syntaxhighlight> echo “options nvmeibc max_ios_per_cpu=8” >> /etc/modprobe.d/nvmesh.conf </syntaxhighlight> | ||
| + | on clients. | ||
| + | |||
| + | === Benchmark === | ||
| + | The following is an example fio script that was used to benchmark IO from a single DGX-1 client, along with the achieved throughput | ||
| + | |||
| + | <syntaxhighlight> | ||
| + | [global] | ||
| + | ioengine=libaio | ||
| + | direct=1 | ||
| + | iodepth=1 | ||
| + | invalidate=1 | ||
| + | time_based | ||
| + | runtime=300 | ||
| + | norandommap | ||
| + | randrepeat=0 | ||
| + | log_avg_msec=1000 | ||
| + | group_reporting | ||
| + | |||
| + | [gpfs1-read] | ||
| + | rw=read | ||
| + | blocksize=4m | ||
| + | size=2T | ||
| + | filename=/gpfs/gpfs1/fio | ||
| + | numjobs=128 | ||
| + | stonewall | ||
| + | </syntaxhighlight> | ||
| + | |||
| + | <syntaxhighlight> | ||
| + | gpfs1-read: (g=0): rw=read, bs=4M-4M/4M-4M/4M-4M, ioengine=libaio, iodepth=1 | ||
| + | ... | ||
| + | fio-2.2.10 | ||
| + | Starting 128 processes | ||
| + | Jobs: 128 (f=128): [R(128)] [100.0% done] [44100MB/0KB/0KB /s] [11.3K/0/0 iops] [eta 00m:00s] | ||
| + | gpfs1-read: (groupid=0, jobs=128): err= 0: pid=58735: Tue Jun 5 10:47:22 2018 | ||
| + | read : io=12869GB, bw=43923MB/s, iops=10980, runt=300014msec | ||
| + | slat (usec): min=92, max=686135, avg=1035.58, stdev=3522.99 | ||
| + | clat (usec): min=2, max=80858, avg=10616.33, stdev=4816.44 | ||
| + | lat (msec): min=1, max=692, avg=11.65, stdev= 5.90 | ||
| + | clat percentiles (usec): | ||
| + | | 1.00th=[ 3824], 5.00th=[ 5344], 10.00th=[ 6176], 20.00th=[ 7264], | ||
| + | | 30.00th=[ 8096], 40.00th=[ 8768], 50.00th=[ 9536], 60.00th=[10432], | ||
| + | | 70.00th=[11456], 80.00th=[13120], 90.00th=[16064], 95.00th=[19584], | ||
| + | | 99.00th=[28800], 99.50th=[33024], 99.90th=[44288], 99.95th=[48896], | ||
| + | | 99.99th=[58624] | ||
| + | bw (KB /s): min= 5818, max=498714, per=0.78%, avg=351823.75, stdev=28459.03 | ||
| + | lat (usec) : 4=0.01%, 750=0.01%, 1000=0.01% | ||
| + | lat (msec) : 2=0.06%, 4=1.16%, 10=53.67%, 20=40.53%, 50=4.54% | ||
| + | lat (msec) : 100=0.04% | ||
| + | cpu : usr=0.07%, sys=2.75%, ctx=28334724, majf=0, minf=132935 | ||
| + | IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% | ||
| + | submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% | ||
| + | complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% | ||
| + | issued : total=r=3294347/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 | ||
| + | latency : target=0, window=0, percentile=100.00%, depth=1 | ||
| + | Run status group 0 (all jobs): | ||
| + | READ: io=12869GB, aggrb=43923MB/s, minb=43923MB/s, maxb=43923MB/s, mint=300014msec, maxt=300014msec | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Latest revision as of 11:04, 5 June 2018
This page covers the use of GPFS with NVMesh.
Setup
Making NVMesh volumes visible to GPFS
By default GPFS can only see block devices of certain types. Known disk types currently are:
powerdisk - EMC power path disk
vpath - IBM virtual path disk
dmm - Device-Mapper Multipath (DMM)
dlmfdrv - Hitachi dlm
hdisk - AIX hard disk
lv - AIX logical volume. Historical usage only.
Not allowed as a new device to mmcrnsd.
gpt - GPFS partition on Windows disk
generic - Device having no unique failover or multipathing
characteristic (predominantly Linux devices).
dasd - DASD device (for Linux on z Systems)
To list all the currently known devices run the following command.
[root@excelero-a ~]# mmdevdiscover
sdb generic
sdb1 generic
sdb2 generic
sda generic
sda1 generic
sda2 generic
dm-0 dmm
dm-1 dmm
dm-2 dmm
dm-3 dmm
dm-4 dmm
dm-5 dmmTo use NVMesh block devices with GPFS an additional known disk type needs to be added to GPFS. The mmdevdiscover script has a built in function to run an arbitrary user script during execution. We use this to add NVMesh devices to the list of known drives. Below is a simple bash script that will find all attached NVMesh volumes and label them as generic Linux devices.
[root@excelero-a ~]# cat /var/mmfs/etc/nsddevices
#!/bin/bash
if [[ -d /dev/nvmesh ]]; then
cd /dev && for dev in $(ls nvmesh/); do
echo nvmesh/$dev generic
done
fi
exit 1Note that the file /var/mmfs/etc/nsddevices needs to be created on all systems (both servers and clients).
After adding /var/mmfs/etc/nsddevices, confirm that the NVMesh volumes are visible in the output of mmdevdiscover.
[root@excelero-a ~]# mmdevdiscover
nvmesh/nv01 generic
nvmesh/nv01p1 generic
nvmesh/nv02 generic
nvmesh/nv02p1 generic
nvmesh/nv03 generic
nvmesh/nv03p1 generic
nvmesh/nv04 generic
nvmesh/nv04p1 generic
.
.NSD Creation
As the NVMesh block devices are available on all servers we can setup GPFS using a direct attached (share all) setup. In this configuration, all block devices used as NSDs appear as local devices on each server. This is the optimal configuration for GPFS, as it means that all network traffic occurs at a block level and so removes the need for GPFS to share devices.
To create the NSDs, create a stanza file that has an entry for each NSD. This only needs to be done on one server, GPFS will sync the configuration across the cluster.
Since all block devices are attached to each server and client, we do not need to specify any servers in the stanza file.
[root@excelero-a ~]# cat nsd.stanza
%nsd:
nsd=nsd01
device=/dev/nvmesh/nv01
usage=dataAndMetadata
%nsd:
nsd=nsd02
device=/dev/nvmesh/nv02
usage=dataAndMetadata
%nsd:
nsd=nsd03
device=/dev/nvmesh/nv03
usage=dataAndMetadata
%nsd:
nsd=nsd04
device=/dev/nvmesh/nv04
usage=dataAndMetadataThis stanza file specifies 4 NSDs to be created.
Create the NSDs using the mmcrnsd command. Once created confirm that the NSDs are mapped to each server and client, and that GPFS sees them as directly attached storage.
[root@excelero-a ~]# mmlsnsd -M
Disk name NSD volume ID Device Node name Remarks
---------------------------------------------------------------------------------------
nsd01 C0A800015B156123 /dev/nvmesh/nv01 dgx-1.admin
nsd01 C0A800015B156123 /dev/nvmesh/nv01 excelero-a.admin
nsd01 C0A800015B156123 /dev/nvmesh/nv01 excelero-b.admin
nsd01 C0A800015B156123 /dev/nvmesh/nv01 excelero-c.admin
nsd01 C0A800015B156123 /dev/nvmesh/nv01 excelero-d.admin
nsd02 C0A800015B156124 /dev/nvmesh/nv02 dgx-1.admin
nsd02 C0A800015B156124 /dev/nvmesh/nv02 excelero-a.admin
nsd02 C0A800015B156124 /dev/nvmesh/nv02 excelero-b.admin
nsd02 C0A800015B156124 /dev/nvmesh/nv02 excelero-c.admin
nsd02 C0A800015B156124 /dev/nvmesh/nv02 excelero-d.admin
nsd03 C0A800015B156125 /dev/nvmesh/nv03 dgx-1.admin
nsd03 C0A800015B156125 /dev/nvmesh/nv03 excelero-a.admin
nsd03 C0A800015B156125 /dev/nvmesh/nv03 excelero-b.admin
nsd03 C0A800015B156125 /dev/nvmesh/nv03 excelero-c.admin
nsd03 C0A800015B156125 /dev/nvmesh/nv03 excelero-d.admin
nsd04 C0A800015B156126 /dev/nvmesh/nv04 dgx-1.admin
nsd04 C0A800015B156126 /dev/nvmesh/nv04 excelero-a.admin
nsd04 C0A800015B156126 /dev/nvmesh/nv04 excelero-b.admin
nsd04 C0A800015B156126 /dev/nvmesh/nv04 excelero-c.admin
nsd04 C0A800015B156126 /dev/nvmesh/nv04 excelero-d.admin
[root@excelero-a ~]# mmlsnsd -L
File system Disk name NSD volume ID NSD servers
---------------------------------------------------------------------------------------------
gpfs1 nsd01 C0A800015B156123 (directly attached)
gpfs1 nsd02 C0A800015B156124 (directly attached)
gpfs1 nsd03 C0A800015B156125 (directly attached)
gpfs1 nsd04 C0A800015B156126 (directly attached)mmlsnsd -L should be checked on every system as a sanity check of the configuration.
File system creation
Once all NSDs are created, use mmcrfs command to create a file system. The minimum invocation of this command is of the form
mmcrfs fs_name -F nsd_stanza_fileIt is worth reading the man page of mmcrfs to get an idea of what options are available as some of the options can not be changed after the file system is created. Some common options include
-A Auto-mount the file system when GPFS daemon starts
-B File system block size
-j Block allocation map (scatter is recommended for flash storage)
-m Default metadata replication factor
-M Default metadata replication factor
-r Default metadata replication factor
-R Default metadata replication factor
-n Estimated number of nodes that will mount the file systemA good baseline that has been shown to work is
mmcrfs gpfs1 -F nsd.stanza -A no -B 4m -D posix -j scatter -m 1 -M 1 -r 1 -R 1 -n 1 -E no -k posix -S yesAfter the file system is created mount it on all servers using the mmmount command.
# Replace gpfs1 with the name of the file system
mmmount gpfs1 -aIf at this point any client fails to mount the filesystem and reports a stale file handle, then it is most likely due to that client no recognising the NVMesh volume as a valid target. Recheck that the /var/mmfs/etc/nsddevices script was added to the failing client and that its contents are correct. Check the output of mmdevdiscover on the client to confirm that the block devices are visible, and try remounting the GPFS file system using
mmmount gpfs1locally.
Optimisations and Performance Tuning
Multiple NSDs
During testing it was found that having one large NVMesh volume that striped across all servers limited throughput. So it is recommended that to get the most throughput possible create a separate NVMesh volume for each Excelero server and use GPFS to stripe across them. In our test configuration that meant that each Excelero server exported a volume that consisted of 4 NVMes in RAID0. Each of these volumes are presented to GPFS as separate NSDs.
GPFS Tuning
Since Excelero is resposible for sharing all of the block devices, the required GPFS tuning is minimised to a small set parameters.
GPFS is a smart file system, that tries to auto tune to get the most performance possible, but we can override some defaults to help the tuning algorithm find optimal parameters. It is important to note that any values set in GPFS are considered to be guidelines as opposed to hard set values, GPFS may change them based on the tuning of some other parameters.
The first thing to look at is caching and prefetching. With NVMesh we want to avoid any caching or prefetching of files. The parameter 'prefetchAggressiveness' determines what the prefetching behaviour of GPFS is. By default it has a value of 2, which indicates that GPFS should prefetch files if the first access occurs at a zero offset in the file, or if the second access is sequential. To tell GPFS not to prefetch any files we set 'prefetchAggressiveness=0'.
GPFS also tries to limit the amount of IO going to each server to avoid overloading them and causing IO requests to queue. This limit is controlled by 'maxMBpS'. The recommendation is to set this to twice the network rate, up to its maximum of 100000MB/s. This is not a hard limit, so even if this maximum value is lower than what the network or servers are capable of it won't effect the overall throughput.
Along with the maximum bandwidth GPFS tries to guess what the expected throughput should be based on the number of LUNs a server has attached. Obviously this won't work with NVMesh because each LUN that GPFS sees will in reality consist of multiple drives. It is recommended to set 'ignorePrefetchLUNCount=yes' which instructs GPFS to not rely on the LUN count to estimate throughput.
GPFS uses multiple threads to handle IO requests in parallel. It does this using IO queues. Each of these queues is dedicated to processing small IO or large IO requests. By default a large IO request is any IO that is larger than 64k. We can control the total number of queues, the ratio of small:large queues and the number of threads in each queue to fine tune the system to handle specific workloads. The following parameters are used to control this.
nsdSmallThreadRatio: ratio of small to large threads nsdThreadsPerQueue: Number of threads in each IO queue nsdMaxWorkerThreads: Total number of threads
For reference IBM recommend the following as a guideline for a general use system.
nsdSmallThreadRatio=1 nsdThreadsPerQueue=12 nsdMaxWorkerThreads=480
This configuration results in a total of 40 queues, 20 dedicated to handling small IO requests and 20 for large.
For an NVMesh system that is optimised for large IO and throughput, the following configuration was used.
nsdSmallThreadRatio=0 nsdThreadsPerQueue=24 nsdMaxWorkerThreads=2040
This provides 85 queues, each with 24 threads, to handle large IO requests. This is good for a system that needs to handle high throughput but it does sacrifice IOPs and small IO performance.
Unfortunately there are no real guidelines on how these should be set, and it is not always deterministic how changing them will effect the file system, so it is necessary to test performance after altering them to ensure the desired performance levels are being met.
The final tunable parameter is workerThreads. This sets the total number of threads that the GPFS daemon should use. Changing this will change the value of several other parameters. The maximum value is 8096, but any value between 4096 and 8182 has proven to perform well during testing. If it is set too high, GPFS may auto-tune it to a lower value to better suit the value of other parameters. For reference, in testing
workerThreads=6141
was used to achieve the best benchmark result.
NVmesh Client Tuning
NVMesh allows for some tuning through modifying kernel parameters. Despite configuring GPFS to not prefetch, it more than likely will still perform some form of read ahead. This causes a drop in performance, but we can use the max_ios_per_cpu kernel parameter to throttle IO requests, which in effectively makes it impossible for GPFS to prefetch.
By default NVMesh allows each CPU to queue 64 IO requests at any one time. At first glance this looks beneficial, but it ultimately ends up degrading performance as it is enabling GPFS to read ahead, which is causing caching. Ideally we want NVMesh processing blocks at the same rate the GPFS is, thus preventing GPFS from even attempting to prefetch data. NVMesh recommend setting max_ios_per_cpu to 8 as a base figure. In testing we have found that this is still too high, particularly on a client with a high core count. It is a good idea to start with a value of 1, and increment from there, until performance starts to drop off again. In a test configuration with a single DGX-1 as a client the optimal performance was achieved using a value of 3.
This only needs to be set on NVMesh clients. It can be set on the fly by doing
echo 3 > /sys/module/nvmeibc/parameters/max_ios_per_cpuwhich allows for easy retesting as it doesn't need the client to be restarted.
Once a good value is found, it can be set permanently by doing
echo “options nvmeibc max_ios_per_cpu=8” >> /etc/modprobe.d/nvmesh.confon clients.
Benchmark
The following is an example fio script that was used to benchmark IO from a single DGX-1 client, along with the achieved throughput
[global]
ioengine=libaio
direct=1
iodepth=1
invalidate=1
time_based
runtime=300
norandommap
randrepeat=0
log_avg_msec=1000
group_reporting
[gpfs1-read]
rw=read
blocksize=4m
size=2T
filename=/gpfs/gpfs1/fio
numjobs=128
stonewallgpfs1-read: (g=0): rw=read, bs=4M-4M/4M-4M/4M-4M, ioengine=libaio, iodepth=1
...
fio-2.2.10
Starting 128 processes
Jobs: 128 (f=128): [R(128)] [100.0% done] [44100MB/0KB/0KB /s] [11.3K/0/0 iops] [eta 00m:00s]
gpfs1-read: (groupid=0, jobs=128): err= 0: pid=58735: Tue Jun 5 10:47:22 2018
read : io=12869GB, bw=43923MB/s, iops=10980, runt=300014msec
slat (usec): min=92, max=686135, avg=1035.58, stdev=3522.99
clat (usec): min=2, max=80858, avg=10616.33, stdev=4816.44
lat (msec): min=1, max=692, avg=11.65, stdev= 5.90
clat percentiles (usec):
| 1.00th=[ 3824], 5.00th=[ 5344], 10.00th=[ 6176], 20.00th=[ 7264],
| 30.00th=[ 8096], 40.00th=[ 8768], 50.00th=[ 9536], 60.00th=[10432],
| 70.00th=[11456], 80.00th=[13120], 90.00th=[16064], 95.00th=[19584],
| 99.00th=[28800], 99.50th=[33024], 99.90th=[44288], 99.95th=[48896],
| 99.99th=[58624]
bw (KB /s): min= 5818, max=498714, per=0.78%, avg=351823.75, stdev=28459.03
lat (usec) : 4=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.06%, 4=1.16%, 10=53.67%, 20=40.53%, 50=4.54%
lat (msec) : 100=0.04%
cpu : usr=0.07%, sys=2.75%, ctx=28334724, majf=0, minf=132935
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=3294347/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: io=12869GB, aggrb=43923MB/s, minb=43923MB/s, maxb=43923MB/s, mint=300014msec, maxt=300014msec