Difference between revisions of "CEPH: Ceph on the Blades"

From Define Wiki
Jump to navigation Jump to search
 
(34 intermediate revisions by the same user not shown)
Line 8: Line 8:
  
 
=== Testing Environment ===
 
=== Testing Environment ===
OS: CentOS release 6.3 (Final)
+
OS: CentOS release 6.3 (Final) <br>
* Server nodes
+
Firewall Disabled
uname -a
+
{| class="wikitable"
Linux Blade3 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
+
|-
* Client node
+
! align="left" | Server nodes
uname -a
+
|-
Linux Blade8 3.8.8-1.el6.elrepo.x86_64 #1 SMP Wed Apr 17 16:47:58 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
+
| uname -a
 +
Linux Blade3 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux  
 +
|}
 +
 
 +
{| class="wikitable"
 +
|-
 +
! align="left" | Client node
 +
|-
 +
| uname -a
 +
Linux Blade8 3.8.8-1.el6.elrepo.x86_64 #1 SMP Wed Apr 17 16:47:58 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
 +
|}
 +
 
 
== Install the Ceph Bobtail release ==
 
== Install the Ceph Bobtail release ==
 
On all the nodes
 
On all the nodes
Line 69: Line 80:
 
* Execute the following from a server node  
 
* Execute the following from a server node  
 
  mkcephfs -a -c /etc/ceph/ceph.conf -k ceph.keyring
 
  mkcephfs -a -c /etc/ceph/ceph.conf -k ceph.keyring
 +
== Start the Ceph Cluster ==
 +
* Execute the following from a node that has password less SSH to the other server nodes.
 +
service ceph -a start
 +
=== Issues ===
 +
* Have seen the following issue quite often when starting the cluster
 +
<syntaxhighlight>[root@Blade3 ~]# service ceph -a start
 +
=== mon.a ===
 +
=== mon.b ===
 +
=== mon.c ===
 +
=== mds.a ===
 +
=== osd.0 ===
 +
Starting Ceph osd.0 on blade3...
 +
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
 +
=== osd.1 ===
 +
Starting Ceph osd.1 on blade4...
 +
global_init: unable to open config file from search list /tmp/ceph.conf.33b54ef1fee10259e92480001532cf78
 +
failed: 'ssh blade4 ulimit -n 8192;  /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /tmp/ceph.conf.33b54ef1fee10259e92480001532cf78 ' </syntaxhighlight>
 +
* Its not picking up the correct file name on the remote server node.
 +
* Executing the following on the failed node should start the OSD daemon. Eg: In the case run the following on '''blade4'''
 +
* Substitute the correct the name for the ceph.conf file form the tmp directory.
 +
<syntaxhighlight>ssh blade4 ulimit -n 8192;  /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /tmp/ceph.conf.<XXX> </syntaxhighlight>
 +
=== Verify Cluster Health ===
 +
<syntaxhighlight>ceph status
 +
 +
health HEALTH_OK
 +
  monmap e1: 3 mons at {a=100.100.0.3:6789/0,b=100.100.0.4:6789/0,c=100.100.0.5:6789/0}, election epoch 8,  quorum 0,1,2 a,b,c
 +
  osdmap e22: 3 osds: 3 up, 3 in
 +
    pgmap v1198: 1544 pgs: 1544 active+clean; 15766 MB data, 40372 MB used, 100 GB / 147 GB avail
 +
  mdsmap e9: 1/1/1 up {0=a=up:active}</syntaxhighlight>
 +
 +
<syntaxhighlight>ceph osd tree
 +
 +
# id    weight  type name      up/down reweight
 +
-1      3      root default
 +
-3      3              rack unknownrack
 +
-2      1                      host blade3
 +
0      1                              osd.0  up      1
 +
-4      1                      host blade4
 +
1      1                              osd.1  up      1
 +
-5      1                      host blade5
 +
2      1                              osd.2  up      1</syntaxhighlight>
  
 
== Performance ==
 
== Performance ==
 +
=== Local disk benchmark ===
 +
{| class="wikitable"
 +
|-
 +
! 1G Ethernet !! 10G Ethernet
 +
|-
 +
| <syntaxhighlight>[root@Blade6 test]# dd if=/dev/zero of=here bs=1G count=2 oflag=direct
 +
2+0 records in
 +
2+0 records out
 +
2147483648 bytes (2.1 GB) copied, 4.47646 s, 480 MB/s
 +
[root@Blade6 test]# dd if=/dev/zero of=here bs=1G count=2 oflag=direct
 +
2+0 records in
 +
2+0 records out
 +
2147483648 bytes (2.1 GB) copied, 4.53849 s, 473 MB/s
 +
[root@Blade6 test]# dd if=/dev/zero of=here bs=1G count=2 oflag=direct
 +
2+0 records in
 +
2+0 records out
 +
2147483648 bytes (2.1 GB) copied, 4.50215 s, 477 MB/s</syntaxhighlight>
 +
|| <syntaxhighlight>[root@Blade8 test]# dd if=/dev/zero of=here bs=1G count=2 oflag=direct
 +
2+0 records in
 +
2+0 records out
 +
2147483648 bytes (2.1 GB) copied, 4.58078 s, 469 MB/s
 +
[root@Blade8 test]# dd if=/dev/zero of=here bs=1G count=2 oflag=direct
 +
2+0 records in
 +
2+0 records out
 +
2147483648 bytes (2.1 GB) copied, 3.85319 s, 557 MB/s
 +
[root@Blade8 test]# dd if=/dev/zero of=here bs=1G count=2 oflag=direct
 +
2+0 records in
 +
2+0 records out
 +
2147483648 bytes (2.1 GB) copied, 4.51147 s, 476 MB/s</syntaxhighlight>
 +
|}
 +
=== Evaluating the network ===
 +
==== IPERF ====
 +
{| class="wikitable"
 +
|-
 +
! 1G Ethernet !! 10G Ethernet
 +
|-
 +
|From server
 +
iperf -s
 +
<syntaxhighlight>[root@Blade3 ceph]# iperf -s
 +
------------------------------------------------------------
 +
Server listening on TCP port 5001
 +
TCP window size: 85.3 KByte (default)
 +
------------------------------------------------------------
 +
[  4] local 172.28.0.232 port 5001 connected with 172.28.0.101 port 33826
 +
[ ID] Interval      Transfer    Bandwidth
 +
[  4]  0.0-10.1 sec  1.12 GBytes    948 Mbits/sec</syntaxhighlight>
 +
From client
 +
iperf -c 172.28.0.232 -i1 -t 10
 +
<syntaxhighlight>[root@Blade6 test]# iperf -c 172.28.0.232 -i1 -t 10
 +
------------------------------------------------------------
 +
Client connecting to 172.28.0.232, TCP port 5001
 +
TCP window size: 64.0 KByte (default)
 +
------------------------------------------------------------
 +
[  3] local 172.28.0.101 port 33826 connected with 172.28.0.232 port 5001
 +
[ ID] Interval      Transfer    Bandwidth
 +
[  3]  0.0- 1.0 sec    120 MBytes  1.01 Gbits/sec
 +
[ ID] Interval      Transfer    Bandwidth
 +
[  3]  1.0- 2.0 sec    118 MBytes    993 Mbits/sec
 +
[ ID] Interval      Transfer    Bandwidth
 +
[  3]  2.0- 3.0 sec    113 MBytes    949 Mbits/sec
 +
[ ID] Interval      Transfer    Bandwidth
 +
[  3]  3.0- 4.0 sec    113 MBytes    949 Mbits/sec
 +
[ ID] Interval      Transfer    Bandwidth
 +
[  3]  4.0- 5.0 sec    113 MBytes    949 Mbits/sec
 +
[ ID] Interval      Transfer    Bandwidth
 +
[  3]  5.0- 6.0 sec    113 MBytes    949 Mbits/sec
 +
[ ID] Interval      Transfer    Bandwidth
 +
[  3]  6.0- 7.0 sec    113 MBytes    949 Mbits/sec
 +
[ ID] Interval      Transfer    Bandwidth
 +
[  3]  7.0- 8.0 sec    113 MBytes    949 Mbits/sec
 +
[ ID] Interval      Transfer    Bandwidth
 +
[  3]  8.0- 9.0 sec    113 MBytes    949 Mbits/sec
 +
[ ID] Interval      Transfer    Bandwidth
 +
[  3]  9.0-10.0 sec    113 MBytes    950 Mbits/sec
 +
[ ID] Interval      Transfer    Bandwidth
 +
[  3]  0.0-10.0 sec  1.12 GBytes    956 Mbits/sec</syntaxhighlight>
 +
||From server
 +
iperf -s
 +
<syntaxhighlight>[root@Blade3 ceph]# iperf -s
 +
------------------------------------------------------------
 +
Server listening on TCP port 5001
 +
TCP window size: 85.3 KByte (default)
 +
------------------------------------------------------------
 +
[  4] local 100.100.0.3 port 5001 connected with 100.100.0.8 port 51166
 +
[ ID] Interval      Transfer    Bandwidth
 +
[  4]  0.0-10.0 sec  10.8 GBytes  9.26 Gbits/sec</syntaxhighlight>
 +
From client
 +
iperf -c 100.100.0.3 -i1 -t 10
 +
<syntaxhighlight>[root@Blade8 test]# iperf -c 100.100.0.3 -i1 -t 10
 +
------------------------------------------------------------
 +
Client connecting to 100.100.0.3, TCP port 5001
 +
TCP window size: 64.0 KByte (default)
 +
------------------------------------------------------------
 +
[  3] local 100.100.0.8 port 51166 connected with 100.100.0.3 port 5001
 +
[ ID] Interval      Transfer    Bandwidth
 +
[  3]  0.0- 1.0 sec    886 MBytes  7.43 Gbits/sec
 +
[ ID] Interval      Transfer    Bandwidth
 +
[  3]  1.0- 2.0 sec  1.10 GBytes  9.47 Gbits/sec
 +
[ ID] Interval      Transfer    Bandwidth
 +
[  3]  2.0- 3.0 sec  1.10 GBytes  9.47 Gbits/sec
 +
[ ID] Interval      Transfer    Bandwidth
 +
[  3]  3.0- 4.0 sec  1.10 GBytes  9.47 Gbits/sec
 +
[ ID] Interval      Transfer    Bandwidth
 +
[  3]  4.0- 5.0 sec  1.10 GBytes  9.46 Gbits/sec
 +
[ ID] Interval      Transfer    Bandwidth
 +
[  3]  5.0- 6.0 sec  1.10 GBytes  9.47 Gbits/sec
 +
[ ID] Interval      Transfer    Bandwidth
 +
[  3]  6.0- 7.0 sec  1.10 GBytes  9.46 Gbits/sec
 +
[ ID] Interval      Transfer    Bandwidth
 +
[  3]  7.0- 8.0 sec  1.10 GBytes  9.47 Gbits/sec
 +
[ ID] Interval      Transfer    Bandwidth
 +
[  3]  8.0- 9.0 sec  1.10 GBytes  9.46 Gbits/sec
 +
[ ID] Interval      Transfer    Bandwidth
 +
[  3]  9.0-10.0 sec  1.10 GBytes  9.47 Gbits/sec
 +
[ ID] Interval      Transfer    Bandwidth
 +
[  3]  0.0-10.0 sec  10.8 GBytes  9.26 Gbits/sec</syntaxhighlight>
 +
|}
 +
 +
==== NETCAT ====
 +
{| class="wikitable"
 +
|-
 +
! 1G Ethernet !! 10G Ethernet
 +
|-
 +
|From server
 +
nc -v -v -l -n 2222 >/dev/null
 +
 +
From the client
 +
time dd if=/dev/zero | nc -v -v -n 172.28.0.232 2222
 +
<syntaxhighlight>[root@Blade6 test]# time dd if=/dev/zero | nc -v -v -n 172.28.0.232 2222
 +
Connection to 172.28.0.232 2222 port [tcp/*] succeeded!
 +
^C5101694+0 records in
 +
5101694+0 records out
 +
2612067328 bytes (2.6 GB) copied, 21.9019 s, 119 MB/s
 +
 +
real    0m21.904s
 +
user    0m1.011s
 +
sys    0m10.891s</syntaxhighlight>
 +
||From server
 +
nc -v -v -l -n 2222 >/dev/null
 +
From client
 +
time dd if=/dev/zero | nc -v -v -n 100.100.0.3 2222
 +
<syntaxhighlight>[root@Blade8 test]# time dd if=/dev/zero | nc -v -v -n 100.100.0.3 2222
 +
Connection to 100.100.0.3 2222 port [tcp/*] succeeded!
 +
^C10481314+0 records in
 +
10481314+0 records out
 +
5366432768 bytes (5.4 GB) copied, 30.1598 s, 178 MB/s
 +
 +
real    0m30.163s
 +
user    0m2.491s
 +
sys    0m42.221s</syntaxhighlight>
 +
|}
 +
 +
=== Ceph Benchmarks ===
 +
==== Rados internal benchmark ====
 +
ceph osd pool create pbench 768
 +
Clean the disk cache on Ceph nodes
 +
sudo echo 3 | sudo tee /proc/sys/vm/drop_caches && sudo sync
 +
OSD Writes
 +
rados bench -p pbench <no of seconds> write --no-cleanup
 +
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
 
! 1G Ethernet !! 10G Ethernet
 
! 1G Ethernet !! 10G Ethernet
 
|-
 
|-
| Example || Example
+
| <syntaxhighlight>Total time run:        63.893136
 +
Total writes made:      314
 +
Write size:            4194304
 +
Bandwidth (MB/sec):    19.658
 +
Stddev Bandwidth:      15.0298
 +
Max bandwidth (MB/sec): 60
 +
Min bandwidth (MB/sec): 0
 +
Average Latency:        3.25449
 +
 
 +
Stddev Latency:        2.84939
 +
Max latency:            15.3193
 +
Min latency:            0.109046</syntaxhighlight>
 +
|| <syntaxhighlight>Total time run:        1.674745
 +
Total writes made:      42
 +
Write size:            4194304
 +
Bandwidth (MB/sec):    100.314
 +
 
 +
Stddev Bandwidth:      70.7107
 +
Max bandwidth (MB/sec): 100
 +
Min bandwidth (MB/sec): 0
 +
Average Latency:        0.632132
 +
Stddev Latency:        0.702116
 +
Max latency:            1.66335
 +
Min latency:            0.049563</syntaxhighlight>
 +
|}
 +
 
 +
OSD Reads:
 +
rados bench -p pbench <no of seconds> seq
 +
{| class="wikitable"
 
|-
 
|-
| Example || Example
+
! 1G Ethernet !! 10G Ethernet
 
|-
 
|-
| Example || Example
+
| <syntaxhighlight>Total time run:        62.247466
 +
Total reads made:    242
 +
Read size:            4194304
 +
Bandwidth (MB/sec):    15.551
 +
 
 +
Average Latency:      4.10843
 +
Max latency:          15.2291
 +
Min latency:          0.049004</syntaxhighlight>
 +
|| <syntaxhighlight>Total time run:        1.691854
 +
Total reads made:    42
 +
Read size:            4194304
 +
Bandwidth (MB/sec):    99.299
 +
 
 +
Average Latency:      0.627681
 +
Max latency:          1.60354
 +
Min latency:          0.023299</syntaxhighlight>
 +
|}
 +
 
 +
==== OSDs ====
 +
for j in `seq 10`; do for id in 0 1 2; do ceph osd tell $id bench ; done ; done
 +
From the log files:
 +
{| class="wikitable"
 +
|-
 +
! 1G Ethernet !! 10G Ethernet
 +
|-
 +
| <syntaxhighlight>[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.476351 sec at 228 MB/sec
 +
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.238013 sec at 241 MB/sec
 +
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.347836 sec at 235 MB/sec
 +
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.455193 sec at 229 MB/sec
 +
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.436797 sec at 230 MB/sec
 +
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.459931 sec at 229 MB/sec
 +
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.480481 sec at 228 MB/sec
 +
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.310322 sec at 237 MB/sec
 +
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.377688 sec at 233 MB/sec
 +
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.388466 sec at 233 MB/sec</syntaxhighlight>
 +
|| <syntaxhighlight>[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.581096 sec at 223 MB/sec
 +
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.804200 sec at 213 MB/sec
 +
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.968500 sec at 206 MB/sec
 +
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.667843 sec at 219 MB/sec
 +
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.233254 sec at 241 MB/sec
 +
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.466140 sec at 229 MB/sec
 +
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.575729 sec at 223 MB/sec
 +
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.906584 sec at 208 MB/sec
 +
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 5.392144 sec at 189 MB/sec
 +
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.473388 sec at 228 MB/sec</syntaxhighlight>
 +
|}
 +
 
 +
==== RBD Mapped Devices ====
 +
The following actions have been done on the client machine:
 +
sudo dd if=/dev/zero of=/root/ceph/test/rbdtest/1 bs=1G count=1 oflag=direct
 +
{| class="wikitable"
 +
|-
 +
! 1G Ethernet !! 10G Ethernet
 +
|-
 +
| <syntaxhighlight>[root@Blade6 rbdtest]# sudo dd if=/dev/zero of=/root/ceph/test/rbdtest/1 bs=1G count=1 oflag=direct
 +
1+0 records in
 +
1+0 records out
 +
1073741824 bytes (1.1 GB) copied, 95.8529 s, 11.2 MB/s
 +
 
 +
[root@Blade6 rbdtest]# sudo dd if=/dev/zero of=/root/ceph/test/rbdtest/1 bs=1G count=1 oflag=direct
 +
1+0 records in
 +
1+0 records out
 +
1073741824 bytes (1.1 GB) copied, 86.0203 s, 12.5 MB/s</syntaxhighlight>
 +
|| <syntaxhighlight>[root@Blade8 rbdtest]# sudo dd if=/dev/zero of=/root/ceph/test/rbdtest/1 bs=1G count=1 oflag=direct
 +
1+0 records in
 +
1+0 records out
 +
1073741824 bytes (1.1 GB) copied, 6.56818 s, 163 MB/s
 +
 
 +
[root@Blade8 rbdtest]# sudo dd if=/dev/zero of=/root/ceph/test/rbdtest/1 bs=1G count=1 oflag=direct
 +
1+0 records in
 +
1+0 records out
 +
1073741824 bytes (1.1 GB) copied, 9.56429 s, 112 MB/s</syntaxhighlight>
 
|}
 
|}

Latest revision as of 11:50, 7 May 2013

Environment

Dependencies

LINUX KERNEL

Ceph Kernel Client: We currently recommend:

  • v3.6.6 or later in the v3.6 stable series
  • v3.4.20 or later in the v3.4 stable series
  • btrfs: If you use the btrfs file system with Ceph, we recommend using a recent Linux kernel (v3.5 or later).

Testing Environment

OS: CentOS release 6.3 (Final)
Firewall Disabled

Server nodes
uname -a

Linux Blade3 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Client node
uname -a

Linux Blade8 3.8.8-1.el6.elrepo.x86_64 #1 SMP Wed Apr 17 16:47:58 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux

Install the Ceph Bobtail release

On all the nodes

rpm --import 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc'
su -c 'rpm -Uvh http://ceph.com/rpm-bobtail/el6/x86_64/ceph-release-1-0.el6.noarch.rpm'
yum -y install ceph

Configuration

Create the Ceph Configuration File

  • Location: /etc/ceph/ceph.conf
  • To be copied to all the nodes (servers nodes and clients)
[global]
	auth cluster required = none
	auth service required = none
	auth client required = none
[osd]
	osd journal size = 1000
	filestore xattr use omap = true
	osd mkfs type = ext4
	osd mount options ext4 = user_xattr,rw,noexec,nodev,noatime,nodiratime
[mon.a]
	host = blade3
	mon addr = <IP of blade3>:6789
[mon.b]
	host = blade4
	mon addr = <IP of blade4>:6789
[mon.c]
	host = blade5
	mon addr = <IP of blade5>:6789
[osd.0]
	host = blade3
[osd.1]
	host = blade4
[osd.2]
	host = blade5
[mds.a]
	host = blade3

Create the Ceph deamon working directories

  • The location and naming convention of the directories should be strictly followed.
ssh blade3 mkdir -p /var/lib/ceph/osd/ceph-0
ssh blade4 mkdir -p /var/lib/ceph/osd/ceph-1
ssh blade5 mkdir -p /var/lib/ceph/osd/ceph-2
ssh blade3 mkdir -p /var/lib/ceph/mon/ceph-a
ssh blade4 mkdir -p /var/lib/ceph/mon/ceph-b
ssh blade5 mkdir -p /var/lib/ceph/mon/ceph-c
ssh blade3 mkdir -p /var/lib/ceph/mds/ceph-a

Run the mkcephfs command from a server node

  • Execute the following from a server node
mkcephfs -a -c /etc/ceph/ceph.conf -k ceph.keyring

Start the Ceph Cluster

  • Execute the following from a node that has password less SSH to the other server nodes.
service ceph -a start

Issues

  • Have seen the following issue quite often when starting the cluster
[root@Blade3 ~]# service ceph -a start
=== mon.a ===
=== mon.b ===
=== mon.c ===
=== mds.a ===
=== osd.0 ===
Starting Ceph osd.0 on blade3...
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
=== osd.1 ===
Starting Ceph osd.1 on blade4...
global_init: unable to open config file from search list /tmp/ceph.conf.33b54ef1fee10259e92480001532cf78
failed: 'ssh blade4 ulimit -n 8192;  /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /tmp/ceph.conf.33b54ef1fee10259e92480001532cf78 '
  • Its not picking up the correct file name on the remote server node.
  • Executing the following on the failed node should start the OSD daemon. Eg: In the case run the following on blade4
  • Substitute the correct the name for the ceph.conf file form the tmp directory.
ssh blade4 ulimit -n 8192;  /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /tmp/ceph.conf.<XXX>

Verify Cluster Health

ceph status

 health HEALTH_OK
   monmap e1: 3 mons at {a=100.100.0.3:6789/0,b=100.100.0.4:6789/0,c=100.100.0.5:6789/0}, election epoch 8,  quorum 0,1,2 a,b,c
   osdmap e22: 3 osds: 3 up, 3 in
    pgmap v1198: 1544 pgs: 1544 active+clean; 15766 MB data, 40372 MB used, 100 GB / 147 GB avail
   mdsmap e9: 1/1/1 up {0=a=up:active}
ceph osd tree

# id    weight  type name       up/down reweight
-1      3       root default
-3      3               rack unknownrack
-2      1                       host blade3
0       1                               osd.0   up      1
-4      1                       host blade4
1       1                               osd.1   up      1
-5      1                       host blade5
2       1                               osd.2   up      1

Performance

Local disk benchmark

1G Ethernet 10G Ethernet
[root@Blade6 test]# dd if=/dev/zero of=here bs=1G count=2 oflag=direct
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 4.47646 s, 480 MB/s
[root@Blade6 test]# dd if=/dev/zero of=here bs=1G count=2 oflag=direct
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 4.53849 s, 473 MB/s
[root@Blade6 test]# dd if=/dev/zero of=here bs=1G count=2 oflag=direct
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 4.50215 s, 477 MB/s
[root@Blade8 test]# dd if=/dev/zero of=here bs=1G count=2 oflag=direct
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 4.58078 s, 469 MB/s
[root@Blade8 test]# dd if=/dev/zero of=here bs=1G count=2 oflag=direct
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 3.85319 s, 557 MB/s
[root@Blade8 test]# dd if=/dev/zero of=here bs=1G count=2 oflag=direct
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 4.51147 s, 476 MB/s

Evaluating the network

IPERF

1G Ethernet 10G Ethernet
From server
iperf -s
[root@Blade3 ceph]# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 172.28.0.232 port 5001 connected with 172.28.0.101 port 33826
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.1 sec  1.12 GBytes    948 Mbits/sec

From client

iperf -c 172.28.0.232 -i1 -t 10
[root@Blade6 test]# iperf -c 172.28.0.232 -i1 -t 10
------------------------------------------------------------
Client connecting to 172.28.0.232, TCP port 5001
TCP window size: 64.0 KByte (default)
------------------------------------------------------------
[  3] local 172.28.0.101 port 33826 connected with 172.28.0.232 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec    120 MBytes  1.01 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  1.0- 2.0 sec    118 MBytes    993 Mbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  2.0- 3.0 sec    113 MBytes    949 Mbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  3.0- 4.0 sec    113 MBytes    949 Mbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  4.0- 5.0 sec    113 MBytes    949 Mbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  5.0- 6.0 sec    113 MBytes    949 Mbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  6.0- 7.0 sec    113 MBytes    949 Mbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  7.0- 8.0 sec    113 MBytes    949 Mbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  8.0- 9.0 sec    113 MBytes    949 Mbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  9.0-10.0 sec    113 MBytes    950 Mbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.12 GBytes    956 Mbits/sec
From server
iperf -s
[root@Blade3 ceph]# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 100.100.0.3 port 5001 connected with 100.100.0.8 port 51166
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  10.8 GBytes  9.26 Gbits/sec

From client

iperf -c 100.100.0.3 -i1 -t 10
[root@Blade8 test]# iperf -c 100.100.0.3 -i1 -t 10
------------------------------------------------------------
Client connecting to 100.100.0.3, TCP port 5001
TCP window size: 64.0 KByte (default)
------------------------------------------------------------
[  3] local 100.100.0.8 port 51166 connected with 100.100.0.3 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec    886 MBytes  7.43 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  1.0- 2.0 sec  1.10 GBytes  9.47 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  2.0- 3.0 sec  1.10 GBytes  9.47 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  3.0- 4.0 sec  1.10 GBytes  9.47 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  4.0- 5.0 sec  1.10 GBytes  9.46 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  5.0- 6.0 sec  1.10 GBytes  9.47 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  6.0- 7.0 sec  1.10 GBytes  9.46 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  7.0- 8.0 sec  1.10 GBytes  9.47 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  8.0- 9.0 sec  1.10 GBytes  9.46 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  9.0-10.0 sec  1.10 GBytes  9.47 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  10.8 GBytes  9.26 Gbits/sec

NETCAT

1G Ethernet 10G Ethernet
From server
nc -v -v -l -n 2222 >/dev/null

From the client

time dd if=/dev/zero | nc -v -v -n 172.28.0.232 2222
[root@Blade6 test]# time dd if=/dev/zero | nc -v -v -n 172.28.0.232 2222
Connection to 172.28.0.232 2222 port [tcp/*] succeeded!
^C5101694+0 records in
5101694+0 records out
2612067328 bytes (2.6 GB) copied, 21.9019 s, 119 MB/s

real    0m21.904s
user    0m1.011s
sys     0m10.891s
From server
nc -v -v -l -n 2222 >/dev/null

From client

time dd if=/dev/zero | nc -v -v -n 100.100.0.3 2222
[root@Blade8 test]# time dd if=/dev/zero | nc -v -v -n 100.100.0.3 2222
Connection to 100.100.0.3 2222 port [tcp/*] succeeded!
^C10481314+0 records in
10481314+0 records out
5366432768 bytes (5.4 GB) copied, 30.1598 s, 178 MB/s

real    0m30.163s
user    0m2.491s
sys     0m42.221s

Ceph Benchmarks

Rados internal benchmark

ceph osd pool create pbench 768

Clean the disk cache on Ceph nodes

sudo echo 3 | sudo tee /proc/sys/vm/drop_caches && sudo sync

OSD Writes

rados bench -p pbench <no of seconds> write --no-cleanup
1G Ethernet 10G Ethernet
Total time run:         63.893136
Total writes made:      314
Write size:             4194304
Bandwidth (MB/sec):     19.658
Stddev Bandwidth:       15.0298
Max bandwidth (MB/sec): 60
Min bandwidth (MB/sec): 0
Average Latency:        3.25449

Stddev Latency:         2.84939
Max latency:            15.3193
Min latency:            0.109046
Total time run:         1.674745
Total writes made:      42
Write size:             4194304
Bandwidth (MB/sec):     100.314

Stddev Bandwidth:       70.7107
Max bandwidth (MB/sec): 100
Min bandwidth (MB/sec): 0
Average Latency:        0.632132
Stddev Latency:         0.702116
Max latency:            1.66335
Min latency:            0.049563

OSD Reads:

rados bench -p pbench <no of seconds> seq
1G Ethernet 10G Ethernet
Total time run:        62.247466
Total reads made:     242
Read size:            4194304
Bandwidth (MB/sec):    15.551

Average Latency:       4.10843
Max latency:           15.2291
Min latency:           0.049004
Total time run:        1.691854
Total reads made:     42
Read size:            4194304
Bandwidth (MB/sec):    99.299

Average Latency:       0.627681
Max latency:           1.60354
Min latency:           0.023299

OSDs

for j in `seq 10`; do for id in 0 1 2; do ceph osd tell $id bench ; done ; done

From the log files:

1G Ethernet 10G Ethernet
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.476351 sec at 228 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.238013 sec at 241 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.347836 sec at 235 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.455193 sec at 229 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.436797 sec at 230 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.459931 sec at 229 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.480481 sec at 228 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.310322 sec at 237 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.377688 sec at 233 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.388466 sec at 233 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.581096 sec at 223 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.804200 sec at 213 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.968500 sec at 206 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.667843 sec at 219 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.233254 sec at 241 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.466140 sec at 229 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.575729 sec at 223 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.906584 sec at 208 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 5.392144 sec at 189 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.473388 sec at 228 MB/sec

RBD Mapped Devices

The following actions have been done on the client machine:

sudo dd if=/dev/zero of=/root/ceph/test/rbdtest/1 bs=1G count=1 oflag=direct
1G Ethernet 10G Ethernet
[root@Blade6 rbdtest]# sudo dd if=/dev/zero of=/root/ceph/test/rbdtest/1 bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 95.8529 s, 11.2 MB/s

[root@Blade6 rbdtest]# sudo dd if=/dev/zero of=/root/ceph/test/rbdtest/1 bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 86.0203 s, 12.5 MB/s
[root@Blade8 rbdtest]# sudo dd if=/dev/zero of=/root/ceph/test/rbdtest/1 bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 6.56818 s, 163 MB/s

[root@Blade8 rbdtest]# sudo dd if=/dev/zero of=/root/ceph/test/rbdtest/1 bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 9.56429 s, 112 MB/s