Difference between revisions of "CEPH: Ceph on the Blades"

From Define Wiki
Jump to navigation Jump to search
Line 327: Line 327:
 
Max latency:          15.2291
 
Max latency:          15.2291
 
Min latency:          0.049004</syntaxhighlight>
 
Min latency:          0.049004</syntaxhighlight>
|| <syntaxhighlight> Total time run:        1.691854
+
|| <syntaxhighlight>Total time run:        1.691854
 
Total reads made:    42
 
Total reads made:    42
 
Read size:            4194304
 
Read size:            4194304

Revision as of 11:04, 5 May 2013

Environment

Dependencies

LINUX KERNEL

Ceph Kernel Client: We currently recommend:

  • v3.6.6 or later in the v3.6 stable series
  • v3.4.20 or later in the v3.4 stable series
  • btrfs: If you use the btrfs file system with Ceph, we recommend using a recent Linux kernel (v3.5 or later).

Testing Environment

OS: CentOS release 6.3 (Final)
Firewall Disabled

Server nodes
uname -a

Linux Blade3 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Client node
uname -a

Linux Blade8 3.8.8-1.el6.elrepo.x86_64 #1 SMP Wed Apr 17 16:47:58 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux

Install the Ceph Bobtail release

On all the nodes

rpm --import 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc'
su -c 'rpm -Uvh http://ceph.com/rpm-bobtail/el6/x86_64/ceph-release-1-0.el6.noarch.rpm'
yum -y install ceph

Configuration

Create the Ceph Configuration File

  • Location: /etc/ceph/ceph.conf
  • To be copied to all the nodes (servers nodes and clients)
[global]
	auth cluster required = none
	auth service required = none
	auth client required = none
[osd]
	osd journal size = 1000
	filestore xattr use omap = true
	osd mkfs type = ext4
	osd mount options ext4 = user_xattr,rw,noexec,nodev,noatime,nodiratime
[mon.a]
	host = blade3
	mon addr = <IP of blade3>:6789
[mon.b]
	host = blade4
	mon addr = <IP of blade4>:6789
[mon.c]
	host = blade5
	mon addr = <IP of blade5>:6789
[osd.0]
	host = blade3
[osd.1]
	host = blade4
[osd.2]
	host = blade5
[mds.a]
	host = blade3

Create the Ceph deamon working directories

  • The location and naming convention of the directories should be strictly followed.
ssh blade3 mkdir -p /var/lib/ceph/osd/ceph-0
ssh blade4 mkdir -p /var/lib/ceph/osd/ceph-1
ssh blade5 mkdir -p /var/lib/ceph/osd/ceph-2
ssh blade3 mkdir -p /var/lib/ceph/mon/ceph-a
ssh blade4 mkdir -p /var/lib/ceph/mon/ceph-b
ssh blade5 mkdir -p /var/lib/ceph/mon/ceph-c
ssh blade3 mkdir -p /var/lib/ceph/mds/ceph-a

Run the mkcephfs command from a server node

  • Execute the following from a server node
mkcephfs -a -c /etc/ceph/ceph.conf -k ceph.keyring

Start the Ceph Cluster

  • Execute the following from a node that has password less SSH to the other server nodes.
service ceph -a start

Issues

  • Have seen the following issue quite often when starting the cluster
[root@Blade3 ~]# service ceph -a start
=== mon.a ===
=== mon.b ===
=== mon.c ===
=== mds.a ===
=== osd.0 ===
Starting Ceph osd.0 on blade3...
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
=== osd.1 ===
Starting Ceph osd.1 on blade4...
global_init: unable to open config file from search list /tmp/ceph.conf.33b54ef1fee10259e92480001532cf78
failed: 'ssh blade4 ulimit -n 8192;  /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /tmp/ceph.conf.33b54ef1fee10259e92480001532cf78 '
  • Its not picking up the correct file name on the remote server node.
  • Executing the following on the failed node should start the OSD daemon. Eg: In the case run the following on blade4
  • Substitute the correct the name for the ceph.conf file form the tmp directory.
ssh blade4 ulimit -n 8192;  /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /tmp/ceph.conf.<XXX>

Verify Cluster Health

ceph status

 health HEALTH_OK
   monmap e1: 3 mons at {a=100.100.0.3:6789/0,b=100.100.0.4:6789/0,c=100.100.0.5:6789/0}, election epoch 8,  quorum 0,1,2 a,b,c
   osdmap e22: 3 osds: 3 up, 3 in
    pgmap v1198: 1544 pgs: 1544 active+clean; 15766 MB data, 40372 MB used, 100 GB / 147 GB avail
   mdsmap e9: 1/1/1 up {0=a=up:active}
ceph osd tree

# id    weight  type name       up/down reweight
-1      3       root default
-3      3               rack unknownrack
-2      1                       host blade3
0       1                               osd.0   up      1
-4      1                       host blade4
1       1                               osd.1   up      1
-5      1                       host blade5
2       1                               osd.2   up      1

Performance

Local disk benchmark

1G Ethernet 10G Ethernet
[root@Blade6 test]# dd if=/dev/zero of=here bs=1G count=2 oflag=direct
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 4.47646 s, 480 MB/s
[root@Blade6 test]# dd if=/dev/zero of=here bs=1G count=2 oflag=direct
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 4.53849 s, 473 MB/s
[root@Blade6 test]# dd if=/dev/zero of=here bs=1G count=2 oflag=direct
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 4.50215 s, 477 MB/s
[root@Blade8 test]# dd if=/dev/zero of=here bs=1G count=2 oflag=direct
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 4.58078 s, 469 MB/s
[root@Blade8 test]# dd if=/dev/zero of=here bs=1G count=2 oflag=direct
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 3.85319 s, 557 MB/s
[root@Blade8 test]# dd if=/dev/zero of=here bs=1G count=2 oflag=direct
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 4.51147 s, 476 MB/s

Evaluating the network

IPERF

1G Ethernet 10G Ethernet
From server
iperf -s
[root@Blade3 ceph]# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 172.28.0.232 port 5001 connected with 172.28.0.101 port 33826
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.1 sec  1.12 GBytes    948 Mbits/sec

From client

iperf -c 172.28.0.232 -i1 -t 10
[root@Blade6 test]# iperf -c 172.28.0.232 -i1 -t 10
------------------------------------------------------------
Client connecting to 172.28.0.232, TCP port 5001
TCP window size: 64.0 KByte (default)
------------------------------------------------------------
[  3] local 172.28.0.101 port 33826 connected with 172.28.0.232 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec    120 MBytes  1.01 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  1.0- 2.0 sec    118 MBytes    993 Mbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  2.0- 3.0 sec    113 MBytes    949 Mbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  3.0- 4.0 sec    113 MBytes    949 Mbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  4.0- 5.0 sec    113 MBytes    949 Mbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  5.0- 6.0 sec    113 MBytes    949 Mbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  6.0- 7.0 sec    113 MBytes    949 Mbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  7.0- 8.0 sec    113 MBytes    949 Mbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  8.0- 9.0 sec    113 MBytes    949 Mbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  9.0-10.0 sec    113 MBytes    950 Mbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.12 GBytes    956 Mbits/sec
From server
iperf -s
[root@Blade3 ceph]# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 100.100.0.3 port 5001 connected with 100.100.0.8 port 51166
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  10.8 GBytes  9.26 Gbits/sec

From client

iperf -c 100.100.0.3 -i1 -t 10
[root@Blade8 test]# iperf -c 100.100.0.3 -i1 -t 10
------------------------------------------------------------
Client connecting to 100.100.0.3, TCP port 5001
TCP window size: 64.0 KByte (default)
------------------------------------------------------------
[  3] local 100.100.0.8 port 51166 connected with 100.100.0.3 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec    886 MBytes  7.43 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  1.0- 2.0 sec  1.10 GBytes  9.47 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  2.0- 3.0 sec  1.10 GBytes  9.47 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  3.0- 4.0 sec  1.10 GBytes  9.47 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  4.0- 5.0 sec  1.10 GBytes  9.46 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  5.0- 6.0 sec  1.10 GBytes  9.47 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  6.0- 7.0 sec  1.10 GBytes  9.46 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  7.0- 8.0 sec  1.10 GBytes  9.47 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  8.0- 9.0 sec  1.10 GBytes  9.46 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  9.0-10.0 sec  1.10 GBytes  9.47 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  10.8 GBytes  9.26 Gbits/sec

NETCAT

1G Ethernet 10G Ethernet
From server
nc -v -v -l -n 2222 >/dev/null

From the client

time dd if=/dev/zero | nc -v -v -n 172.28.0.232 2222
[root@Blade6 test]# time dd if=/dev/zero | nc -v -v -n 172.28.0.232 2222
Connection to 172.28.0.232 2222 port [tcp/*] succeeded!
^C5101694+0 records in
5101694+0 records out
2612067328 bytes (2.6 GB) copied, 21.9019 s, 119 MB/s

real    0m21.904s
user    0m1.011s
sys     0m10.891s
From server
nc -v -v -l -n 2222 >/dev/null

From client

time dd if=/dev/zero | nc -v -v -n 100.100.0.3 2222
[root@Blade8 test]# time dd if=/dev/zero | nc -v -v -n 100.100.0.3 2222
Connection to 100.100.0.3 2222 port [tcp/*] succeeded!
^C10481314+0 records in
10481314+0 records out
5366432768 bytes (5.4 GB) copied, 30.1598 s, 178 MB/s

real    0m30.163s
user    0m2.491s
sys     0m42.221s

Ceph Benchmarks

Rados internal benchmark

ceph osd pool create pbench 768

Clean the disk cache on Ceph nodes

sudo echo 3 | sudo tee /proc/sys/vm/drop_caches && sudo sync

OSD Writes

rados bench -p pbench <no of seconds> write --no-cleanup
1G Ethernet 10G Ethernet
Total time run:         63.893136
Total writes made:      314
Write size:             4194304
Bandwidth (MB/sec):     19.658
Stddev Bandwidth:       15.0298
Max bandwidth (MB/sec): 60
Min bandwidth (MB/sec): 0
Average Latency:        3.25449

Stddev Latency:         2.84939
Max latency:            15.3193
Min latency:            0.109046
Total time run:         1.674745
Total writes made:      42
Write size:             4194304
Bandwidth (MB/sec):     100.314

Stddev Bandwidth:       70.7107
Max bandwidth (MB/sec): 100
Min bandwidth (MB/sec): 0
Average Latency:        0.632132
Stddev Latency:         0.702116
Max latency:            1.66335
Min latency:            0.049563

OSD Reads:

1G Ethernet 10G Ethernet
Total time run:        62.247466
Total reads made:     242
Read size:            4194304
Bandwidth (MB/sec):    15.551

Average Latency:       4.10843
Max latency:           15.2291
Min latency:           0.049004
Total time run:        1.691854
Total reads made:     42
Read size:            4194304
Bandwidth (MB/sec):    99.299

Average Latency:       0.627681
Max latency:           1.60354
Min latency:           0.023299

OSDs

1G Ethernet 10G Ethernet

RBD Mapped Devices

1G Ethernet 10G Ethernet