Difference between revisions of "CEPH: Ceph on the Blades"
Jump to navigation
Jump to search
m (→OSDs) |
|||
| (3 intermediate revisions by the same user not shown) | |||
| Line 314: | Line 314: | ||
OSD Reads: | OSD Reads: | ||
| − | + | rados bench -p pbench <no of seconds> seq | |
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
| Line 344: | Line 344: | ||
! 1G Ethernet !! 10G Ethernet | ! 1G Ethernet !! 10G Ethernet | ||
|- | |- | ||
| − | | <syntaxhighlight> | + | | <syntaxhighlight>[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.476351 sec at 228 MB/sec |
| − | + | [INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.238013 sec at 241 MB/sec | |
| − | + | [INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.347836 sec at 235 MB/sec | |
| − | + | [INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.455193 sec at 229 MB/sec | |
| − | + | [INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.436797 sec at 230 MB/sec | |
| − | + | [INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.459931 sec at 229 MB/sec | |
| − | + | [INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.480481 sec at 228 MB/sec | |
| − | + | [INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.310322 sec at 237 MB/sec | |
| − | + | [INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.377688 sec at 233 MB/sec | |
| − | + | [INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.388466 sec at 233 MB/sec</syntaxhighlight> | |
| − | || <syntaxhighlight> | + | || <syntaxhighlight>[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.581096 sec at 223 MB/sec |
| − | + | [INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.804200 sec at 213 MB/sec | |
| − | + | [INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.968500 sec at 206 MB/sec | |
| − | + | [INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.667843 sec at 219 MB/sec | |
| − | + | [INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.233254 sec at 241 MB/sec | |
| − | + | [INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.466140 sec at 229 MB/sec | |
| − | + | [INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.575729 sec at 223 MB/sec | |
| − | + | [INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.906584 sec at 208 MB/sec | |
| − | + | [INF] bench: wrote 1024 MB in blocks of 4096 KB in 5.392144 sec at 189 MB/sec | |
| − | + | [INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.473388 sec at 228 MB/sec</syntaxhighlight> | |
|} | |} | ||
==== RBD Mapped Devices ==== | ==== RBD Mapped Devices ==== | ||
| + | The following actions have been done on the client machine: | ||
| + | sudo dd if=/dev/zero of=/root/ceph/test/rbdtest/1 bs=1G count=1 oflag=direct | ||
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
! 1G Ethernet !! 10G Ethernet | ! 1G Ethernet !! 10G Ethernet | ||
|- | |- | ||
| − | | <syntaxhighlight></syntaxhighlight> | + | | <syntaxhighlight>[root@Blade6 rbdtest]# sudo dd if=/dev/zero of=/root/ceph/test/rbdtest/1 bs=1G count=1 oflag=direct |
| − | || <syntaxhighlight></syntaxhighlight> | + | 1+0 records in |
| + | 1+0 records out | ||
| + | 1073741824 bytes (1.1 GB) copied, 95.8529 s, 11.2 MB/s | ||
| + | |||
| + | [root@Blade6 rbdtest]# sudo dd if=/dev/zero of=/root/ceph/test/rbdtest/1 bs=1G count=1 oflag=direct | ||
| + | 1+0 records in | ||
| + | 1+0 records out | ||
| + | 1073741824 bytes (1.1 GB) copied, 86.0203 s, 12.5 MB/s</syntaxhighlight> | ||
| + | || <syntaxhighlight>[root@Blade8 rbdtest]# sudo dd if=/dev/zero of=/root/ceph/test/rbdtest/1 bs=1G count=1 oflag=direct | ||
| + | 1+0 records in | ||
| + | 1+0 records out | ||
| + | 1073741824 bytes (1.1 GB) copied, 6.56818 s, 163 MB/s | ||
| + | |||
| + | [root@Blade8 rbdtest]# sudo dd if=/dev/zero of=/root/ceph/test/rbdtest/1 bs=1G count=1 oflag=direct | ||
| + | 1+0 records in | ||
| + | 1+0 records out | ||
| + | 1073741824 bytes (1.1 GB) copied, 9.56429 s, 112 MB/s</syntaxhighlight> | ||
|} | |} | ||
Latest revision as of 11:50, 7 May 2013
Environment
Dependencies
- LINUX KERNEL
Ceph Kernel Client: We currently recommend:
- v3.6.6 or later in the v3.6 stable series
- v3.4.20 or later in the v3.4 stable series
- btrfs: If you use the btrfs file system with Ceph, we recommend using a recent Linux kernel (v3.5 or later).
Testing Environment
OS: CentOS release 6.3 (Final)
Firewall Disabled
| Server nodes |
|---|
| uname -a
Linux Blade3 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux |
| Client node |
|---|
| uname -a
Linux Blade8 3.8.8-1.el6.elrepo.x86_64 #1 SMP Wed Apr 17 16:47:58 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux |
Install the Ceph Bobtail release
On all the nodes
rpm --import 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' su -c 'rpm -Uvh http://ceph.com/rpm-bobtail/el6/x86_64/ceph-release-1-0.el6.noarch.rpm' yum -y install ceph
Configuration
Create the Ceph Configuration File
- Location: /etc/ceph/ceph.conf
- To be copied to all the nodes (servers nodes and clients)
[global]
auth cluster required = none
auth service required = none
auth client required = none
[osd]
osd journal size = 1000
filestore xattr use omap = true
osd mkfs type = ext4
osd mount options ext4 = user_xattr,rw,noexec,nodev,noatime,nodiratime
[mon.a]
host = blade3
mon addr = <IP of blade3>:6789
[mon.b]
host = blade4
mon addr = <IP of blade4>:6789
[mon.c]
host = blade5
mon addr = <IP of blade5>:6789
[osd.0]
host = blade3
[osd.1]
host = blade4
[osd.2]
host = blade5
[mds.a]
host = blade3Create the Ceph deamon working directories
- The location and naming convention of the directories should be strictly followed.
ssh blade3 mkdir -p /var/lib/ceph/osd/ceph-0
ssh blade4 mkdir -p /var/lib/ceph/osd/ceph-1
ssh blade5 mkdir -p /var/lib/ceph/osd/ceph-2
ssh blade3 mkdir -p /var/lib/ceph/mon/ceph-a
ssh blade4 mkdir -p /var/lib/ceph/mon/ceph-b
ssh blade5 mkdir -p /var/lib/ceph/mon/ceph-c
ssh blade3 mkdir -p /var/lib/ceph/mds/ceph-aRun the mkcephfs command from a server node
- Execute the following from a server node
mkcephfs -a -c /etc/ceph/ceph.conf -k ceph.keyring
Start the Ceph Cluster
- Execute the following from a node that has password less SSH to the other server nodes.
service ceph -a start
Issues
- Have seen the following issue quite often when starting the cluster
[root@Blade3 ~]# service ceph -a start
=== mon.a ===
=== mon.b ===
=== mon.c ===
=== mds.a ===
=== osd.0 ===
Starting Ceph osd.0 on blade3...
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
=== osd.1 ===
Starting Ceph osd.1 on blade4...
global_init: unable to open config file from search list /tmp/ceph.conf.33b54ef1fee10259e92480001532cf78
failed: 'ssh blade4 ulimit -n 8192; /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /tmp/ceph.conf.33b54ef1fee10259e92480001532cf78 '- Its not picking up the correct file name on the remote server node.
- Executing the following on the failed node should start the OSD daemon. Eg: In the case run the following on blade4
- Substitute the correct the name for the ceph.conf file form the tmp directory.
ssh blade4 ulimit -n 8192; /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /tmp/ceph.conf.<XXX>Verify Cluster Health
ceph status
health HEALTH_OK
monmap e1: 3 mons at {a=100.100.0.3:6789/0,b=100.100.0.4:6789/0,c=100.100.0.5:6789/0}, election epoch 8, quorum 0,1,2 a,b,c
osdmap e22: 3 osds: 3 up, 3 in
pgmap v1198: 1544 pgs: 1544 active+clean; 15766 MB data, 40372 MB used, 100 GB / 147 GB avail
mdsmap e9: 1/1/1 up {0=a=up:active}ceph osd tree
# id weight type name up/down reweight
-1 3 root default
-3 3 rack unknownrack
-2 1 host blade3
0 1 osd.0 up 1
-4 1 host blade4
1 1 osd.1 up 1
-5 1 host blade5
2 1 osd.2 up 1Performance
Local disk benchmark
| 1G Ethernet | 10G Ethernet |
|---|---|
[root@Blade6 test]# dd if=/dev/zero of=here bs=1G count=2 oflag=direct
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 4.47646 s, 480 MB/s
[root@Blade6 test]# dd if=/dev/zero of=here bs=1G count=2 oflag=direct
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 4.53849 s, 473 MB/s
[root@Blade6 test]# dd if=/dev/zero of=here bs=1G count=2 oflag=direct
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 4.50215 s, 477 MB/s |
[root@Blade8 test]# dd if=/dev/zero of=here bs=1G count=2 oflag=direct
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 4.58078 s, 469 MB/s
[root@Blade8 test]# dd if=/dev/zero of=here bs=1G count=2 oflag=direct
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 3.85319 s, 557 MB/s
[root@Blade8 test]# dd if=/dev/zero of=here bs=1G count=2 oflag=direct
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 4.51147 s, 476 MB/s |
Evaluating the network
IPERF
| 1G Ethernet | 10G Ethernet |
|---|---|
From server
iperf -s [root@Blade3 ceph]# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 172.28.0.232 port 5001 connected with 172.28.0.101 port 33826
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.1 sec 1.12 GBytes 948 Mbits/secFrom client iperf -c 172.28.0.232 -i1 -t 10 [root@Blade6 test]# iperf -c 172.28.0.232 -i1 -t 10
------------------------------------------------------------
Client connecting to 172.28.0.232, TCP port 5001
TCP window size: 64.0 KByte (default)
------------------------------------------------------------
[ 3] local 172.28.0.101 port 33826 connected with 172.28.0.232 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 120 MBytes 1.01 Gbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 1.0- 2.0 sec 118 MBytes 993 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 2.0- 3.0 sec 113 MBytes 949 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 3.0- 4.0 sec 113 MBytes 949 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 4.0- 5.0 sec 113 MBytes 949 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 5.0- 6.0 sec 113 MBytes 949 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 6.0- 7.0 sec 113 MBytes 949 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 7.0- 8.0 sec 113 MBytes 949 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 8.0- 9.0 sec 113 MBytes 949 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 9.0-10.0 sec 113 MBytes 950 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.12 GBytes 956 Mbits/sec |
From server
iperf -s [root@Blade3 ceph]# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 100.100.0.3 port 5001 connected with 100.100.0.8 port 51166
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 10.8 GBytes 9.26 Gbits/secFrom client iperf -c 100.100.0.3 -i1 -t 10 [root@Blade8 test]# iperf -c 100.100.0.3 -i1 -t 10
------------------------------------------------------------
Client connecting to 100.100.0.3, TCP port 5001
TCP window size: 64.0 KByte (default)
------------------------------------------------------------
[ 3] local 100.100.0.8 port 51166 connected with 100.100.0.3 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 886 MBytes 7.43 Gbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 1.0- 2.0 sec 1.10 GBytes 9.47 Gbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 2.0- 3.0 sec 1.10 GBytes 9.47 Gbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 3.0- 4.0 sec 1.10 GBytes 9.47 Gbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 4.0- 5.0 sec 1.10 GBytes 9.46 Gbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 5.0- 6.0 sec 1.10 GBytes 9.47 Gbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 6.0- 7.0 sec 1.10 GBytes 9.46 Gbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 7.0- 8.0 sec 1.10 GBytes 9.47 Gbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 8.0- 9.0 sec 1.10 GBytes 9.46 Gbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 9.0-10.0 sec 1.10 GBytes 9.47 Gbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 10.8 GBytes 9.26 Gbits/sec |
NETCAT
| 1G Ethernet | 10G Ethernet |
|---|---|
From server
nc -v -v -l -n 2222 >/dev/null From the client time dd if=/dev/zero | nc -v -v -n 172.28.0.232 2222 [root@Blade6 test]# time dd if=/dev/zero | nc -v -v -n 172.28.0.232 2222
Connection to 172.28.0.232 2222 port [tcp/*] succeeded!
^C5101694+0 records in
5101694+0 records out
2612067328 bytes (2.6 GB) copied, 21.9019 s, 119 MB/s
real 0m21.904s
user 0m1.011s
sys 0m10.891s |
From server
nc -v -v -l -n 2222 >/dev/null From client time dd if=/dev/zero | nc -v -v -n 100.100.0.3 2222 [root@Blade8 test]# time dd if=/dev/zero | nc -v -v -n 100.100.0.3 2222
Connection to 100.100.0.3 2222 port [tcp/*] succeeded!
^C10481314+0 records in
10481314+0 records out
5366432768 bytes (5.4 GB) copied, 30.1598 s, 178 MB/s
real 0m30.163s
user 0m2.491s
sys 0m42.221s |
Ceph Benchmarks
Rados internal benchmark
ceph osd pool create pbench 768
Clean the disk cache on Ceph nodes
sudo echo 3 | sudo tee /proc/sys/vm/drop_caches && sudo sync
OSD Writes
rados bench -p pbench <no of seconds> write --no-cleanup
| 1G Ethernet | 10G Ethernet |
|---|---|
Total time run: 63.893136
Total writes made: 314
Write size: 4194304
Bandwidth (MB/sec): 19.658
Stddev Bandwidth: 15.0298
Max bandwidth (MB/sec): 60
Min bandwidth (MB/sec): 0
Average Latency: 3.25449
Stddev Latency: 2.84939
Max latency: 15.3193
Min latency: 0.109046 |
Total time run: 1.674745
Total writes made: 42
Write size: 4194304
Bandwidth (MB/sec): 100.314
Stddev Bandwidth: 70.7107
Max bandwidth (MB/sec): 100
Min bandwidth (MB/sec): 0
Average Latency: 0.632132
Stddev Latency: 0.702116
Max latency: 1.66335
Min latency: 0.049563 |
OSD Reads:
rados bench -p pbench <no of seconds> seq
| 1G Ethernet | 10G Ethernet |
|---|---|
Total time run: 62.247466
Total reads made: 242
Read size: 4194304
Bandwidth (MB/sec): 15.551
Average Latency: 4.10843
Max latency: 15.2291
Min latency: 0.049004 |
Total time run: 1.691854
Total reads made: 42
Read size: 4194304
Bandwidth (MB/sec): 99.299
Average Latency: 0.627681
Max latency: 1.60354
Min latency: 0.023299 |
OSDs
for j in `seq 10`; do for id in 0 1 2; do ceph osd tell $id bench ; done ; done
From the log files:
| 1G Ethernet | 10G Ethernet |
|---|---|
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.476351 sec at 228 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.238013 sec at 241 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.347836 sec at 235 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.455193 sec at 229 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.436797 sec at 230 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.459931 sec at 229 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.480481 sec at 228 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.310322 sec at 237 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.377688 sec at 233 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.388466 sec at 233 MB/sec |
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.581096 sec at 223 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.804200 sec at 213 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.968500 sec at 206 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.667843 sec at 219 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.233254 sec at 241 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.466140 sec at 229 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.575729 sec at 223 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.906584 sec at 208 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 5.392144 sec at 189 MB/sec
[INF] bench: wrote 1024 MB in blocks of 4096 KB in 4.473388 sec at 228 MB/sec |
RBD Mapped Devices
The following actions have been done on the client machine:
sudo dd if=/dev/zero of=/root/ceph/test/rbdtest/1 bs=1G count=1 oflag=direct
| 1G Ethernet | 10G Ethernet |
|---|---|
[root@Blade6 rbdtest]# sudo dd if=/dev/zero of=/root/ceph/test/rbdtest/1 bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 95.8529 s, 11.2 MB/s
[root@Blade6 rbdtest]# sudo dd if=/dev/zero of=/root/ceph/test/rbdtest/1 bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 86.0203 s, 12.5 MB/s |
[root@Blade8 rbdtest]# sudo dd if=/dev/zero of=/root/ceph/test/rbdtest/1 bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 6.56818 s, 163 MB/s
[root@Blade8 rbdtest]# sudo dd if=/dev/zero of=/root/ceph/test/rbdtest/1 bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 9.56429 s, 112 MB/s |