CEPH: Ceph installation using ceph-deploy on Centos 7

From Define Wiki
Jump to navigation Jump to search

Initial Setup

  • 10 nodes, all with Centos 7
  • SSH Keys setup between hosts
  • Firewall disabled (out of laziness)
  • SElinux disabled
  • /etc/hosts sync'd across the servers (node names, reflect their purpose below, mon,osd1 etc)
  • Had to add; yum install redhat-lsb-core

Install ceph-deploy

wget http://download.ceph.com/rpm/el7/noarch/ceph-release-1-1.el7.noarch.rpm
rpm -ivh ceph-release-1-1.el7.noarch.rpm
yum install ceph-deploy

Setup the Systems with CEPH

So before we going deploying and configuring Ceph, we need to install the RPMs on the nodes we'll be using; This just installs Ceph on the nodes and you'll see a lot of debug info while this process progresses.

# Note on OpenHPC; Perform the following if you've already installed OpenStack/Liberty
# $ yum-config-manager --enable epel 
# $ yum-config-manager --disable centos-openstack-liberty
# Also, make sure the hostname on the nodes matches the ceph-mon1, ceph-osd{1,2,3} etc

# not even sure i needed the release arg; Works sequentially, room for improvement! 
 ceph-deploy install --release hammer ceph-osd{1,2,3,4,5}
 ceph-deploy install --release hammer ceph-mon1

# Note; if you dont change the node names the services will start up incorrectly, stop / remove and ceph-deploy again. Your process ID's will be different 
# hostname ceph-mon1
# systemctl | grep ceph
# systemctl stop ceph-mon.head.1461319199.566181162.service
# systemctl stop ceph-mon.ceph-mon1.1461320175.884595648.service 
# systemctl disable ceph-mon.ceph-mon1.1461320175.884595648.service 
# systemctl disable stop ceph-mon.head.1461319199.566181162.service
# ceph-deploy --overwrite-conf mon create ceph-mon1

Create the cluster and setup mon node(s)

 ceph-deploy new ceph-mon1
 ceph-deploy mon create ceph-mon1
# if i had multiple mons;
 ceph-deploy --cluster rbdcluster new ceph-mon{1,2,3}

Before you can provision a host to run OSDs or metadata servers, you must gather monitor keys and the OSD and MDS bootstrap keyrings. To gather keys, enter the following:

ceph-deploy gatherkeys ceph-mon1
# when no longer using ceph-deploy or restarting the install; ceph-deploy forgetkeys

Create the OSDs

Check the disks and prepare them all for Ceph OSD installation

  ceph-deploy disk list ceph-osd1
  ceph-deploy disk zap ceph-osd1:sd{a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,aa,ab,ac,ad,ae,af,ag,ah,ai}
  ceph-deploy osd prepare ceph-osd1:sd{a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v}:/dev/sdd # last entry is the journal for the OSDs, usually an SSD. 
# on one occasion, the prepare didnt work (firewall issue) so follow up commands had to be issued: 
# ceph-deploy osd activate ceph-osd1:sdc1:sda1 # and note, prepare would have stated /dev/sda (block device), and activate would have /dev/sda1 (partition)  
#  ceph-deploy osd activate ceph-osd2:sdc1:sda1
#  ceph-deploy osd activate ceph-osd3:sdc1:sda1

Monitor the system while loading;

[root@ceph-mon1 ~]# ceph -w
    cluster 9180ea1b-1342-479c-b7b4-63296f195d1c
     health HEALTH_WARN
            too few PGs per OSD (1 < min 30)
     monmap e1: 1 mons at {ceph-mon1=172.28.55.6:6789/0}
            election epoch 2, quorum 0 ceph-mon1
     osdmap e1001: 122 osds: 121 up, 121 in
      pgmap v3263: 64 pgs, 1 pools, 0 bytes data, 0 objects
            10714 MB used, 659 TB / 659 TB avail
                  64 active+clean
# lots of output about balancing
2016-03-03 17:19:51.843294 mon.0 [INF] pgmap v3272: 4160 pgs: 440 creating, 2844 creating+peering, 789 creating+activating, 87 active+clean; 0 bytes data, 10737 MB used, 659 TB / 659 TB avail
2016-03-03 17:19:53.261868 mon.0 [INF] pgmap v3273: 4160 pgs: 2 active, 1421 creating+peering, 2507 creating+activating, 230 active+clean; 0 bytes data, 10772 MB used, 659 TB / 659 TB avail
2016-03-03 17:19:54.332496 mon.0 [INF] pgmap v3274: 4160 pgs: 2 active, 1351 creating+peering, 2573 creating+activating, 234 active+clean; 0 bytes data, 10773 MB used, 659 TB / 659 TB avail

You'll need to give Ceph a while to balance. Check the output of the following to verify all OSDs are up;

[root@ceph-mon1 ~]# ceph osd tree
ID  WEIGHT    TYPE NAME          UP/DOWN REWEIGHT PRIMARY-AFFINITY 
 -1 659.89941 root default                                         
 -2 120.11981     host ceph-osd1                                   
  0   5.45999         osd.0           up  1.00000          1.00000 
  1   5.45999         osd.1           up  1.00000          1.00000 
  2   5.45999         osd.2           up  1.00000          1.00000 
  3   5.45999         osd.3           up  1.00000          1.00000 
  4   5.45999         osd.4           up  1.00000          1.00000 
  5   5.45999         osd.5           up  1.00000          1.00000 
  6   5.45999         osd.6           up  1.00000          1.00000 
  7   5.45999         osd.7           up  1.00000          1.00000 
  8   5.45999         osd.8           up  1.00000          1.00000 
  9   5.45999         osd.9           up  1.00000          1.00000 
 10   5.45999         osd.10          up  1.00000          1.00000 
 11   5.45999         osd.11          up  1.00000          1.00000 
 12   5.45999         osd.12          up  1.00000          1.00000 
 13   5.45999         osd.13          up  1.00000          1.00000 
 14   5.45999         osd.14          up  1.00000          1.00000 
 15   5.45999         osd.15          up  1.00000          1.00000 
 16   5.45999         osd.16          up  1.00000          1.00000 
 17   5.45999         osd.17          up  1.00000          1.00000 
 18   5.45999         osd.18          up  1.00000          1.00000 
 19   5.45999         osd.19          up  1.00000          1.00000 
 20   5.45999         osd.20          up  1.00000          1.00000 
 21   5.45999         osd.21          up  1.00000          1.00000 
 -3 125.57980     host ceph-osd2                                   
 22   5.45999         osd.22          up  1.00000          1.00000 
 23   5.45999         osd.23          up  1.00000          1.00000 
 24   5.45999         osd.24          up  1.00000          1.00000 
 25   5.45999         osd.25          up  1.00000          1.00000 
 26   5.45999         osd.26          up  1.00000          1.00000 
 27   5.45999         osd.27          up  1.00000          1.00000 
 28   5.45999         osd.28          up  1.00000          1.00000 
 29   5.45999         osd.29          up  1.00000          1.00000 
 30   5.45999         osd.30          up  1.00000          1.00000 
 31   5.45999         osd.31          up  1.00000          1.00000 
 32   5.45999         osd.32          up  1.00000          1.00000 
 33   5.45999         osd.33          up  1.00000          1.00000 
 34   5.45999         osd.34          up  1.00000          1.00000 
 35   5.45999         osd.35          up  1.00000          1.00000 
 36   5.45999         osd.36          up  1.00000          1.00000 
 39   5.45999         osd.39          up  1.00000          1.00000 
 42   5.45999         osd.42          up  1.00000          1.00000 
 46   5.45999         osd.46          up  1.00000          1.00000 
 49   5.45999         osd.49          up  1.00000          1.00000 
 52   5.45999         osd.52          up  1.00000          1.00000 
 55   5.45999         osd.55          up  1.00000          1.00000 
 58   5.45999         osd.58          up  1.00000          1.00000 
 61   5.45999         osd.61          up  1.00000          1.00000 
 -4 136.24992     host ceph-osd3                                   
 37   5.45000         osd.37          up  1.00000          1.00000 
 40   5.45000         osd.40          up  1.00000          1.00000 
 43   5.45000         osd.43          up  1.00000          1.00000 
 45   5.45000         osd.45          up  1.00000          1.00000 
 48   5.45000         osd.48          up  1.00000          1.00000 
 51   5.45000         osd.51          up  1.00000          1.00000 
 54   5.45000         osd.54          up  1.00000          1.00000 
 57   5.45000         osd.57          up  1.00000          1.00000 
 60   5.45000         osd.60          up  1.00000          1.00000 
 63   5.45000         osd.63          up  1.00000          1.00000 
 65   5.45000         osd.65          up  1.00000          1.00000 
 67   5.45000         osd.67          up  1.00000          1.00000 
 69   5.45000         osd.69          up  1.00000          1.00000 
 71   5.45000         osd.71          up  1.00000          1.00000 
 88   5.45000         osd.88          up  1.00000          1.00000 
 91   5.45000         osd.91          up  1.00000          1.00000 
 94   5.45000         osd.94          up  1.00000          1.00000 
 96   5.45000         osd.96          up  1.00000          1.00000 
 97   5.45000         osd.97          up  1.00000          1.00000 
 98   5.45000         osd.98          up  1.00000          1.00000 
 99   5.45000         osd.99          up  1.00000          1.00000 
101   5.45000         osd.101         up  1.00000          1.00000 
103   5.45000         osd.103         up  1.00000          1.00000 
105   5.45000         osd.105         up  1.00000          1.00000 
107   5.45000         osd.107         up  1.00000          1.00000 
 -5 141.69992     host ceph-osd4                                   
 38   5.45000         osd.38          up  1.00000          1.00000 
 41   5.45000         osd.41          up  1.00000          1.00000 
 44   5.45000         osd.44          up  1.00000          1.00000 
 47   5.45000         osd.47          up  1.00000          1.00000 
 50   5.45000         osd.50          up  1.00000          1.00000 
 53   5.45000         osd.53          up  1.00000          1.00000 
 56   5.45000         osd.56          up  1.00000          1.00000 
 59   5.45000         osd.59          up  1.00000          1.00000 
 62   5.45000         osd.62          up  1.00000          1.00000 
 64   5.45000         osd.64          up  1.00000          1.00000 
 66   5.45000         osd.66          up  1.00000          1.00000 
 68   5.45000         osd.68          up  1.00000          1.00000 
 70   5.45000         osd.70          up  1.00000          1.00000 
 72   5.45000         osd.72          up  1.00000          1.00000 
 73   5.45000         osd.73          up  1.00000          1.00000 
 75   5.45000         osd.75          up  1.00000          1.00000 
 76   5.45000         osd.76          up  1.00000          1.00000 
 77   5.45000         osd.77          up  1.00000          1.00000 
 79   5.45000         osd.79          up  1.00000          1.00000 
 81   5.45000         osd.81          up  1.00000          1.00000 
 83   5.45000         osd.83          up  1.00000          1.00000 
 85   5.45000         osd.85          up  1.00000          1.00000 
 87   5.45000         osd.87          up  1.00000          1.00000 
 90   5.45000         osd.90          up  1.00000          1.00000 
 93   5.45000         osd.93          up  1.00000          1.00000 
 95   5.45000         osd.95          up  1.00000          1.00000 
 -6 136.24992     host ceph-osd5                                   
 78   5.45000         osd.78          up  1.00000          1.00000 
 80   5.45000         osd.80          up  1.00000          1.00000 
 82   5.45000         osd.82          up  1.00000          1.00000 
 84   5.45000         osd.84          up  1.00000          1.00000 
 86   5.45000         osd.86          up  1.00000          1.00000 
 89   5.45000         osd.89          up  1.00000          1.00000 
 92   5.45000         osd.92          up  1.00000          1.00000 
100   5.45000         osd.100         up  1.00000          1.00000 
102   5.45000         osd.102         up  1.00000          1.00000 
104   5.45000         osd.104         up  1.00000          1.00000 
106   5.45000         osd.106         up  1.00000          1.00000 
108   5.45000         osd.108         up  1.00000          1.00000 
109   5.45000         osd.109         up  1.00000          1.00000 
110   5.45000         osd.110         up  1.00000          1.00000 
111   5.45000         osd.111         up  1.00000          1.00000 
112   5.45000         osd.112         up  1.00000          1.00000 
113   5.45000         osd.113         up  1.00000          1.00000 
114   5.45000         osd.114         up  1.00000          1.00000 
115   5.45000         osd.115         up  1.00000          1.00000 
116   5.45000         osd.116         up  1.00000          1.00000 
117   5.45000         osd.117         up  1.00000          1.00000 
118   5.45000         osd.118         up  1.00000          1.00000 
119   5.45000         osd.119         up  1.00000          1.00000 
120   5.45000         osd.120         up  1.00000          1.00000 
121   5.45000         osd.121         up  1.00000          1.00000 
 74         0 osd.74                down        0          1.00000

And check ceph status for the health of the ceph cluster

[root@ceph-mon1 ~]# ceph status
    cluster 9180ea1b-1342-479c-b7b4-63296f195d1c
     health HEALTH_WARN
            too few PGs per OSD (1 < min 30)
     monmap e1: 1 mons at {ceph-mon1=172.28.55.6:6789/0}
            election epoch 2, quorum 0 ceph-mon1
     osdmap e1001: 122 osds: 121 up, 121 in
      pgmap v3263: 64 pgs, 1 pools, 0 bytes data, 0 objects
            10714 MB used, 659 TB / 659 TB avail
                  64 active+clean

Setup the PGs (Placement Groups)

At this stage we are still getting a warning about the PG (placement group). The default pg_num is 64 and we have over 120 OSDs in this configuration. The place group page: http://docs.ceph.com/docs/master/rados/operations/placement-groups/ suggests we need a much a large PG number so lets increase it (or add another pool) .

How much PG you need for a POOL :

Total PGs = (OSDs * 100) / Replicas

[root@ceph-mon1 ~]# ceph osd stat
     osdmap e1004: 122 osds: 121 up, 121 in

Applying formula gives us = ( 121 * 100 ) / 3 = 4033

Now , round up this value to the next power of 2 , this will give you the number of PG you should have for a pool having replication size of 3 and total 121 OSD in entire cluster.

Final Value = 4096 PG

Check the current (Default) pools

[root@ceph-mon1 ~]# ceph osd lspools
0 rbd,
[root@ceph-mon1 ~]# ceph osd dump  | grep rbd
pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
[root@ceph-mon1 ~]# ceph osd pool get rbd pg_num
pg_num: 64

From the above we can see;

  1. replication factor 3 (default value)
  2. pg size 64 (default)

Create a new pool (with the optimal PGs)

Lets create another pool (With a large PG num), then give it a few minutes and check the heath again;

[root@ceph-mon1 ~]# ceph osd pool create pooldp 4096
pool 'pooldp' created
[root@ceph-mon1 ~]# ceph -w 
    cluster 9180ea1b-1342-479c-b7b4-63296f195d1c
     health HEALTH_WARN
            2527 pgs peering
            4096 pgs stuck inactive
            4096 pgs stuck unclean
     monmap e1: 1 mons at {ceph-mon1=172.28.55.6:6789/0}
            election epoch 2, quorum 0 ceph-mon1
     osdmap e1004: 122 osds: 121 up, 121 in
      pgmap v3271: 4160 pgs, 2 pools, 0 bytes data, 0 objects
            10721 MB used, 659 TB / 659 TB avail
                2527 creating+peering
                1219 creating
                 350 creating+activating
                  64 active+clean

2016-03-03 17:19:48.468333 mon.0 [INF] pgmap v3270: 4160 pgs: 1389 creating, 2371 creating+peering, 336 creating+activating, 64 active+clean; 0 bytes data, 10720 MB used, 659 TB / 659 TB avail
2016-03-03 17:19:50.586989 mon.0 [INF] pgmap v3271: 4160 pgs: 1219 creating, 2527 creating+peering, 350 creating+activating, 64 active+clean; 0 bytes data, 10721 MB used, 659 TB / 659 TB avail
2016-03-03 17:19:51.843294 mon.0 [INF] pgmap v3272: 4160 pgs: 440 creating, 2844 creating+peering, 789 creating+activating, 87 active+clean; 0 bytes data, 10737 MB used, 659 TB / 659 TB avail
2016-03-03 17:19:53.261868 mon.0 [INF] pgmap v3273: 4160 pgs: 2 active, 1421 creating+peering, 2507 creating+activating, 230 active+clean; 0 bytes data, 10772 MB used, 659 TB / 659 TB avail
2016-03-03 17:19:54.332496 mon.0 [INF] pgmap v3274: 4160 pgs: 2 active, 1351 creating+peering, 2573 creating+activating, 234 active+clean; 0 bytes data, 10773 MB used, 659 TB / 659 TB avail
2016-03-03 17:19:56.689413 mon.0 [INF] pgmap v3275: 4160 pgs: 2 active, 756 creating+peering, 3084 creating+activating, 318 active+clean; 0 bytes data, 10784 MB used, 659 TB / 659 TB avail
2016-03-03 17:19:58.209627 mon.0 [INF] pgmap v3276: 4160 pgs: 38 active, 148 creating+peering, 2457 creating+activating, 1517 active+clean; 0 bytes data, 10823 MB used, 659 TB / 659 TB avail
2016-03-03 17:19:59.274988 mon.0 [INF] pgmap v3277: 4160 pgs: 50 active, 4 creating+peering, 2401 creating+activating, 1705 active+clean; 0 bytes data, 10829 MB used, 659 TB / 659 TB avail
2016-03-03 17:20:01.569975 mon.0 [INF] pgmap v3278: 4160 pgs: 50 active, 2 creating+peering, 2009 creating+activating, 2099 active+clean; 0 bytes data, 10834 MB used, 659 TB / 659 TB avail
2016-03-03 17:20:02.814976 mon.0 [INF] pgmap v3279: 4160 pgs: 13 active, 454 creating+activating, 3693 active+clean; 0 bytes data, 10860 MB used, 659 TB / 659 TB avail
2016-03-03 17:20:03.886496 mon.0 [INF] pgmap v3280: 4160 pgs: 39 creating+activating, 4121 active+clean; 0 bytes data, 10868 MB used, 659 TB / 659 TB avail
2016-03-03 17:20:06.025433 mon.0 [INF] pgmap v3281: 4160 pgs: 34 creating+activating, 4126 active+clean; 0 bytes data, 10869 MB used, 659 TB / 659 TB avail
2016-03-03 17:20:07.187744 mon.0 [INF] pgmap v3282: 4160 pgs: 6 creating+activating, 4154 active+clean; 0 bytes data, 10870 MB used, 659 TB / 659 TB avail
2016-03-03 17:20:08.283504 mon.0 [INF] pgmap v3283: 4160 pgs: 4160 active+clean; 0 bytes data, 10870 MB used, 659 TB / 659 TB avail
^C
[root@ceph-mon1 ~]# ceph status
    cluster 9180ea1b-1342-479c-b7b4-63296f195d1c
     health HEALTH_OK
     monmap e1: 1 mons at {ceph-mon1=172.28.55.6:6789/0}
            election epoch 2, quorum 0 ceph-mon1
     osdmap e1004: 122 osds: 121 up, 121 in
      pgmap v3283: 4160 pgs, 2 pools, 0 bytes data, 0 objects
            10870 MB used, 659 TB / 659 TB avail
                4160 active+clean
[root@ceph-mon1 ~]#

Good we now have a healthy ceph cluster!

Benchmark your Ceph Cluster (RADOS / Objects)

Ceph includes the rados bench command, designed specifically to benchmark a RADOS storage cluster. To use it, create a storage pool and then use rados bench to perform a write benchmark, as shown below.

The rados command is included with Ceph.

[root@ceph-mon1 ~]# ceph osd pool create scbench 100 100
pool 'scbench' created
[root@ceph-mon1 ~]# rados bench -p scbench 60 write --no-cleanup
 Maintaining 16 concurrent writes of 4194304 bytes for up to 60 seconds or 0 objects
 Object prefix: benchmark_data_ceph-mon1_8522
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
     0       0         0         0         0         0         -         0
     1      16        24         8   31.9888        32  0.488635  0.679065
     2      16        48        32   63.9853        96  0.925952  0.710474
     3      16        75        59    78.652       108  0.800361  0.705261
     4      16        98        82   81.9867        92   0.89478  0.693056
     5      16       124       108   86.3872       104  0.482141  0.674745
     6      16       146       130   86.6546        88   0.50371  0.670405
     7      16       171       155   88.5595       100   0.38974  0.664332
     8      16       196       180   89.9882       100  0.800197  0.678677
     9      16       216       200   88.8775        80  0.219502  0.662146
    10      16       240       224   89.5888        96  0.560353  0.681194
    11      16       266       250    90.898       104  0.590875  0.674103
    12      16       290       274   91.3225        96  0.362163  0.677343
    13      16       315       299   91.9892       100   0.45172  0.671456
    14      16       341       325   92.8464       104   1.52322  0.668701
    15      16       364       348   92.7895        92  0.342974  0.672035
    16      16       386       370   92.4896        88  0.415546  0.668979
    17      16       414       398   93.6366       112  0.380463  0.671389
    18      16       438       422   93.7675        96  0.507864  0.667457
    19      16       465       449   94.5161       108  0.955241  0.662551
2016-03-03 18:07:07.368978min lat: 0.219502 max lat: 2.02651 avg lat: 0.659791
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
    20      16       490       474   94.7898       100  0.326693  0.659791
    21      16       515       499   95.0373       100  0.454321  0.659829
    22      16       542       526    95.626       108  0.484665  0.656761
    23      16       566       550   95.6419        96  0.438458  0.656717
    24      16       591       575   95.8231       100  0.367075  0.653703
    25      16       614       598   95.6689        92  0.344375  0.654561
    26      16       644       628   96.6044       120  0.489909  0.653584
    27      16       668       652   96.5816        96  0.345536  0.653503
    28      16       693       677   96.7033       100  0.440913  0.654526
    29      16       715       699    96.403        88  0.387208  0.652397
    30      16       742       726   96.7892       108   1.08176  0.653314
    31      16       762       746   96.2474        80  0.497605  0.649928
    32      16       789       773   96.6143       108  0.495238   0.65265
    33      16       816       800    96.959       108  0.335065  0.653864
    34      16       839       823   96.8129        92  0.833682  0.650921
    35      16       866       850   97.1322       108  0.750654   0.65168
    36      16       888       872   96.8783        88  0.628799  0.651708
    37      16       911       895   96.7461        92  0.387538  0.651389
    38      16       938       922    97.042       108  0.339795  0.653065
    39      16       964       948   97.2201       104  0.488341  0.652364
2016-03-03 18:07:27.371173min lat: 0.219502 max lat: 2.02651 avg lat: 0.651527
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
    40      16       989       973   97.2894       100  0.502734  0.651527
    41      16      1013       997   97.2577        96   1.05962  0.652523
    42      16      1035      1019   97.0371        88  0.470472  0.652811
    43      16      1061      1045   97.1988       104   1.00319  0.652995
    44      16      1085      1069   97.1714        96  0.817875  0.651415
    45      16      1113      1097   97.5006       112   0.58483   0.65208
    46      16      1139      1123   97.6417       104  0.275317  0.650259
    47      16      1162      1146   97.5215        92  0.334055  0.649269
    48      16      1190      1174   97.8229       112  0.486993  0.649918
    49      16      1211      1195   97.5406        84   0.60307  0.649903
    50      16      1237      1221   97.6696       104  0.492317  0.651083
    51      16      1263      1247   97.7936       104   0.74506   0.65062
    52      16      1285      1269   97.6051        88   1.03266  0.649712
    53      16      1310      1294   97.6501       100  0.426457  0.649229
    54      16      1335      1319   97.6934       100  0.552609  0.649796
    55      16      1361      1345   97.8079       104  0.423608  0.649596
    56      16      1380      1364   97.4184        76  0.965162  0.651008
    57      16      1408      1392    97.674       112  0.523899  0.650617
    58      16      1430      1414   97.5071        88  0.417439  0.650343
    59      16      1454      1438   97.4814        96  0.467891  0.650477
2016-03-03 18:07:47.373053min lat: 0.219502 max lat: 2.02651 avg lat: 0.651066
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
    60      16      1481      1465   97.6565       108  0.939101  0.651066
 Total time run:         60.510059
Total writes made:      1482
Write size:             4194304
Bandwidth (MB/sec):     97.967 

Stddev Bandwidth:       17.5802
Max bandwidth (MB/sec): 120
Min bandwidth (MB/sec): 0
Average Latency:        0.652469
Stddev Latency:         0.27032
Max latency:            2.02651
Min latency:            0.219502

Which seems about right for a 1GB link;

This creates a new pool named 'scbench' and then performs a write benchmark for 60 seconds. Notice the --no-cleanup option, which leaves behind some data. The output gives you a good indicator of how fast your cluster can write data.

Two types of read benchmarks are available: seq for sequential reads and rand for random reads. To perform a read benchmark, use the commands below:

 rados bench -p scbench 10 seq # hmm this crashes on hammer?!
 rados bench -p scbench 10 rand # this one runs ok

You can also add the -t parameter to increase the concurrency of reads and writes (defaults to 16 threads), or the -b parameter to change the size of the object being written (defaults to 4 MB). It's also a good idea to run multiple copies of this benchmark against different pools, to see how performance changes with multiple clients.

You can clean up the benchmark data left behind by the write benchmark with this command:

rados -p scbench cleanup

Benchmark your Ceph Cluster (RBD/Block)

Lets setup the block device benchmark;

[root@ceph-mon1 ~]# ceph osd pool create rbdbench 100 100
pool 'rbdbench' created
[root@ceph-mon1 ~]# rbd create image01 --size 1024 --pool rbdbench
[root@ceph-mon1 ~]# rbd map image01 --pool rbdbench --name client.admin
/dev/rbd0
[root@ceph-mon1 ~]# ls /dev/rbd
rbd/  rbd0  
[root@ceph-mon1 ~]# ls /dev/rbd/rbdbench/image01 -lah
lrwxrwxrwx 1 root root 10 Mar  3 18:25 /dev/rbd/rbdbench/image01 -> ../../rbd0
[root@ceph-mon1 ~]# mkfs.ext4 -m0 /dev/rbd0
# snip
[root@ceph-mon1 ~]# mkdir /mnt/ceph-block-device
[root@ceph-mon1 ~]# mount /dev/rbd/rbdbench/image01 /mnt/ceph-block-device
[root@ceph-mon1 ~]# df -h
# snip
/dev/rbd0                         976M  2.6M  958M   1% /mnt/ceph-block-device

And run the rbd tests;

[root@ceph-mon1 ~]# rbd bench-write image01 --pool=rbdbench
bench-write  io_size 4096 io_threads 16 bytes 1073741824 pattern seq
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     13023  13039.19  53408507.58
    2     26663  13116.23  53724063.44
    3     41471  13829.29  56644767.46
    4     55679  13754.19  56337163.38
    5     69292  13695.71  56097642.82
    6     82949  13956.94  57167624.18
    7     96818  14124.95  57855797.69
    8    110524  13692.56  56084731.58
    9    125270  13974.68  57240275.09
   10    137981  13874.42  56829606.72
   11    151636  13722.28  56206447.11
   12    164513  13541.25  55464962.50
   13    177053  13421.49  54974406.85
   14    190050  13032.00  53379086.79
   15    204131  13260.21  54313806.63
   16    218337  13346.96  54669135.15
   17    233748  13847.05  56717512.07
   18    246706  13930.60  57059725.75
   19    260079  13985.05  57282745.63
elapsed:    19  ops:   262144  ops/sec: 13395.91  bytes/sec: 54869667.12

Lets try fio (which has rbd support now), we'll need to install latest version with rbd support

yum groupinstall 'Development tools' 
 yum install librbd1.x86_64 librbd1-devel.x86_64
yum install zlib-devel
yum install git
git clone git://git.kernel.dk/fio.git
cd fio/
./configure
# ensure the following; 
Rados Block Device engine     yes
rbd_invalidate_cache          yes
make

With fio built and ready;

[root@ceph-mon1 fio]# cat rbd.fio 
[global]
ioengine=rbd
clientname=admin
pool=rbdbench
rbdname=image01
rw=randwrite
bs=4k

[rbd_iodepth32]
iodepth=32
[root@ceph-mon1 fio]# ./fio ./rbd.fio 
rbd_iodepth32: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32
fio-2.6-21-g8116
Starting 1 process
rbd engine: RBD version: 0.1.9
Jobs: 1 (f=1): [w(1)] [15.6% done] [0KB/3244KB/0KB /s] [0/811/0 iops] [eta 10m:48s]