Lustre: Installing and setting up a RobinHood server (RBH)

From Define Wiki
Revision as of 20:34, 5 September 2016 by Shanal (talk | contribs) (→‎Configure MDS Changelogs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Enabling the HSM Coordinator for an Existing File System

The HSM Coordinator can be enabled at any time after a file system has been created, provided that the file system is based on Lustre 2.5.0 or newer, the Metadata service is online, and the MDT and MGT are mounted.

  1. Log into Intel® Manager for Lustre dashboard as a superuser.
  2. Navigate to the File Systems window: Click Configuration > File Systems.
  3. Select the file system for which HSM support is to be enabled from the list of file systems.
  4. Under Metadata Target, click the name of the metadata target.
  5. Click the Advanced tab and look for the property hsm_control.
  6. Enter the string enabled into the text field.
  7. Click the Apply button. The background will briefly flash to grey then back to white to indicate that the setting has been applied. Click Close when done.

Verifying the Current Status of the Coordinator

The most reliable way to verify the status of the HSM Coordinator process is to log into the metadata server that currently has the MDT mounted. Then enter the following command:

 lctl get_param mdt.*.hsm_control

For example:

  [root@mds2 ~]#  lctl get_param mdt.*.hsm_control
  mdt.lfs-MDT0000.hsm_control=stopped
  # Go perform the step above
  [root@mds2 ~]#  lctl get_param mdt.*.hsm_control
  mdt.lfs-MDT0000.hsm_control=enabled

Configure MDS Changelogs

Robinhood relies on the MDT Changelogs feature in Lustre in order to track changes to the file system. The Changelog records changes to the file system's metadata, such as creating and deleting files, modifying a file's content, ownership, permissions, and other attributes. Robinhood captures this data and stores it in a MySQL database for processing by its policy engine. Applications that want to consume the Changelogs content need to register with the MDT, which will return a unique identifier, referred to as a userid. This is not the same as a UNIX user account identifier; it is internal to Lustre and is a token used to distinguish registrations from multiple consumers. Each registered userid has its own view of the Changelog register. Changelog entries are kept until each registered user has acknowledged that the entry has been consumed.

# on server with MGT mounted / also assumes only one FSname (used a * for fsname below)
[root@mds2 ~]# lctl get_param  mdd.*-MDT*.changelog_mask 
mdd.lfs-MDT0000.changelog_mask=    
MARK CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RENME RNMTO OPEN LYOUT TRUNC SATTR XATTR HSM MTIME CTIME 
[root@mds2 ~]# lctl set_param  -P mdd.*-MDT*.changelog_mask="all-XATTR-MARK-ATIME"
mdd.lfs-MDT0000.changelog_mask=all-XATTR-MARK-ATIME 
[root@mds2 ~]# lctl get_param  mdd.*-MDT*.changelog_mask
mdd.lfs-MDT0000.changelog_mask= 
CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RENME RNMTO OPEN CLOSE LYOUT TRUNC SATTR HSM MTIME CTIME 
# so the word MARK seems to be removed and XATTR 
# Note; If you run without the -P, the changes wont be persistent (and without the -P you can run on the MDS)

Register a Changelog user identifier on the MDS and keep a record of the name of the userid that is returned (normally this will be "cl1", unless there is more than one Changelog registration):

# lctl --device <fsname>-<MDT index> changelog_register
[root@mds2 ~]# lctl --device lfs-MDT0000 changelog_register 
lfs-MDT0000: Registered changelog userid 'cl1'
# Note; This is not a UNIX user account; it is an identifier used to track Lustre changelog events.

Setup the RobinHood Server (RBH)

Server Setup

Provision the server through the IML Interface; - Configuration -> Servers -> Add Servers -> Enter details -> Add -> Select the profile to Deploy (Robinhood policy engine)

Quick note; I got an error about a file initramfs-.img missing so created a symlink to get around this. Its as if the installer isnt picking the kernel version installed.

Verify the RBH has been setup correctly

[root@x10-ft-02 boot]# rpm -ql robinhood-adm
/usr/sbin/rbh-config

Setup the Robinhood Database

# Basic MySQL setup 
/etc/init.d/mysqld start
/usr/bin/mysql_secure_installation
chkconfig mysqld on

# Once the above is sorted, check that the sql DB is 
[root@x10-ft-02 boot]# rbh-config precheck_db
Checking system configuration...
mysqladmin command OK.
mysql_config command OK.
MySQL version is 5.
mysqld service OK.
mysql command OK.

Create the RBH Database

Without any additional arguments, rbh-config create_db will enter an interactive mode and will ask for the following information:

  1. A unique identifier for the database instance (use the file system name).
  2. A list of hosts that can access the database (this uses SQL formatting, including wildcards, and forms part of the SQL GRANT command; SQL syntax rules apply). For systems where Robinhood and MySQL are co-located, set to localhost. Set to % to allow any host access to the database.
  3. A password for the database user. Robinhood will connect to the database as user robinhood with the password you enter here.
  4. The root user password for the MySQL database (required in order to be able to create the new database and grant privileges).
# create the rbh dbs
[root@x10-ft-02 boot]# rbh-config create_db
Checking system configuration...
mysqladmin command OK.
mysql_config command OK.
MySQL version is 5.
mysqld service OK.
mysql command OK.

Enter a custom identifier for your filesystem. E.g. lustre
fsname (max 8 chars): lfs

Enter hosts where robinhood commands will run. E.g. localhost
You can use '%' as wildcard: "%" for all hosts, "cluster%" for nodes starting with 'cluster'...
hosts: localhost

Choose a password for connecting to the database (user 'robinhood'). 
password: 
confirm password: 
Write this password to /etc/robinhood.d/.dbpassword file

Configuration summary:
- Database name: 'robinhood_lfs'
- Client hosts: 'localhost'
- Database user name: 'robinhood'

Do you agree? [y/N]y

Enter password for root's database account (leave blank if none is set):
root's DB password: 

Creating database 'robinhood_lfs'...
done

Setting access right for user 'robinhood'@'localhost'...
(notice: user robinhood must have SUPER privilege to create triggers)
Grants for robinhood@localhost
GRANT SUPER ON *.* TO 'robinhood'@'localhost' IDENTIFIED BY PASSWORD '*A3D1A5853B73542DD7CFBF300F4873BAD075D819'
GRANT ALL PRIVILEGES ON `robinhood_lfs`.* TO 'robinhood'@'localhost'

Testing connection to 'robinhood_lfs'...

Database sucessfully created!

Verify the database is setup correctly

[root@x10-ft-02 boot]# mysql -p -h localhost -u robinhood
Enter password: 
mysql> show grants; 
mysql> use robinhood_lfs;
mysql> \q

# worth keeping the pass in order;
[root@x10-ft-02 ~]# vi /etc/robinhood.d/.dbpassword
[root@x10-ft-02 ~]# chmod 400 /etc/robinhood.d/.dbpassword 
[root@x10-ft-02 ~]# cat /etc/robinhood.d/.dbpassword 
robinhood

Robinhood Initial Configuration

The configuration files for Robinhood Lustre HSM are kept in the directory /etc/robinhood.d/lhsm. An annotated sample configuration file is supplied as a template with the Robinhood package and can be found in /etc/robinhood.d/lhsm/templates. On startup, the init script for Robinhood will scan the configuration directory for any files ending in ".conf" and will create a new Robinhood management instance for each configuration file found. This means that a single Robinhood server might be running a policy engine daemon for several file systems simultaneously. It also means that care must be taken when managing the configuration files to ensure that a single file system is not accidentally managed by two conflicting configurations.


The default template is pretty large and explains a lot of the options available; /etc/robinhood.d/lhsm/templates/lhsm_detailed.conf

If this file is missing, a new one can be generated as follows:

 rbh-lhsm -T <filename>

For example:

 rbh-lhsm -T /var/tmp/rhtemplate.conf

The minimum configuration requires four sections:

  • General
  • Log
  • ListManager
  • ChangeLog

Check the Intel Documentation for an explanation of the configuration, fairly basic one below based on our environment

[root@x10-ft-02 lhsm]# pwd
/etc/robinhood.d/lhsm
[root@x10-ft-02 lhsm]# cat boston-lfs-lustre-hsm.conf 
General
  {
    # file system to be monitored
    fs_path = "/mnt/lfs" ;
  }

Log
 {
  # Log file
  log_file = "/var/log/robinhood/lustre_hsm_demo.log" ;
  # File for reporting purge events
  report_file = "/var/log/robinhood/lustre_hsm_demo_reports.log" ; 
  # Alerts file
  alert_file = "/var/log/robinhood/lustre_hsm_demo_alerts.log" ;
  }

ListManager
 {
    commit_behavior = transaction ;
    MySQL
    {
      server = "localhost" ;
      db = "robinhood_lfs" ;
      user = "robinhood" ;
      password_file = "/etc/robinhood.d/.dbpassword" ; 
      engine = InnoDB ;
    } 
  }

ChangeLog
  { 
    MDT
    {
      mdt_name  = "MDT0000" ;
      reader_id = "cl1" ;
    }
    force_polling    = ON ;
  }

Starting Robinhood for the first time

  • Log into the Robinhood policy engine host either as root or as a user that has superuser privileges granted by sudo.
  • Mount the Lustre file system.
  • Make sure that the directory for the log files exists before starting Robinhood:
   mkdir -m 0700 -p /var/log/robinhood
  • If the directory does not exist or is not writable by the Robinhood process, all log entries will be written to stderr.
  • Scan the target Lustre file system. This is necessary in order to populate the database with information about files that already exist on the file system. This only needs to be done the first time that Robinhood is started and should ideally be run when the file system is idle (not being used by any other processes):
   rbh-lhsm --scan --once -f /etc/robinhood.d/lhsm/boston-lfs-lustre-hsm.conf 
  • By default, the rbh-lhsm command will select the first configuration file it finds in the /etc/robinhood.d directory. If there is more than one configuration, specify the individual file using the -f flag, as shown above.
  • Start up the Robinhood service:
   service robinhood-lhsm start
  • Verify that the service is running:
   service robinhood-lhsm status
  • Make sure that the Robinhood service is configured to start automatically on system boot:
   chkconfig robinhood-lhsm on
  • If there are any problems, examine the log files in /var/log/robinhood.

Using Robinhood to Create Reports

Create an activity report;

[root@x10-ft-02 lhsm]# rbh-lhsm-report -a
Using config file '/etc/robinhood.d/lhsm/boston-lfs-lustre-hsm.conf'.

Filesystem scan activity:

    Last filesystem scan:
            status:          done
            start:           2016/03/22 16:14:16
            end:             2016/03/22 16:14:16
            duration:        00s

         Statistics:
            entries scanned: 737
            errors:          0
            timeouts:        0
            # threads:       2
            average speed:   3636.36 entries/sec


Changelog stats:

        Last read record id:      1
        Last read record time:    2016/03/18 17:24:36.601480
        Last receive time:        2016/03/22 16:05:11
        Last committed record id: 0
        Changelog stats:
                 type            total 	(diff)	(rate)
                 MARK:               1
                CREAT:               0
                MKDIR:               0
                HLINK:               0
                SLINK:               0
                MKNOD:               0
                UNLNK:               0
                RMDIR:               0
                RENME:               0
                RNMTO:               0
                 OPEN:               0
                CLOSE:               0
                LYOUT:               0
                TRUNC:               0
                SATTR:               0
                XATTR:               0
                  HSM:               0
                MTIME:               0
                CTIME:               0
                ATIME:               0

Storage usage has never been checked

No migration was performed on this filesystem

High Level Report

[root@x10-ft-02 lhsm]# rbh-lhsm-report --fs-info
Using config file '/etc/robinhood.d/lhsm/boston-lfs-lustre-hsm.conf'.
status    ,     type,      count,     volume,   avg_size
n/a       ,      dir,         17,  132.00 KB,    7.76 KB
new       ,     file,        723,    3.15 MB,    4.46 KB

Total: 740 entries, 3436970 bytes (3.28 MB)


Identify the Largest File System Consumers

# Some examples 
# Display largest files.
rbh-lhsm-report --top-size[=<count>], -s <count> 
# Display oldest entries eligible for purge. 
rbh-lhsm-report --top-purge[=<count>], -p <count> 
# Display top disk space consumers. 
rbh-lhsm-report --top-users[=<count>], -U <count>

[root@x10-ft-02 lhsm]# rbh-lhsm-report --top-users
Using config file '/etc/robinhood.d/lhsm/boston-lfs-lustre-hsm.conf'.
rank, user      ,     volume,      count,   avg_size
   1, lfsuser   ,    3.14 MB,        722,    4.45 KB
   2, root      ,   12.00 KB,          1,   12.00 KB