Excelero Nvmesh

From Define Wiki
Revision as of 15:11, 13 April 2018 by Matt hole (talk | contribs) (→‎Centos installation with GPFS on top)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Prerequisites

Hardware

Check that everything is performing as it should be, before installing nvmesh it would be worth testing NVME drive performance at a block level.

Install "fio" and "nvme-cli"

Target each nvme device and check read/write performance at a 1m block level. During our testing the performance was found to be 60% lower than it should be.

To work around this, we secure erased each drive then retested.

nvme cli secure erase command: "nvme format /dev/nvme0n1 --ses=2" (dev address may change)

Ethernet Network requirements

The servers were configured properly - the interfaces had global pause on, which is the required configuration on the host side when using Global Pause flow control method. How can you check if pause parameters are enabled on interfaces? Using ethtool: ethtool -a enp139s0f0 Pause parameters for enp139s0f0: Autonegotiate: off RX: on TX: on

We can see that interface ’enp139s0f0’ has RX and TX pause parameters on, this status is as needed in Global Pause method, but if we want to change it we should do the following: ethtool -A enp139s0f0 rx off tx off

This will turn off the pause parameters on the interfaces on the host side, this is the configuration we want to use when running PFC (Priority flow control). PFC also requires host vlan tagging and egress priority mapping in the host side. If more details needed about PFC I will happily share.

About the switch: There are two options to turn on Global Pause , through GUI or CLI: GUI - go to the “Ports” tab in the control bar. Then select the port you want to check/change configuration from the switch. After you selected the required port, scroll down and you will see a window called “Port Configuration” , there you will see a field called “FlowControlMode” and it has 3 options: None, Global, PFC. Choose the relevant option , in our case it was configured as PFC while it should be Global.

CLI - connect to the switch and then do the following: ena conf t Interface ethernet 1/1 (choose the relevant port here) flowcontrol receive on force flowcontrol send on force

Software

Excelero requires certain OS and kernel versions to be supported. As of 11/04/2018.

DNS/Hosts

Add the hostnames of your nodes within your cluster to hosts file on all.

May want to look at using PDSH to allow for remote execution across all nodes at once, safes time. If using GPFS it has a built in tool called mmdsh.

For example

[root@excelero-a tmp]# pdsh -w excelero-a,excelero-b,excelero-c,excelero-d uptime
excelero-a:  11:57:24 up 4 days, 19:25,  4 users,  load average: 0.21, 0.11, 0.14
excelero-d:  11:57:24 up 4 days, 19:24,  1 user,  load average: 0.18, 0.15, 0.22
excelero-b:  11:57:24 up 4 days, 19:25,  1 user,  load average: 0.33, 0.32, 0.30
excelero-c:  11:57:24 up 4 days, 19:25,  1 user,  load average: 0.08, 0.03, 0.13

To run pdsh and save time of course you will need to keygen ssh and ssh-copy-id to all nodes within your nvmesh cluster that you wish to admin.

Centos

7.3

Tried and tested (10/04/2018) for NVmesh/GPFS Demo.

OFED: MLNX_OFED_LINUX-4.3-1.0.1.0-rhel7.3-x86_64.tgz

3.10.0-514.el7.x86_64. Distro ISO: CentOS-7-x86_64-Minimal-1611


You will need Headers and Devel packages during the install however the version on repo did not match the kernel installed,

\\10.0.0.222\software\Excelero\kernel-headers-3.10.0-514.el7.x86_64.rpm

\\10.0.0.222\software\Excelero\kernel-devel-3.10.0-514.el7.x86_64.rpm

7.4

OFED and Kernels that can be used in 7.4

MLNX_OFED_LINUX-4.2-1.0.0.0-rhel7.4-x86_64.tgz

"3.10.0-693.5.2.el7.x86_64"

"3.10.0-693.11.1.el7.x86_64"

MLNX_OFED_LINUX-4.1-1.0.2.0-rhel7.4-x86_64.tgz

"3.10.0-693.5.2.el7.x86_64"

"3.10.0-693.el7.x86_64"

No OFED

• 0 : "3.10.0-693.11.1.el7.x86_64"

• 1 : "3.10.0-693.5.2.el7.x86_64"

Ubuntu

Version: Ubuntu 16.04.3

Kernels supported: 4.4.0-103-generic, 4.4.0-108-generic and 4.4.0-116-generic.

NVmesh packages/ repo

We had a mixed experience since their repo is really slow (45kb/s). So i would just manually install by the packages listed below or create you own repo onsite.

Centos/RHEL 1.2.1-217

\\10.0.0.222\software\Excelero\NVMesh-target-1.2.1-217.x86_64.rpm \\10.0.0.222\software\Excelero\NVMesh-client-1.2.1-217.x86_64.rpm

Centos/RHEL 1.2.1-194

\\10.0.0.222\software\Excelero\NVMesh-target-1.2.1-194.x86_64.rpm \\10.0.0.222\software\Excelero\NVMesh-client-1.2.1-194.x86_64.rpm


Installation of NVmesh onto Centos 7

Tried and tested

Chose which node on you`re chosen infrastructure to install the NVmesh management GUI.

To install the web based administration GUI you need to install Mongo DB and nodejs 6.* or 8.*

Create a file called mongo.repo and insert the content below

vim /etc/yum.repos.d/mongo.repo

Here is a example repo entry for mongodb:

[mongodb-org-3.4]
name=MongoDB Repository
baseurl=http://repo.mongodb.org/yum/redhat/7/mongodb-org/3.4/x86_64/
gpgcheck=0
enabled=1


To install Mongodb:

yum install mongodb-org -y

Install epel repo and nodejs:

yum install epel-release
yum install nodejs

However when we tried to install this, we found that the epel-release of nodejs would install correctly but nvmesh management would break and proceed to never start. Using the alternative way gets around this, its not a error per say but the nvmeshmanagement service will just never start with nodejs from epel. Alternative way: Add nodejs repo:

curl --silent --location https://rpm.nodesource.com/setup_6.x | bash -

Install nodejs:

yum install nodejs -y

Verify mongodb is running

service mongod status

Open port 4000 and 4001. 4001 is only required when having more than one Management server.

sudo iptables -I INPUT 1 -m state --state NEW -m tcp -p tcp --dport 4000 -j ACCEPT -m comment --comment Excelero-Management
sudo firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 0 -p -tcp --dport 4000 -j ACCEPT -m comment --comment Excelero-Management

Alternatively just disable firewalld completely if this is not important to you.

Install NVMesh from the repo below -

[NVMesh] 
name=NVMesh repository 
baseurl=https://bostonuk:bostonuk@repo.excelero.com/repos/NVMesh/redhat/7.3
gpgcheck=0 
enabled=1
 

or install the packages listed above manually (found exceleros repo to be painfully slow)

Install management first and check functionality.

yum install NVMesh-management-1.2.1-56.x86_64.rpm -y

Go to in browser: https://172.28.0.181:4000/ and this should resolve the NVMesh GUI, if at this stage you get nothing. Check that the nvmeshmanagement service is running, check that the ports are cleared or shutoff firewalld. If the service is up but showing errors then this is most likely a nodejs issue. If you installed nodejs from EPEL then remove this and install with the alterative method.

Default login details

login: admin@excelero.com pass: admin

Error creating thumbnail: File missing
Error creating thumbnail: File missing

Once the management interface is up and working you will need to install at least the target package on the nodes you wish to use as storage nodes.

Install the client RPM onto the server/node/workstation that you want to have access to NVmesh.

In our testing our nodes were hyperconverged. IE both clients and servers.

So we installed both packages:

yum install NVMesh-target-1.2.1-217.x86_64.rpm -y
yum install NVMesh-client-1.2.1-217.x86_64.rpm -y

To get both target and client working you will need to edit the following config file:

/etc/opt/NVMesh/nvmesh.conf

Here are the only sections you will need to edit at least o begin with

# Define the management protocol
# MANAGEMENT_PROTOCOL="<https/http>"
# Example
# MANAGEMENT_PROTOCOL="https"

MANAGEMENT_PROTOCOL="https"

# Define the location of the NVMesh Management Websocket servers
# MANAGEMENT_SERVERS="<server name or IP>:<port>,<server name or IP>:<port>,..."
# Example:
# MANAGEMENT_SERVERS="nvmesh-management1:4001,nvmesh-management2:4001"

MANAGEMENT_SERVERS="excelero-a:4001"

# Define the nics that will be available for NVMesh Client/Target to work with
# CONFIGURED_NICS="<interface name;interface name;...>"
# To allow all nics to be available leave empty. Example: CONFIGURED_NICS=""
# Example:
# CONFIGURED_NICS="ib0;eno1;eth0"

CONFIGURED_NICS="enp139s0f0;enp139s0f1"

The configuration file will come with MANAGEMENT_SERVERS=nvmesh-management1:4001 change to the hostname of the node you wish to make your management server for nvmesh.

add the device id of your high bandwidth nics you wish to use for excelero to CONFIGURED_NICS=.

Restart nvmeshmgr service.

Install NVMesh client and target RPMs (note: the Client RPMs contains the common module so get prompted to install this dependency even if you dont want to run the Client service. We will change this in a later release. It is just “cosmetic” the time being).

Lastly, run nvmesh_format.py to format the disks you plan to use.

Centos installation with GPFS on top

For general GPFS install and usage look here:

This section is more around NVmesh and GPFS working together.

Once GPFS has been installed, the packages and the portability layer have been build. You will start looking to configure GPFS.

Normally within GPFS mmdevdiscover will list all the known devices however you may notice that no nvmesh devices get listed.

To work around this. Run the below:

[root@excelero-a ~]# vim  /var/mmfs/etc/nsddevices
#!/bin/bash

cd /dev && for dev in $(ls nvmesh/); do
        echo nvmesh/$dev generic
done

exit 1

You should then rerun mmdevdiscover to see your nvmesh volume listed. If you still get nothing i would check that nvmesh is working and that you have volumes working/assigned to your node.

Create a nsd.stanza file with your nvmesh device within. Here is a example below.

%nsd:
        nsd=nsd01
        device=/dev/nvmesh/test
        usage=dataAndMetadata
        pool=system
        failureGroup=1

In our demo we had the data and metadata in one. IE our NVMesh volume was just one labelled "test".