Excelero Nvmesh
Prerequisites
Hardware
Check that everything is performing as it should be, before installing nvmesh it would be worth testing NVME drive performance at a block level.
Install "fio" and "nvme-cli"
Target each nvme device and check read/write performance at a 1m block level. During our testing the performance was found to be 60% lower than it should be.
To work around this, we secure erased each drive then retested.
nvme cli secure erase command: "nvme format /dev/nvme0n1 --ses=2" (dev address may change)
Ethernet Network requirements
The servers were configured properly - the interfaces had global pause on, which is the required configuration on the host side when using Global Pause flow control method. How can you check if pause parameters are enabled on interfaces? Using ethtool: ethtool -a enp139s0f0 Pause parameters for enp139s0f0: Autonegotiate: off RX: on TX: on
We can see that interface ’enp139s0f0’ has RX and TX pause parameters on, this status is as needed in Global Pause method, but if we want to change it we should do the following: ethtool -A enp139s0f0 rx off tx off
This will turn off the pause parameters on the interfaces on the host side, this is the configuration we want to use when running PFC (Priority flow control). PFC also requires host vlan tagging and egress priority mapping in the host side. If more details needed about PFC I will happily share.
About the switch: There are two options to turn on Global Pause , through GUI or CLI: GUI - go to the “Ports” tab in the control bar. Then select the port you want to check/change configuration from the switch. After you selected the required port, scroll down and you will see a window called “Port Configuration” , there you will see a field called “FlowControlMode” and it has 3 options: None, Global, PFC. Choose the relevant option , in our case it was configured as PFC while it should be Global.
CLI - connect to the switch and then do the following: ena conf t Interface ethernet 1/1 (choose the relevant port here) flowcontrol receive on force flowcontrol send on force
Software
Excelero requires certain OS and kernel versions to be supported. As of 11/04/2018.
DNS/Hosts
Add the hostnames of your nodes within your cluster to hosts file on all.
May want to look at using PDSH to allow for remote execution across all nodes at once, safes time. If using GPFS it has a built in tool called mmdsh.
For example
[root@excelero-a tmp]# pdsh -w excelero-a,excelero-b,excelero-c,excelero-d uptime excelero-a: 11:57:24 up 4 days, 19:25, 4 users, load average: 0.21, 0.11, 0.14 excelero-d: 11:57:24 up 4 days, 19:24, 1 user, load average: 0.18, 0.15, 0.22 excelero-b: 11:57:24 up 4 days, 19:25, 1 user, load average: 0.33, 0.32, 0.30 excelero-c: 11:57:24 up 4 days, 19:25, 1 user, load average: 0.08, 0.03, 0.13
To run pdsh and save time of course you will need to keygen ssh and ssh-copy-id to all nodes within your nvmesh cluster that you wish to admin.
Centos
7.3
Tried and tested (10/04/2018) for NVmesh/GPFS Demo.
OFED: MLNX_OFED_LINUX-4.3-1.0.1.0-rhel7.3-x86_64.tgz
3.10.0-514.el7.x86_64. Distro ISO: CentOS-7-x86_64-Minimal-1611
You will need Headers and Devel packages during the install however the version on repo did not match the kernel installed,
\\10.0.0.222\software\Excelero\kernel-headers-3.10.0-514.el7.x86_64.rpm
\\10.0.0.222\software\Excelero\kernel-devel-3.10.0-514.el7.x86_64.rpm
7.4
OFED and Kernels that can be used in 7.4
MLNX_OFED_LINUX-4.2-1.0.0.0-rhel7.4-x86_64.tgz
"3.10.0-693.5.2.el7.x86_64"
"3.10.0-693.11.1.el7.x86_64"
MLNX_OFED_LINUX-4.1-1.0.2.0-rhel7.4-x86_64.tgz
"3.10.0-693.5.2.el7.x86_64"
"3.10.0-693.el7.x86_64"
No OFED
• 0 : "3.10.0-693.11.1.el7.x86_64"
• 1 : "3.10.0-693.5.2.el7.x86_64"
Ubuntu
Version: Ubuntu 16.04.3
Kernels supported: 4.4.0-103-generic, 4.4.0-108-generic and 4.4.0-116-generic.
NVmesh packages/ repo
We had a mixed experience since their repo is really slow (45kb/s). So i would just manually install by the packages listed below or create you own repo onsite.
Centos/RHEL 1.2.1-217
\\10.0.0.222\software\Excelero\NVMesh-target-1.2.1-217.x86_64.rpm \\10.0.0.222\software\Excelero\NVMesh-client-1.2.1-217.x86_64.rpm
Centos/RHEL 1.2.1-194
\\10.0.0.222\software\Excelero\NVMesh-target-1.2.1-194.x86_64.rpm \\10.0.0.222\software\Excelero\NVMesh-client-1.2.1-194.x86_64.rpm
Installation of NVmesh onto Centos 7
Tried and tested
Chose which node on you`re chosen infrastructure to install the NVmesh management GUI.
To install the web based administration GUI you need to install Mongo DB and nodejs 6.* or 8.*
Create a file called mongo.repo and insert the content below
vim /etc/yum.repos.d/mongo.repo
Here is a example repo entry for mongodb:
[mongodb-org-3.4] name=MongoDB Repository baseurl=http://repo.mongodb.org/yum/redhat/7/mongodb-org/3.4/x86_64/ gpgcheck=0 enabled=1
To install Mongodb:
yum install mongodb-org -y
Install epel repo and nodejs:
yum install epel-release yum install nodejs
However when we tried to install this, we found that the epel-release of nodejs would install correctly but nvmesh management would break and proceed to never start. Using the alternative way gets around this, its not a error per say but the nvmeshmanagement service will just never start with nodejs from epel. Alternative way: Add nodejs repo:
curl --silent --location https://rpm.nodesource.com/setup_6.x | bash -
Install nodejs:
yum install nodejs -y
Verify mongodb is running
service mongod status
Open port 4000 and 4001. 4001 is only required when having more than one Management server.
sudo iptables -I INPUT 1 -m state --state NEW -m tcp -p tcp --dport 4000 -j ACCEPT -m comment --comment Excelero-Management sudo firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 0 -p -tcp --dport 4000 -j ACCEPT -m comment --comment Excelero-Management
Alternatively just disable firewalld completely if this is not important to you.
Install NVMesh from the repo below -
[NVMesh] name=NVMesh repository baseurl=https://bostonuk:bostonuk@repo.excelero.com/repos/NVMesh/redhat/7.3 gpgcheck=0 enabled=1
or install the packages listed above manually (found exceleros repo to be painfully slow)
Install management first and check functionality.
yum install NVMesh-management-1.2.1-56.x86_64.rpm -y
Go to in browser: https://172.28.0.181:4000/ and this should resolve the NVMesh GUI, if at this stage you get nothing. Check that the nvmeshmanagement service is running, check that the ports are cleared or shutoff firewalld. If the service is up but showing errors then this is most likely a nodejs issue. If you installed nodejs from EPEL then remove this and install with the alterative method.
Default login details
login: admin@excelero.com pass: admin
Once the management interface is up and working you will need to install at least the target package on the nodes you wish to use as storage nodes.
Install the client RPM onto the server/node/workstation that you want to have access to NVmesh.
In our testing our nodes were hyperconverged. IE both clients and servers.
So we installed both packages:
yum install NVMesh-target-1.2.1-217.x86_64.rpm -y yum install NVMesh-client-1.2.1-217.x86_64.rpm -y
To get both target and client working you will need to edit the following config file:
/etc/opt/NVMesh/nvmesh.conf
Here are the only sections you will need to edit at least o begin with
# Define the management protocol # MANAGEMENT_PROTOCOL="<https/http>" # Example # MANAGEMENT_PROTOCOL="https" MANAGEMENT_PROTOCOL="https" # Define the location of the NVMesh Management Websocket servers # MANAGEMENT_SERVERS="<server name or IP>:<port>,<server name or IP>:<port>,..." # Example: # MANAGEMENT_SERVERS="nvmesh-management1:4001,nvmesh-management2:4001" MANAGEMENT_SERVERS="excelero-a:4001" # Define the nics that will be available for NVMesh Client/Target to work with # CONFIGURED_NICS="<interface name;interface name;...>" # To allow all nics to be available leave empty. Example: CONFIGURED_NICS="" # Example: # CONFIGURED_NICS="ib0;eno1;eth0" CONFIGURED_NICS="enp139s0f0;enp139s0f1"
The configuration file will come with MANAGEMENT_SERVERS=nvmesh-management1:4001 change to the hostname of the node you wish to make your management server for nvmesh.
add the device id of your high bandwidth nics you wish to use for excelero to CONFIGURED_NICS=.
Restart nvmeshmgr service.
Install NVMesh client and target RPMs (note: the Client RPMs contains the common module so get prompted to install this dependency even if you dont want to run the Client service. We will change this in a later release. It is just “cosmetic” the time being).
Lastly, run nvmesh_format.py to format the disks you plan to use.
Centos installation with GPFS on top
For general GPFS install and usage look here:
This section is more around NVmesh and GPFS working together.
Once GPFS has been installed, the packages and the portability layer have been build. You will start looking to configure GPFS.
Normally within GPFS mmdevdiscover will list all the known devices however you may notice that no nvmesh devices get listed.
To work around this. Run the below:
[root@excelero-a ~]# vim /var/mmfs/etc/nsddevices
#!/bin/bash
cd /dev && for dev in $(ls nvmesh/); do
echo nvmesh/$dev generic
done
exit 1
You should then rerun mmdevdiscover to see your nvmesh volume listed. If you still get nothing i would check that nvmesh is working and that you have volumes working/assigned to your node.
Create a nsd.stanza file with your nvmesh device within. Here is a example below.
%nsd:
nsd=nsd01
device=/dev/nvmesh/test
usage=dataAndMetadata
pool=system
failureGroup=1
In our demo we had the data and metadata in one. IE our NVMesh volume was just one labelled "test".