IBM Spectrum Scale
IBM Spectrum Scale
The General Parallel File System (GPFS) is a parallel file system developed by IBM and was renamed recently to IBM Spectrum Scale. It is a high performance parallel clustered file system. GPFS is parallel in the sense data is broken into blocks and striped across multiple disks so that it can be read and written in parallel. GPFS has many enterprise features such as mirroring, high availability, replication, and disaster recovery
GPFS can be bought as a Software or as an Appliance. As a software there are three editions (Express, Standard, and Advanced Edition) depending on the features needed. As an applicance, GSS from Lenovo, ESS from IBM, Seagate Clusterstor, DDN, and NEC.
Models of Deployment
There are basically three models for the deployment; Shared Storage Model (SAN), Client-Server Model (can be SAN or NAS), and the Shared-nothing Cluster model. The latter is more suitable for Big Data especially because IBM also provides a Hadoop Plugin to use it with GPFS instread of the HDFS.
GPFS Entities
There are three basic entities in the GPFS world, the first is the NSD Client or GPFS Client, the second is the NSD or GPFS Server, and the latter are the NSDs which stands for network shared disks. NSDs are basically the Disks where the Data and Metadata will be stored, they only have to be gived a clusted-wide unique name.
Install Notes
It should be noted that the same GPFS packages must be installed on all nodes of a GPFS Cluster. After the installation, the license has to be changed depending on the node whether being a GPFS server or client node. Please note that the storage side consists of Metadata Disks and Data Disks. Therefore NSD servers basically serve both Metadata and Data requests.
The NSD servers can replicate metadata and data (up to 3 copies) if configured. The replication is based on failure groups. Failure groups are required at least for this configuration. When configured, an Active-Active Failover mechanism is used between failuer groups.
The GPFS Daemon is a multi-threaded user mode daemon. However, a special Kernel extension is needed which makes GPFS appear to the application as just another file-system, using the so-called Virtual File system concept.
GPFS Node Architecture
A GPFS node has a Linux Kernel, the GPFS portability layer on top of it, the GPFS kernel extension on top of the latter, and the GPFS Daemon in the userland.
- GPFS portability layer: It is a layer (loadable kernel module) which enables communication between Linux kernel and GPFS daemon. This kernel module must be compiled after GPFS installation.
- GPFS kernel extension: It provides the interfaces to the kernel’s virtual file system (VFS) in order to add the GPFS file system. So the kernel thinks of GPFS as another local file-system like ext3 or xfs.
- GPFS daemon: The GPFS daemon performs all I/O and buffer management for GPFS.
GPFS Cluster Configuration
The GPFS Cluster Configuration File is stored in /var/mmfs/gen/mmsdrfs, it contains information like list of nodes, available disks, file system and other cluster configurations. There are two ways to store the configuration file; the first in on the server and the latter is on all Quorum nodes. To store it on the server, one has to specify the primary and secondary server to store a copy of the file on each. Any changes in the configuration would require the primary and secondary server to be available. To this end use the following command
mmchcluster {[--ccr-disable] [-p PrimaryServer] [-s SecondaryServer]}To store a copy of the configuration file on all Quorum nodes, aka configuration server repository (CCR), use the following command
mmchcluster --ccr-enableIn the CCR case, any changes in the configuration would require the majority of Quorum nodes to be available
Install dependencies
GPFS pre-requires the installation of the following packages 1. Development Tools 2. kernel-devel 3. Kernel-headers
yum -y groupinstall "Development Tools"
yum -y install kernel-devel-$(uname -r) kernel-headers-$(uname -r)Installation Steps
cd /etc/yum.repos.d
wget https://packages.quobyte.com/repo/9/<YOUR_REPO_ID>/rpm/CentOS_7/quobyte.repoInstall Quobyte packages
yum -y install quobyte-server quobyte-clientServer Configuration
Prepare Drives
Any drives being used by Quobyte need to be formatted and mounted before Quobyte can use them. Currently only ext4 and XFS are supported. Each server in our testbed has 3 available drives. 2 SSDs (/dev/sdb and /dev/sdc) and 1 HHD (/dev/sdd). To prepare each drive do the following
# Create a filesystem on each drive and mount them.
# Note it is recommended to use the full drive and not partitions.
mkfs.xfs /dev/sdX
mount /dev/sdX /some/mount/point
# The testbed was configured as per below, where /dev/sda was the OS drive
[root@q01 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 238.5G 0 disk
├─sda1 8:1 0 1M 0 part
├─sda2 8:2 0 512M 0 part /boot
├─sda3 8:3 0 15.6G 0 part
├─sda4 8:4 0 1K 0 part
└─sda5 8:5 0 222.4G 0 part /
sdb 8:16 0 372.6G 0 disk /mnt/quobyte/metadata0
sdc 8:32 0 894.3G 0 disk /mnt/quobyte/data0
sdd 8:48 0 931.5G 0 disk /mnt/quobyte/data1
# The same procedure was performed on each storage serverDefine Registry Servers
Edit /etc/quobyte/host.cfg on each server to state which servers are running the registry servers. In our setup all 4 are so it was updated to read
registry=q01:7861,q02:7861,q03:7861,q04:7861
If name resolution isn't configured on the servers the hostnames can be replaced with IP addresses.
Create registry devices
To create the first registry device do the following on one server only.
qbootstrap /mnt/quobyte/metadata0Then start services on this server.
systemctl start quobyte-registry
systemctl start quobyte-webconsole
systemctl start quobyte-apiTo confirm that the registry services is running and the device is available run the following command.
[root@q01 ~]# qmgmt device list
Id Host Mode Disk Used Disk Avail Services LED Mode
1 q01 ONLINE 34 MB 400 GB REGISTRY OFFNote it make take a minute for the device to initially register.
To create other registry services, do the following on each server
qmkdev -t REGISTRY /mnt/quobyte/metadata0
systemctl start quobyte-registryOnce this is completed on each server you can list and check availability of each registry from the first server.
[root@q01 ~]# qmgmt device list
Id Host Mode Disk Used Disk Avail Services LED Mode
1 q01 ONLINE 34 MB 400 GB REGISTRY OFF
2 q02 ONLINE 34 MB 400 GB REGISTRY OFF
3 q03 ONLINE 34 MB 400 GB REGISTRY OFF
4 q04 ONLINE 34 MB 400 GB REGISTRY OFFAdd Metadata Devices
From the first server run the following command to add Metadata to the registry devices
qmgmt device update add-type <id> METADATA
# id for each registry is listed in the output of 'qmgmt device list'SSH to each host with a metadata device and start the metadata service by running
systemctl start quobyte-metadataConfirm that metadata devices are running
[root@q01 ~]# qmgmt device list
Id Host Mode Disk Used Disk Avail Services LED Mode
1 q01 ONLINE 34 MB 400 GB METADATA REGISTRY OFF
2 q02 ONLINE 34 MB 400 GB METADATA REGISTRY OFF
3 q03 ONLINE 34 MB 400 GB METADATA REGISTRY OFF
4 q04 ONLINE 34 MB 400 GB METADATA REGISTRY OFFAdd Data Devices
To add data devices perform the following on each server.
# Define data devices
qmkdev -t DATA /mnt/quobyte/data0
qmkdev -t DATA /mnt/quobyte/data1
# Start Quobyte Data service
systemctl start quobyte-dataOnce completed on each server check all devices are registered and available.
[root@q01 ~]# qmgmt device list
Id Host Mode Disk Used Disk Avail Services LED Mode
1 q01 ONLINE 34 MB 400 GB METADATA REGISTRY OFF
5 q01 ONLINE 22 GB 960 GB DATA OFF
6 q01 ONLINE 34 GB 1000 GB DATA OFF
2 q02 ONLINE 34 MB 400 GB METADATA REGISTRY OFF
7 q02 ONLINE 36 MB 960 GB DATA OFF
8 q02 ONLINE 46 GB 1000 GB DATA OFF
3 q03 ONLINE 34 MB 400 GB METADATA REGISTRY OFF
9 q03 ONLINE 36 MB 960 GB DATA OFF
10 q03 ONLINE 46 GB 1000 GB DATA OFF
4 q04 ONLINE 34 MB 400 GB METADATA REGISTRY OFF
11 q04 ONLINE 36 MB 400 GB DATA OFF
12 q04 ONLINE 46 GB 1000 GB DATA OFFVolume Management
By default Quobyte creates one volume configuration called BASE, which
Viewing Volume Configurations
Configurations can be viewed through the API or from the web console.
- API
[root@q01 ~]# qmgmt volume config export BASE
configuration_name: "BASE"
volume_metadata_configuration {
placement_settings {
required_device_tags {
}
forbidden_device_tags {
}
prefer_client_local_device: false
optimize_for_mapreduce: false
}
replication_factor: 1
}
default_config {
file_layout {
stripe_width: 1
replication_factor: 1
block_size_bytes: 524288
object_size_bytes: 8388608
segment_size_bytes: 10737418240
crc_method: CRC_32_ISCSI
}
placement {
required_device_tags {
}
forbidden_device_tags {
}
prefer_client_local_device: false
optimize_for_mapreduce: false
}
io_policy {
cache_size_in_objects: 10
enable_async_writebacks: true
enable_client_checksum_verification: true
enable_client_checksum_computation: true
sync_writes: AS_REQUESTED
direct_io: AS_REQUESTED
OBSOLETE_implicit_locking: false
lost_lock_behavior: IO_ERROR
OBSOLETE_keep_page_cache: false
implicit_locking_mode: NO_LOCKING
enable_direct_writebacks: false
notify_dataservice_on_close: false
keep_page_cache_mode: USE_HEURISTIC
rpc_retry_mode: RETRY_FOREVER
lock_scope: GLOBAL
}
}
snapshot_configuration {
snapshot_interval_s: 0
snapshot_lifetime_s: 0
}
metadata_cache_configuration {
cache_ttl_ms: 10000
negative_cache_ttl_ms: 10000
enable_write_back_cache: false
}
- Web console
Login to web console and navigate to "Volume Configuration". Select BASE to view the configuration.
Editing Volume Configuration
- API
qmgmt volume config edit BASEThis will open in your default editor (or use the value of the EDITOR environment variable if it is set).
- Web console
Navigate to 'Volume Configurations' and tick the box beside BASE. Then select 'edit' from the drop down menu.
Creating Volume Configurations
- API
To create a new config use the same command you would to edit a command, but use a configuration name that doesn't exist. For example to create a new configuration called 3x_replication run the following
qmgmt volume config edit 3x_replicationThis will open an empty file in a text editor.
To avoid specifying every setting manually, it is advisable to inherit a different configuration to use as a template. For example to use the BASE configuration as a template for the 3x_replication add the following
base_configuration: "BASE"Now individual parameters can be set, and any setting that isn't defined will inherit the value used in the BASE configuration. The below options were set in the 3x_replication configuration
[root@q01 ~]# qmgmt volume config export 3x_replication
configuration_name: "3x_replication"
base_configuration: "BASE"
volume_metadata_configuration {
replication_factor: 3
}
default_config {
placement {
required_device_tags {
tags: "hdd"
}
forbidden_device_tags {
}
prefer_client_local_device: false
optimize_for_mapreduce: false
}
}This will create 3 replications of data and metadata, as well as only place data on devices tagged with "hdd". The use of tags allows for finer control of what data is placed on what devices. In this example all data is placed on HDDs and not on SSD storage.
- Web console
The Web console only allows new sub-configurations to be created, i.e configurations that inherit from another. To create a new sub-configuration, navigate to 'Volume Configurations' and tick the box next to BASE. Then from the drop down menu select 'Add new sub-configuration'
Creating Volumes
Volumes are created either from the CLI or through the web console.
- CLI
The generic command used to create volumes is
qmgmt volume create <volume name> <user> <group> <volume configuration>In the test bed 3 volumes were created using different volume configurations. They were created by running the following commands
qmgmt volume create home_vol root root 3x_replication
qmgmt volume create scratch_vol root root ssd_performance
qmgmt volume create archive_vol root root 8+3_erasureMounting Volumes
Volumes can be mounted on any server that the quobyte-client package is installed. There is a CLI tool mount.quobyte used to mount the quobyte volumes. The command takes a list of registry servers and volume to mount as well as the directory to mount the volume to. So to mount the home_vol above to /home
mount.quobyte q01:7861,q02:7861,q03:7861,q04:7861/home_vol /homeThis can be repeated to mount any other volumes
mount.quobyte q01:7861,q02:7861,q03:7861,q04:7861/scratch_vol /scratch
mount.quobyte q01:7861,q02:7861,q03:7861,q04:7861/archive_vol /archive