Installation Scope of Works

From Define Wiki
Revision as of 12:08, 7 February 2017 by Damian (talk | contribs) (Created page with " Contents 1 Document Information 3 2 Document Purpose 4 3 Customer Details 4 4 Site Details 4 5 Installation 5 6 System Settings 6 7 Rack requirement...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search














Contents 1 Document Information 3 2 Document Purpose 4 3 Customer Details 4 4 Site Details 4 5 Installation 5 6 System Settings 6 7 Rack requirements 6 8 Rack PDU / Node Layout (Same for all cabinets) 9 9 Installation Coordination 10 10 Customer Responsibilities 10 11 Hardware Installation and Set-up 11 12 Test Hardware 12 13 Software Installation and Set-up 14 14 Customer Acceptance 15


  1 Document Information

Document Version Number 1.1 Project Name Supply, installation and configuration of 3 Node Mini HPC Cluster (High Performance Computing) Boston Ltd Project Code BIN3443005

Contact Details Contact Details David Power, Head of HPC Address Boston Ltd, Unit 5 Curo Park, Frogmore, St. Albans AL2 2DD E-mail David.power@boston.co.uk Tel 01727876100

Distribution List To Date Sent Contact Information Mike Dunford Mike O’Connor 10.01.17 mike.dunford@sentralsystems.com mike.oconnor@sentralsystems.com

Document Revision History Version Change Date Change History 1.0 10.01.17 Scope of Works







2 Document Purpose

The purpose of this document is to define the test criteria for the project deliverables, compiled into a Scope of Works (SoW) and agreed with the customer. It is also an intention of this document to define the exact configuration of the system that is due to be delivered.

3 Customer Details

Organisation Name TAQA Bratani Ltd Contact Name David Day Email David.Day@taqaglobal.com

Address Brightsolid, Gateway House, Luna Place, Technology Park Tel No 01224 737565 Fax No n/a

Business Development Manager N/A Quotation Number Supply, installation and configuration of 3 Node Mini HPC Cluster (High Performance Computing) HPC Project Lead David Power Site Survey Date Project Code BIN3443005

4 Site Details

Boston Ltd Installation hours are from 9:00am to 5:30pm (Monday to Friday). Installations required outside of the stated times may incur an additional charge. Note: It is sometimes necessary for Boston Ltd installation staff to work outside the standard working hours to complete work on time Installation Project Owner Konstantinos Mouzakitis Contact Details (Tel & Email) Konstantinos.Mouzakitis@boston.co.uk 01727 876 100 Installation Site Address Brightsolid, Gateway House, Luna Place, Technology Park 5 Installation

Hardware Installation Required Yes Software Install Required Yes Type of Installation Pre-Configure at BOSTON LTD (Reduce time onsite) On-Site Installation Only Yes Yes Training Requirements Yes If ‘Yes’ please specify level of training required? Training will be available for both novice and intermediate administrators and users of the cluster. Notes Training Agenda. Please note, the installation will take place over 2 parts, on site or via webex. During this time training will focus on the two key sets of users, Administrators and Users. Part 1 will focus on Cluster Administration and will cover the following topics. This should be attended by: : Hardware Admins, Datacenter Admins, HPC Admins - Hardware Introduction - Bringing up the cluster from cold - Cluster health checks - Hardware monitoring - Cluster Management Concepts - Modules Environment (for multiusers HPC systems) - Cluster Management GUI - Cluster Management Shell - Configuring the Cluster - Node Images and Application Integration - User Management - Workload Management

Part 2 will focus more on end users but can be useful for Administrators also to understand using the cluster. Both sets of users are welcome. - Introduction to the cluster and system overview - Environment Modules - Using MPI - Workload Management - Build / Compiling codes on the cluster - Verifying application performance

Please note that anything that is not covered to RFD’s satisfaction can be revisited after the initial training to drill down in to any aspects via webinar. BOSTON LTD will continue to update and train periodically throughout the duration of the contract and will be invited to attend partner train sessions such as: - Supermicro Hardware Familiarisation Training (1 session annually and quarterly webinars) - Bright Cluster Management Training (Frequent webinars for users and administrators) - Intel Cluster tools (compilers/maths libraries (annual webinars)

6 System Settings


System Configuration Head Node configuration:

Head Node’s Image setup based on CentOS 7.2

Bios Settings: - Boot Local disk first - Hyper-threading [On] - Turbo [On]

The 12 HHDs on the head node will be configured in RAID50 array, consisting of 2 stripes with 6 drives each. This will give a total of approximately 38TB usable space. The Operating System will be installed on this array, using approximately 400GB of space. The rest usable space will be divided in the following 2 directories: • /home: 34.1TB. This will be the root of the users’ home directories. • /cm/shared: 3.5TB. This will be a shared directory, which is used by default by Bright Cluster Manager to install applications and libraries on. Compute Nodes configuration:

Default Compute Image setup based on CentOS 7.2

Bios Settings: - Boot PXE Boot first (LAN1) - Boot Local disk second - Hyper-threading [On] - Turbo [On]

PDUs: - Provided by RFD Required for solution: - Headnode (with 2x PSU), x1: 2 - Compute Servers (each with 2x PSU), x1: 2 - Network switch 1Gbe (each with 1x PSU), x1: 1 - Network switch Mellanox (each with 2x PSU), x1: 2

Fast Network: - Mellanox FDR IB Switch with flat cluster fast network

Ethernet Stack: - 1GB flat network with all nodes and management ports connected.

IPMI Network Settings: - Integrated as part of the cluster management network.

7 Rack requirements

Rack Size 42U Is Sitting required No Notes (ie First Floor, No Lift Access) Racking to be provided by RFD and it should be in place by the installation date. PDUs Required No Notes Provided by RFD Switch Type Mellanox IB Notes Mellanox FDR IB Architecture 1U Console Required No Notes Console access provided by IPMI modules KVM Switch Required Yes Notes KVM switch included UPS Required No Notes Provided by RFD if needed Blank Empty Rack Slots required No Notes None ordered – RFD to provide if desired Estimated Maximum Power Draw from the Cluster Power Sockets on PDU

(Socket required to accommodate PDU Power Socket) 32a Commando 0

16a Commando (*note* smaller than the 32amp connection) 0

IEC-C19/C20 0

IEC-C13/C14 1

Standard 13amp Plug 0

Cluster Power Socket Location (for PDU orientation) Above Rack, Ceiling sockets 0

Below Rack, Under floor tiles 0

Other (Please Specify) Existing PDU in the racks. Please specify type.







Network Architecture Access to the cluster will be possible via the headnode only. The headnode will be configured with an external network address and access via SSH only permitted. RFD must ensure that the address/mask/gateway are provided and that any configuration required to allow our headnode access the network is applied before the installation date. The connection to our head node will be done via a reverse ssh tunnel. External access is also requested for this system to ensure that updates and additional packages are installable. A MAC address of our headnode interface can be provided if required. The internal cluster networks will include: • IPMI system management (1GbE) • Cluster Management Network (1GbE) • HPC Mellanox IB Fabric (56Gb) The network diagram of the above description can be seen here:


  8 Rack PDU / Node Layout (Same for all cabinets) • XX A load per PDU PDU-1 Node PDU-1 Node 1 11 2 12 3 13 4 14 5 15 6 16 7 17 8 18 9 19 10 20 Please note, the above table will be populated before handover. Cable Requirements: 1m 2m 3m Mellanox Cables 0 0 4 1GB Management 0 0 4 1GB IPMI 0 0 0

9 Installation Coordination  Pre-Site Configuration Survey has been completed for this installation.  The installation will be scheduled for January 23rd through the 27th.  Initial training classes will be scheduled no later than 30 days following installation. Training will be conducted on site during the initial installation. The initial agenda has been detailed earlier in this document. BOSTON LTD will work closely with the client during the initial month to bring the system into production and ensure the local staff and well equipped to manage and maintain the cluster moving for the duration of the contract. Follow on training will also be arranged at least twice a year (which may be delivered via webinar or through invitation to partner locations)

10 Customer Responsibilities  Space will be provided for temporary placement of chassis boxes during installation.  Provide sufficient power, power sockets (see section above for PDU socket details) and cooling.  The customer must provide LAN connectivity for remote management of the Cluster as required to the HPC cluster (A single connection from one port per headnode or cluster access point).  The customer must confirm (and complete if required) the HPC configuration (all questions sections highlighted in Yellow - BOSTON LTD CONFIRMED)  The customer must confirm the total power and can be housed in the designated area.  The customer must ensure that the required power outlets have been installed.  The network cabling infrastructure and active hardware will be in place prior to installation.   11 Hardware Installation and Set-up

Rack / HPC Node Installation - This service will provide on-site installation of a BOSTON LTD Rackmount Server Cabinets & Accessories. Rack Installation includes on-site quality inspection and assembly of the various BOSTON LTD rack hardware. Physical install Rack mounted device includes BOSTON LTD Rack mounted servers, Power Distribution Units (PDU’s), UPS, switch boxes, monitor, mouse and keyboard into the rack. This service will include the power-on and testing of components placed into the rack.  Unpack systems and components and inspect condition of equipment  Position rack in accordance with customer instructions (if applicable).  Attach levelling screws and stabilizer bracing (if applicable)  Unpack systems and components and inspect condition of equipment  Attach server side rails to rack and rack-mounting kits to server and storage components (if applicable).  Install all applicable BOSTON LTD products into the rack.  Connect all units to appropriate power connections.  Cable Management – i.e. cable routing & labelling. Cables will be labelled at both ends.  Attach earth bonding kit to the rack cabinet  Remove all packaging to a customer-designated area within the facility.

12 Test Hardware

1. Power-on and testing of components  System ‘POST’ check and BIOS set-up  Test all hardware components  Demonstrate to customer

2. Installation Verification & System Handover  Installation review and customer acceptance  Installation review and customer sign off


Network Configuration (for headnode/management node) Address Configuration (This will be the address used to access the cluster remotely)

Network / VLAN IP Address 10.223.199.140 Subnet Mask 255.255.255.192 Gateway IP Address 10.223.199.129 Proxy Server (if applicable) N/A

Cluster Naming Server Name FILL Domain Extension FILL

DNS Configuration Primary DNS 10.223.192.39 Secondary DNS 10.223.192.70 Tertiary DNS N/A NTP Server FILL

User Configuration Root Password FILL

  • Suggested/Default naming and IP convention*

Internal Private Management Cluster Network 10.141.0.0/16 Server / Node Name IP Address Server Height (1U 2U etc) Head 10.141.255.254/16 2U Compute01 10.141.0.1 2U Compute02 10.141.0.2 2U Compute03 10.141.0.3 2U

Internal Private IPMI Cluster Network 10.148.0.0/16 Server / Node Name IP Address Server Height (1U 2U etc) Head 10.148.255.254/16 2U Compute01 10.148.0.1 2U Compute02 10.148.0.2 2U Compute03 10.148.0.3 2U

Internal Private Fast/IB Cluster Network 10.149.0.0/16 Server / Node Name IP Address Server Height (1U 2U etc) Head 10.149.255.254/16 2U Compute01 10.149.0.1 2U Compute02 10.149.0.2 2U Compute03 10.149.0.3 2U

  13 Software Installation and Set-up

• Cluster software installation and perform acceptance testing for optimal performance & stability of the appropriate cluster software and hardware.


Bright Software Configuration

Bright inc Base, Monitoring, HPC, Java, Kernel, OS Centos 7 1

Slurm Scheduler 0

Torque/Default Scheduler 1

LSF Roll 0

Intel Cluster Ready 0

Intel Cluster Developer 0

MOAB 0

Mellanox Drivers 1

CUDA GPU 0

PGI 0


  14 Customer Acceptance

BOSTON LTD cannot take ownership of certain tasks of the installation owing to the nature of those tasks. These tasks remain the responsibility of our Client.

The Client undertakes to ensure the following:

(Consulting engineer to indicate where applicable)  Access is available to all required locations at agreed times.  All cabling and network infrastructure is complete prior to installation, including provision of patch and fly leads where appropriate.  Cabinets are large enough to accommodate all new equipment i.e. rack mount servers, hubs, switches etc.  The ‘Client’ is responsible for all power requirements including: meeting regulatory bodies, ensuring correct quantities of power outlets and any other “power” related issues that may arise. (Mandatory).


“I accept that the above tasks, as indicated, are the responsibility of the customer and undertake to ensure they are completed prior to the scheduled installation date(s). I accept that additional installation time caused by non-compliance of the above may be chargeable to the Customer. Further, I fully understand and agree to the contents of this Scope of Work document.”


Signed

Position

Organisation

Date


 WARNING. - YOU ARE STRONGLY ADVISED NOT TO IGNORE THIS


During the installation period (i.e. any period prior to sign-off by your authorised representative) BOSTON LTD will be testing and re-installing software onto your PCs and/or server. We recommend that you make several backups of your data prior to our installation/upgrade. You are therefore strongly advised not to transfer or store any of your data onto these BOSTON LTD systems until installation is complete. BOSTON LTD will not accept any responsibility for any loss of your data, however caused, during the installation period.


I ______________________________ (Authorised Representative) for _____________________________ (Customer Organisation) have read and fully understand the statement above. Furthermore, I accept that BOSTON LTD cannot be held responsible for any data loss occurring during the installation period.

Special Note on Network Upgrades and Migration:

Extra care needs to be taken when data is transferred during an upgrade or migration. You will need to prepare the data for transfer as BOSTON LTD cannot do this for you. This data will then be transferred if included in the Scope of Work, agreed between both parties.

• However, due to software copying limitations, it is impossible to guarantee the copy of the data will be identical to the original. You should therefore dedicate ICT time and resources to check that the data copy has worked correctly, and restore any missing files from your backup. The BOSTON LTD Engineer will not be able to do this for you.