Platform Cluster : Index
Platform HPC
- Topics below include information based on 2.0, 2.0.1 and 3.0
PCM Administration
Users
Compute, Login & Unmanaged Nodes
- Install Login Node in PCM
- Change Master Public IP Address
- Add Hosts
- Delete Hosts
- Re-provision Node After Motherboard Replacement
- Associate Node Group with Multiple OS Update kits
- Add Unmanaged Hosts
- Force PXEBoot
- Cross provision Centos on compute nodes (form RHEL headnode)
Tools & Software
Web Interface
- Enabling https access for the PCM GUI
- Online resizing of the users' HOME and /(ROOT) directories
- Assign Nodes to Rack in PCM Web GUI
Licences
Packages & Repos
Networks and Interfaces
Jobs and Queues
Other Useful Information
- Using Cron
- Configuring SNMP Vendor MIBs
- snmpd: messages in /var/log/messages
- Dual Boot Windows 2008 R2
- Login Node IPTABLES
- Platform Kusu DB
- Post Installation Custom Scripts
- Diskless CUDA Installation
- Permissions in CFM
- Setup HPC HA (Platform HPC 3.0)
- Remote Visualisation
- Unmount Home Directory
- Green Data Centre Daemons - gdd
- Backup
- Delete Postgres Archive
- Stop OFED From Compiling at Boot Up
Install Intel Runtime and Intel Cluster Ready Without Platform Kits
Working with Kits
LSF Topics
- LSF Commands
- Job Information
- Take LSF Node Offline
- LSF Logs
- Setup Email with Postfix
- Setup an Interactive Queue
- Specifying Resources for Nodes
- Resource Allocation Limits
- Limit Jobs for Users
- Runtime Resource Usage Limits
- MPI Job Placement
- BSUB Job Submission
- Using esub scripts
- Scheduling Speed
- Monthly Report for User Usage
- PCM 2.0.1 Cfmsync -f restarts LSF - Disable
- Setup Application Profiles
- Patching LSF
- RUNLIMIT Bug - Patching LSF
- Using egroup in LSF
- Job Group Limits
- Absolute Priority Scheduling
- Calculating Fairshare - bhpart
- LSF Multicluster
- LSF Matlab ELIM
- Preemptable Low Priority Queue
- Enable Processor Binding for Jobs
Platform MPI
Platform MISC Debugging
- Debug Command
- PCM Mysql Table Repair
- Cannot connect to LSF. Please wait...
- bhosts reports nodes as down
- PMC Data source initialize failed...
- PMC/PAC Failed to connect to database server
- Check LSF Variables Set
- Check hopen hclose history
- Tftpboot files zero size
- Dynamically enabling higher debugging levels
- Debug SSUSP Suspended Jobs
- addhost -u not running badmin reconfig
Overheat Detection
Platform Patches
Rock vs Platform Commands
NEEDS UPDATING
- VNC Access on headnode
- Install CUDA 4.x on PCM 2.0.1/2.1
- Register License
- Create Repo for Applications
- Platform Automated Shutdown Scripts
- BLCR Integration
- Build New Kit