Difference between revisions of "Platform Cluster : Index"
Jump to navigation
Jump to search
m |
|||
| Line 74: | Line 74: | ||
*[[Stop OFED From Compiling at Boot Up]] | *[[Stop OFED From Compiling at Boot Up]] | ||
| − | + | == Install Intel Runtime and Intel Cluster Ready Without Platform Kits == | |
*[[Intel Runtime and ICR]] | *[[Intel Runtime and ICR]] | ||
| − | + | == Working with Kits == | |
*[[Add Kits]] | *[[Add Kits]] | ||
*[[Build New Kit]] | *[[Build New Kit]] | ||
| Line 84: | Line 84: | ||
*[[Remove Kits]] | *[[Remove Kits]] | ||
| − | + | == LSF Topics == | |
*[[BLCR Integration]] | *[[BLCR Integration]] | ||
*[[LSF Commands]] | *[[LSF Commands]] | ||
| Line 114: | Line 114: | ||
*[[Enable Processor Binding for Jobs]] | *[[Enable Processor Binding for Jobs]] | ||
| − | + | == Platform MPI == | |
*[[Platform MPI General]] | *[[Platform MPI General]] | ||
*[[MPICH2 Under PCM]] | *[[MPICH2 Under PCM]] | ||
| − | + | == Platform MISC Debugging == | |
*[[Debug Command]] | *[[Debug Command]] | ||
*[[PCM Mysql Table Repair]] | *[[PCM Mysql Table Repair]] | ||
| Line 132: | Line 132: | ||
*[[addhost -u not running badmin reconfig]] | *[[addhost -u not running badmin reconfig]] | ||
| − | + | == Overheat Detection == | |
* [[Platform Automated Shutdown Scripts]] | * [[Platform Automated Shutdown Scripts]] | ||
| − | + | == Platform Patches == | |
* [[Adaptec 5805 raid disk not detected]] | * [[Adaptec 5805 raid disk not detected]] | ||
| − | + | == Rock vs Platform Commands == | |
*[[Rocks vs Platform]] | *[[Rocks vs Platform]] | ||
| − | + | == LSF Config Wishlist == | |
*[[Lancaster LSF Wishlist]] | *[[Lancaster LSF Wishlist]] | ||
*[[PCM HPC EE 201 Issues]] | *[[PCM HPC EE 201 Issues]] | ||
Revision as of 14:29, 5 December 2012
Platform HPC
- Topics below include information based on 2.0, 2.0.1 and 3.0
PCM Administration
Users
Compute, Login & Unmanaged Nodes
- Install Login Node in PCM
- Change Master Public IP Address
- Add Hosts
- Delete Hosts
- Enable X11 on Compute Nodes
- Re-provision Node After Motherboard Replacement
- Associate Node Group with Multiple OS Update kits
- Add Unmanaged Hosts
Tools & Software
Web Interface
- Enabling https access for the PCM GUI
- Online resizing of the users' HOME and /(ROOT) directories
- Assign Nodes to Rack in PCM Web GUI
Licences
Packages & Repos
- Add Packages
- Scripting the Addition of Packages
- Change a repository name
- Create Repo for Applications
- PCM Repo Updates
Networks and Interfaces
Jobs and Queues
Other Useful Information
- Force PXEBoot
- Using Cron
- Configuring SNMP Vendor MIBs
- snmpd: messages in /var/log/messages
- Dual Boot Windows 2008 R2
- Login Node IPTABLES
- Platform Kusu DB
- Post Installation Custom Scripts
- Diskless CUDA Installation
- Permissions in CFM
- Setup HPC HA (Platform HPC 3.0)
- Remote Visualisation
- Unmount Home Directory
- Green Data Centre Daemons - gdd
- Backup
- Delete Postgres Archive
- Stop OFED From Compiling at Boot Up
Install Intel Runtime and Intel Cluster Ready Without Platform Kits
Working with Kits
LSF Topics
- BLCR Integration
- LSF Commands
- Job Information
- Take LSF Node Offline
- LSF Logs
- Setup Email with Postfix
- Setup an Interactive Queue
- Specifying Resources for Nodes
- Resource Allocation Limits
- Limit Jobs for Users
- Runtime Resource Usage Limits
- MPI Job Placement
- BSUB Job Submission
- Using esub scripts
- Scheduling Speed
- Monthly Report for User Usage
- PCM 2.0.1 Cfmsync -f restarts LSF - Disable
- Setup Application Profiles
- Patching LSF
- RUNLIMIT Bug - Patching LSF
- Using egroup in LSF
- Job Group Limits
- Absolute Priority Scheduling
- Calculating Fairshare - bhpart
- LSF Multicluster
- LSF Matlab ELIM
- Preemptable Low Priority Queue
- Enable Processor Binding for Jobs
Platform MPI
Platform MISC Debugging
- Debug Command
- PCM Mysql Table Repair
- Cannot connect to LSF. Please wait...
- bhosts reports nodes as down
- PMC Data source initialize failed...
- PMC/PAC Failed to connect to database server
- Check LSF Variables Set
- Check hopen hclose history
- Tftpboot files zero size
- Dynamically enabling higher debugging levels
- Debug SSUSP Suspended Jobs
- addhost -u not running badmin reconfig