Difference between revisions of "Platform Cluster : Index"

From Define Wiki
Jump to navigation Jump to search
 
(25 intermediate revisions by the same user not shown)
Line 4: Line 4:
 
===== PCM Administration =====
 
===== PCM Administration =====
  
== Users ==
 
  
*[[Add Users]]
 
  
== Compute, Login & Unmanaged Nodes ==
 
  
*[[Install Login Node in PCM]]
+
*[[Green Data Centre Daemons - gdd]]
*[[Change Master Public IP Address]]
+
*[[Diskless CUDA Installation]]
*[[Add Hosts]]
 
*[[Delete Hosts]]
 
*[[Enable X11 on Compute Nodes]]
 
*[[X11 Forwarding for compute nodes]]
 
*[[Re-provision Node After Motherboard Replacement]]
 
*[[Associate Node Group with Multiple OS Update kits]]
 
*[[Add Unmanaged Hosts]]
 
*[[Force PXEBoot]]
 
*[[Cross provision Centos on compute nodes (form RHEL headnode)]]
 
 
 
== Tools & Software ==
 
*[[VNC through PMC VncViewer.jar]]
 
*[[Using PDSH]]
 
 
 
== Web Interface ==
 
*[[Enabling https access for the PCM GUI]]
 
*[[Online resizing of the users' HOME and /(ROOT) directories]]
 
*[[Assign Nodes to Rack in PCM Web GUI]]
 
 
 
== Licences ==
 
 
 
*[[License Expire Alert - PCM 3.0]]
 
*[[Setup redundant license servers is PCM 2.1]]
 
 
 
== Packages & Repos ==
 
*[[Add Packages]]
 
*[[Scripting the Addition of Packages]]
 
*[[Change a repository name]]
 
*[[Create Repo for Applications]]
 
*[[PCM Repo Updates]]
 
 
 
==Networks and Interfaces ==
 
*[[Change DNS on headnode]]
 
*[[Add IPMI Network]]
 
*[[Add IPoIB Network]]
 
 
 
== Jobs and Queues ==
 
 
 
*[[Max Job ID]]
 
*[[Re-open Closed Queue]]
 
 
 
 
 
== Other Useful Information ==
 
 
 
*[[Using Cron]]
 
 
*[[Configuring SNMP Vendor MIBs]]
 
*[[Configuring SNMP Vendor MIBs]]
 
*[[snmpd: messages in /var/log/messages]]
 
*[[snmpd: messages in /var/log/messages]]
 
*[[Dual Boot Windows 2008 R2]]
 
*[[Dual Boot Windows 2008 R2]]
*[[Login Node IPTABLES]]
 
*[[Platform Kusu DB]]
 
*[[Post Installation Custom Scripts]]
 
*[[Diskless CUDA Installation]]
 
*[[Permissions in CFM]]
 
*[[Setup HPC HA (Platform HPC 3.0)]]
 
*[[Remote Visualisation]]
 
*[[Unmount Home Directory]]
 
*[[Green Data Centre Daemons - gdd]]
 
*[[Backup]]
 
*[[Delete Postgres Archive]]
 
*[[Stop OFED From Compiling at Boot Up]]
 
  
 
== Install Intel Runtime and Intel Cluster Ready Without Platform Kits ==
 
== Install Intel Runtime and Intel Cluster Ready Without Platform Kits ==
Line 78: Line 18:
 
== Working with Kits ==
 
== Working with Kits ==
 
*[[Add Kits]]
 
*[[Add Kits]]
*[[Build New Kit]]
 
 
*[[build.kit Example File]]
 
*[[build.kit Example File]]
 
*[[Intel Cluster Ready Kit]]
 
*[[Intel Cluster Ready Kit]]
 
*[[Remove Kits]]
 
*[[Remove Kits]]
 
== LSF Topics ==
 
*[[BLCR Integration]]
 
*[[LSF Commands]]
 
*[[Job Information]]
 
*[[Take LSF Node Offline]]
 
*[[LSF Logs]]
 
*[[Setup Email with Postfix]]
 
*[[Setup an Interactive Queue]]
 
*[[Specifying Resources for Nodes]]
 
*[[Resource Allocation Limits]]
 
*[[Limit Jobs for Users]]
 
*[[Runtime Resource Usage Limits]]
 
*[[MPI Job Placement]]
 
*[[BSUB Job Submission]]
 
*[[Using esub scripts]]
 
*[[Scheduling Speed]]
 
*[[Monthly Report for User Usage]]
 
*[[PCM 2.0.1 Cfmsync -f restarts LSF - Disable]]
 
*[[Setup Application Profiles]]
 
*[[Patching LSF]]
 
*[[RUNLIMIT Bug - Patching LSF]]
 
*[[Using egroup in LSF]]
 
*[[Job Group Limits]]
 
*[[Absolute Priority Scheduling]]
 
*[[Calculating Fairshare - bhpart]]
 
*[[LSF Multicluster]]
 
*[[LSF Matlab ELIM]]
 
*[[Preemptable Low Priority Queue]]
 
*[[Enable Processor Binding for Jobs]]
 
 
== Platform MPI ==
 
*[[Platform MPI General]]
 
*[[MPICH2 Under PCM]]
 
 
== Platform MISC Debugging ==
 
*[[Debug Command]]
 
*[[PCM Mysql Table Repair]]
 
*[[Cannot connect to LSF. Please wait...]]
 
*[[bhosts reports nodes as down]]
 
*[[PMC Data source initialize failed...]]
 
*[[PMC/PAC Failed to connect to database server]]
 
*[[Check LSF Variables Set]]
 
*[[Check hopen hclose history]]
 
*[[Tftpboot files zero size]]
 
*[[Dynamically enabling higher debugging levels]]
 
*[[Debug SSUSP Suspended Jobs]]
 
*[[addhost -u not running badmin reconfig]]
 
  
 
== Overheat Detection ==
 
== Overheat Detection ==
* [[Platform Automated Shutdown Scripts]]
 
  
 
== Platform Patches ==
 
== Platform Patches ==
 
* [[Adaptec 5805 raid disk not detected]]
 
* [[Adaptec 5805 raid disk not detected]]
  
== Rock vs Platform Commands ==
+
==NEEDS UPDATING==
*[[Rocks vs Platform]]
 
  
 +
*[[VNC Access on headnode]]
 +
*[[Install CUDA 4.x on PCM 2.0.1/2.1]]
 +
*[[Register License]]
 +
*[[Create Repo for Applications]]
 +
* [[Platform Automated Shutdown Scripts]]
 +
*[[BLCR Integration]]
 +
*[[Build New Kit]]
  
  
 +
*[[Enable X11 on Compute Nodes]]
 +
*[[X11 Forwarding for compute nodes]]
  
 +
*[[MPICH2 Under PCM]]
  
==NEEDS UPDATING==
 
  
*[[VNC Access on headnode]]
+
*[[MPI Job Placement]]
*[[Install CUDA 4.x on PCM 2.0.1/2.1]]
+
*[[Using esub scripts]]
*[[Register License]]
+
*[[Scheduling Speed]]
 +
*[[Patching LSF]]

Latest revision as of 09:43, 1 May 2013