Difference between revisions of "Platform Cluster : Index"

From Define Wiki
Jump to navigation Jump to search
 
(39 intermediate revisions by 3 users not shown)
Line 3: Line 3:
  
 
===== PCM Administration =====
 
===== PCM Administration =====
*[[Install Login Node in PCM]]
+
 
*[[Register License]]
+
 
*[[Add Hosts]]
+
 
*[[Add Users]]
+
 
*[[Add Packages]]
+
*[[Green Data Centre Daemons - gdd]]
*[[Scripting the Addition of Packages]]
+
*[[Diskless CUDA Installation]]
*[[Delete Hosts]]
 
*[[Change Master Public IP Address]]
 
*[[Change DNS on headnode]]
 
*[[Add Unmanaged Hosts]]
 
*[[Using PDSH]]
 
*[[Force PXEBoot]]
 
*[[Re-provision Node After Motherboard Replacement]]
 
*[[Using Cron]]
 
*[[Add IPMI Network]]
 
*[[Add IPoIB Network]]
 
*[[VNC Access on headnode]]
 
*[[VNC through PMC VncViewer.jar]]
 
 
*[[Configuring SNMP Vendor MIBs]]
 
*[[Configuring SNMP Vendor MIBs]]
*[[X11 Forwarding for compute nodes]]
 
*[[PCM Repo Updates]]
 
 
*[[snmpd: messages in /var/log/messages]]
 
*[[snmpd: messages in /var/log/messages]]
 
*[[Dual Boot Windows 2008 R2]]
 
*[[Dual Boot Windows 2008 R2]]
*[[Login Node IPTABLES]]
 
*[[Platform Kusu DB]]
 
*[[Post Installation Custom Scripts]]
 
*[[Diskless CUDA Installation]]
 
*[[Online resizing of the users' HOME and /(ROOT) directories]]
 
*[[Enabling https access for the PCM GUI]]
 
*[[Enable X11 on Compute Nodes]]
 
*[[Re-open Closed Queue]]
 
*[[Setup redundant license servers is PCM 2.1]]
 
*[[Change a repository name]]
 
*[[Permissions in CFM]]
 
*[[Max Job ID]]
 
*[[Setup HPC HA (Platform HPC 3.0)]]
 
*[[Create Repo for Applications]]
 
*[[Remote Visualisation]]
 
*[[Unmount Home Directory]]
 
*[[License Expire Alert - PCM 3.0]]
 
*[[Assign Nodes to Rack in PCM Web GUI]]
 
*[[Green Data Centre Daemons - gdd]]
 
*[[Backup]]
 
*[[Delete Postgres Archive]]
 
*[[Stop OFED From Compiling at Boot Up]]
 
*[[Install CUDA 4.x on PCM 2.0.1/2.1]]
 
*[[Associate Node Group with Multiple OS Update kits]]
 
  
===== Install Intel Runtime and Intel Cluster Ready Without Platform Kits =====
+
== Install Intel Runtime and Intel Cluster Ready Without Platform Kits ==
 
*[[Intel Runtime and ICR]]
 
*[[Intel Runtime and ICR]]
  
===== Working with Kits =====
+
== Working with Kits ==
 
*[[Add Kits]]
 
*[[Add Kits]]
*[[Build New Kit]]
 
 
*[[build.kit Example File]]
 
*[[build.kit Example File]]
 
*[[Intel Cluster Ready Kit]]
 
*[[Intel Cluster Ready Kit]]
 
*[[Remove Kits]]
 
*[[Remove Kits]]
  
===== LSF Topics =====
+
== Overheat Detection ==
*[[BLCR Integration]]
 
*[[LSF Commands]]
 
*[[Job Information]]
 
*[[Take LSF Node Offline]]
 
*[[LSF Logs]]
 
*[[Setup Email with Postfix]]
 
*[[Setup an Interactive Queue]]
 
*[[Specifying Resources for Nodes]]
 
*[[Resource Allocation Limits]]
 
*[[Limit Jobs for Users]]
 
*[[Runtime Resource Usage Limits]]
 
*[[MPI Job Placement]]
 
*[[BSUB Job Submission]]
 
*[[Using esub scripts]]
 
*[[Scheduling Speed]]
 
*[[Monthly Report for User Usage]]
 
*[[PCM 2.0.1 Cfmsync -f restarts LSF - Disable]]
 
*[[Setup Application Profiles]]
 
*[[Patching LSF]]
 
*[[RUNLIMIT Bug - Patching LSF]]
 
*[[Using egroup in LSF]]
 
*[[Job Group Limits]]
 
*[[Absolute Priority Scheduling]]
 
*[[Calculating Fairshare - bhpart]]
 
*[[LSF Multicluster]]
 
*[[LSF Matlab ELIM]]
 
*[[Preemptable Low Priority Queue]]
 
*[[Enable Processor Binding for Jobs]]
 
  
===== Platform MPI =====
+
== Platform Patches ==
*[[Platform MPI General]]
+
* [[Adaptec 5805 raid disk not detected]]
*[[MPICH2 Under PCM]]
 
  
===== Platform MISC Debugging =====
+
==NEEDS UPDATING==
*[[Debug Command]]
 
*[[PCM Mysql Table Repair]]
 
*[[Cannot connect to LSF. Please wait...]]
 
*[[bhosts reports nodes as down]]
 
*[[PMC Data source initialize failed...]]
 
*[[PMC/PAC Failed to connect to database server]]
 
*[[Check LSF Variables Set]]
 
*[[Check hopen hclose history]]
 
*[[Tftpboot files zero size]]
 
*[[Dynamically enabling higher debugging levels]]
 
*[[Debug SSUSP Suspended Jobs]]
 
*[[addhost -u not running badmin reconfig]]
 
  
===== Overheat Detection =====
+
*[[VNC Access on headnode]]
 +
*[[Install CUDA 4.x on PCM 2.0.1/2.1]]
 +
*[[Register License]]
 +
*[[Create Repo for Applications]]
 
* [[Platform Automated Shutdown Scripts]]
 
* [[Platform Automated Shutdown Scripts]]
 +
*[[BLCR Integration]]
 +
*[[Build New Kit]]
  
===== Platform Patches =====
 
* [[Adaptec 5805 raid disk not detected]]
 
  
===== Rock vs Platform Commands =====
+
*[[Enable X11 on Compute Nodes]]
*[[Rocks vs Platform]]
+
*[[X11 Forwarding for compute nodes]]
  
===== LSF Config Wishlist =====
+
*[[MPICH2 Under PCM]]
*[[Lancaster LSF Wishlist]]
 
*[[PCM HPC EE 201 Issues]]
 
  
===== NFS exports on PCM head =====
 
  
Edit the file: /opt/kusu/lib/plugins/addhost/04-nfs.py and append:
+
*[[MPI Job Placement]]
<syntaxhighlight>
+
*[[Using esub scripts]]
os.system("echo \"/home 172.28.0.0/255.255.0.0(rw,async,no_root_squash)\" >> /etc/exports")
+
*[[Scheduling Speed]]
os.system("/etc/init.d/nfs restart > /dev/null 2>&1")
+
*[[Patching LSF]]
</syntaxhighlight>
 

Latest revision as of 09:43, 1 May 2013