Rocks: Rocks Solid
Jump to navigation
Jump to search
Rocks Solid Installation and Configuration
RPMs approved and tested:
foundation-pysqlite-2.4.1-1.el5.x86_64.rpm foundation-SQLAlchemy-0.4.8.el5-1.noarch.rpm foundation-python-elementtree-1.2.6-5.el5.x86_64.rpm foundation-SQLAlchemy-0.4.8.el5-1.src.rpm rocks-solid-0.5.0b1-1.el5.noarch.rpm
Install at the RPMS
rpm -ivh foundation* rocks-solid*
* note, we ignored the warning about support groupSame for all the compute nodes, put the RPMs in
/export/rocks/install/contrib/5.3/arch/RPMSCreate a new XML configuration file that will extend the current compute.xml configuration file:
# cd /export/rocks/install/site-profiles/5.1/nodes
# cp skeleton.xml extend-compute.xml
Inside extend-compute.xml, add the package name by changing the section from:
<!-- Rocks-Solid -->
<package>foundation-pysqlite</package>
<package>foundation-python-elementtree</package>
<package>foundation-SQLAlchemy</package>
<package>rocks-solid</package>Save extend-compute and run
# cd /export/rocks/install
# rocks create distroNaming of the IPMI modules
compute-ipmi-0-0
compute-ipmi-0-1
...Edit /etc/rocks-solid.conf
change the following:
[ipmi]
host_pattern=s/compute/compute-ipmi/
user=ADMIN
passwd=ADMIN
[temp]
threshold=55
ipmi_attr=.*Sys.* (System Temperature)
[main]
scheduler=torque
poweron_driver=ipmi (although wol seemed to work!)
Add /etc/rocks-solid.conf to 411 (/var/411/Files.mk) and then run 'rocks sync users'
The following binaries are installed
/opt/rocks/bin/check-ignore-host
/opt/rocks/bin/cluster-clean-ps
/opt/rocks/bin/cluster-freehost
/opt/rocks/bin/cluster-ipmi
/opt/rocks/bin/cluster-power
/opt/rocks/bin/cluster-poweron-sched-nodes
/opt/rocks/bin/cluster-powersave
/opt/rocks/bin/cluster-reset-freeze-node
/opt/rocks/bin/cluster-status-acct
/opt/rocks/bin/node-cleanipcs
/opt/rocks/bin/node-envcheck
/opt/rocks/bin/node-term-sge-zombie
/opt/rocks/bin/node-term-user-ps
/opt/rocks/bin/queue-limit-user-cpuTurn nodes off and on depending on the load and queue
cluster-powersave
cluster-powersave -d -v (verbose and dry run)Check the power status of all the nodes
cluster-power statusRun an ipmi query on all the nodes
cluster-ipmi ipmicommand