Difference between revisions of "Rocks: Rocks Solid"

From Define Wiki
Jump to navigation Jump to search
(Created page with "==== Rocks Solid Installation and Configuration ==== RPMs approved and tested: <pre> foundation-pysqlite-2.4.1-1.el5.x86_64.rpm foundation-SQLAlchemy-0.4.8.el5-1....")
 
(No difference)

Latest revision as of 09:46, 13 May 2013

Rocks Solid Installation and Configuration

RPMs approved and tested:

foundation-pysqlite-2.4.1-1.el5.x86_64.rpm           
foundation-SQLAlchemy-0.4.8.el5-1.noarch.rpm
foundation-python-elementtree-1.2.6-5.el5.x86_64.rpm
foundation-SQLAlchemy-0.4.8.el5-1.src.rpm
rocks-solid-0.5.0b1-1.el5.noarch.rpm


Install at the RPMS

rpm -ivh foundation* rocks-solid*

* note, we ignored the warning about support group

Same for all the compute nodes, put the RPMs in

/export/rocks/install/contrib/5.3/arch/RPMS

Create a new XML configuration file that will extend the current compute.xml configuration file:

# cd /export/rocks/install/site-profiles/5.1/nodes
# cp skeleton.xml extend-compute.xml

Inside extend-compute.xml, add the package name by changing the section from:

<!-- Rocks-Solid -->
<package>foundation-pysqlite</package>
<package>foundation-python-elementtree</package>
<package>foundation-SQLAlchemy</package>
<package>rocks-solid</package>

Save extend-compute and run

 
# cd /export/rocks/install
# rocks create distro

Naming of the IPMI modules

compute-ipmi-0-0
compute-ipmi-0-1
...

Edit /etc/rocks-solid.conf

change the following:

[ipmi]
host_pattern=s/compute/compute-ipmi/
user=ADMIN
passwd=ADMIN

[temp]
threshold=55
ipmi_attr=.*Sys.* (System Temperature)

[main]
scheduler=torque
poweron_driver=ipmi (although wol seemed to work!)


Add /etc/rocks-solid.conf to 411 (/var/411/Files.mk) and then run 'rocks sync users'

The following binaries are installed

/opt/rocks/bin/check-ignore-host
/opt/rocks/bin/cluster-clean-ps
/opt/rocks/bin/cluster-freehost
/opt/rocks/bin/cluster-ipmi
/opt/rocks/bin/cluster-power
/opt/rocks/bin/cluster-poweron-sched-nodes
/opt/rocks/bin/cluster-powersave
/opt/rocks/bin/cluster-reset-freeze-node
/opt/rocks/bin/cluster-status-acct
/opt/rocks/bin/node-cleanipcs
/opt/rocks/bin/node-envcheck
/opt/rocks/bin/node-term-sge-zombie
/opt/rocks/bin/node-term-user-ps
/opt/rocks/bin/queue-limit-user-cpu

Turn nodes off and on depending on the load and queue

cluster-powersave
cluster-powersave -d -v (verbose and dry run)

Check the power status of all the nodes

cluster-power status

Run an ipmi query on all the nodes

cluster-ipmi ipmicommand