Platform Cluster : Certification

From Define Wiki
Revision as of 15:20, 29 October 2012 by Michael (talk | contribs)
Jump to navigation Jump to search

Installing Cluster Checker

Two kits need to be installed on PCM: Intel Cluster Runtime and Intel Cluster Checker. These can be intalled using the kusu-kitops -am command.

Intel® Cluster Runtimes 3.3 Intel® Cluster Checker 1.8

The iso must be mounted first.

kusu-kitops -am /path_to_mountpoint

XML Config File

The cluster checker is run using the cluster-check command. This requires access to the config file.

A good starting point is to use the autoconfig tool. This will produce a basic config file which can be adjusted to your system.

cluster-check --autoconfigure

From here adjustments can be made. The xml file must conatin:

<user>username<user>

The user must a the name of a non privlidged user on the system. Use adduser to create one if needed.

An example config file is shown below. Each benchmark has its own set of tags that can be set. Please see full documentation for detailed list.


<cluster>
  <global_configuration>
    <cc-path>/opt/intel/cce/12.0.191</cc-path>
    <fc-path>/opt/intel/fce/12.0.191</fc-path>
    <mkl-path> /opt/intel/cmkl/10.3.4.191</mkl-path>
    <mpi-path>/opt/intel/impi/4.0.3.008</mpi-path>
  </global_configuration>
  <nodefile>/opt/intel/clck/1.8/etc/nodelist.20121012.141037.auto</nodefile>
  <test>
    <hdparm>
      <cache-read>3000</cache-read>
      <device-read>30</device-read>
    </hdparm>
    <intel_mpi_rt_internode>
    <device options="-genv I_MPI_DEBUG 5">sock</device>
    </intel_mpi_rt_internode>
    <dat_conf>
      <ibstat-path>/etc/rdma</ibstat-path>
    </dat_conf>
    <file_tree>
      <exclude>/boot/initramfs-2.6.32-220.el6.x86_64.img</exclude>
      <exclude>/dev/.udev/*</exclude>
      <exclude>/etc/mail/*</exclude>
      <exclude>/etc/udev/rules.d/70-persistent-net.rules</exclude>
      <exclude>/etc/yum.repos.d/redhat.repo.disable</exclude>
      <exclude>/opt/kusu/etc/cfm/etc/fstab.OS</exclude>
      <exclude>/opt/kusu/etc/lsf.md5</exclude>
      <exclude>/usr/lib64/graphviz/config6</exclude>
      <exclude>/usr/share/icons/hicolor/icon-theme.cache</exclude>
    </file_tree>
    <environment>
      <exclude>NII_BOOTIP</exclude>
      <exclude>NII_HOSTNAME</exclude>
      <exclude>NII_NICDEF0</exclude>
      <exclude>NII_NID</exclude>
    </environment>
    <packages>
      <head>icrhead-20121016.171434.list</head>
      <node>compute000-20121016.171434.list</node>
    </packages>
    <imb_pingpong_intel_mpi>
      <fabric>
        <bandwidth>75</bandwidth>
        <device>sock</device>
        <latency>65</latency>
      </fabric>
    </imb_pingpong_intel_mpi>
    <hpcc>
      <fabric>
        <device options="-genv I_MPI_DEBUG 5">sock</device>
      </fabric>
      <thread-number>ALL</thread-number>
    </hpcc>
    <imb_collective_intel_mpi>
    <benchmark>barrier</benchmark>
    <benchmark>bcast</benchmark>
    <fabric>
    <device options="-genv I_MPI_DEBUG 5">sock</device>
    </fabric>
    </imb_collective_intel_mpi>
    <imb_message_integrity_intel_mpi>
    <fabric>
    <device>sock</device>
    </fabric>
    </imb_message_integrity_intel_mpi>
    <memory_bandwidth_stream>
      <bandwidth>3000</bandwidth>
      <group name="DIMM_SPEED_1333 AND X9DRT">
        <bandwidth>8531.2</bandwidth>
      </group>
    </memory_bandwidth_stream>
    <mflops_intel_mkl>
      <group name="2_PROCESSOR">
        <mflops>112640</mflops>
      </group>
      <mflops>17000</mflops>
    </mflops_intel_mkl>
  </test>
  <user>michael</user>
</cluster>


First run

The first run is used to get the list of packages installed on the head and compute nodes. These will be used by the packages test later.

Run as Root

cluster-check <xmlfile> --packages

Add these to the config file under the packages test:

    <packages>
      <head>icrhead-20121016.171434.list</head>
      <node>compute000-20121016.171434.list</node>
    </packages>

Second run

Run as standard user.

There are several modules that can be optionally included if there is an infiniband connection.

use the --exclude flag to remove them if they are not needed.

dat_conf openib intel_mpi_testsuite ipoib subnet_manager


cluster-check --certification 1.2  --exclude dat_conf

Output

The output files are saved in

/var/log/intel/clck/

send the .out and .config file to send to intel