Difference between revisions of "Platform Cluster : Certification"

From Define Wiki
Jump to navigation Jump to search
m
Line 178: Line 178:
 
--exclude
 
--exclude
 
--verbose
 
--verbose
<syntaxhighlight>
+
</syntaxhighlight>
  
 
These flags allow you to control which tests are run, and how much detail the output provides. These are extremly useful for debugging failed tests
 
These flags allow you to control which tests are run, and how much detail the output provides. These are extremly useful for debugging failed tests
Line 186: Line 186:
 
The dat.conf test is only required for systems with other interfaces than ethernet.  
 
The dat.conf test is only required for systems with other interfaces than ethernet.  
  
If the dat.conf file is found and vaild the test will pass even if there are no interfaces for it to use. Later mpi tests will then use da.conf and fail as the interface does not work.
+
If the dat.conf file is found and vaild the test will pass even if there are no interfaces for it to use. Later mpi tests will then use dat.conf and fail as the interface does not work.
  
 
Skip the test if it is not needed.
 
Skip the test if it is not needed.

Revision as of 15:27, 29 October 2012

Installing Cluster Checker

Two kits need to be installed on PCM: Intel Cluster Runtime and Intel Cluster Checker. These can be intalled using the kusu-kitops -am command.

Intel® Cluster Runtimes 3.3 Intel® Cluster Checker 1.8

The iso must be mounted first.

kusu-kitops -am /path_to_mountpoint

XML Config File

The cluster checker is run using the cluster-check command. This requires access to the config file.

A good starting point is to use the autoconfig tool. This will produce a basic config file which can be adjusted to your system.

cluster-check --autoconfigure

From here adjustments can be made. The xml file must conatin:

<user>username<user>

The user must a the name of a non privlidged user on the system. Use adduser to create one if needed.

An example config file is shown below. Each benchmark has its own set of tags that can be set. Please see full documentation for detailed list.


<cluster>
  <global_configuration>
    <cc-path>/opt/intel/cce/12.0.191</cc-path>
    <fc-path>/opt/intel/fce/12.0.191</fc-path>
    <mkl-path> /opt/intel/cmkl/10.3.4.191</mkl-path>
    <mpi-path>/opt/intel/impi/4.0.3.008</mpi-path>
  </global_configuration>
  <nodefile>/opt/intel/clck/1.8/etc/nodelist.20121012.141037.auto</nodefile>
  <test>
    <hdparm>
      <cache-read>3000</cache-read>
      <device-read>30</device-read>
    </hdparm>
    <intel_mpi_rt_internode>
    <device options="-genv I_MPI_DEBUG 5">sock</device>
    </intel_mpi_rt_internode>
    <dat_conf>
      <ibstat-path>/etc/rdma</ibstat-path>
    </dat_conf>
    <file_tree>
      <exclude>/boot/initramfs-2.6.32-220.el6.x86_64.img</exclude>
      <exclude>/dev/.udev/*</exclude>
      <exclude>/etc/mail/*</exclude>
      <exclude>/etc/udev/rules.d/70-persistent-net.rules</exclude>
      <exclude>/etc/yum.repos.d/redhat.repo.disable</exclude>
      <exclude>/opt/kusu/etc/cfm/etc/fstab.OS</exclude>
      <exclude>/opt/kusu/etc/lsf.md5</exclude>
      <exclude>/usr/lib64/graphviz/config6</exclude>
      <exclude>/usr/share/icons/hicolor/icon-theme.cache</exclude>
    </file_tree>
    <environment>
      <exclude>NII_BOOTIP</exclude>
      <exclude>NII_HOSTNAME</exclude>
      <exclude>NII_NICDEF0</exclude>
      <exclude>NII_NID</exclude>
    </environment>
    <packages>
      <head>icrhead-20121016.171434.list</head>
      <node>compute000-20121016.171434.list</node>
    </packages>
    <imb_pingpong_intel_mpi>
      <fabric>
        <bandwidth>75</bandwidth>
        <device>sock</device>
        <latency>65</latency>
      </fabric>
    </imb_pingpong_intel_mpi>
    <hpcc>
      <fabric>
        <device options="-genv I_MPI_DEBUG 5">sock</device>
      </fabric>
      <thread-number>ALL</thread-number>
    </hpcc>
    <imb_collective_intel_mpi>
    <benchmark>barrier</benchmark>
    <benchmark>bcast</benchmark>
    <fabric>
    <device options="-genv I_MPI_DEBUG 5">sock</device>
    </fabric>
    </imb_collective_intel_mpi>
    <imb_message_integrity_intel_mpi>
    <fabric>
    <device>sock</device>
    </fabric>
    </imb_message_integrity_intel_mpi>
    <memory_bandwidth_stream>
      <bandwidth>3000</bandwidth>
      <group name="DIMM_SPEED_1333 AND X9DRT">
        <bandwidth>8531.2</bandwidth>
      </group>
    </memory_bandwidth_stream>
    <mflops_intel_mkl>
      <group name="2_PROCESSOR">
        <mflops>112640</mflops>
      </group>
      <mflops>17000</mflops>
    </mflops_intel_mkl>
  </test>
  <user>michael</user>
</cluster>


First run

The first run is used to get the list of packages installed on the head and compute nodes. These will be used by the packages test later.

Run as Root

cluster-check <xmlfile> --packages

Add these to the config file under the packages test:

    <packages>
      <head>icrhead-20121016.171434.list</head>
      <node>compute000-20121016.171434.list</node>
    </packages>

Second run

Run as standard user.

There are several modules that can be optionally included if there is an infiniband connection.

use the --exclude flag to remove them if they are not needed.

dat_conf openib intel_mpi_testsuite ipoib subnet_manager


cluster-check --certification 1.2  --exclude dat_conf

Output

The output files are saved in

/var/log/intel/clck/

send the .out and .config file to send to intel


Other options

There are several options that can be added to the cluster-check command

--include_only
--exclude
--verbose

These flags allow you to control which tests are run, and how much detail the output provides. These are extremly useful for debugging failed tests

dat.conf test

The dat.conf test is only required for systems with other interfaces than ethernet.

If the dat.conf file is found and vaild the test will pass even if there are no interfaces for it to use. Later mpi tests will then use dat.conf and fail as the interface does not work.

Skip the test if it is not needed.