Difference between revisions of "Platform Cluster : Certification"

From Define Wiki
Jump to navigation Jump to search
 
(9 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
==Pre Requisities==
 +
 +
*Intel Compilers
 +
*Intel MPI
 +
 +
==Installing Cluster Checker==
 +
 +
Two kits need to be installed on PCM: Intel Cluster Runtime and Intel Cluster Checker. These can be intalled using the kusu-kitops -am command.
 +
 +
Intel® Cluster Runtimes 3.3
 +
Intel® Cluster Checker 1.8
 +
 +
The iso must be mounted first.
 +
 +
<syntaxhighlight>
 +
 +
mount kit /mountpont
 +
kusu-kitops -am /path_to_mountpoint
 +
kusu-repoman -a -r Repo -k kit
 +
kusu-repoman -u
 +
kusu-ngedit  (add the packages to the node group and sync)
 +
 +
</syntaxhighlight>
 +
 +
==XML Config File==
 +
 +
The cluster checker is run using the cluster-check command. This requires access to the config file.
 +
 +
A good starting point is to use the autoconfig tool. This will produce a basic config file which can be adjusted to your system.
 +
 +
<syntaxhighlight>
 +
cluster-check --autoconfigure
 +
</syntaxhighlight>
 +
 +
From here adjustments can be made. The xml file must conatin:
 +
 +
<syntaxhighlight>
 +
<user>username<user>
 +
</syntaxhighlight>
 +
 +
The user must a the name of a non privlidged user on the system. Use adduser to create one if needed.
 +
 +
An example config file is shown below. Each benchmark has its own set of tags that can be set. Please see full documentation for detailed list.
 +
 +
 +
<syntaxhighlight>
 +
<cluster>
 +
  <global_configuration>
 +
    <cc-path>/opt/intel/cce/12.0.191</cc-path>
 +
    <fc-path>/opt/intel/fce/12.0.191</fc-path>
 +
    <mkl-path> /opt/intel/cmkl/10.3.4.191</mkl-path>
 +
    <mpi-path>/opt/intel/impi/4.0.3.008</mpi-path>
 +
  </global_configuration>
 +
  <nodefile>/opt/intel/clck/1.8/etc/nodelist.20121012.141037.auto</nodefile>
 +
  <test>
 +
    <hdparm>
 +
      <cache-read>3000</cache-read>
 +
      <device-read>30</device-read>
 +
    </hdparm>
 +
    <intel_mpi_rt_internode>
 +
    <device options="-genv I_MPI_DEBUG 5">sock</device>
 +
    </intel_mpi_rt_internode>
 +
    <dat_conf>
 +
      <ibstat-path>/etc/rdma</ibstat-path>
 +
    </dat_conf>
 +
    <file_tree>
 +
      <exclude>/boot/initramfs-2.6.32-220.el6.x86_64.img</exclude>
 +
      <exclude>/dev/.udev/*</exclude>
 +
      <exclude>/etc/mail/*</exclude>
 +
      <exclude>/etc/udev/rules.d/70-persistent-net.rules</exclude>
 +
      <exclude>/etc/yum.repos.d/redhat.repo.disable</exclude>
 +
      <exclude>/opt/kusu/etc/cfm/etc/fstab.OS</exclude>
 +
      <exclude>/opt/kusu/etc/lsf.md5</exclude>
 +
      <exclude>/usr/lib64/graphviz/config6</exclude>
 +
      <exclude>/usr/share/icons/hicolor/icon-theme.cache</exclude>
 +
    </file_tree>
 +
    <environment>
 +
      <exclude>NII_BOOTIP</exclude>
 +
      <exclude>NII_HOSTNAME</exclude>
 +
      <exclude>NII_NICDEF0</exclude>
 +
      <exclude>NII_NID</exclude>
 +
    </environment>
 +
    <packages>
 +
      <head>icrhead-20121016.171434.list</head>
 +
      <node>compute000-20121016.171434.list</node>
 +
    </packages>
 +
    <imb_pingpong_intel_mpi>
 +
      <fabric>
 +
        <bandwidth>75</bandwidth>
 +
        <device>sock</device>
 +
        <latency>65</latency>
 +
      </fabric>
 +
    </imb_pingpong_intel_mpi>
 +
    <hpcc>
 +
      <fabric>
 +
        <device options="-genv I_MPI_DEBUG 5">sock</device>
 +
      </fabric>
 +
      <thread-number>ALL</thread-number>
 +
    </hpcc>
 +
    <imb_collective_intel_mpi>
 +
    <benchmark>barrier</benchmark>
 +
    <benchmark>bcast</benchmark>
 +
    <fabric>
 +
    <device options="-genv I_MPI_DEBUG 5">sock</device>
 +
    </fabric>
 +
    </imb_collective_intel_mpi>
 +
    <imb_message_integrity_intel_mpi>
 +
    <fabric>
 +
    <device>sock</device>
 +
    </fabric>
 +
    </imb_message_integrity_intel_mpi>
 +
    <memory_bandwidth_stream>
 +
      <bandwidth>3000</bandwidth>
 +
      <group name="DIMM_SPEED_1333 AND X9DRT">
 +
        <bandwidth>8531.2</bandwidth>
 +
      </group>
 +
    </memory_bandwidth_stream>
 +
    <mflops_intel_mkl>
 +
      <group name="2_PROCESSOR">
 +
        <mflops>112640</mflops>
 +
      </group>
 +
      <mflops>17000</mflops>
 +
    </mflops_intel_mkl>
 +
  </test>
 +
  <user>michael</user>
 +
</cluster>
 +
                                                                                                                     
 +
</syntaxhighlight>
 +
 +
 +
 
==First run ==
 
==First run ==
  
The first run is used to get the list of packages installed on the head and compute nodes. These will be used by the packages test later
+
The first run is used to get the list of packages installed on the head and compute nodes. These will be used by the packages test later.
 +
 
 +
Run as Root
  
 
<syntaxhighlight>
 
<syntaxhighlight>
Line 15: Line 148:
 
     </packages>
 
     </packages>
 
</syntaxhighlight>
 
</syntaxhighlight>
 
  
 
==Second run==
 
==Second run==
Line 38: Line 170:
 
==Output==
 
==Output==
  
The out oput files are saved in
+
The output files are saved in
  
 
<syntaxhighlight>
 
<syntaxhighlight>
Line 45: Line 177:
  
 
send the .out and .config file to send to intel
 
send the .out and .config file to send to intel
 +
 +
 +
==Other options==
 +
 +
There are several options that can be added to the cluster-check command
 +
 +
<syntaxhighlight>
 +
--include_only
 +
--exclude
 +
--verbose
 +
</syntaxhighlight>
 +
 +
These flags allow you to control which tests are run, and how much detail the output provides. These are extremly useful for debugging failed tests
 +
 +
==dat.conf test==
 +
 +
The dat.conf test is only required for systems with other interfaces than ethernet.
 +
 +
If the dat.conf file is found and vaild the test will pass even if there are no interfaces for it to use. Later mpi tests will then use dat.conf and fail as the interface does not work.
 +
 +
Skip the test if it is not needed.
 +
 +
 +
==file tree test==
 +
 +
This test will check if the file systems are identical on all nodes.
 +
 +
It will fail on some files as they contain ipaddresses and hostnames.
 +
 +
These must be excluded using the settings in the config.xml file, in a similar way to the ones shown below:
 +
 +
<syntaxhighlight>
 +
 +
    <file_tree>
 +
      <exclude>/boot/initramfs-2.6.32-220.el6.x86_64.img</exclude>
 +
      <exclude>/dev/.udev/*</exclude>
 +
      <exclude>/etc/mail/*</exclude>
 +
      <exclude>/etc/udev/rules.d/70-persistent-net.rules</exclude>
 +
      <exclude>/etc/yum.repos.d/redhat.repo.disable</exclude>
 +
      <exclude>/opt/kusu/etc/cfm/etc/fstab.OS</exclude>
 +
      <exclude>/opt/kusu/etc/lsf.md5</exclude>
 +
      <exclude>/usr/lib64/graphviz/config6</exclude>
 +
      <exclude>/usr/share/icons/hicolor/icon-theme.cache</exclude>
 +
    </file_tree>
 +
 +
</syntaxhighlight>

Latest revision as of 12:28, 15 August 2013

Pre Requisities

  • Intel Compilers
  • Intel MPI

Installing Cluster Checker

Two kits need to be installed on PCM: Intel Cluster Runtime and Intel Cluster Checker. These can be intalled using the kusu-kitops -am command.

Intel® Cluster Runtimes 3.3 Intel® Cluster Checker 1.8

The iso must be mounted first.

mount kit /mountpont
kusu-kitops -am /path_to_mountpoint
kusu-repoman -a -r Repo -k kit
kusu-repoman -u
kusu-ngedit  (add the packages to the node group and sync)

XML Config File

The cluster checker is run using the cluster-check command. This requires access to the config file.

A good starting point is to use the autoconfig tool. This will produce a basic config file which can be adjusted to your system.

cluster-check --autoconfigure

From here adjustments can be made. The xml file must conatin:

<user>username<user>

The user must a the name of a non privlidged user on the system. Use adduser to create one if needed.

An example config file is shown below. Each benchmark has its own set of tags that can be set. Please see full documentation for detailed list.


<cluster>
  <global_configuration>
    <cc-path>/opt/intel/cce/12.0.191</cc-path>
    <fc-path>/opt/intel/fce/12.0.191</fc-path>
    <mkl-path> /opt/intel/cmkl/10.3.4.191</mkl-path>
    <mpi-path>/opt/intel/impi/4.0.3.008</mpi-path>
  </global_configuration>
  <nodefile>/opt/intel/clck/1.8/etc/nodelist.20121012.141037.auto</nodefile>
  <test>
    <hdparm>
      <cache-read>3000</cache-read>
      <device-read>30</device-read>
    </hdparm>
    <intel_mpi_rt_internode>
    <device options="-genv I_MPI_DEBUG 5">sock</device>
    </intel_mpi_rt_internode>
    <dat_conf>
      <ibstat-path>/etc/rdma</ibstat-path>
    </dat_conf>
    <file_tree>
      <exclude>/boot/initramfs-2.6.32-220.el6.x86_64.img</exclude>
      <exclude>/dev/.udev/*</exclude>
      <exclude>/etc/mail/*</exclude>
      <exclude>/etc/udev/rules.d/70-persistent-net.rules</exclude>
      <exclude>/etc/yum.repos.d/redhat.repo.disable</exclude>
      <exclude>/opt/kusu/etc/cfm/etc/fstab.OS</exclude>
      <exclude>/opt/kusu/etc/lsf.md5</exclude>
      <exclude>/usr/lib64/graphviz/config6</exclude>
      <exclude>/usr/share/icons/hicolor/icon-theme.cache</exclude>
    </file_tree>
    <environment>
      <exclude>NII_BOOTIP</exclude>
      <exclude>NII_HOSTNAME</exclude>
      <exclude>NII_NICDEF0</exclude>
      <exclude>NII_NID</exclude>
    </environment>
    <packages>
      <head>icrhead-20121016.171434.list</head>
      <node>compute000-20121016.171434.list</node>
    </packages>
    <imb_pingpong_intel_mpi>
      <fabric>
        <bandwidth>75</bandwidth>
        <device>sock</device>
        <latency>65</latency>
      </fabric>
    </imb_pingpong_intel_mpi>
    <hpcc>
      <fabric>
        <device options="-genv I_MPI_DEBUG 5">sock</device>
      </fabric>
      <thread-number>ALL</thread-number>
    </hpcc>
    <imb_collective_intel_mpi>
    <benchmark>barrier</benchmark>
    <benchmark>bcast</benchmark>
    <fabric>
    <device options="-genv I_MPI_DEBUG 5">sock</device>
    </fabric>
    </imb_collective_intel_mpi>
    <imb_message_integrity_intel_mpi>
    <fabric>
    <device>sock</device>
    </fabric>
    </imb_message_integrity_intel_mpi>
    <memory_bandwidth_stream>
      <bandwidth>3000</bandwidth>
      <group name="DIMM_SPEED_1333 AND X9DRT">
        <bandwidth>8531.2</bandwidth>
      </group>
    </memory_bandwidth_stream>
    <mflops_intel_mkl>
      <group name="2_PROCESSOR">
        <mflops>112640</mflops>
      </group>
      <mflops>17000</mflops>
    </mflops_intel_mkl>
  </test>
  <user>michael</user>
</cluster>


First run

The first run is used to get the list of packages installed on the head and compute nodes. These will be used by the packages test later.

Run as Root

cluster-check <xmlfile> --packages

Add these to the config file under the packages test:

    <packages>
      <head>icrhead-20121016.171434.list</head>
      <node>compute000-20121016.171434.list</node>
    </packages>

Second run

Run as standard user.

There are several modules that can be optionally included if there is an infiniband connection.

use the --exclude flag to remove them if they are not needed.

dat_conf openib intel_mpi_testsuite ipoib subnet_manager


cluster-check --certification 1.2  --exclude dat_conf

Output

The output files are saved in

/var/log/intel/clck/

send the .out and .config file to send to intel


Other options

There are several options that can be added to the cluster-check command

--include_only
--exclude
--verbose

These flags allow you to control which tests are run, and how much detail the output provides. These are extremly useful for debugging failed tests

dat.conf test

The dat.conf test is only required for systems with other interfaces than ethernet.

If the dat.conf file is found and vaild the test will pass even if there are no interfaces for it to use. Later mpi tests will then use dat.conf and fail as the interface does not work.

Skip the test if it is not needed.


file tree test

This test will check if the file systems are identical on all nodes.

It will fail on some files as they contain ipaddresses and hostnames.

These must be excluded using the settings in the config.xml file, in a similar way to the ones shown below:

    <file_tree>
      <exclude>/boot/initramfs-2.6.32-220.el6.x86_64.img</exclude>
      <exclude>/dev/.udev/*</exclude>
      <exclude>/etc/mail/*</exclude>
      <exclude>/etc/udev/rules.d/70-persistent-net.rules</exclude>
      <exclude>/etc/yum.repos.d/redhat.repo.disable</exclude>
      <exclude>/opt/kusu/etc/cfm/etc/fstab.OS</exclude>
      <exclude>/opt/kusu/etc/lsf.md5</exclude>
      <exclude>/usr/lib64/graphviz/config6</exclude>
      <exclude>/usr/share/icons/hicolor/icon-theme.cache</exclude>
    </file_tree>