Difference between revisions of "Platform Cluster : Certification"
| (8 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
| + | ==Pre Requisities== | ||
| + | |||
| + | *Intel Compilers | ||
| + | *Intel MPI | ||
| + | |||
| + | ==Installing Cluster Checker== | ||
| + | |||
| + | Two kits need to be installed on PCM: Intel Cluster Runtime and Intel Cluster Checker. These can be intalled using the kusu-kitops -am command. | ||
| + | |||
| + | Intel® Cluster Runtimes 3.3 | ||
| + | Intel® Cluster Checker 1.8 | ||
| + | |||
| + | The iso must be mounted first. | ||
| + | |||
| + | <syntaxhighlight> | ||
| + | |||
| + | mount kit /mountpont | ||
| + | kusu-kitops -am /path_to_mountpoint | ||
| + | kusu-repoman -a -r Repo -k kit | ||
| + | kusu-repoman -u | ||
| + | kusu-ngedit (add the packages to the node group and sync) | ||
| + | |||
| + | </syntaxhighlight> | ||
| + | |||
| + | ==XML Config File== | ||
| + | |||
| + | The cluster checker is run using the cluster-check command. This requires access to the config file. | ||
| + | |||
| + | A good starting point is to use the autoconfig tool. This will produce a basic config file which can be adjusted to your system. | ||
| + | |||
| + | <syntaxhighlight> | ||
| + | cluster-check --autoconfigure | ||
| + | </syntaxhighlight> | ||
| + | |||
| + | From here adjustments can be made. The xml file must conatin: | ||
| + | |||
| + | <syntaxhighlight> | ||
| + | <user>username<user> | ||
| + | </syntaxhighlight> | ||
| + | |||
| + | The user must a the name of a non privlidged user on the system. Use adduser to create one if needed. | ||
| + | |||
| + | An example config file is shown below. Each benchmark has its own set of tags that can be set. Please see full documentation for detailed list. | ||
| + | |||
| + | |||
| + | <syntaxhighlight> | ||
| + | <cluster> | ||
| + | <global_configuration> | ||
| + | <cc-path>/opt/intel/cce/12.0.191</cc-path> | ||
| + | <fc-path>/opt/intel/fce/12.0.191</fc-path> | ||
| + | <mkl-path> /opt/intel/cmkl/10.3.4.191</mkl-path> | ||
| + | <mpi-path>/opt/intel/impi/4.0.3.008</mpi-path> | ||
| + | </global_configuration> | ||
| + | <nodefile>/opt/intel/clck/1.8/etc/nodelist.20121012.141037.auto</nodefile> | ||
| + | <test> | ||
| + | <hdparm> | ||
| + | <cache-read>3000</cache-read> | ||
| + | <device-read>30</device-read> | ||
| + | </hdparm> | ||
| + | <intel_mpi_rt_internode> | ||
| + | <device options="-genv I_MPI_DEBUG 5">sock</device> | ||
| + | </intel_mpi_rt_internode> | ||
| + | <dat_conf> | ||
| + | <ibstat-path>/etc/rdma</ibstat-path> | ||
| + | </dat_conf> | ||
| + | <file_tree> | ||
| + | <exclude>/boot/initramfs-2.6.32-220.el6.x86_64.img</exclude> | ||
| + | <exclude>/dev/.udev/*</exclude> | ||
| + | <exclude>/etc/mail/*</exclude> | ||
| + | <exclude>/etc/udev/rules.d/70-persistent-net.rules</exclude> | ||
| + | <exclude>/etc/yum.repos.d/redhat.repo.disable</exclude> | ||
| + | <exclude>/opt/kusu/etc/cfm/etc/fstab.OS</exclude> | ||
| + | <exclude>/opt/kusu/etc/lsf.md5</exclude> | ||
| + | <exclude>/usr/lib64/graphviz/config6</exclude> | ||
| + | <exclude>/usr/share/icons/hicolor/icon-theme.cache</exclude> | ||
| + | </file_tree> | ||
| + | <environment> | ||
| + | <exclude>NII_BOOTIP</exclude> | ||
| + | <exclude>NII_HOSTNAME</exclude> | ||
| + | <exclude>NII_NICDEF0</exclude> | ||
| + | <exclude>NII_NID</exclude> | ||
| + | </environment> | ||
| + | <packages> | ||
| + | <head>icrhead-20121016.171434.list</head> | ||
| + | <node>compute000-20121016.171434.list</node> | ||
| + | </packages> | ||
| + | <imb_pingpong_intel_mpi> | ||
| + | <fabric> | ||
| + | <bandwidth>75</bandwidth> | ||
| + | <device>sock</device> | ||
| + | <latency>65</latency> | ||
| + | </fabric> | ||
| + | </imb_pingpong_intel_mpi> | ||
| + | <hpcc> | ||
| + | <fabric> | ||
| + | <device options="-genv I_MPI_DEBUG 5">sock</device> | ||
| + | </fabric> | ||
| + | <thread-number>ALL</thread-number> | ||
| + | </hpcc> | ||
| + | <imb_collective_intel_mpi> | ||
| + | <benchmark>barrier</benchmark> | ||
| + | <benchmark>bcast</benchmark> | ||
| + | <fabric> | ||
| + | <device options="-genv I_MPI_DEBUG 5">sock</device> | ||
| + | </fabric> | ||
| + | </imb_collective_intel_mpi> | ||
| + | <imb_message_integrity_intel_mpi> | ||
| + | <fabric> | ||
| + | <device>sock</device> | ||
| + | </fabric> | ||
| + | </imb_message_integrity_intel_mpi> | ||
| + | <memory_bandwidth_stream> | ||
| + | <bandwidth>3000</bandwidth> | ||
| + | <group name="DIMM_SPEED_1333 AND X9DRT"> | ||
| + | <bandwidth>8531.2</bandwidth> | ||
| + | </group> | ||
| + | </memory_bandwidth_stream> | ||
| + | <mflops_intel_mkl> | ||
| + | <group name="2_PROCESSOR"> | ||
| + | <mflops>112640</mflops> | ||
| + | </group> | ||
| + | <mflops>17000</mflops> | ||
| + | </mflops_intel_mkl> | ||
| + | </test> | ||
| + | <user>michael</user> | ||
| + | </cluster> | ||
| + | |||
| + | </syntaxhighlight> | ||
| + | |||
| + | |||
| + | |||
==First run == | ==First run == | ||
| Line 39: | Line 170: | ||
==Output== | ==Output== | ||
| − | The | + | The output files are saved in |
<syntaxhighlight> | <syntaxhighlight> | ||
| Line 46: | Line 177: | ||
send the .out and .config file to send to intel | send the .out and .config file to send to intel | ||
| + | |||
| + | |||
| + | ==Other options== | ||
| + | |||
| + | There are several options that can be added to the cluster-check command | ||
| + | |||
| + | <syntaxhighlight> | ||
| + | --include_only | ||
| + | --exclude | ||
| + | --verbose | ||
| + | </syntaxhighlight> | ||
| + | |||
| + | These flags allow you to control which tests are run, and how much detail the output provides. These are extremly useful for debugging failed tests | ||
| + | |||
| + | ==dat.conf test== | ||
| + | |||
| + | The dat.conf test is only required for systems with other interfaces than ethernet. | ||
| + | |||
| + | If the dat.conf file is found and vaild the test will pass even if there are no interfaces for it to use. Later mpi tests will then use dat.conf and fail as the interface does not work. | ||
| + | |||
| + | Skip the test if it is not needed. | ||
| + | |||
| + | |||
| + | ==file tree test== | ||
| + | |||
| + | This test will check if the file systems are identical on all nodes. | ||
| + | |||
| + | It will fail on some files as they contain ipaddresses and hostnames. | ||
| + | |||
| + | These must be excluded using the settings in the config.xml file, in a similar way to the ones shown below: | ||
| + | |||
| + | <syntaxhighlight> | ||
| + | |||
| + | <file_tree> | ||
| + | <exclude>/boot/initramfs-2.6.32-220.el6.x86_64.img</exclude> | ||
| + | <exclude>/dev/.udev/*</exclude> | ||
| + | <exclude>/etc/mail/*</exclude> | ||
| + | <exclude>/etc/udev/rules.d/70-persistent-net.rules</exclude> | ||
| + | <exclude>/etc/yum.repos.d/redhat.repo.disable</exclude> | ||
| + | <exclude>/opt/kusu/etc/cfm/etc/fstab.OS</exclude> | ||
| + | <exclude>/opt/kusu/etc/lsf.md5</exclude> | ||
| + | <exclude>/usr/lib64/graphviz/config6</exclude> | ||
| + | <exclude>/usr/share/icons/hicolor/icon-theme.cache</exclude> | ||
| + | </file_tree> | ||
| + | |||
| + | </syntaxhighlight> | ||
Latest revision as of 12:28, 15 August 2013
Pre Requisities
- Intel Compilers
- Intel MPI
Installing Cluster Checker
Two kits need to be installed on PCM: Intel Cluster Runtime and Intel Cluster Checker. These can be intalled using the kusu-kitops -am command.
Intel® Cluster Runtimes 3.3 Intel® Cluster Checker 1.8
The iso must be mounted first.
mount kit /mountpont
kusu-kitops -am /path_to_mountpoint
kusu-repoman -a -r Repo -k kit
kusu-repoman -u
kusu-ngedit (add the packages to the node group and sync)XML Config File
The cluster checker is run using the cluster-check command. This requires access to the config file.
A good starting point is to use the autoconfig tool. This will produce a basic config file which can be adjusted to your system.
cluster-check --autoconfigureFrom here adjustments can be made. The xml file must conatin:
<user>username<user>The user must a the name of a non privlidged user on the system. Use adduser to create one if needed.
An example config file is shown below. Each benchmark has its own set of tags that can be set. Please see full documentation for detailed list.
<cluster>
<global_configuration>
<cc-path>/opt/intel/cce/12.0.191</cc-path>
<fc-path>/opt/intel/fce/12.0.191</fc-path>
<mkl-path> /opt/intel/cmkl/10.3.4.191</mkl-path>
<mpi-path>/opt/intel/impi/4.0.3.008</mpi-path>
</global_configuration>
<nodefile>/opt/intel/clck/1.8/etc/nodelist.20121012.141037.auto</nodefile>
<test>
<hdparm>
<cache-read>3000</cache-read>
<device-read>30</device-read>
</hdparm>
<intel_mpi_rt_internode>
<device options="-genv I_MPI_DEBUG 5">sock</device>
</intel_mpi_rt_internode>
<dat_conf>
<ibstat-path>/etc/rdma</ibstat-path>
</dat_conf>
<file_tree>
<exclude>/boot/initramfs-2.6.32-220.el6.x86_64.img</exclude>
<exclude>/dev/.udev/*</exclude>
<exclude>/etc/mail/*</exclude>
<exclude>/etc/udev/rules.d/70-persistent-net.rules</exclude>
<exclude>/etc/yum.repos.d/redhat.repo.disable</exclude>
<exclude>/opt/kusu/etc/cfm/etc/fstab.OS</exclude>
<exclude>/opt/kusu/etc/lsf.md5</exclude>
<exclude>/usr/lib64/graphviz/config6</exclude>
<exclude>/usr/share/icons/hicolor/icon-theme.cache</exclude>
</file_tree>
<environment>
<exclude>NII_BOOTIP</exclude>
<exclude>NII_HOSTNAME</exclude>
<exclude>NII_NICDEF0</exclude>
<exclude>NII_NID</exclude>
</environment>
<packages>
<head>icrhead-20121016.171434.list</head>
<node>compute000-20121016.171434.list</node>
</packages>
<imb_pingpong_intel_mpi>
<fabric>
<bandwidth>75</bandwidth>
<device>sock</device>
<latency>65</latency>
</fabric>
</imb_pingpong_intel_mpi>
<hpcc>
<fabric>
<device options="-genv I_MPI_DEBUG 5">sock</device>
</fabric>
<thread-number>ALL</thread-number>
</hpcc>
<imb_collective_intel_mpi>
<benchmark>barrier</benchmark>
<benchmark>bcast</benchmark>
<fabric>
<device options="-genv I_MPI_DEBUG 5">sock</device>
</fabric>
</imb_collective_intel_mpi>
<imb_message_integrity_intel_mpi>
<fabric>
<device>sock</device>
</fabric>
</imb_message_integrity_intel_mpi>
<memory_bandwidth_stream>
<bandwidth>3000</bandwidth>
<group name="DIMM_SPEED_1333 AND X9DRT">
<bandwidth>8531.2</bandwidth>
</group>
</memory_bandwidth_stream>
<mflops_intel_mkl>
<group name="2_PROCESSOR">
<mflops>112640</mflops>
</group>
<mflops>17000</mflops>
</mflops_intel_mkl>
</test>
<user>michael</user>
</cluster>
First run
The first run is used to get the list of packages installed on the head and compute nodes. These will be used by the packages test later.
Run as Root
cluster-check <xmlfile> --packagesAdd these to the config file under the packages test:
<packages>
<head>icrhead-20121016.171434.list</head>
<node>compute000-20121016.171434.list</node>
</packages>Second run
Run as standard user.
There are several modules that can be optionally included if there is an infiniband connection.
use the --exclude flag to remove them if they are not needed.
dat_conf openib intel_mpi_testsuite ipoib subnet_manager
cluster-check --certification 1.2 --exclude dat_confOutput
The output files are saved in
/var/log/intel/clck/send the .out and .config file to send to intel
Other options
There are several options that can be added to the cluster-check command
--include_only
--exclude
--verboseThese flags allow you to control which tests are run, and how much detail the output provides. These are extremly useful for debugging failed tests
dat.conf test
The dat.conf test is only required for systems with other interfaces than ethernet.
If the dat.conf file is found and vaild the test will pass even if there are no interfaces for it to use. Later mpi tests will then use dat.conf and fail as the interface does not work.
Skip the test if it is not needed.
file tree test
This test will check if the file systems are identical on all nodes.
It will fail on some files as they contain ipaddresses and hostnames.
These must be excluded using the settings in the config.xml file, in a similar way to the ones shown below:
<file_tree>
<exclude>/boot/initramfs-2.6.32-220.el6.x86_64.img</exclude>
<exclude>/dev/.udev/*</exclude>
<exclude>/etc/mail/*</exclude>
<exclude>/etc/udev/rules.d/70-persistent-net.rules</exclude>
<exclude>/etc/yum.repos.d/redhat.repo.disable</exclude>
<exclude>/opt/kusu/etc/cfm/etc/fstab.OS</exclude>
<exclude>/opt/kusu/etc/lsf.md5</exclude>
<exclude>/usr/lib64/graphviz/config6</exclude>
<exclude>/usr/share/icons/hicolor/icon-theme.cache</exclude>
</file_tree>