Difference between revisions of "GPU: Hardware Locality"

From Define Wiki
Jump to navigation Jump to search
 
(3 intermediate revisions by the same user not shown)
Line 14: Line 14:
 
Configure, make , make install. These have the usual command line options.
 
Configure, make , make install. These have the usual command line options.
  
== Sample Output Image ==
+
<syntaxhighlight>
 +
./configure --prefix=$HOME --enable-libpci --enable-plugins=nvml
 +
make
 +
make install
 +
</syntaxhighlight>
 +
 
 +
 
 +
== Usage ==
 +
 
 +
The main program is lstopo. Several options can be passed to change the input and output formats.
 +
 
 +
* ./lstopo output.png - produces and image like the one shown.
 +
* ./lstopo-no-graphics - produces a text output
 +
 
 +
 
 +
=== General Help ===
 +
 
 +
<syntaxhighlight>
 +
Supported output file formats: console, txt, fig, pdf, ps, png, svg, xml, synthetic
 +
 
 +
Formatting options:
 +
  -l --logical          Display hwloc logical object indexes
 +
                        (default for console output)
 +
  -p --physical        Display physical object indexes
 +
                        (default for graphical output)
 +
Output options:
 +
  --output-format <format>
 +
  --of <format>        Force the output to use the given format
 +
Textual output options:
 +
  --only <type>        Only show objects of the given type in the textual output
 +
  -v --verbose          Include additional details
 +
  -s --silent          Reduce the amount of details to show
 +
  -c --cpuset          Show the cpuset of each object
 +
  -C --cpuset-only      Only show the cpuset of each object
 +
  --taskset            Show taskset-specific cpuset strings
 +
Object filtering options:
 +
  --ignore <type>      Ignore objects of the given type
 +
  --no-caches          Do not show caches
 +
  --no-useless-caches  Do not show caches which do not have a hierarchical
 +
                        impact
 +
  --no-icaches          Do not show instruction caches
 +
  --merge              Do not show levels that do not have a hierarchical
 +
                        impact
 +
  --restrict <cpuset>  Restrict the topology to processors listed in <cpuset>
 +
  --restrict binding    Restrict the topology to the current process binding
 +
  --no-io              Do not show any I/O device or bridge
 +
  --no-bridges          Do not any I/O bridge except hostbridges
 +
  --whole-io            Show all I/O devices and bridges
 +
Input options:
 +
  --input <XML file>
 +
  -i <XML file>        Read topology from XML file <path>
 +
  --input <directory>
 +
  -i <directory>        Read topology from chroot containing the /proc and /sys
 +
                        of another system
 +
  --input "n:2 2"
 +
  -i "n:2 2"            Simulate a fake hierarchy, here with 2 NUMA nodes of 2
 +
                        processors
 +
  --input-format <format>
 +
  --if <format>        Enforce input format among xml, fsroot, synthetic
 +
  --thissystem          Assume that the input topology provides the topology
 +
                        for the system on which we are running
 +
  --pid <pid>          Detect topology as seen by process <pid>
 +
  --whole-system        Do not consider administration limitations
 +
Graphical output options:
 +
  --fontsize 10        Set size of text font
 +
  --gridsize 10        Set size of margin between elements
 +
  --horiz[=<type,...>]  Horizontal graphical layout instead of nearly 4/3 ratio
 +
  --vert[=<type,...>]  Vertical graphical layout instead of nearly 4/3 ratio
 +
  --no-legend          Remove the text legend at the bottom
 +
Miscellaneous options:
 +
  --ps --top            Display processes within the hierarchy
 +
  --version            Report version and exit
 +
</syntaxhighlight>
 +
 
 +
== Sample Output ==
 +
 
 +
=== Text ===
 +
<syntaxhighlight>
 +
 
 +
Machine (64GB)
 +
  NUMANode L#0 (P#0 32GB)
 +
    Socket L#0 + L3 L#0 (20MB)
 +
      L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
 +
      L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
 +
      L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
 +
      L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
 +
      L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#4)
 +
      L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#5)
 +
      L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#6)
 +
      L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#7)
 +
    HostBridge L#0
 +
      PCIBridge
 +
        PCI 15b3:1003
 +
          Net L#0 "ib0"
 +
          OpenFabrics L#1 "mlx4_0"
 +
      PCIBridge
 +
        PCI 10de:1028
 +
          GPU L#2 "nvml0"
 +
      PCIBridge
 +
        PCI 10de:1028
 +
          GPU L#3 "nvml1"
 +
      PCIBridge
 +
        PCI 8086:1d6b
 +
      PCIBridge
 +
        PCI 102b:0532
 +
      PCI 8086:1d02
 +
        Block L#4 "sda"
 +
  NUMANode L#1 (P#1 32GB)
 +
    Socket L#1 + L3 L#1 (20MB)
 +
      L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 (P#8)
 +
      L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 (P#9)
 +
      L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 + PU L#10 (P#10)
 +
      L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 + PU L#11 (P#11)
 +
      L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12 + PU L#12 (P#12)
 +
      L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13 + PU L#13 (P#13)
 +
      L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14 + PU L#14 (P#14)
 +
      L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15 + PU L#15 (P#15)
 +
    HostBridge L#6
 +
      PCIBridge
 +
        PCI 8086:1521
 +
          Net L#5 "eth0"
 +
        PCI 8086:1521
 +
          Net L#6 "eth1"
 +
      PCIBridge
 +
        PCI 10de:1028
 +
          GPU L#7 "nvml2"
 +
 
 +
</syntaxhighlight>
 +
 
 +
=== Image ===
  
 
[[File:Hwloc.png]]
 
[[File:Hwloc.png]]

Latest revision as of 15:19, 19 April 2013

Installation

Pre requisites

  • Cairo
  • Cairo-devel
  • nvml
  • pciaccess / libpci / libpci-devel
  • pci utils / pciutils-devel (not in centos repos)

Install

Configure, make , make install. These have the usual command line options.

./configure --prefix=$HOME --enable-libpci --enable-plugins=nvml
make
make install


Usage

The main program is lstopo. Several options can be passed to change the input and output formats.

  • ./lstopo output.png - produces and image like the one shown.
  • ./lstopo-no-graphics - produces a text output


General Help

Supported output file formats: console, txt, fig, pdf, ps, png, svg, xml, synthetic

Formatting options:
  -l --logical          Display hwloc logical object indexes
                        (default for console output)
  -p --physical         Display physical object indexes
                        (default for graphical output)
Output options:
  --output-format <format>
  --of <format>         Force the output to use the given format
Textual output options:
  --only <type>         Only show objects of the given type in the textual output
  -v --verbose          Include additional details
  -s --silent           Reduce the amount of details to show
  -c --cpuset           Show the cpuset of each object
  -C --cpuset-only      Only show the cpuset of each object
  --taskset             Show taskset-specific cpuset strings
Object filtering options:
  --ignore <type>       Ignore objects of the given type
  --no-caches           Do not show caches
  --no-useless-caches   Do not show caches which do not have a hierarchical
                        impact
  --no-icaches          Do not show instruction caches
  --merge               Do not show levels that do not have a hierarchical
                        impact
  --restrict <cpuset>   Restrict the topology to processors listed in <cpuset>
  --restrict binding    Restrict the topology to the current process binding
  --no-io               Do not show any I/O device or bridge
  --no-bridges          Do not any I/O bridge except hostbridges
  --whole-io            Show all I/O devices and bridges
Input options:
  --input <XML file>
  -i <XML file>         Read topology from XML file <path>
  --input <directory>
  -i <directory>        Read topology from chroot containing the /proc and /sys
                        of another system
  --input "n:2 2"
  -i "n:2 2"            Simulate a fake hierarchy, here with 2 NUMA nodes of 2
                        processors
  --input-format <format>
  --if <format>         Enforce input format among xml, fsroot, synthetic
  --thissystem          Assume that the input topology provides the topology
                        for the system on which we are running
  --pid <pid>           Detect topology as seen by process <pid>
  --whole-system        Do not consider administration limitations
Graphical output options:
  --fontsize 10         Set size of text font
  --gridsize 10         Set size of margin between elements
  --horiz[=<type,...>]  Horizontal graphical layout instead of nearly 4/3 ratio
  --vert[=<type,...>]   Vertical graphical layout instead of nearly 4/3 ratio
  --no-legend           Remove the text legend at the bottom
Miscellaneous options:
  --ps --top            Display processes within the hierarchy
  --version             Report version and exit

Sample Output

Text

Machine (64GB)
  NUMANode L#0 (P#0 32GB)
    Socket L#0 + L3 L#0 (20MB)
      L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
      L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
      L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
      L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
      L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#4)
      L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#5)
      L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#6)
      L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#7)
    HostBridge L#0
      PCIBridge
        PCI 15b3:1003
          Net L#0 "ib0"
          OpenFabrics L#1 "mlx4_0"
      PCIBridge
        PCI 10de:1028
          GPU L#2 "nvml0"
      PCIBridge
        PCI 10de:1028
          GPU L#3 "nvml1"
      PCIBridge
        PCI 8086:1d6b
      PCIBridge
        PCI 102b:0532
      PCI 8086:1d02
        Block L#4 "sda"
  NUMANode L#1 (P#1 32GB)
    Socket L#1 + L3 L#1 (20MB)
      L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 (P#8)
      L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 (P#9)
      L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 + PU L#10 (P#10)
      L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 + PU L#11 (P#11)
      L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12 + PU L#12 (P#12)
      L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13 + PU L#13 (P#13)
      L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14 + PU L#14 (P#14)
      L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15 + PU L#15 (P#15)
    HostBridge L#6
      PCIBridge
        PCI 8086:1521
          Net L#5 "eth0"
        PCI 8086:1521
          Net L#6 "eth1"
      PCIBridge
        PCI 10de:1028
          GPU L#7 "nvml2"

Image

Hwloc.png