Difference between revisions of "GPU: Nvidia-healthmon"
Jump to navigation
Jump to search
| Line 25: | Line 25: | ||
<syntaxhighlight> | <syntaxhighlight> | ||
./nvidia-healthmon | ./nvidia-healthmon | ||
| + | ./nvidia-healthmon -c config.file | ||
| + | ./nvidia-healthmon --extended | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Revision as of 13:51, 10 May 2013
Installation
https://developer.nvidia.com/tesla-deployment-kit
- Download and untar.
- move the the nvidia-healthmon folder and run.
Update teh config.ini file to match your system
Example file:
[global]
devices.tesla.count = 3
drivers.blacklist = nouveau
[Tesla K20m]
pci.gen = 2
pci.width = 16
temperature.warn = 9Basic Usage
./nvidia-healthmon
./nvidia-healthmon -c config.file
./nvidia-healthmon --extended [-h | --help]: Print usage
[-H | --verbose-help]: Print detailed usage
[-v | --verbose]: Enable verbose output
[-V | --version]: Prints the version number
[-q | --quick]: Execute a subset of tests
[-e | --extended]: Execute the complete test suite
[-i | --id]: Target a specific GPU
[-L | --list-devices]: List all the GPUs attached
[-c | --config]: Path to the configuration file
[-l | --log-file]: Path to the output log fileexample extended verbose output
[root@compute022 nvidia-healthmon]# ./nvidia-healthmon -e -v -i 0
Loading Config: SUCCESS
Global Tests
Black-Listed Drivers: SUCCESS
Load NVML: SUCCESS
Load CUDA: SUCCESS
NVML Sanity: SUCCESS
Tesla Devices Count: SKIPPED
Global Test Results: 4 success, 0 errors, 0 warnings, 1 did not run
-----------------------------------------------------------
GPU 0000:02:00.0 #0 : Tesla K20m (Serial: 0325212005895)
NVML Sanity: SUCCESS
InfoROM: SUCCESS
GEMINI InfoROM
This GPU does not share a board with another GPU chip.
Result: SKIPPED
ECC: SUCCESS
CUDA Sanity
GPU: Tesla K20m
Compute Capability: 3.5
Amount of Memory: 5032706048 bytes
ECC: Enabled
Number of SMs: 13
Core Clock: 705 MHz
Watchdog Timeout: Disabled
Compute Mode: Default
Result: SUCCESS
PCIe Maximum Link Generation: SKIPPED
PCIe Maximum Link Width: SKIPPED
PCI Bandwidth: SKIPPED
Memory
Allocated 4901464656 bytes (97.3%)
Result: SUCCESS
Device Results: 5 success, 0 errors, 0 warnings, 4 did not run
System Results: 9 success, 0 errors, 0 warnings, 5 did not run
One or more tests didn't run.