Difference between revisions of "GPU: Nvidia-smi"
| Line 245: | Line 245: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
| + | |||
| + | Note on boost: We cannot go above the frequencies set by nvidia-smi (no further overclocking) unlike the GeForce card. Reason being (From NVIDIA): | ||
| + | |||
| + | <i>The clocks on Tesla GPUs cannot be adjusted beyond what is permitted via nvidia-smi. | ||
| + | |||
| + | Some background on why Tesla clocks are set lower than similar GeForce cards which may help in discussion with your customers. The physical characteristics of silicon change over time, and change more rapidly at higher temperatures. GeForce cards are expected to be used a few hours a day with a short operational life as the gamer upgrades to a new product to play the most recent games. Tesla GPUs may be in operation 24/7 over several years. We need to guarantee that the product is still within spec for its whole operational lifetime taking into account the physical changes to the silicon. Also we work with OEMs to certify the products at particular power and thermal settings.</i> | ||
== OTHER == | == OTHER == | ||
Latest revision as of 12:07, 28 January 2015
List the GPUS
nividia-smi -LGPU 0: Tesla K20m (S/N: 0325212006276)
GPU 1: Tesla K20m (S/N: 0324712040020)Other options for list
-i, --id= Target a specific GPU.
-f, --filename= Log to a specified file, rather than to stdout.
-l, --loop= Probe until Ctrl+C at specified second interval.Query
nvidia-smi -qGPU 0000:84:00.0
Product Name : Tesla K20m
Display Mode : Disabled
Persistence Mode : Disabled
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0324712040020
GPU UUID : GPU-f5ea157d-f71d-277e-afaf-a598426a8dc1
VBIOS Version : 80.10.11.00.0B
Inforom Version
Image Version : 2081.0208.01.07
OEM Object : 1.1
ECC Object : 3.0
Power Management Object : N/A
GPU Operation Mode
Current : Compute
Pending : Compute
PCI
Bus : 0x84
Device : 0x00
Domain : 0x0000
Device Id : 0x102810DE
Bus Id : 0000:84:00.0
Sub System Id : 0x101510DE
GPU Link Info
PCIe Generation
Max : 2
Current : 2
Link Width
Max : 16x
Current : 16x
Fan Speed : N/A
Performance State : P0
Clocks Throttle Reasons
Idle : Not Active
User Defined Clocks : Active
SW Power Cap : Not Active
HW Slowdown : Not Active
Unknown : Not Active
Memory Usage
Total : 4799 MB
Used : 11 MB
Free : 4788 MB
Compute Mode : Default
Utilization
Gpu : 75 %
Memory : 4 %
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Aggregate
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Temperature
Gpu : 34 C
Power Readings
Power Management : Supported
Power Draw : 43.73 W
Power Limit : 225.00 W
Default Power Limit : 225.00 W
Min Power Limit : 150.00 W
Max Power Limit : 225.00 W
Clocks
Graphics : 705 MHz
SM : 705 MHz
Memory : 2600 MHz
Applications Clocks
Graphics : 705 MHz
Memory : 2600 MHz
Max Clocks
Graphics : 758 MHz
SM : 758 MHz
Memory : 2600 MHz
Compute Processes : None
Other Options for query
An number of other flags can be used with a query
-u, --unit Show unit, rather than GPU, attributes.
-i, --id= Target a specific GPU or Unit.
-f, --filename= Log to a specified file, rather than to stdout.
-x, --xml-format Produce XML output.
--dtd When showing xml output, embed DTD.
-d, --display= Display only selected information: MEMORY,
UTILIZATION, ECC, TEMPERATURE, POWER, CLOCK,
COMPUTE, PIDS, PERFORMANCE, SUPPORTED_CLOCKS.
Flags can be combined with comma e.g. ECC,POWER.
Doesn't work with -u or -x flags.
-l, --loop= Probe until Ctrl+C at specified second interval.Examples of GPU Query and monitoring
nvidia-smi -q -g 0 -d temperature -l==============NVSMI LOG==============
Timestamp : Mon Jan 27 12:05:04 2014
Driver Version : 319.37
Attached GPUs : 1
GPU 0000:83:00.0
Temperature
Gpu : 30 CClock
nvidia-smi -q -g 0 -d clock -l==============NVSMI LOG==============
Timestamp : Mon Jan 27 12:06:27 2014
Driver Version : 319.37
Attached GPUs : 1
GPU 0000:83:00.0
Clocks
Graphics : 324 MHz
SM : 324 MHz
Memory : 324 MHz
Applications Clocks
Graphics : 745 MHz
Memory : 3004 MHz
Default Applications Clocks
Graphics : 745 MHz
Memory : 3004 MHz
Max Clocks
Graphics : 875 MHz
SM : 875 MHz
Memory : 3004 MHzChanging the ECC setting
See the Current and Future setting
[root@gpu3 ~]# nvidia-smi -q -d ecc | grep 'Ecc Mode' -A 2
Ecc Mode
Current : Enabled
Pending : Enabled
--
Ecc Mode
Current : Enabled
Pending : Enabled
--
Ecc Mode
Current : Enabled
Pending : EnabledChange the setting
Turn On:
nvidia-smi --ecc-config=1
Turn Off:
nvidia-smi --ecc-config=0The setting will only apply after a reboot - but will be shown in the future setting line.
The -i flag can be used to change the stting only on a specific GPU, otherwise it will apply to all GPUs in the system.
Enabling GPU Boost (dynamic overclock)
Clock speeds used in this example were applied to the K40.
Query the GPU for supported clock speeds:
nvidia-smi -q -g 0 -d SUPPORTED_CLOCKS
Persistent mode has to be enabled before changes are applied or the clock speeds will not applied after driver unload/reboot:
nvidia-smi -pm 1
Set CLOCK speeds for CORE and MEM:
nvidia-smi -ac 3004,875Note on boost: We cannot go above the frequencies set by nvidia-smi (no further overclocking) unlike the GeForce card. Reason being (From NVIDIA):
The clocks on Tesla GPUs cannot be adjusted beyond what is permitted via nvidia-smi.
Some background on why Tesla clocks are set lower than similar GeForce cards which may help in discussion with your customers. The physical characteristics of silicon change over time, and change more rapidly at higher temperatures. GeForce cards are expected to be used a few hours a day with a short operational life as the gamer upgrades to a new product to play the most recent games. Tesla GPUs may be in operation 24/7 over several years. We need to guarantee that the product is still within spec for its whole operational lifetime taking into account the physical changes to the silicon. Also we work with OEMs to certify the products at particular power and thermal settings.
OTHER
DEVICE MODIFICATION OPTIONS:
[any one of]
-pm, --persistence-mode= Set persistence mode: 0/DISABLED, 1/ENABLED
-e, --ecc-config= Toggle ECC support: 0/DISABLED, 1/ENABLED
-p, --reset-ecc-errors= Reset ECC error counts: 0/VOLATILE, 1/AGGREGATE
-c, --compute-mode= Set MODE for compute applications:
0/DEFAULT, 1/EXCLUSIVE_THREAD,
2/PROHIBITED, 3/EXCLUSIVE_PROCESS
--gom= Set GPU Operation Mode:
0/ALL_ON, 1/COMPUTE, 2/LOW_DP
-r --gpu-reset Trigger secondary bus reset of the GPU.
Can be used to reset GPU HW state in situations
that would otherwise require a machine reboot.
Typically useful if a double bit ECC error has
occurred.
--id= switch is mandatory for this switch
-ac --application-clocks= Specifies <memory,graphics> clocks as a
pair (e.g. 2000,800) that defines GPU's
speed in MHz while running applications on a GPU.
-rac --reset-application-clocks
Resets the application clocks to the default value.
-pl --power-limit= Specifies maximum power management limit in watts.
[plus optional]
-i, --id= Target a specific GPU.
UNIT MODIFICATION OPTIONS:
-t, --toggle-led= Set Unit LED state: 0/GREEN, 1/AMBER
[plus optional]
-i, --id= Target a specific Unit.
SHOW DTD OPTIONS:
--dtd Print device DTD and exit.
[plus optional]
-f, --filename= Log to a specified file, rather than to stdout.
-u, --unit Show unit, rather than device, DTD.Disable ECC
# -i GPU target ID
# -e ECC Mode (0=off, 1=on)
# reboot required after changes
[root@compute021 ~]# nvidia-smi -i 0 -e 0
Disabled ECC support for GPU 0000:02:00.0.
All done.
Reboot required.
[root@compute021 ~]# nvidia-smi -i 1 -e 0
ECC support is already Disabled for GPU 0000:03:00.0.
All done.
[root@compute021 ~]# nvidia-smi -i 2 -e 0
Disabled ECC support for GPU 0000:84:00.0.
All done.
Reboot required.