Difference between revisions of "GPU: Nvidia-smi"

From Define Wiki
Jump to navigation Jump to search
Line 231: Line 231:
  
 
== Enabling GPU Boost (dynamic overclock) ==
 
== Enabling GPU Boost (dynamic overclock) ==
 +
 +
Clock speeds used in this example were applied to the K40.
 
<syntaxhighlight>
 
<syntaxhighlight>
 
To ensure that any changes to clock speed are applied after reboot or driver unload.
 
To ensure that any changes to clock speed are applied after reboot or driver unload.

Revision as of 12:13, 27 January 2014

List the GPUS

nividia-smi -L
GPU 0: Tesla K20m (S/N: 0325212006276)
GPU 1: Tesla K20m (S/N: 0324712040020)

Other options for list

    -i,   --id=                 Target a specific GPU.
    -f,   --filename=           Log to a specified file, rather than to stdout.
    -l,   --loop=               Probe until Ctrl+C at specified second interval.

Query

nvidia-smi -q
GPU 0000:84:00.0
    Product Name                : Tesla K20m
    Display Mode                : Disabled
    Persistence Mode            : Disabled
    Driver Model
        Current                 : N/A
        Pending                 : N/A
    Serial Number               : 0324712040020
    GPU UUID                    : GPU-f5ea157d-f71d-277e-afaf-a598426a8dc1
    VBIOS Version               : 80.10.11.00.0B
    Inforom Version
        Image Version           : 2081.0208.01.07
        OEM Object              : 1.1
        ECC Object              : 3.0
        Power Management Object : N/A
    GPU Operation Mode
        Current                 : Compute
        Pending                 : Compute
    PCI
        Bus                     : 0x84
        Device                  : 0x00
        Domain                  : 0x0000
        Device Id               : 0x102810DE
        Bus Id                  : 0000:84:00.0
        Sub System Id           : 0x101510DE
        GPU Link Info
            PCIe Generation
                Max             : 2
                Current         : 2
            Link Width
                Max             : 16x
                Current         : 16x
    Fan Speed                   : N/A
    Performance State           : P0
    Clocks Throttle Reasons
        Idle                    : Not Active
        User Defined Clocks     : Active
        SW Power Cap            : Not Active
        HW Slowdown             : Not Active
        Unknown                 : Not Active
    Memory Usage
        Total                   : 4799 MB
        Used                    : 11 MB
        Free                    : 4788 MB
    Compute Mode                : Default
    Utilization
        Gpu                     : 75 %
        Memory                  : 4 %
    Ecc Mode
        Current                 : Enabled
        Pending                 : Enabled
    ECC Errors
        Volatile
            Single Bit
                Device Memory   : 0
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : 0
                Total           : 0
            Double Bit
                Device Memory   : 0
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : 0
                Total           : 0
        Aggregate
            Single Bit
                Device Memory   : 0
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : 0
                Total           : 0
            Double Bit
                Device Memory   : 0
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : 0
                Total           : 0
    Temperature
        Gpu                     : 34 C
    Power Readings
        Power Management        : Supported
        Power Draw              : 43.73 W
        Power Limit             : 225.00 W
        Default Power Limit     : 225.00 W
        Min Power Limit         : 150.00 W
        Max Power Limit         : 225.00 W
    Clocks
        Graphics                : 705 MHz
        SM                      : 705 MHz
        Memory                  : 2600 MHz
    Applications Clocks
        Graphics                : 705 MHz
        Memory                  : 2600 MHz
    Max Clocks
        Graphics                : 758 MHz
        SM                      : 758 MHz
        Memory                  : 2600 MHz
    Compute Processes           : None


Other Options for query

An number of other flags can be used with a query

    -u,   --unit                Show unit, rather than GPU, attributes.
    -i,   --id=                 Target a specific GPU or Unit.
    -f,   --filename=           Log to a specified file, rather than to stdout.
    -x,   --xml-format          Produce XML output.
          --dtd                 When showing xml output, embed DTD.
    -d,   --display=            Display only selected information: MEMORY,
                                    UTILIZATION, ECC, TEMPERATURE, POWER, CLOCK,
                                    COMPUTE, PIDS, PERFORMANCE, SUPPORTED_CLOCKS.
                                Flags can be combined with comma e.g. ECC,POWER.
                                Doesn't work with -u or -x flags.
    -l,   --loop=               Probe until Ctrl+C at specified second interval.

Examples of GPU Query and monitoring

nvidia-smi -q -g 0 -d temperature -l
==============NVSMI LOG==============

Timestamp                           : Mon Jan 27 12:05:04 2014
Driver Version                      : 319.37

Attached GPUs                       : 1
GPU 0000:83:00.0
    Temperature
        Gpu                         : 30 C

Clock

nvidia-smi -q -g 0 -d clock -l
==============NVSMI LOG==============

Timestamp                           : Mon Jan 27 12:06:27 2014
Driver Version                      : 319.37

Attached GPUs                       : 1
GPU 0000:83:00.0
    Clocks
        Graphics                    : 324 MHz
        SM                          : 324 MHz
        Memory                      : 324 MHz
    Applications Clocks
        Graphics                    : 745 MHz
        Memory                      : 3004 MHz
    Default Applications Clocks
        Graphics                    : 745 MHz
        Memory                      : 3004 MHz
    Max Clocks
        Graphics                    : 875 MHz
        SM                          : 875 MHz
        Memory                      : 3004 MHz

Changing the ECC setting

See the Current and Future setting

[root@gpu3 ~]# nvidia-smi -q -d ecc | grep 'Ecc Mode' -A 2
    Ecc Mode
        Current                     : Enabled
        Pending                     : Enabled
--
    Ecc Mode
        Current                     : Enabled
        Pending                     : Enabled
--
    Ecc Mode
        Current                     : Enabled
        Pending                     : Enabled

Change the setting

Turn On:
nvidia-smi --ecc-config=1

Turn Off:
nvidia-smi --ecc-config=0

The setting will only apply after a reboot - but will be shown in the future setting line.

The -i flag can be used to change the stting only on a specific GPU, otherwise it will apply to all GPUs in the system.

Enabling GPU Boost (dynamic overclock)

Clock speeds used in this example were applied to the K40.

To ensure that any changes to clock speed are applied after reboot or driver unload.

Query the GPU for supported clock speeds:
nvidia-smi -q -g 0 -d SUPPORTED_CLOCKS

Persistent mode has to be enabled before changes are applied:
nvidia-smi -pm 1

Set CLOCK speeds for CORE and MEM:
nvidia-smi -ac 3004,875

OTHER

  
  DEVICE MODIFICATION OPTIONS:

    [any one of]

    -pm,  --persistence-mode=   Set persistence mode: 0/DISABLED, 1/ENABLED
    -e,   --ecc-config=         Toggle ECC support: 0/DISABLED, 1/ENABLED
    -p,   --reset-ecc-errors=   Reset ECC error counts: 0/VOLATILE, 1/AGGREGATE
    -c,   --compute-mode=       Set MODE for compute applications:
                                0/DEFAULT, 1/EXCLUSIVE_THREAD,
                                2/PROHIBITED, 3/EXCLUSIVE_PROCESS
          --gom=                Set GPU Operation Mode:
                                    0/ALL_ON, 1/COMPUTE, 2/LOW_DP
    -r    --gpu-reset           Trigger secondary bus reset of the GPU.
                                Can be used to reset GPU HW state in situations
                                that would otherwise require a machine reboot.
                                Typically useful if a double bit ECC error has
                                occurred.
                                --id= switch is mandatory for this switch
    -ac   --application-clocks= Specifies <memory,graphics> clocks as a
                                    pair (e.g. 2000,800) that defines GPU's
                                    speed in MHz while running applications on a GPU.
    -rac  --reset-application-clocks
                                Resets the application clocks to the default value.
    -pl   --power-limit=        Specifies maximum power management limit in watts.

   [plus optional]

    -i,   --id=                 Target a specific GPU.

  UNIT MODIFICATION OPTIONS:

    -t,   --toggle-led=         Set Unit LED state: 0/GREEN, 1/AMBER

   [plus optional]

    -i,   --id=                 Target a specific Unit.

  SHOW DTD OPTIONS:

          --dtd                 Print device DTD and exit.

     [plus optional]

    -f,   --filename=           Log to a specified file, rather than to stdout.
    -u,   --unit                Show unit, rather than device, DTD.

Disable ECC

# -i GPU target ID
# -e ECC Mode (0=off, 1=on)
# reboot required after changes

[root@compute021 ~]# nvidia-smi -i 0 -e 0 
Disabled ECC support for GPU 0000:02:00.0.
All done.
Reboot required.
[root@compute021 ~]# nvidia-smi -i 1 -e 0 
ECC support is already Disabled for GPU 0000:03:00.0.
All done.
[root@compute021 ~]# nvidia-smi -i 2 -e 0 
Disabled ECC support for GPU 0000:84:00.0.
All done.
Reboot required.