Difference between revisions of "Bright:Add GPUs to OpenStack"

From Define Wiki
Jump to navigation Jump to search
Line 1: Line 1:
Performed on the node with the GPUs (provisioned as a nova hypervisor - default)  
+
Performed on the centos 7 node with the 2x K80 GPUs (provisioned as a nova hypervisor under bright / openstack-image - default)  
 +
 
 +
== Ensure Intel IOMMU is enabled ==
 +
Need to pass <tt>intel_iommu=on</tt> as a kernel arg at boot
 +
 
 +
<syntaxhighlight>
 +
[shadow-head]% softwareimage
 +
[shadow-head->softwareimage]% use openstack-image
 +
[shadow-head->softwareimage[openstack-image]]% list
 +
Name (key)          Path                                    Kernel version       
 +
-------------------- ---------------------------------------- ----------------------
 +
default-image        /cm/images/default-image                3.10.0-229.el7.x86_64
 +
openstack-image      /cm/images/openstack-image              3.10.0-229.el7.x86_64
 +
test-image          /cm/images/test-image                    3.10.0-229.el7.x86_64
 +
[shadow-head->softwareimage[openstack-image]]% show
 +
Parameter                        Value                                         
 +
-------------------------------- ------------------------------------------------
 +
Boot FSPart                      98784247814                                     
 +
Creation time                    Tue, 29 Sep 2015 10:29:56 GMT                   
 +
Enable SOL                      yes                                             
 +
FSPart                          98784247814                                     
 +
Kernel modules                  <38 in submode>                                 
 +
Kernel parameters                rdblacklist=nouveau                             
 +
Kernel version                  3.10.0-229.el7.x86_64                           
 +
Locked                          no                                             
 +
Name                            openstack-image                                 
 +
Notes                            <0 bytes>                                       
 +
Path                            /cm/images/openstack-image                     
 +
Revision                                                                         
 +
SOL Flow Control                no                                             
 +
SOL Port                        ttyS1                                           
 +
SOL Speed                        115200                                         
 +
[shadow-head->softwareimage[openstack-image]]% set kernelparameters "rdblacklist=nouveau intel_iommu=on"
 +
[shadow-head->softwareimage*[openstack-image*]]% show
 +
Parameter                        Value                                         
 +
-------------------------------- ------------------------------------------------
 +
Boot FSPart                      98784247814                                     
 +
Creation time                    Tue, 29 Sep 2015 10:29:56 GMT                   
 +
Enable SOL                      yes                                             
 +
FSPart                          98784247814                                     
 +
Kernel modules                  <38 in submode>                                 
 +
Kernel parameters                rdblacklist=nouveau intel_iommu=on             
 +
Kernel version                  3.10.0-229.el7.x86_64                           
 +
Locked                          no                                             
 +
Name                            openstack-image                                 
 +
Notes                            <0 bytes>                                       
 +
Path                            /cm/images/openstack-image                     
 +
Revision                                                                         
 +
SOL Flow Control                no                                             
 +
SOL Port                        ttyS1                                           
 +
SOL Speed                        115200                                         
 +
[shadow-head->softwareimage*[openstack-image*]]% commit
 +
=============================== openstack-image ================================
 +
Field                    Message                                                     
 +
------------------------ --------------------------------------------------------------
 +
module                  Warning: Module xhci-hcd does not exist for specified kernel.
 +
[shadow-head->softwareimage[openstack-image]]%
 +
</syntaxhighlight>
 +
 
 +
Now when the node is booted up, check the dmesg output to confirm IOMMU is enabled;
 +
<syntaxhighlight>
 +
[root@gpu ~]# dmesg | grep -iE "dmar|iommu" | grep -i enabled
 +
[    0.000000] Intel-IOMMU: enabled
 +
</syntaxhighlight>
  
 
== Get the PCI IDs of the GPUs ==
 
== Get the PCI IDs of the GPUs ==
Line 14: Line 77:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
* So the <b>vendor id</b> is: <tt> 10de</tt>
+
* So the <b>vendor id</b> is: <tt>10de</tt>
 
* And the <b>product id</b> is: <tt>102d</tt>
 
* And the <b>product id</b> is: <tt>102d</tt>
  
 
== Update the /etc/nova/nova.conf file ==
 
== Update the /etc/nova/nova.conf file ==
 +
Applied to the headnode, this will sync with other nodes (all nodes will run same nova.conf version)
 +
<syntaxhighlight>
 +
# PCI alias and whitelist
 +
[root@shadow-head nova]# grep ^pci_ /etc/nova/nova.conf
 +
pci_alias = { 'name': 'K80_Tesla', 'vendor_id': '10de', 'product_id': '102d' }
 +
pci_passthrough_whitelist = [{ "vendor_id": "10de", "product_id": "102d" }]
 +
 +
# Scheduler params
 +
[root@shadow-head nova]# grep ^scheduler /etc/nova/nova.conf
 +
scheduler_max_attempts=5
 +
scheduler_default_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,PciPassthroughFilter
 +
scheduler_available_filters=nova.scheduler.filters.all_filters
 +
scheduler_available_filters=nova.scheduler.filters.pci_passthrough_filter.PciPassthroughFilter
 +
scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler
 +
</syntaxhighlight>

Revision as of 15:08, 24 March 2016

Performed on the centos 7 node with the 2x K80 GPUs (provisioned as a nova hypervisor under bright / openstack-image - default)

Ensure Intel IOMMU is enabled

Need to pass intel_iommu=on as a kernel arg at boot

[shadow-head]% softwareimage 
[shadow-head->softwareimage]% use openstack-image 
[shadow-head->softwareimage[openstack-image]]% list
Name (key)           Path                                     Kernel version        
-------------------- ---------------------------------------- ----------------------
default-image        /cm/images/default-image                 3.10.0-229.el7.x86_64 
openstack-image      /cm/images/openstack-image               3.10.0-229.el7.x86_64 
test-image           /cm/images/test-image                    3.10.0-229.el7.x86_64 
[shadow-head->softwareimage[openstack-image]]% show
Parameter                        Value                                           
-------------------------------- ------------------------------------------------
Boot FSPart                      98784247814                                      
Creation time                    Tue, 29 Sep 2015 10:29:56 GMT                    
Enable SOL                       yes                                              
FSPart                           98784247814                                      
Kernel modules                   <38 in submode>                                  
Kernel parameters                rdblacklist=nouveau                              
Kernel version                   3.10.0-229.el7.x86_64                            
Locked                           no                                               
Name                             openstack-image                                  
Notes                            <0 bytes>                                        
Path                             /cm/images/openstack-image                       
Revision                                                                          
SOL Flow Control                 no                                               
SOL Port                         ttyS1                                            
SOL Speed                        115200                                           
[shadow-head->softwareimage[openstack-image]]% set kernelparameters "rdblacklist=nouveau intel_iommu=on"
[shadow-head->softwareimage*[openstack-image*]]% show
Parameter                        Value                                           
-------------------------------- ------------------------------------------------
Boot FSPart                      98784247814                                      
Creation time                    Tue, 29 Sep 2015 10:29:56 GMT                    
Enable SOL                       yes                                              
FSPart                           98784247814                                      
Kernel modules                   <38 in submode>                                  
Kernel parameters                rdblacklist=nouveau intel_iommu=on               
Kernel version                   3.10.0-229.el7.x86_64                            
Locked                           no                                               
Name                             openstack-image                                  
Notes                            <0 bytes>                                        
Path                             /cm/images/openstack-image                       
Revision                                                                          
SOL Flow Control                 no                                               
SOL Port                         ttyS1                                            
SOL Speed                        115200                                           
[shadow-head->softwareimage*[openstack-image*]]% commit
=============================== openstack-image ================================
Field                    Message                                                       
------------------------ --------------------------------------------------------------
module                   Warning: Module xhci-hcd does not exist for specified kernel. 
[shadow-head->softwareimage[openstack-image]]%

Now when the node is booted up, check the dmesg output to confirm IOMMU is enabled;

[root@gpu ~]# dmesg | grep -iE "dmar|iommu" | grep -i enabled 
[    0.000000] Intel-IOMMU: enabled

Get the PCI IDs of the GPUs

In this instance we are using K80s

# check for the nvidia GPUs
[root@gpu ~]# lspci | grep -i nvidia
04:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
05:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
# get their IDs with the -n flag
[root@gpu ~]# lspci -nn | grep -i nvidia 
04:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1)
05:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1)
  • So the vendor id is: 10de
  • And the product id is: 102d

Update the /etc/nova/nova.conf file

Applied to the headnode, this will sync with other nodes (all nodes will run same nova.conf version)

# PCI alias and whitelist
[root@shadow-head nova]# grep ^pci_ /etc/nova/nova.conf 
pci_alias = { 'name': 'K80_Tesla', 'vendor_id': '10de', 'product_id': '102d' }
pci_passthrough_whitelist = [{ "vendor_id": "10de", "product_id": "102d" }]

# Scheduler params
[root@shadow-head nova]# grep ^scheduler /etc/nova/nova.conf 
scheduler_max_attempts=5
scheduler_default_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,PciPassthroughFilter
scheduler_available_filters=nova.scheduler.filters.all_filters
scheduler_available_filters=nova.scheduler.filters.pci_passthrough_filter.PciPassthroughFilter
scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler