Bright:Add GPUs to OpenStack

From Define Wiki
Revision as of 15:08, 24 March 2016 by David (talk | contribs)
Jump to navigation Jump to search

Performed on the centos 7 node with the 2x K80 GPUs (provisioned as a nova hypervisor under bright / openstack-image - default)

Ensure Intel IOMMU is enabled

Need to pass intel_iommu=on as a kernel arg at boot

[shadow-head]% softwareimage 
[shadow-head->softwareimage]% use openstack-image 
[shadow-head->softwareimage[openstack-image]]% list
Name (key)           Path                                     Kernel version        
-------------------- ---------------------------------------- ----------------------
default-image        /cm/images/default-image                 3.10.0-229.el7.x86_64 
openstack-image      /cm/images/openstack-image               3.10.0-229.el7.x86_64 
test-image           /cm/images/test-image                    3.10.0-229.el7.x86_64 
[shadow-head->softwareimage[openstack-image]]% show
Parameter                        Value                                           
-------------------------------- ------------------------------------------------
Boot FSPart                      98784247814                                      
Creation time                    Tue, 29 Sep 2015 10:29:56 GMT                    
Enable SOL                       yes                                              
FSPart                           98784247814                                      
Kernel modules                   <38 in submode>                                  
Kernel parameters                rdblacklist=nouveau                              
Kernel version                   3.10.0-229.el7.x86_64                            
Locked                           no                                               
Name                             openstack-image                                  
Notes                            <0 bytes>                                        
Path                             /cm/images/openstack-image                       
Revision                                                                          
SOL Flow Control                 no                                               
SOL Port                         ttyS1                                            
SOL Speed                        115200                                           
[shadow-head->softwareimage[openstack-image]]% set kernelparameters "rdblacklist=nouveau intel_iommu=on"
[shadow-head->softwareimage*[openstack-image*]]% show
Parameter                        Value                                           
-------------------------------- ------------------------------------------------
Boot FSPart                      98784247814                                      
Creation time                    Tue, 29 Sep 2015 10:29:56 GMT                    
Enable SOL                       yes                                              
FSPart                           98784247814                                      
Kernel modules                   <38 in submode>                                  
Kernel parameters                rdblacklist=nouveau intel_iommu=on               
Kernel version                   3.10.0-229.el7.x86_64                            
Locked                           no                                               
Name                             openstack-image                                  
Notes                            <0 bytes>                                        
Path                             /cm/images/openstack-image                       
Revision                                                                          
SOL Flow Control                 no                                               
SOL Port                         ttyS1                                            
SOL Speed                        115200                                           
[shadow-head->softwareimage*[openstack-image*]]% commit
=============================== openstack-image ================================
Field                    Message                                                       
------------------------ --------------------------------------------------------------
module                   Warning: Module xhci-hcd does not exist for specified kernel. 
[shadow-head->softwareimage[openstack-image]]%

Now when the node is booted up, check the dmesg output to confirm IOMMU is enabled;

[root@gpu ~]# dmesg | grep -iE "dmar|iommu" | grep -i enabled 
[    0.000000] Intel-IOMMU: enabled

Get the PCI IDs of the GPUs

In this instance we are using K80s

# check for the nvidia GPUs
[root@gpu ~]# lspci | grep -i nvidia
04:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
05:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
# get their IDs with the -n flag
[root@gpu ~]# lspci -nn | grep -i nvidia 
04:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1)
05:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1)
  • So the vendor id is: 10de
  • And the product id is: 102d

Update the /etc/nova/nova.conf file

Applied to the headnode, this will sync with other nodes (all nodes will run same nova.conf version)

# PCI alias and whitelist
[root@shadow-head nova]# grep ^pci_ /etc/nova/nova.conf 
pci_alias = { 'name': 'K80_Tesla', 'vendor_id': '10de', 'product_id': '102d' }
pci_passthrough_whitelist = [{ "vendor_id": "10de", "product_id": "102d" }]

# Scheduler params
[root@shadow-head nova]# grep ^scheduler /etc/nova/nova.conf 
scheduler_max_attempts=5
scheduler_default_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,PciPassthroughFilter
scheduler_available_filters=nova.scheduler.filters.all_filters
scheduler_available_filters=nova.scheduler.filters.pci_passthrough_filter.PciPassthroughFilter
scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler