Learn how to configure GPU passthrough in OpenStack for high-performance computing.

image

GPU Passthrough in OpenStack allows a virtual machine (VM) to directly access the graphics processing unit (GPU) of a physical host. This allows the VM to take full advantage of the GPU’s capabilities, such as for tasks that require high-performance graphics or machine learning workloads. OpenStack is an open-source cloud computing platform that can be used to deploy and manage VMs, making it possible to use GPU Passthrough in a multi-tenant environment. However, configuring GPU Passthrough in OpenStack can be complex and requires specific hardware and software setup.

To pass a GPU through to virtual machines, you will need to enable VT-d extensions in the BIOS.

The next step in preparing the GPU for passthrough is to ensure the proper drivers are configured. Run lspci command to get PCI bus ID, vendor ID and product ID.

sudo lspci -nn | grep NVIDIA
17:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A] [10de:1e07] (rev a1)
17:00.1 Audio device [0403]: NVIDIA Corporation TU102 High Definition Audio Controller [10de:10f7] (rev a1)
17:00.2 USB controller [0c03]: NVIDIA Corporation TU102 USB 3.1 Host Controller [10de:1ad6] (rev a1)
17:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C UCSI Controller [10de:1ad7] (rev a1)

In this example, the PCI bus ID is 17:00.0, the vendor ID is 10de and the product ID is 1e07.

The card now has the default driver.

sudo lspci -s 17:00.0 -k
17:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A] (rev a1)
        Subsystem: ZOTAC International (MCO) Ltd. TU102 [GeForce RTX 2080 Ti Rev. A]
        Kernel driver in use: nouveau
        Kernel modules: nvidiafb, nouveau

sudo lspci -s 17:00.1 -k
17:00.1 Audio device: NVIDIA Corporation TU102 High Definition Audio Controller (rev a1)
        Subsystem: ZOTAC International (MCO) Ltd. TU102 High Definition Audio Controller
        Kernel driver in use: snd_hda_intel
        Kernel modules: snd_hda_intel

sudo lspci -s 17:00.2 -k
17:00.2 USB controller: NVIDIA Corporation TU102 USB 3.1 Host Controller (rev a1)
        Subsystem: ZOTAC International (MCO) Ltd. TU102 USB 3.1 Host Controller
        Kernel driver in use: xhci_hcd

sudo lspci -s 17:00.3 -k
17:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C UCSI Controller (rev a1)
        Subsystem: ZOTAC International (MCO) Ltd. TU102 USB Type-C UCSI Controller
        Kernel driver in use: nvidia-gpu
        Kernel modules: i2c_nvidia_gpu

We will take every vendor ID and product ID of our GPU and add them to these config:

$ sudo nano /etc/initramfs-tools/modules

with the following contents:

vfio vfio_iommu_type1 vfio_virqfd vfio_pci ids=10de:1e07,10de:10f7,10de:1ad6,10de:1ad7

and

sudo nano /etc/modprobe.d/vfio.conf

with the following contents:

options vfio-pci ids=10de:1e07,10de:10f7,10de:1ad6,10de:1ad7

and

sudo nano /etc/modprobe.d/kvm.conf

with the following contents:

options kvm ignore_msrs=1

and

sudo nano /etc/modprobe.d/blacklist-nvidia.conf

with the following contents:

blacklist nouveau
blacklist nvidiafb

and

sudo nano /etc/default/grub

with the following contents:

GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on vfio-pci.ids=10de:1e07,10de:10f7,10de:1ad6,10de:1ad7 vfio_iommu_type1.allow_unsafe_interrupts=1 modprobe.blacklist=nvidiafb,nouveau"

Refresh the grub configuration and reboot the server with:

$ sudo update-grub
$ reboot

Once the server has returned, you can verify the proper driver has been bound using the following lspci command:

sudo lspci -s 17:00.0 -k
17:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A] (rev a1)
        Subsystem: ZOTAC International (MCO) Ltd. TU102 [GeForce RTX 2080 Ti Rev. A]
        Kernel driver in use: vfio-pci
        Kernel modules: nvidiafb, nouveau

sudo lspci -s 17:00.1 -k
17:00.1 Audio device: NVIDIA Corporation TU102 High Definition Audio Controller (rev a1)
        Subsystem: ZOTAC International (MCO) Ltd. TU102 High Definition Audio Controller
        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel

sudo lspci -s 17:00.2 -k
17:00.2 USB controller: NVIDIA Corporation TU102 USB 3.1 Host Controller (rev a1)
        Subsystem: ZOTAC International (MCO) Ltd. TU102 USB 3.1 Host Controller
        Kernel driver in use: vfio-pci

sudo lspci -s 17:00.3 -k
17:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C UCSI Controller (rev a1)
        Subsystem: ZOTAC International (MCO) Ltd. TU102 USB Type-C UCSI Controller
        Kernel driver in use: vfio-pci
        Kernel modules: i2c_nvidia_gpu

Nova config

First, configure PCI passthrough whitelist on the compute node.

sudo nano /etc/nova/nova.conf

Add the following at the end of the configuration, it should be like the this:

[pci]
passthrough_whitelist: { "vendor_id": "10de", "product_id": "1e07" }

Restart the nova compute service:

$ systemctl restart openstack-nova-*

Next, configure nova.conf on the API service node.

sudo nano /etc/nova/nova.conf

The configuration should be like the following:

[pci]
alias: { "vendor_id":"10de", "product_id":"1e07", "device_type":"type-PCI", "name":"geforce-rtx" }

[filter_scheduler]
enabled_filters = PciPassthroughFilter
available_filters = nova.scheduler.filters.all_filters

Lastly, ensure that the Nova scheduler has been configured with a PCI Passthrough filter.

sudo nano /etc/nova/nova.conf

The configuration should be like the following:

[filter_scheduler]
enabled_filters = PciPassthroughFilter
available_filters = nova.scheduler.filters.all_filters

Restart the nova scheduler service:

$ systemctl restart openstack-nova-*

Create a flavor

Create a flavor with the following command:

openstack flavor create \
--vcpus 2 \
--ram 4096 \
--disk 25 \
--property "pci_passthrough:alias"="geforce-rtx:1" \
gpu_flavor

The pci_passthrough:alias property referencing the geforce-rtx alias we configured earlier. The number 1 instructs nova that a single GPU should be assigned.

Configure image

The image that will be used, should have hidden the hypervisor id since NVIDIA drivers do not work in instances with KVM hypervisor signatures.

openstack image list

Image list:

openstack image set bb60850b-fd03-4afc-b0cd-e6eedb2fdf92 --property img_hide_hypervisor_id=true

Create an instance

openstack server create \
--flavor gpu_flavor \
--image bb60850b-fd03-4afc-b0cd-e6eedb2fdf92 \
--network LAN \
--key-name key \
gpu-server

Log in to the instance and verify the GPU is recognized:

root@test:~$ lspci | grep NVIDIA

Check the hypervisor id signature:

root@test:~$ sudo apt update
root@test:~$ sudo apt install cpuid
root@test:~$ cpuid|grep hypervisor_id
   hypervisor_id = "            "
   hypervisor_id = "            "
   hypervisor_id = "            "
   hypervisor_id = "            "
   hypervisor_id = "            "
   hypervisor_id = "            "
   hypervisor_id = "            "
   hypervisor_id = "            "

Now install nvidia drivers:

root@test:~$ sudo apt install ubuntu-drivers-common

root@test:~$ ubuntu-drivers devices
WARNING:root:_pkg_get_support nvidia-driver-390: package has invalid Support Legacyheader, cannot determine support level
== /sys/devices/pci0000:00/0000:00:05.0 ==
modalias : pci:v000010DEd000017C2sv000010DEsd00001132bc03sc00i00
vendor   : NVIDIA Corporation
model    : TU102 [GeForce Rtx 2080 ti]
driver   : nvidia-driver-455 - third-party non-free
driver   : nvidia-driver-470 - third-party non-free recommended
driver   : nvidia-driver-418-server - distro non-free
driver   : nvidia-driver-450 - third-party non-free
driver   : nvidia-driver-460 - third-party non-free
driver   : nvidia-driver-465 - third-party non-free
driver   : nvidia-driver-450-server - distro non-free
driver   : nvidia-driver-460-server - distro non-free
driver   : nvidia-driver-390 - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

root@test:~$  sudo apt install nvidia-driver-470

When the installation finishes reboot the instance:

root@test:~$  sudo reboot

After rebooting the instance, verify the new drivers have been installed and work correctly by typing nvidia-smi:

The driver works and we are ready to access the GPU.

Want to write a blog post or have an idea for a Superuser article? Fill out the form at openinfrafoundation.formstack.com/forms/superuser_pitch or email [email protected] to submit your pitch.