GPU Passthrough: Three Days of Pain, One Day of Glory

🎯 The Goal

I wanted to run local LLMs with GPU acceleration in a Proxmox VM. Simple goal, right? Just pass through my RTX 4090 to a Linux VM and let it rip.

Three days later, I was still stuck. The GPU wouldn't initialize, the VM wouldn't boot, and I was questioning my life choices.

Day 1: "This should be easy, I'll have it working in an hour."
Day 3: "Maybe I should just use bare metal..."
Day 4: "OH MY GOD IT WORKS!"

❌ What Didn't Work (Days 1-3)

Day 1: The Basics

I started with the standard Proxmox GPU passthrough guide. Enable IOMMU, blacklist drivers, configure VFIO. Standard stuff.

# /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"

# /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:2684,10de:22ba

Result: VM booted, but GPU wasn't detected. Error: "No compatible devices found."

Day 2: Driver Issues

I thought it was a driver problem. Tried different approaches:

Installing NVIDIA drivers in the VM
Different driver versions
Manually loading VFIO modules

Result: Still nothing. The GPU was passed through (visible in lspci), but drivers couldn't initialize it.

Day 3: Configuration Hell

I tried every configuration option I could find:

Different IOMMU settings
PCIe slot configurations
ROM file loading
Different VM settings

Result: VM wouldn't boot at all. Progress, I guess?

✅ What Finally Worked (Day 4)

The Missing Piece

After three days of frustration, I found a forum post mentioning one specific setting I'd missed:

# The magic setting: All Functions
# In Proxmox VM config, I needed to check "All Functions"
# This tells Proxmox to pass through ALL PCIe functions, not just the main GPU

But wait, there's more. The real issue was the IOMMU groups. My GPU was in an IOMMU group with other devices, and I needed to pass through the entire group.

The Complete Working Configuration

Here's what finally worked:

# 1. Check IOMMU groups
find /sys/kernel/iommu_groups/ -type l | sort -V

# 2. Find your GPU's group
lspci -nn | grep -i nvidia
# Output: 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD102 [GeForce RTX 4090] [10de:2684]

# 3. Check what's in the same IOMMU group
find /sys/kernel/iommu_groups/*/devices/* -name "01:00.*" | xargs dirname | xargs basename
# This showed me group 15

# 4. Check ALL devices in group 15
ls -la /sys/kernel/iommu_groups/15/devices/
# Found: GPU + Audio controller + USB controller

# 5. Pass through ALL devices in the group
# In Proxmox: Hardware → Add → PCI Device
# Select GPU, then ALSO add the audio controller
# Check "All Functions" for both

The Proxmox VM Config

# /etc/pve/qemu-server/100.conf
hostpci0: 01:00.0,pcie=1,rombar=0
hostpci1: 01:00.1,pcie=1,rombar=0  # Audio controller
machine: q35

The key: Passing through the audio controller too. The GPU and audio are on the same PCIe device, and VFIO needs both.

📊 The Results

After four days, here's what I got:

Metric	Bare Metal	VM Passthrough
GPU Utilization	98-100%	95-98%
LLM Inference Speed	45 tokens/sec	43 tokens/sec
Overhead	0%	~2-3%

Verdict: The 2-3% overhead is totally worth it for the flexibility of VMs.

💡 Lessons Learned

IOMMU groups matter: You can't pass through just the GPU if it's grouped with other devices
Check "All Functions": This is critical for multi-function PCIe devices
Audio controller matters: Even if you don't need audio, pass it through
ROM files help: Some GPUs need ROM files for proper initialization
Patience pays off: Sometimes the solution is one obscure setting away

🎯 Final Configuration Checklist

If you're trying GPU passthrough, here's what you need:

✅ IOMMU enabled in BIOS and kernel
✅ VFIO modules loaded
✅ GPU and related devices in VFIO
✅ All devices in IOMMU group passed through
✅ "All Functions" checked in Proxmox
✅ Correct VM machine type (q35)
✅ ROM file if needed

💡 Key Takeaways

GPU passthrough is complex but doable
IOMMU groups are the key to understanding passthrough
Don't give up - the solution is usually one setting away
The performance hit is minimal (2-3%)
It's worth it for the flexibility

Three days of pain, one day of glory. That's GPU passthrough in a nutshell. But now that it works, I wouldn't go back to bare metal. The flexibility of VMs is too valuable.

GPU Passthrough: Three Days of Pain, One Day of Glory

🎯 The Goal

❌ What Didn't Work (Days 1-3)

Day 1: The Basics

Day 2: Driver Issues

Day 3: Configuration Hell

✅ What Finally Worked (Day 4)

The Missing Piece

The Complete Working Configuration

The Proxmox VM Config

📊 The Results

💡 Lessons Learned

🎯 Final Configuration Checklist

💡 Key Takeaways

Share this post

Comments

Leave a Comment

Related Posts

ZFS Snapshots: The Backup That Saved My Business

My Home AI Lab Setup — GPU Computing for Local LLMs

Setting Up Proxmox with ZFS Storage: A Home Lab Infrastructure Guide