GPU Passthrough: Three Days of Pain, One Day of Glory
🎯 The Goal
I wanted to run local LLMs with GPU acceleration in a Proxmox VM. Simple goal, right? Just pass through my RTX 4090 to a Linux VM and let it rip.
Three days later, I was still stuck. The GPU wouldn't initialize, the VM wouldn't boot, and I was questioning my life choices.
Day 1: "This should be easy, I'll have it working in an hour."
Day 3: "Maybe I should just use bare metal..."
Day 4: "OH MY GOD IT WORKS!"
❌ What Didn't Work (Days 1-3)
Day 1: The Basics
I started with the standard Proxmox GPU passthrough guide. Enable IOMMU, blacklist drivers, configure VFIO. Standard stuff.
# /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
# /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:2684,10de:22ba
Result: VM booted, but GPU wasn't detected. Error: "No compatible devices found."
Day 2: Driver Issues
I thought it was a driver problem. Tried different approaches:
- Installing NVIDIA drivers in the VM
- Different driver versions
- Manually loading VFIO modules
Result: Still nothing. The GPU was passed through (visible in lspci), but drivers couldn't initialize it.
Day 3: Configuration Hell
I tried every configuration option I could find:
- Different IOMMU settings
- PCIe slot configurations
- ROM file loading
- Different VM settings
Result: VM wouldn't boot at all. Progress, I guess?
✅ What Finally Worked (Day 4)
The Missing Piece
After three days of frustration, I found a forum post mentioning one specific setting I'd missed:
# The magic setting: All Functions
# In Proxmox VM config, I needed to check "All Functions"
# This tells Proxmox to pass through ALL PCIe functions, not just the main GPU
But wait, there's more. The real issue was the IOMMU groups. My GPU was in an IOMMU group with other devices, and I needed to pass through the entire group.
The Complete Working Configuration
Here's what finally worked:
# 1. Check IOMMU groups
find /sys/kernel/iommu_groups/ -type l | sort -V
# 2. Find your GPU's group
lspci -nn | grep -i nvidia
# Output: 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD102 [GeForce RTX 4090] [10de:2684]
# 3. Check what's in the same IOMMU group
find /sys/kernel/iommu_groups/*/devices/* -name "01:00.*" | xargs dirname | xargs basename
# This showed me group 15
# 4. Check ALL devices in group 15
ls -la /sys/kernel/iommu_groups/15/devices/
# Found: GPU + Audio controller + USB controller
# 5. Pass through ALL devices in the group
# In Proxmox: Hardware → Add → PCI Device
# Select GPU, then ALSO add the audio controller
# Check "All Functions" for both
The Proxmox VM Config
# /etc/pve/qemu-server/100.conf
hostpci0: 01:00.0,pcie=1,rombar=0
hostpci1: 01:00.1,pcie=1,rombar=0 # Audio controller
machine: q35
The key: Passing through the audio controller too. The GPU and audio are on the same PCIe device, and VFIO needs both.
📊 The Results
After four days, here's what I got:
| Metric | Bare Metal | VM Passthrough |
|---|---|---|
| GPU Utilization | 98-100% | 95-98% |
| LLM Inference Speed | 45 tokens/sec | 43 tokens/sec |
| Overhead | 0% | ~2-3% |
Verdict: The 2-3% overhead is totally worth it for the flexibility of VMs.
💡 Lessons Learned
- IOMMU groups matter: You can't pass through just the GPU if it's grouped with other devices
- Check "All Functions": This is critical for multi-function PCIe devices
- Audio controller matters: Even if you don't need audio, pass it through
- ROM files help: Some GPUs need ROM files for proper initialization
- Patience pays off: Sometimes the solution is one obscure setting away
🎯 Final Configuration Checklist
If you're trying GPU passthrough, here's what you need:
- ✅ IOMMU enabled in BIOS and kernel
- ✅ VFIO modules loaded
- ✅ GPU and related devices in VFIO
- ✅ All devices in IOMMU group passed through
- ✅ "All Functions" checked in Proxmox
- ✅ Correct VM machine type (q35)
- ✅ ROM file if needed
💡 Key Takeaways
- GPU passthrough is complex but doable
- IOMMU groups are the key to understanding passthrough
- Don't give up - the solution is usually one setting away
- The performance hit is minimal (2-3%)
- It's worth it for the flexibility
Three days of pain, one day of glory. That's GPU passthrough in a nutshell. But now that it works, I wouldn't go back to bare metal. The flexibility of VMs is too valuable.