Je suis passé en version “Ubuntu 24.04.2 LTS” (le kernel est 6.8.0-60-generic)
Migration :
Les cartes NVIDIA sont toujours visibles :
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.51.03 Driver Version: 575.51.03 CUDA Version: 12.9 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Quadro M5000 Off | 00000000:00:10.0 Off | Off |
| 39% 44C P8 14W / 150W | 5MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Quadro M4000 Off | 00000000:00:11.0 Off | N/A |
| 49% 48C P8 14W / 120W | 5MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
J’ai refait un benchmark :
Ma configuration :
J’ai lancé un nouveau test via llm_benchmark, afin de comparer avec ma dernière configuration fonctionnelle :
J’ai changé la carte NVIDIA car deux cartes NVIDIA avec 8 Go chacune, elles sont vues par la VM qui est lancé par proxmox :
# nvidia-smi --list-gpus
GPU 0: Quadro M5000 (UUID: GPU-)
GPU 1: Quadro M4000 (UUID: GPU-)
# nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.15 Driver Version: 570.86.15 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Quadro M5000 Off | 00000000:00:10.0 Off | Off |
| 38% 37C P8 13W / 150W | 5MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Quadro M4000 Off | 00000000:00:11.0 Off | N/A |
| 46% 39C P8 13W / 120W | 5MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
les résultats du test sont les suivants :
J’ai trouvé un outil de test de llm : llm_benchmark ( installation via pip )
Je suis en dernière position : https://llm.aidatatools.com/results-linux.php , avec “llama3.1:8b”: “1.12”.


llm_benchmark run
-------Linux----------
{'id': '0', 'name': 'Quadro 4000', 'driver': '390.157', 'gpu_memory_total': '1985.0 MB',
'gpu_memory_free': '1984.0 MB', 'gpu_memory_used': '1.0 MB', 'gpu_load': '0.0%',
'gpu_temperature': '60.0°C'}
Only one GPU card
Total memory size : 61.36 GB
cpu_info: Intel(R) Xeon(R) CPU E5-2450 v2 @ 2.50GHz
gpu_info: Quadro 4000
os_version: Ubuntu 22.04.5 LTS
ollama_version: 0.5.7
----------
LLM models file path:/usr/local/lib/python3.10/dist-packages/llm_benchmark/data/benchmark_models_16gb_ram.yml
Checking and pulling the following LLM models
phi4:14b
qwen2:7b
gemma2:9b
mistral:7b
llama3.1:8b
llava:7b
llava:13b
----------
....
----------------------------------------
Sending the following data to a remote server
-------Linux----------
{'id': '0', 'name': 'Quadro 4000', 'driver': '390.157', 'gpu_memory_total': '1985.0 MB',
'gpu_memory_free': '1984.0 MB', 'gpu_memory_used': '1.0 MB', 'gpu_load': '0.0%',
'gpu_temperature': '61.0°C'}
Only one GPU card
-------Linux----------
{'id': '0', 'name': 'Quadro 4000', 'driver': '390.157', 'gpu_memory_total': '1985.0 MB',
'gpu_memory_free': '1984.0 MB', 'gpu_memory_used': '1.0 MB', 'gpu_load': '0.0%',
'gpu_temperature': '61.0°C'}
Only one GPU card
{
"mistral:7b": "1.40",
"llama3.1:8b": "1.12",
"phi4:14b": "0.76",
"qwen2:7b": "1.31",
"gemma2:9b": "1.03",
"llava:7b": "1.84",
"llava:13b": "0.73",
"uuid": "",
"ollama_version": "0.5.7"
}
----------
Petit test d’installation de Ollama en version LXC via un script :
bash -c "$(wget -qLO - https://github.com/tteck/Proxmox/raw/main/ct/ollama.sh)"

On va voir le résultat … actuellement m’a carte NVIDIA (ou Bios) de supporte pas le Proxmox Passthrough.
root@balkany:~# dmesg | grep -e DMAR -e IOMMU | grep "enable"
[ 0.333769] DMAR: IOMMU enabled
root@balkany:~# dmesg | grep 'remapping'
[ 0.821036] DMAR-IR: Enabled IRQ remapping in xapic mode
[ 0.821038] x2apic: IRQ remapping doesn't support X2APIC mode
# lspci -nn | grep 'NVIDIA'
0a:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF100GL [Quadro 4000] [10de:06dd] (rev a3)
0a:00.1 Audio device [0403]: NVIDIA Corporation GF100 High Definition Audio Controller [10de:0be5] (rev a1)
# cat /etc/default/grub | grep "GRUB_CMDLINE_LINUX_DEFAULT"
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt video=vesafb:off video=efifb:off initcall_blacklist=sysfb_init
# efibootmgr -v
EFI variables are not supported on this system.
# cat /etc/modules
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
# cat /etc/modprobe.d/pve-blacklist.conf | grep nvidia
blacklist nvidiafb
blacklist nvidia
J’ai donc ajouter ceci :
Ma version d’OS est “Ubuntu 22.04.5 LTS”.
Ma version de carte/drivers NVIDIA :
# uname -a
Linux 5.15.0-130-generic #140-Ubuntu SMP Wed Dec 18 17:59:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
# nvidia-smi -L
GPU 0: Quadro 4000 (UUID: GPU-13797e5d-a72f-4c72-609f-686fa4a8c956)
# nvidia-smi
Mon Jan 20 16:41:52 2025
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.157 Driver Version: 390.157 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro 4000 Off | 00000000:00:10.0 Off | N/A |
| 36% 62C P12 N/A / N/A | 1MiB / 1985MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
# cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 390.157 Wed Oct 12 09:19:07 UTC 2022
GCC version: gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)
# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Oct_29_23:50:19_PDT_2024
Cuda compilation tools, release 12.6, V12.6.85
Build cuda_12.6.r12.6/compiler.35059454_0
# ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:10.0 ==
modalias : pci:v000010DEd000006DDsv000010DEsd00000780bc03sc00i00
vendor : NVIDIA Corporation
model : GF100GL [Quadro 4000]
driver : nvidia-driver-390 - distro non-free recommended
driver : xserver-xorg-video-nouveau - distro free builtin
Journal de ollama :