ollama/ml/backend/ggml
Santosh Bhavani 8fafc8af77
ml/backend/ggml: NVML fallback for unified memory GPUs (#12619)
* Simplify NVML fallback for unified memory GPUs

Remove device-specific checks and environment variable dependency for
NVML_ERROR_NOT_SUPPORTED fallback. When NVML doesn't support memory
queries, unconditionally use /proc/meminfo instead of checking device
names or OLLAMA_UNIFIED_MEMORY environment variable.

This provides better memory reporting by using MemAvailable which
accounts for reclaimable memory, avoiding the underreporting issue
described in NVIDIA support article a_id/5728.

Tested on NVIDIA GB10 unified memory iGPU with consistent and accurate
memory reporting across multiple model load/unload cycles.

* Add NVML fallback patch for unified memory GPUs
2025-10-15 11:40:06 -07:00
..
ggml ml/backend/ggml: NVML fallback for unified memory GPUs (#12619) 2025-10-15 11:40:06 -07:00
ggml.go Vulkan based on #9650 (#11835) 2025-10-14 10:59:58 -07:00
quantization.go chore: fix some inconsistent function name in comment 2025-08-13 09:50:27 -07:00
threads.go ollama debug tensor 2025-03-11 14:49:19 -07:00
threads_debug.go ollama debug tensor 2025-03-11 14:49:19 -07:00