ollama

History

Jesse Gross 9d97e6a9f1 ggml: Avoid allocating CUDA primary context on unused GPUs The recent memory management changes caused all GPUs to be visible to the runner, regardless of whether they are ultimately used. This caused CUDA devices to allocate a primary context (~300 MB VRAM) on each GPU, for each model. This is unnecessary, so we can both avoid touching GPUs that we exclude in the early stage of allocation and freeing the memory for any that we touch but don't use. The issue will continue to exist for the old engine, since it touches all devices during initialization.		2025-08-27 16:24:18 -07:00
..
.gitignore	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0001-ggml-backend-malloc-and-free-using-the-same-compiler.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0002-pretokenizer.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0003-clip-unicode.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0004-solar-pro.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0005-fix-deepseek-deseret-regex.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0006-maintain-ordering-for-rules-for-grammar.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0007-sort-devices-by-score.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0008-add-phony-target-ggml-cpu-for-all-cpu-variants.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0009-remove-amx.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0010-fix-string-arr-kv-loading.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0011-ollama-debug-tensor.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0012-add-ollama-vocab-for-grammar-support.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0013-add-argsort-and-cuda-copy-for-i32.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0014-graph-memory-reporting-on-failure.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0015-ggml-Export-GPU-UUIDs.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0016-add-C-API-for-mtmd_input_text.patch	llm: New memory management	2025-08-14 15:24:01 -07:00
0017-no-power-throttling-win32-with-gnuc.patch	llm: New memory management	2025-08-14 15:24:01 -07:00
0018-BF16-macos-version-guard.patch	llm: New memory management	2025-08-14 15:24:01 -07:00
0019-Enable-CUDA-Graphs-for-gemma3n.patch	disable output_all (#11959 )	2025-08-18 17:45:40 -07:00
0020-Disable-ggml-blas-on-macos-v13-and-older.patch	llm: New memory management	2025-08-14 15:24:01 -07:00
0021-fix-mtmd-audio.cpp-build-on-windows.patch	llm: New memory management	2025-08-14 15:24:01 -07:00
0022-ggml-No-alloc-mode.patch	llm: New memory management	2025-08-14 15:24:01 -07:00
0023-decode-disable-output_all.patch	disable output_all (#11959 )	2025-08-18 17:45:40 -07:00
0024-ggml-Enable-resetting-backend-devices.patch	ggml: Avoid allocating CUDA primary context on unused GPUs	2025-08-27 16:24:18 -07:00