ollama/llama/patches
Jesse Gross 3d0b1734c0 ggml: Preallocate CUDA pool memory
The GGML CUDA backend allocates additional memory for intermediate
results during calculation. This memory isn't currently allocated
during worst case graph reservation and therefore not included in
scheduling. This means that as these buffers potentially grow
with context length, we could crash.

This extends the memory allocation system down layer from the GGML
graph to the CUDA layer, preallocating the worst case memory there
as well.

Fixes #11753
2025-09-30 15:04:43 -07:00
..
.gitignore update vendored llama.cpp and ggml (#11823) 2025-08-14 14:42:58 -07:00
0001-ggml-backend-malloc-and-free-using-the-same-compiler.patch update vendored llama.cpp and ggml (#11823) 2025-08-14 14:42:58 -07:00
0002-pretokenizer.patch update vendored llama.cpp and ggml (#11823) 2025-08-14 14:42:58 -07:00
0003-clip-unicode.patch update vendored llama.cpp and ggml (#11823) 2025-08-14 14:42:58 -07:00
0004-solar-pro.patch update vendored llama.cpp and ggml (#11823) 2025-08-14 14:42:58 -07:00
0005-fix-deepseek-deseret-regex.patch update vendored llama.cpp and ggml (#11823) 2025-08-14 14:42:58 -07:00
0006-maintain-ordering-for-rules-for-grammar.patch update vendored llama.cpp and ggml (#11823) 2025-08-14 14:42:58 -07:00
0007-sort-devices-by-score.patch update vendored llama.cpp and ggml (#11823) 2025-08-14 14:42:58 -07:00
0008-add-phony-target-ggml-cpu-for-all-cpu-variants.patch update vendored llama.cpp and ggml (#11823) 2025-08-14 14:42:58 -07:00
0009-remove-amx.patch update vendored llama.cpp and ggml (#11823) 2025-08-14 14:42:58 -07:00
0010-fix-string-arr-kv-loading.patch update vendored llama.cpp and ggml (#11823) 2025-08-14 14:42:58 -07:00
0011-ollama-debug-tensor.patch update vendored llama.cpp and ggml (#11823) 2025-08-14 14:42:58 -07:00
0012-add-ollama-vocab-for-grammar-support.patch update vendored llama.cpp and ggml (#11823) 2025-08-14 14:42:58 -07:00
0013-add-argsort-and-cuda-copy-for-i32.patch update vendored llama.cpp and ggml (#11823) 2025-08-14 14:42:58 -07:00
0014-graph-memory-reporting-on-failure.patch ggml: Remove allocation status reporting 2025-09-30 15:04:43 -07:00
0015-ggml-Export-GPU-UUIDs.patch update vendored llama.cpp and ggml (#11823) 2025-08-14 14:42:58 -07:00
0016-add-C-API-for-mtmd_input_text.patch llm: New memory management 2025-08-14 15:24:01 -07:00
0017-no-power-throttling-win32-with-gnuc.patch llm: New memory management 2025-08-14 15:24:01 -07:00
0018-BF16-macos-version-guard.patch llm: New memory management 2025-08-14 15:24:01 -07:00
0019-Enable-CUDA-Graphs-for-gemma3n.patch disable output_all (#11959) 2025-08-18 17:45:40 -07:00
0020-Disable-ggml-blas-on-macos-v13-and-older.patch llm: New memory management 2025-08-14 15:24:01 -07:00
0021-fix-mtmd-audio.cpp-build-on-windows.patch llm: New memory management 2025-08-14 15:24:01 -07:00
0022-ggml-No-alloc-mode.patch ggml: Preallocate CUDA pool memory 2025-09-30 15:04:43 -07:00
0023-decode-disable-output_all.patch disable output_all (#11959) 2025-08-18 17:45:40 -07:00
0024-ggml-Enable-resetting-backend-devices.patch ggml: Avoid allocating CUDA primary context on unused GPUs 2025-08-27 16:24:18 -07:00
0025-harden-uncaught-exception-registration.patch harden uncaught exception registration (#12120) 2025-09-02 09:43:55 -07:00
0026-ggml-Backport-scale-kernel-fixes.patch ggml: Backport scale kernel fixes 2025-09-30 15:04:43 -07:00