ollama

History

Jesse Gross 3d0b1734c0 ggml: Preallocate CUDA pool memory The GGML CUDA backend allocates additional memory for intermediate results during calculation. This memory isn't currently allocated during worst case graph reservation and therefore not included in scheduling. This means that as these buffers potentially grow with context length, we could crash. This extends the memory allocation system down layer from the GGML graph to the CUDA layer, preallocating the worst case memory there as well. Fixes #11753		2025-09-30 15:04:43 -07:00
..
.gitignore	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0001-ggml-backend-malloc-and-free-using-the-same-compiler.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0002-pretokenizer.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0003-clip-unicode.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0004-solar-pro.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0005-fix-deepseek-deseret-regex.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0006-maintain-ordering-for-rules-for-grammar.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0007-sort-devices-by-score.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0008-add-phony-target-ggml-cpu-for-all-cpu-variants.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0009-remove-amx.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0010-fix-string-arr-kv-loading.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0011-ollama-debug-tensor.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0012-add-ollama-vocab-for-grammar-support.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0013-add-argsort-and-cuda-copy-for-i32.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0014-graph-memory-reporting-on-failure.patch	ggml: Remove allocation status reporting	2025-09-30 15:04:43 -07:00
0015-ggml-Export-GPU-UUIDs.patch	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0016-add-C-API-for-mtmd_input_text.patch	llm: New memory management	2025-08-14 15:24:01 -07:00
0017-no-power-throttling-win32-with-gnuc.patch	llm: New memory management	2025-08-14 15:24:01 -07:00
0018-BF16-macos-version-guard.patch	llm: New memory management	2025-08-14 15:24:01 -07:00
0019-Enable-CUDA-Graphs-for-gemma3n.patch	disable output_all (#11959 )	2025-08-18 17:45:40 -07:00
0020-Disable-ggml-blas-on-macos-v13-and-older.patch	llm: New memory management	2025-08-14 15:24:01 -07:00
0021-fix-mtmd-audio.cpp-build-on-windows.patch	llm: New memory management	2025-08-14 15:24:01 -07:00
0022-ggml-No-alloc-mode.patch	ggml: Preallocate CUDA pool memory	2025-09-30 15:04:43 -07:00
0023-decode-disable-output_all.patch	disable output_all (#11959 )	2025-08-18 17:45:40 -07:00
0024-ggml-Enable-resetting-backend-devices.patch	ggml: Avoid allocating CUDA primary context on unused GPUs	2025-08-27 16:24:18 -07:00
0025-harden-uncaught-exception-registration.patch	harden uncaught exception registration (#12120 )	2025-09-02 09:43:55 -07:00
0026-ggml-Backport-scale-kernel-fixes.patch	ggml: Backport scale kernel fixes	2025-09-30 15:04:43 -07:00