ollama/llama/patches
Oliver Simons ea85e27bbd
Increase performance for Gemma3n models on NVGPUs by enabling CUDA Graph execution (#11525)
* Enable CUDA Graphs for gemma3n.

Similar to
https://github.com/ggml-org/llama.cpp/pull/14741,
though ollama has a slightly different model graph
than llama.cpp which requires different workaround
checks.

* Remove residual check by reshaping differently in gemma3n model

This should make the heuristics more robust
2025-07-29 12:37:06 -07:00
..
0001-ggml-backend-malloc-and-free-using-the-same-compiler.patch llama: update to commit de4c07f93 (#10655) 2025-05-12 12:17:26 -07:00
0002-pretokenizer.patch llama: update to commit de4c07f93 (#10655) 2025-05-12 12:17:26 -07:00
0003-embeddings.patch llama: update to commit de4c07f93 (#10655) 2025-05-12 12:17:26 -07:00
0004-clip-unicode.patch llama: update to commit de4c07f93 (#10655) 2025-05-12 12:17:26 -07:00
0005-solar-pro.patch add new gemma model (#11204) 2025-06-25 21:47:09 -07:00
0006-fix-deepseek-deseret-regex.patch chore: update mllama to use ollama engine (#10637) 2025-05-13 17:36:02 -07:00
0007-maintain-ordering-for-rules-for-grammar.patch chore: update mllama to use ollama engine (#10637) 2025-05-13 17:36:02 -07:00
0008-ensure-KV-cache-is-fully-defragmented.patch add new gemma model (#11204) 2025-06-25 21:47:09 -07:00
0009-sort-devices-by-score.patch chore: update mllama to use ollama engine (#10637) 2025-05-13 17:36:02 -07:00
0010-add-phony-target-ggml-cpu-for-all-cpu-variants.patch chore: update mllama to use ollama engine (#10637) 2025-05-13 17:36:02 -07:00
0011-remove-amx.patch chore: update mllama to use ollama engine (#10637) 2025-05-13 17:36:02 -07:00
0012-fix-string-arr-kv-loading.patch chore: update mllama to use ollama engine (#10637) 2025-05-13 17:36:02 -07:00
0013-ollama-debug-tensor.patch chore: update mllama to use ollama engine (#10637) 2025-05-13 17:36:02 -07:00
0014-add-ollama-vocab-for-grammar-support.patch chore: update mllama to use ollama engine (#10637) 2025-05-13 17:36:02 -07:00
0015-add-argsort-and-cuda-copy-for-i32.patch add new gemma model (#11204) 2025-06-25 21:47:09 -07:00
0016-graph-memory-reporting-on-failure.patch ggml: Report graph memory for failed allocations 2025-05-22 14:38:09 -07:00
0017-ggml-Export-GPU-UUIDs.patch ggml: Report ordinal IDs for AMD GPUs on Windows 2025-07-09 10:35:31 -07:00
0018-temporary-prevent-rocm-cuda-mixed-loading.patch Re-remove cuda v11 (#10694) 2025-06-23 14:07:00 -07:00
0019-metal-add-mean-kernel-14267.patch Increase performance for Gemma3n models on NVGPUs by enabling CUDA Graph execution (#11525) 2025-07-29 12:37:06 -07:00
0020-CUDA-add-mean-operation-14313.patch Increase performance for Gemma3n models on NVGPUs by enabling CUDA Graph execution (#11525) 2025-07-29 12:37:06 -07:00
0021-Enable-CUDA-Graphs-for-gemma3n.patch Increase performance for Gemma3n models on NVGPUs by enabling CUDA Graph execution (#11525) 2025-07-29 12:37:06 -07:00