ollama/llama/patches
Jesse Gross 6db8a3771c ggml: Report graph memory for failed allocations
GGML has a function to report the allocated size of a backend buffer.
However, this returns 0 if we tried to allocate a buffer and it failed.
For memory management purposes, it's important to know how much we were
trying to allocate. This extends the API to report attempted sizes for
all buffers and whether it succeeeded.
2025-05-22 14:38:09 -07:00
..
0001-ggml-backend-malloc-and-free-using-the-same-compiler.patch llama: update to commit de4c07f93 (#10655) 2025-05-12 12:17:26 -07:00
0002-pretokenizer.patch llama: update to commit de4c07f93 (#10655) 2025-05-12 12:17:26 -07:00
0003-embeddings.patch llama: update to commit de4c07f93 (#10655) 2025-05-12 12:17:26 -07:00
0004-clip-unicode.patch llama: update to commit de4c07f93 (#10655) 2025-05-12 12:17:26 -07:00
0005-solar-pro.patch chore: update mllama to use ollama engine (#10637) 2025-05-13 17:36:02 -07:00
0006-fix-deepseek-deseret-regex.patch chore: update mllama to use ollama engine (#10637) 2025-05-13 17:36:02 -07:00
0007-maintain-ordering-for-rules-for-grammar.patch chore: update mllama to use ollama engine (#10637) 2025-05-13 17:36:02 -07:00
0008-ensure-KV-cache-is-fully-defragmented.patch chore: update mllama to use ollama engine (#10637) 2025-05-13 17:36:02 -07:00
0009-sort-devices-by-score.patch chore: update mllama to use ollama engine (#10637) 2025-05-13 17:36:02 -07:00
0010-add-phony-target-ggml-cpu-for-all-cpu-variants.patch chore: update mllama to use ollama engine (#10637) 2025-05-13 17:36:02 -07:00
0011-remove-amx.patch chore: update mllama to use ollama engine (#10637) 2025-05-13 17:36:02 -07:00
0012-fix-string-arr-kv-loading.patch chore: update mllama to use ollama engine (#10637) 2025-05-13 17:36:02 -07:00
0013-ollama-debug-tensor.patch chore: update mllama to use ollama engine (#10637) 2025-05-13 17:36:02 -07:00
0014-add-ollama-vocab-for-grammar-support.patch chore: update mllama to use ollama engine (#10637) 2025-05-13 17:36:02 -07:00
0015-add-argsort-and-cuda-copy-for-i32.patch model: add Qwen2.5-VL support (#10385) 2025-05-13 20:58:02 -07:00
0016-graph-memory-reporting-on-failure.patch ggml: Report graph memory for failed allocations 2025-05-22 14:38:09 -07:00