ollama/ml/backend/ggml
Jesse Gross ef549d513c ggml: Increase maximum graph size
The initial implementation of qwen3-vl:235b exceeded the maximum graph
size based on the number of tensors. Although this was later fixed
through the use of the mrope operation, we are close to the limit in
some cases. This updates to track the current llama.cpp usage of GGML.
2025-11-03 16:05:37 -08:00
..
ggml ggml: Avoid cudaMemsetAsync during memory fitting 2025-10-31 15:23:28 -07:00
ggml.go ggml: Increase maximum graph size 2025-11-03 16:05:37 -08:00
ggml_test.go tests: add tests and docs for commonly used ops (#12844) 2025-10-30 10:32:45 -07:00
quantization.go chore: fix some inconsistent function name in comment 2025-08-13 09:50:27 -07:00
threads.go ollama debug tensor 2025-03-11 14:49:19 -07:00
threads_debug.go ollama debug tensor 2025-03-11 14:49:19 -07:00