ollama

History

Jesse Gross ef549d513c ggml: Increase maximum graph size The initial implementation of qwen3-vl:235b exceeded the maximum graph size based on the number of tensors. Although this was later fixed through the use of the mrope operation, we are close to the limit in some cases. This updates to track the current llama.cpp usage of GGML.		2025-11-03 16:05:37 -08:00
..
ggml	ggml: Avoid cudaMemsetAsync during memory fitting	2025-10-31 15:23:28 -07:00
ggml.go	ggml: Increase maximum graph size	2025-11-03 16:05:37 -08:00
ggml_test.go	tests: add tests and docs for commonly used ops (#12844 )	2025-10-30 10:32:45 -07:00
quantization.go	chore: fix some inconsistent function name in comment	2025-08-13 09:50:27 -07:00
threads.go	ollama debug tensor	2025-03-11 14:49:19 -07:00
threads_debug.go	ollama debug tensor	2025-03-11 14:49:19 -07:00