ollama

Files

Jesse Gross ef549d513c ggml: Increase maximum graph size

The initial implementation of qwen3-vl:235b exceeded the maximum graph
size based on the number of tensors. Although this was later fixed
through the use of the mrope operation, we are close to the limit in
some cases. This updates to track the current llama.cpp usage of GGML.

2025-11-03 16:05:37 -08:00

backend

ggml: Increase maximum graph size

2025-11-03 16:05:37 -08:00

interleaved mrope (#12807 )

2025-10-30 11:29:00 -07:00

backend.go

ggml: Enable op_offload to improve partial offload performance

2025-10-30 13:53:10 -07:00

device.go

cpu: always ensure LibOllamaPath included (#12890 )

2025-10-31 14:37:29 -07:00

path.go

cpu: always ensure LibOllamaPath included (#12890 )

2025-10-31 14:37:29 -07:00