ollama

Files

Jesse Gross 4100ed7bdd ml: Add support for quantized KV cache

Similar to the llama engine, quantizing the KV cache requires
flash attention to be enabled through the Ollama server.

2025-03-07 18:43:39 -08:00

2025-03-07 18:43:39 -08:00

backend.go

next ollama runner (#7913 )

2025-02-13 16:31:21 -08:00