ollama/runner/ollamarunner
Jesse Gross 4100ed7bdd ml: Add support for quantized KV cache
Similar to the llama engine, quantizing the KV cache requires
flash attention to be enabled through the Ollama server.
2025-03-07 18:43:39 -08:00
..
cache.go ml: Add support for quantized KV cache 2025-03-07 18:43:39 -08:00
cache_test.go ollamarunner: Improve multimodal input handling 2025-03-06 16:54:16 -08:00
runner.go ollamarunner: Quiet debug logging and panic on unimplemented features 2025-03-07 18:38:02 -08:00