ollama

History

Jesse Gross 4100ed7bdd ml: Add support for quantized KV cache Similar to the llama engine, quantizing the KV cache requires flash attention to be enabled through the Ollama server.		2025-03-07 18:43:39 -08:00
..
ggml	model: load non-repeated tensors into multiple backends	2025-03-07 14:08:21 -08:00
ggml.go	ml: Add support for quantized KV cache	2025-03-07 18:43:39 -08:00