ollama/fs/ggml
Daniel Hiltgen bd6c1d6b49
flash attn: add auto mode for llama engine (#13052)
* flash attn: add auto mode for llama engine

If the user does not specify fa in the environment, use auto-mode.

* review comments

* ensure kv cache quantized types have FA explicitly enabled

additional review comments
2025-12-12 13:27:19 -08:00
..
ggml.go flash attn: add auto mode for llama engine (#13052) 2025-12-12 13:27:19 -08:00
ggml_test.go ggml: fix crash for array head counts 2025-04-27 11:38:06 -07:00
gguf.go fs/ggml: write int32 and int64 values to gguf files (#13335) 2025-12-07 21:49:14 -08:00
gguf_test.go fs/ggml: write int32 and int64 values to gguf files (#13335) 2025-12-07 21:49:14 -08:00
type.go fs/ggml: fix function name in comment (#12630) 2025-10-15 21:53:38 -07:00