ollama/fs/ggml
Jesse Gross 29ddfc2cab ggml: Disable flash attention for gemma2
Our new engine implementation of gemma2 doesn't support flash
attention, which means that it also doesn't support KV cache
quantization. Currently, it is possible to turn these two on,
which will result in a crash.
2025-09-10 16:40:45 -07:00
..
ggml.go ggml: Disable flash attention for gemma2 2025-09-10 16:40:45 -07:00
ggml_test.go ggml: fix crash for array head counts 2025-04-27 11:38:06 -07:00
gguf.go convert: fix tensor sorting (#12015) 2025-08-26 13:57:46 -07:00
gguf_test.go convert: fix tensor sorting (#12015) 2025-08-26 13:57:46 -07:00
type.go convert(gptoss): mxfp4 to ggml layout to avoid jit conversion (#12018) 2025-08-26 16:41:02 -07:00