ollama

History

Jesse Gross 71cb86af3e llm: Remove unneeded warning with flash attention enabled If flash attention is enabled without KV cache quanitization, we will currently always get this warning: level=WARN source=server.go:226 msg="kv cache type not supported by model" type=""		2025-09-10 16:40:45 -07:00
..
ggml.go	llm: Remove unneeded warning with flash attention enabled	2025-09-10 16:40:45 -07:00
ggml_test.go	ggml: fix crash for array head counts	2025-04-27 11:38:06 -07:00
gguf.go	convert: fix tensor sorting (#12015 )	2025-08-26 13:57:46 -07:00
gguf_test.go	convert: fix tensor sorting (#12015 )	2025-08-26 13:57:46 -07:00
type.go	convert(gptoss): mxfp4 to ggml layout to avoid jit conversion (#12018 )	2025-08-26 16:41:02 -07:00