ollama

History

Jesse Gross 8253ad4d2b ggml: Prevent kv cache quanitization on gpt-oss KV cache quantization has a dependency on the flash attention kernel. We currently cannot use flash attention with gpt-oss as it requires additional operations. The model definition does not call flash attention, so it works regardless of the setting but the cache will pick up the quantization type. This updates the flash attention setting earlier in the loading flow so that all downstream settings are also set correctly. Fixes: #11671		2025-08-05 13:04:03 -07:00
..
ggml	ggml: Prevent kv cache quanitization on gpt-oss	2025-08-05 13:04:03 -07:00
gguf	Reapply "feat: incremental gguf parser (#10822 )" (#11114 ) (#11119 )	2025-06-20 11:11:40 -07:00
util/bufioutil	next ollama runner (#7913 )	2025-02-13 16:31:21 -08:00
config.go	add new gemma model (#11204 )	2025-06-25 21:47:09 -07:00