ollama

History

Jesse Gross 7837a5bc7e ggml: Always set cache padding to 256 We currently use cache padding of 32 when not using flash attention and 256 with flash attention, which is based on the historic alignment requirements of these kernels. The restrictions have since been loosened but there are still performance benefits, such as better CUDA graph reuse. Since the requirement is no longer kernel-specific, set the padding uniformly to 256, as llama.cpp has.		2025-12-04 15:19:06 -08:00
..
backend	ggml: Always set cache padding to 256	2025-12-04 15:19:06 -08:00
nn	Add deepseek v3.1 (#13063 )	2025-11-17 18:03:21 -08:00
backend.go	kvcache: Use SetRows to store cache data	2025-11-18 20:42:28 -08:00
device.go	CUDA: filter devices on secondary discovery (#13317 )	2025-12-03 12:58:16 -08:00
path.go	cpu: always ensure LibOllamaPath included (#12890 )	2025-10-31 14:37:29 -07:00