ollama

History

Jesse Gross f66216e399 ggml: Support heterogeneous KV cache layer sizes in memory estimation Gemma3 uses sliding windows for its context on 5/6 layers, significantly reducing memory usage but leading to uneven usage across layers, which makes allocation to the correct GPU difficult. We currently estimate very conservatively by assuming all layers are consistent at the max size. Llama3.2-vision is also inconsistent between self attention and cross attention layers - at moment, we calculate the correct total size and then average this across layers. In some cases, this may lead to crashes if a large layer is placed on a GPU sized by the average. This allows memory estimation to calculate per-layer KV cache size and take this account when placing layers onto GPUs. We already do this for weights that vary per-tensor, so this is a logical extension. Fixes #9730 Fixes #9890		2025-03-26 13:16:03 -07:00
..
internal	server/internal/client/ollama: fix file descriptor management in Pull (#9931 )	2025-03-21 16:16:38 -07:00
testdata/tools	all: fix typos in documentation, code, and comments (#7021 )	2024-12-10 12:58:06 -08:00
auth.go	fix nil deref in auth.go	2024-07-26 14:14:48 -07:00
create.go	server: validate local path on safetensor create (#9379 )	2025-02-28 16:10:43 -08:00
create_test.go	server: validate local path on safetensor create (#9379 )	2025-02-28 16:10:43 -08:00
download.go	server: increase timeout in stall detection from 5s to 30s (#8831 )	2025-02-05 10:00:26 -08:00
fixblobs.go	server: replace blob prefix separator from ':' to '-' (#3146 )	2024-03-14 20:18:06 -07:00
fixblobs_test.go	server: replace blob prefix separator from ':' to '-' (#3146 )	2024-03-14 20:18:06 -07:00
images.go	next ollama runner (#7913 )	2025-02-13 16:31:21 -08:00
layer.go	One corrupt manifest should not wedge model operations (#7515 )	2024-11-05 14:21:45 -08:00
manifest.go	One corrupt manifest should not wedge model operations (#7515 )	2024-11-05 14:21:45 -08:00
manifest_test.go	One corrupt manifest should not wedge model operations (#7515 )	2024-11-05 14:21:45 -08:00
model.go	templates: add autotemplate for gemma3 (#9880 )	2025-03-20 00:15:30 -07:00
model_test.go	Update the /api/create endpoint to use JSON (#7935 )	2024-12-31 18:02:30 -08:00
modelpath.go	server: more support for mixed-case model names (#8017 )	2024-12-11 15:29:59 -08:00
modelpath_test.go	server: more support for mixed-case model names (#8017 )	2024-12-11 15:29:59 -08:00
prompt.go	gemma3: Allow multiple image in a single input	2025-03-14 15:38:54 -07:00
prompt_test.go	prompt: Don't trim whitespace from prompts	2024-12-09 11:02:55 -08:00
routes.go	add verbose mode to the show command (#9640 )	2025-03-13 14:24:27 -07:00
routes_create_test.go	next ollama runner (#7913 )	2025-02-13 16:31:21 -08:00
routes_delete_test.go	Update the /api/create endpoint to use JSON (#7935 )	2024-12-31 18:02:30 -08:00
routes_generate_test.go	next ollama runner (#7913 )	2025-02-13 16:31:21 -08:00
routes_list_test.go	Update the /api/create endpoint to use JSON (#7935 )	2024-12-31 18:02:30 -08:00
routes_test.go	server/internal/client/ollama: hold DiskCache on Registry (#9463 )	2025-03-02 20:55:44 -08:00
sched.go	ggml: Support heterogeneous KV cache layer sizes in memory estimation	2025-03-26 13:16:03 -07:00
sched_test.go	next ollama runner (#7913 )	2025-02-13 16:31:21 -08:00
sparse_common.go	Don't hard fail on sparse setup error	2024-08-09 12:16:19 -07:00
sparse_windows.go	Don't hard fail on sparse setup error	2024-08-09 12:16:19 -07:00
upload.go	server: always print upload/download part info (#8832 )	2025-02-04 19:30:49 -08:00