From 528bd3077a6b66d745d6cc2d4db9edf732fddbdf Mon Sep 17 00:00:00 2001 From: Devon Rifkin Date: Tue, 29 Apr 2025 02:04:14 -0700 Subject: [PATCH] lower default num parallel to 2 this is in part to "pay" for #10452, which doubled the default context length. The combination isn't fully neutral though, because even though the old 4x2k limit and the new 2x4k limit are memory equivalent, the 1x fallback is larger with 4k --- server/sched.go | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/server/sched.go b/server/sched.go index f3978796c..883540cea 100644 --- a/server/sched.go +++ b/server/sched.go @@ -58,7 +58,7 @@ var defaultModelsPerGPU = 3 // Default automatic value for parallel setting // Model will still need to fit in VRAM. If this setting won't fit // we'll back off down to 1 to try to get it to fit -var defaultParallel = 4 +var defaultParallel = 2 var ErrMaxQueue = errors.New("server busy, please try again. maximum pending requests exceeded")