ollama

History

Jesse Gross e119783e66 llm: Clamp batch size to context size The context must always be able to store the current batch, so if the user requests a small context then we should also shrink the batch to match. This also fixes the TestLongInputContext test on the new engine. (The old engine already has this behavior.)		2025-09-08 20:40:11 -07:00
..
cache.go	llm: Clamp batch size to context size	2025-09-08 20:40:11 -07:00
cache_test.go	embedding gemma model (#12181 )	2025-09-04 09:09:07 -07:00
multimodal.go	ml: Panic rather than return error on tensor allocation failure	2025-05-22 14:38:09 -07:00
runner.go	runner: move harmony to runner (#12052 )	2025-09-08 15:07:59 -07:00