ollama/runner/ollamarunner
Jesse Gross e119783e66 llm: Clamp batch size to context size
The context must always be able to store the current batch, so
if the user requests a small context then we should also shrink
the batch to match. This also fixes the TestLongInputContext
test on the new engine. (The old engine already has this behavior.)
2025-09-08 20:40:11 -07:00
..
cache.go llm: Clamp batch size to context size 2025-09-08 20:40:11 -07:00
cache_test.go embedding gemma model (#12181) 2025-09-04 09:09:07 -07:00
multimodal.go ml: Panic rather than return error on tensor allocation failure 2025-05-22 14:38:09 -07:00
runner.go runner: move harmony to runner (#12052) 2025-09-08 15:07:59 -07:00