ollama/runner/ollamarunner
Parth Sareen 0682dae027
sample: improve ollama engine sampler performance (#9374)
This change bring in various interface cleanups along with greatly improving the performance of the sampler.

Tested with llama3.2 on local machine.
Improves performance from ~ 70 tokens/s -> 135 tokens/s with topK(40) enabled.
Without topK performance is ~ 110 tokens/s
2025-03-07 12:37:48 -08:00
..
cache.go ollamarunner: Improve multimodal input handling 2025-03-06 16:54:16 -08:00
cache_test.go ollamarunner: Improve multimodal input handling 2025-03-06 16:54:16 -08:00
runner.go sample: improve ollama engine sampler performance (#9374) 2025-03-07 12:37:48 -08:00