ollama

History

Parth Sareen 0682dae027 sample: improve ollama engine sampler performance (#9374 ) This change bring in various interface cleanups along with greatly improving the performance of the sampler. Tested with llama3.2 on local machine. Improves performance from ~ 70 tokens/s -> 135 tokens/s with topK(40) enabled. Without topK performance is ~ 110 tokens/s		2025-03-07 12:37:48 -08:00
..
cache.go	ollamarunner: Improve multimodal input handling	2025-03-06 16:54:16 -08:00
cache_test.go	ollamarunner: Improve multimodal input handling	2025-03-06 16:54:16 -08:00
runner.go	sample: improve ollama engine sampler performance (#9374 )	2025-03-07 12:37:48 -08:00