ollama

History

nicole pardal 5d347f6d6f server: Consolidate embedding truncation in runner (#12730 ) Currently, checking the length of prompts for embeddings to ensure they fit in the context window (and possible truncation) occurs in two places - the Ollama server and runner. This can lead to inconsistencies in both the checks and reported number of tokens processed. Since we have to do this processing in the runner, this consolidates all of the logic there.		2025-10-27 11:59:12 -07:00
..
cache.go	refactor: use the built-in max/min to simplify the code (#12280 )	2025-09-16 17:14:21 -07:00
cache_test.go	Runner for Ollama engine	2025-02-13 17:09:26 -08:00
image.go	Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552 )	2025-10-13 15:26:18 -07:00
image_test.go	Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552 )	2025-10-13 15:26:18 -07:00
runner.go	server: Consolidate embedding truncation in runner (#12730 )	2025-10-27 11:59:12 -07:00