Right now we deserialize tool call definitions' arguments into golang
maps. These purposefully don't have a predictable iteration order,
whereas we want to maintain the order the user originally provided.
Unstable rendering of arguments means that we break the kv cache, which
this change fixes.
There's no way to build this in a fully backwards compatible way when
executing existing templates exactly as they are. We get around this by
rewriting templates dynamically just before they're rendered. This is
fragile, but perhaps the least bad option?
* tests: add single threaded history test
Also tidies up some existing tests to handle more model output variation
* test: add support for testing specific architectures
* perf: build graph for next batch in parallel to keep GPU busy
This refactors the main run loop of the ollama runner to perform the main GPU
intensive tasks (Compute+Floats) in a go routine so we can prepare the next
batch in parallel to reduce the amount of time the GPU stalls waiting for the
next batch of work.
* tests: tune integration tests for ollama engine
This tunes the integration tests to focus more on models supported
by the new engine.