ollama/runner
Jesse Gross 38fae71425
ollamarunner: Use correct constant to remove cache entries
The correct constant to remove all entries to the end of the sequence
for the Ollama engine is math.MaxInt32. -1 is used by the old engine.

The impact of this is currently minimal because it would only occur
in situations that are not supported by the implemented models or
rarely used options.
2025-12-29 06:37:54 -06:00
..
common Runner for Ollama engine 2025-02-13 17:09:26 -08:00
llamarunner api: remove unused or unsupported api options (#10574) 2025-12-29 06:37:52 -06:00
ollamarunner ollamarunner: Use correct constant to remove cache entries 2025-12-29 06:37:54 -06:00
README.md Runner for Ollama engine 2025-02-13 17:09:26 -08:00
runner.go Runner for Ollama engine 2025-02-13 17:09:26 -08:00

README.md

runner

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

Completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

Embeddings

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embedding