ollama/runner
Jesse Gross 597f6cd3a9
ollamarunner: Fix memory leak when processing images
The context (and therefore associated input tensors) was not being
properly closed when images were being processed. We were trying to
close them but in reality we were closing over an empty list, preventing
anything from actually being freed.

Fixes #10434
2025-12-29 06:37:49 -06:00
..
common Runner for Ollama engine 2025-02-13 17:09:26 -08:00
llamarunner llm: set done reason at server level (#9830) 2025-04-03 10:19:24 -07:00
ollamarunner ollamarunner: Fix memory leak when processing images 2025-12-29 06:37:49 -06:00
README.md Runner for Ollama engine 2025-02-13 17:09:26 -08:00
runner.go Runner for Ollama engine 2025-02-13 17:09:26 -08:00

README.md

runner

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

Completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

Embeddings

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embedding