History

Jesse Gross 597f6cd3a9 ollamarunner: Fix memory leak when processing images The context (and therefore associated input tensors) was not being properly closed when images were being processed. We were trying to close them but in reality we were closing over an empty list, preventing anything from actually being freed. Fixes #10434		2025-12-29 06:37:49 -06:00
..
common	Runner for Ollama engine	2025-02-13 17:09:26 -08:00
llamarunner	llm: set done reason at server level (#9830 )	2025-04-03 10:19:24 -07:00
ollamarunner	ollamarunner: Fix memory leak when processing images	2025-12-29 06:37:49 -06:00
README.md	Runner for Ollama engine	2025-02-13 17:09:26 -08:00
runner.go	Runner for Ollama engine	2025-02-13 17:09:26 -08:00

`runner`

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embedding