History

Jesse Gross b2a465296d runner: Release semaphore and improve error messages on failures If we have an error after creating a new sequence but before finding a slot for it, we return without releasing the semaphore. This reduces our parallel sequences and eventually leads to deadlock. In practice this should never happen because once we have acquired the semaphore, we should always be able to find a slot. However, the code is clearly not correct.		2025-03-30 19:21:54 -07:00
..
common	Runner for Ollama engine	2025-02-13 17:09:26 -08:00
llamarunner	runner: Release semaphore and improve error messages on failures	2025-03-30 19:21:54 -07:00
ollamarunner	runner: Release semaphore and improve error messages on failures	2025-03-30 19:21:54 -07:00
README.md	Runner for Ollama engine	2025-02-13 17:09:26 -08:00
runner.go	Runner for Ollama engine	2025-02-13 17:09:26 -08:00

`runner`

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embedding