History

Bruce MacDonald d5eae8248d runner: enable returning more info from runner processing Currently we return only the text predicted from the LLM. This was nice in that it was simple, but there may be other info we want to know from the processing. This change adds the ability to return more information from the runner than just the text predicted.		2025-06-13 16:26:57 -07:00
..
common	runner: enable returning more info from runner processing	2025-06-13 16:26:57 -07:00
llamarunner	runner: enable returning more info from runner processing	2025-06-13 16:26:57 -07:00
ollamarunner	runner: enable returning more info from runner processing	2025-06-13 16:26:57 -07:00
README.md	Runner for Ollama engine	2025-02-13 17:09:26 -08:00
runner.go	Runner for Ollama engine	2025-02-13 17:09:26 -08:00

`runner`

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embedding