Files

Bruce MacDonald d5eae8248d runner: enable returning more info from runner processing

Currently we return only the text predicted from the LLM. This was nice in
that it was simple, but there may be other info we want to know from the
processing. This change adds the ability to return more information from the
runner than just the text predicted.

2025-06-13 16:26:57 -07:00

common

runner: enable returning more info from runner processing

2025-06-13 16:26:57 -07:00

llamarunner

runner: enable returning more info from runner processing

2025-06-13 16:26:57 -07:00

ollamarunner

runner: enable returning more info from runner processing

2025-06-13 16:26:57 -07:00

README.md

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

runner.go

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

README.md

`runner`

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

Completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

Embeddings

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embedding