Files

Bruce MacDonald 95e271d98f runner: remove cache prompt flag from ollama runner (#9826 )

We do not need to bypass the prompt caching in the ollama runner yet, as
only embedding models needed to bypass the prompt caching. When embedding
models are implemented they can skip initializing this cache completely.

2025-03-17 15:11:15 -07:00

common

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

llamarunner

llm: remove internal subprocess req and resp types (#9324 )

2025-03-14 15:21:53 -07:00

ollamarunner

runner: remove cache prompt flag from ollama runner (#9826 )

2025-03-17 15:11:15 -07:00

README.md

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

runner.go

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

README.md

`runner`

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

Completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

Embeddings

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embedding