Files

Jesse Gross b70fc4d51e model: Don't unconditionally add special tokens

We sometimes tokenize partial strings. For example, with
multimodal inputs, we split the input string around the images
and then tokenize each piece. In these cases, we should only add
the special tokens on the first piece.

2025-03-06 16:54:16 -08:00

common

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

llamarunner

ml/backend/ggml: consolidate system info logging

2025-03-04 15:14:31 -08:00

ollamarunner

model: Don't unconditionally add special tokens

2025-03-06 16:54:16 -08:00

README.md

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

runner.go

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

README.md

`runner`

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

Completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

Embeddings

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embedding