* perf: build graph for next batch in parallel to keep GPU busy This refactors the main run loop of the ollama runner to perform the main GPU intensive tasks (Compute+Floats) in a go routine so we can prepare the next batch in parallel to reduce the amount of time the GPU stalls waiting for the next batch of work. * tests: tune integration tests for ollama engine This tunes the integration tests to focus more on models supported by the new engine. |
||
|---|---|---|
| .. | ||
| imageproc | ||
| input | ||
| models | ||
| testdata | ||
| bytepairencoding.go | ||
| bytepairencoding_test.go | ||
| model.go | ||
| model_test.go | ||
| sentencepiece.go | ||
| sentencepiece_test.go | ||
| textprocessor.go | ||
| vocabulary.go | ||
| vocabulary_test.go | ||