This refactors the main run loop of the ollama runner to perform the main GPU intensive tasks (Compute+Floats) in a go routine so we can prepare the next batch in parallel to reduce the amount of time the GPU stalls waiting for the next batch of work. |
||
|---|---|---|
| .. | ||
| backend | ||
| nn | ||
| backend.go | ||