If we create a memory layout that should fit based on report free VRAM but allocation still fails, we start applying a backoff. This reduces free VRAM by an exponential percentage (1%, 2%, 4%...). However, the points chosen tend to be too dense at the beginning and too sparse at the end. Therefore, this switches to an incremental backoff (10%, 20%, 30%...). |
||
|---|---|---|
| .. | ||
| llm_darwin.go | ||
| llm_linux.go | ||
| llm_windows.go | ||
| memory.go | ||
| memory_test.go | ||
| server.go | ||
| server_test.go | ||
| status.go | ||