ollama/ml
Jesse Gross f50d691254 ggml: Fix memory leak on input tensors
For every forward pass through the model, we need to allocate input
tensors: tokens, images, positions, outputs and masks. These get
allocated in system memory.

However, when we close the context that the tensors were allocated
through, the metadata gets freed but the actual backend memory does
not. This results in a significant memory leak.

This makes it so that all the memory allocated through a context
gets freed when it is closed.

Fixes #10040
2025-04-11 11:13:22 -07:00
..
backend ggml: Fix memory leak on input tensors 2025-04-11 11:13:22 -07:00
nn attention: Remove unnecessary contiguous operations 2025-03-01 20:53:23 -08:00
backend.go ollamarunner: Preallocate worst case graph at startup 2025-04-08 10:01:28 -07:00