ollama

History

Jesse Gross f50d691254 ggml: Fix memory leak on input tensors For every forward pass through the model, we need to allocate input tensors: tokens, images, positions, outputs and masks. These get allocated in system memory. However, when we close the context that the tensors were allocated through, the metadata gets freed but the actual backend memory does not. This results in a significant memory leak. This makes it so that all the memory allocated through a context gets freed when it is closed. Fixes #10040		2025-04-11 11:13:22 -07:00
..
backend	ggml: Fix memory leak on input tensors	2025-04-11 11:13:22 -07:00
nn	attention: Remove unnecessary contiguous operations	2025-03-01 20:53:23 -08:00
backend.go	ollamarunner: Preallocate worst case graph at startup	2025-04-08 10:01:28 -07:00