ollama/ml
Michael Yang bab6f34dc0 ml/backend/ggml: update model loading for hybrid/multi backends
use a similar strategy as llama.cpp for deciding where tensors should be
allocated. this will be improved later to be aware of usable memory
before assigning the tensor
2025-03-07 14:08:21 -08:00
..
backend ml/backend/ggml: update model loading for hybrid/multi backends 2025-03-07 14:08:21 -08:00
nn attention: Remove unnecessary contiguous operations 2025-03-01 20:53:23 -08:00
backend.go ml/backend/ggml: consolidate system info logging 2025-03-04 15:14:31 -08:00