Commit Graph

84 Commits

Author SHA1 Message Date
Daniel Hiltgen
936c6d6be1 win: fix CPU query buffer handling
Try in a short loop until we get the size right.
2025-09-25 10:50:00 -07:00
Daniel Hiltgen
5f9f312bdb fix - give bootstrapping more time on slow systems 2025-09-24 16:25:56 -07:00
Daniel Hiltgen
2689357890 fix index bug 2025-09-24 12:22:46 -07:00
Daniel Hiltgen
c86af47ac0 WIP - wire up Vulkan with the new engine based discovery
Not a complete implementation - free VRAM is better, but not accurate on
windows
2025-09-24 10:49:39 -07:00
Daniel Hiltgen
3a8ee62bd5 Merge remote-tracking branch 'inforithmics/vulkanV3' into engine_based_discovery_with_vulkan 2025-09-21 14:04:22 -07:00
Daniel Hiltgen
3566fe0e7b timing info for runner 2025-09-21 13:53:24 -07:00
Daniel Hiltgen
f761292516 Use runners for GPU discovery
This revamps how we discover GPUs in the system by leveraging the Ollama
runner.  This should eliminate inconsistency between our GPU discovery and the
runners capabilities at runtime, particularly for cases where we try to filter
out unsupported GPUs.  Now the runner does that implicitly based on the actual
device list.  In some cases free VRAM reporting can be unreliable which can
leaad to scheduling mistakes, so this also includes a patch to leverage more
reliable VRAM reporting libraries if available.

Automatic workarounds have been removed as only one GPU leveraged this, which
is now documented. This GPU will soon fall off the support matrix with the next
ROCm bump.

Additional cleanup of the scheduler and discovery packages can be done in the
future once we have switched on the new memory management code, and removed
support for the llama runner.
2025-09-21 13:53:24 -07:00
Inforithmics
1cb70716bf revert debug code 2025-09-20 15:26:24 +02:00
Inforithmics
d26d920fb2 Filter out already supported gpus 2025-09-20 15:18:39 +02:00
Nakasaka, Masato
d0b5247084 Fixed Vulkan header
More aligned with official header definition now
2025-09-18 08:40:52 +09:00
Inforithmics
15eef5cc87 Merge remote-tracking branch 'upstream/main' into vulkanV3 2025-09-17 23:06:02 +02:00
Daniel Hiltgen
9c5bf342bc fix: multi-cuda version skew (#12318)
Ensure that in a version skewed multi-cuda setup we use the lowest version for all GPUs
2025-09-17 13:05:09 -07:00
Nakasaka, Masato
ac9d59cf69 Fixed wrong structure ID 2025-09-17 16:59:23 +09:00
Nakasaka, Masato
45430ded4b Fixed missing members in Vulkan header
also added zero clear for some structs
2025-09-17 16:04:43 +09:00
Nakasaka, Masato
6cf4e0a7c8 added missing NL 2025-09-17 15:21:24 +09:00
Nakasaka, Masato
73441c9780 Removed unneeded function call
Somehow removing this call fixed the crashing when Vulkan header was removed
2025-09-17 15:11:13 +09:00
Nakasaka, Masato
882278a258 Merge remote-tracking branch 'vk-upstream/vulkanV3' into remove-vulkan-header 2025-09-17 09:24:06 +09:00
Inforithmics
176d30744e fixing lint error 2025-09-16 22:48:24 +02:00
Inforithmics
0d4f3341c3 Merge remote-tracking branch 'upstream/main' into vulkanV3 2025-09-16 22:15:31 +02:00
Beshoy Girgis
a1cff89b30 fix: fix CUDA detection for older GPUs (#12300)
Prioritize GPU compute capability over driver version to ensure
Pascal GPUs (CC 6.1) use compatible CUDA v12 libraries instead of v13.
2025-09-16 07:47:06 -07:00
Nakasaka, Masato
7a6b09ebae Removed unused code
Fix linter error in CI
2025-09-16 17:18:49 +09:00
Masato Nakasaka
ede4081253 Fix compile error in Mac
Metal is preferred so we're disabling Vulkan for now
2025-09-16 17:00:17 +09:00
Nakasaka, Masato
da466f4f86 Copied minimal definition from vulkan header 2025-09-16 15:05:54 +09:00
Inforithmics
69ed26c93b Merge remote-tracking branch 'upstream/main' into vulkanV3 2025-09-11 18:30:21 +02:00
Daniel Hiltgen
17a023f34b Add v12 + v13 cuda support (#12000)
* Add support for upcoming NVIDIA Jetsons

The latest Jetsons with JetPack 7 are moving to an SBSA compatible model and
will not require building a JetPack specific variant.

* cuda: bring back dual versions

This adds back dual CUDA versions for our releases,
with v11 and v13 to cover a broad set of GPUs and
driver versions.

* win: break up native builds in build_windows.ps1

* v11 build working on windows and linux

* switch to cuda v12.8 not JIT

* Set CUDA compression to size

* enhance manual install linux docs
2025-09-10 12:05:18 -07:00
Masato Nakasaka
ec7628f853 added Vulkan API to get correct Device UUID
current UUID from pipelineCacheUUID does not match CUDA
2025-09-09 17:11:50 +09:00
Inforithmics
d97c2ab8b9 Merge remote-tracking branch 'upstream/main' into vulkanV3 2025-09-06 20:16:05 +02:00
Xiaodong Ye
603d3ab0ca vulkan: get GPU ID (ollama v0.11.5)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-09-06 20:11:06 +02:00
Inforithmics
8300a55e1d Fix Unit Test (Add Vulkan Library) 2025-08-30 20:26:53 +02:00
Daniel Hiltgen
ead4a9a1d0 Always filter devices (#12108)
* Always filter devices

Avoid crashing on unsupported AMD iGPUs

* Remove cuda device filtering

This interferes with mixed setups
2025-08-29 12:17:31 -07:00
Masato Nakasaka
af5f5bdf60 Removed libcap related code
libcap is not directly related to Vulkan and should be added by its own PR. It adds additional library dependencies for building and also requires users to run setcap or run ollama as root, which is not ideal for easy use
2025-08-27 11:51:53 +09:00
Jesse Gross
d5a0d8d904 llm: New memory management
This changes the memory allocation strategy from upfront estimation to
tracking actual allocations done by the engine and reacting to that. The
goal is avoid issues caused by both under-estimation (crashing) and
over-estimation (low performance due to under-utilized GPUs).

It is currently opt-in and can be enabled for models running on the
Ollama engine by setting OLLAMA_NEW_ESTIMATES=1. Behavior in other
cases is unchanged and will continue to use the existing estimates.
2025-08-14 15:24:01 -07:00
Inforithmics
56050ad8ea Fix logging 2025-08-14 22:42:30 +02:00
Inforithmics
d71c83f2ba Merge remote-tracking branch 'upstream/main' into vulkanV3 2025-08-14 22:11:08 +02:00
Daniel Hiltgen
837379a94c discovery: fix cudart driver version (#11614)
We prefer the nvcuda library, which reports driver versions. When we
dropped cuda v11, we added a safety check for too-old drivers.  What
we missed was the cudart fallback discovery logic didn't have driver
version wired up.  This fixes cudart discovery to expose the driver
version as well so we no longer reject all GPUs if nvcuda didn't work.
2025-08-13 15:43:33 -07:00
Inforithmics
49c4d154ae Enable Vulkan Flash attention in FlashAttentionSupported 2025-08-12 21:55:19 +02:00
Inforithmics
e6da524ab7 Merge remote-tracking branch 'upstream/main' into vulkanV3 2025-08-12 21:51:39 +02:00
Jesse Gross
8f4ec9ab28 discover: CPU supports flash attention
We already run flash attention on CPUs in cases where we have
partial offloading but were disabling it if running on pure CPU,
 which is unnecessary.
2025-08-11 15:00:34 -07:00
Inforithmics
0c27f472e7 Remove commented out code 2025-08-11 18:52:43 +02:00
Inforithmics
d1f74e17d4 Update gpu.go 2025-08-10 21:28:59 +02:00
Inforithmics
f6dd7070de vk_check_flash_attention 0 means supported 2025-08-10 21:22:26 +02:00
Inforithmics
ee24b967f1 fixed flash attention logic enabling 2025-08-10 19:57:14 +02:00
Inforithmics
a1393414ce revert remove parenthesis 2025-08-10 17:54:13 +02:00
Inforithmics
5270c4c5f7 enable falsh attention on vulkan 2025-08-10 16:53:13 +02:00
Thomas Stocker
29b1ed0077 Revert whitespace changes in gpu.go 2025-08-09 22:30:13 +02:00
Thomas Stocker
57270767ac Remove flashattention setting gpu.go 2025-08-09 22:26:54 +02:00
Thomas Stocker
42463fbb7f Revert changes in amd_linux.go 2025-08-09 22:24:33 +02:00
Thomas Stocker
89ac91099d Revert changes in amd_linux.go 2025-08-09 22:23:00 +02:00
Inforithmics
f8ed1541ed Merge remote-tracking branch 'upstream/main' into vulkanV3 2025-08-09 21:59:30 +02:00
Sajal Kulshreshtha
ff89ba90bc fixing broken AMD driver link (#11579) 2025-07-30 12:02:54 -07:00