Commit Graph

82 Commits

Author SHA1 Message Date
Inforithmics a7ddd0e2ae gofumpt fix 2025-09-26 22:15:58 +02:00
Inforithmics 82f0c7e6a5 ask for supported first 2025-09-25 08:47:04 +02:00
Inforithmics 05bdfedb56 Handle GGML_VK_VISIBLE_DEVICES 2025-09-25 08:23:13 +02:00
Inforithmics a7e2d21f59 vk_check_flash_attention is not needed (coompat2 coopmapt and scalar implementation exist) 2025-09-25 06:33:15 +02:00
Inforithmics 3a45922c01 Test if Vulkan device is supported 2025-09-25 03:22:01 +02:00
Inforithmics 1cb70716bf revert debug code 2025-09-20 15:26:24 +02:00
Inforithmics d26d920fb2 Filter out already supported gpus 2025-09-20 15:18:39 +02:00
Nakasaka, Masato d0b5247084 Fixed Vulkan header
More aligned with official header definition now
2025-09-18 08:40:52 +09:00
Inforithmics 15eef5cc87 Merge remote-tracking branch 'upstream/main' into vulkanV3 2025-09-17 23:06:02 +02:00
Daniel Hiltgen 9c5bf342bc
fix: multi-cuda version skew (#12318)
Ensure that in a version skewed multi-cuda setup we use the lowest version for all GPUs
2025-09-17 13:05:09 -07:00
Nakasaka, Masato ac9d59cf69 Fixed wrong structure ID 2025-09-17 16:59:23 +09:00
Nakasaka, Masato 45430ded4b Fixed missing members in Vulkan header
also added zero clear for some structs
2025-09-17 16:04:43 +09:00
Nakasaka, Masato 6cf4e0a7c8 added missing NL 2025-09-17 15:21:24 +09:00
Nakasaka, Masato 73441c9780 Removed unneeded function call
Somehow removing this call fixed the crashing when Vulkan header was removed
2025-09-17 15:11:13 +09:00
Nakasaka, Masato 882278a258 Merge remote-tracking branch 'vk-upstream/vulkanV3' into remove-vulkan-header 2025-09-17 09:24:06 +09:00
Inforithmics 176d30744e fixing lint error 2025-09-16 22:48:24 +02:00
Inforithmics 0d4f3341c3 Merge remote-tracking branch 'upstream/main' into vulkanV3 2025-09-16 22:15:31 +02:00
Beshoy Girgis a1cff89b30
fix: fix CUDA detection for older GPUs (#12300)
Prioritize GPU compute capability over driver version to ensure
Pascal GPUs (CC 6.1) use compatible CUDA v12 libraries instead of v13.
2025-09-16 07:47:06 -07:00
Nakasaka, Masato 7a6b09ebae Removed unused code
Fix linter error in CI
2025-09-16 17:18:49 +09:00
Masato Nakasaka ede4081253 Fix compile error in Mac
Metal is preferred so we're disabling Vulkan for now
2025-09-16 17:00:17 +09:00
Nakasaka, Masato da466f4f86 Copied minimal definition from vulkan header 2025-09-16 15:05:54 +09:00
Inforithmics 69ed26c93b Merge remote-tracking branch 'upstream/main' into vulkanV3 2025-09-11 18:30:21 +02:00
Daniel Hiltgen 17a023f34b
Add v12 + v13 cuda support (#12000)
* Add support for upcoming NVIDIA Jetsons

The latest Jetsons with JetPack 7 are moving to an SBSA compatible model and
will not require building a JetPack specific variant.

* cuda: bring back dual versions

This adds back dual CUDA versions for our releases,
with v11 and v13 to cover a broad set of GPUs and
driver versions.

* win: break up native builds in build_windows.ps1

* v11 build working on windows and linux

* switch to cuda v12.8 not JIT

* Set CUDA compression to size

* enhance manual install linux docs
2025-09-10 12:05:18 -07:00
Masato Nakasaka ec7628f853 added Vulkan API to get correct Device UUID
current UUID from pipelineCacheUUID does not match CUDA
2025-09-09 17:11:50 +09:00
Inforithmics d97c2ab8b9 Merge remote-tracking branch 'upstream/main' into vulkanV3 2025-09-06 20:16:05 +02:00
Xiaodong Ye 603d3ab0ca vulkan: get GPU ID (ollama v0.11.5)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-09-06 20:11:06 +02:00
Inforithmics 8300a55e1d Fix Unit Test (Add Vulkan Library) 2025-08-30 20:26:53 +02:00
Daniel Hiltgen ead4a9a1d0
Always filter devices (#12108)
* Always filter devices

Avoid crashing on unsupported AMD iGPUs

* Remove cuda device filtering

This interferes with mixed setups
2025-08-29 12:17:31 -07:00
Masato Nakasaka af5f5bdf60 Removed libcap related code
libcap is not directly related to Vulkan and should be added by its own PR. It adds additional library dependencies for building and also requires users to run setcap or run ollama as root, which is not ideal for easy use
2025-08-27 11:51:53 +09:00
Jesse Gross d5a0d8d904 llm: New memory management
This changes the memory allocation strategy from upfront estimation to
tracking actual allocations done by the engine and reacting to that. The
goal is avoid issues caused by both under-estimation (crashing) and
over-estimation (low performance due to under-utilized GPUs).

It is currently opt-in and can be enabled for models running on the
Ollama engine by setting OLLAMA_NEW_ESTIMATES=1. Behavior in other
cases is unchanged and will continue to use the existing estimates.
2025-08-14 15:24:01 -07:00
Inforithmics 56050ad8ea Fix logging 2025-08-14 22:42:30 +02:00
Inforithmics d71c83f2ba Merge remote-tracking branch 'upstream/main' into vulkanV3 2025-08-14 22:11:08 +02:00
Daniel Hiltgen 837379a94c
discovery: fix cudart driver version (#11614)
We prefer the nvcuda library, which reports driver versions. When we
dropped cuda v11, we added a safety check for too-old drivers.  What
we missed was the cudart fallback discovery logic didn't have driver
version wired up.  This fixes cudart discovery to expose the driver
version as well so we no longer reject all GPUs if nvcuda didn't work.
2025-08-13 15:43:33 -07:00
Inforithmics 49c4d154ae Enable Vulkan Flash attention in FlashAttentionSupported 2025-08-12 21:55:19 +02:00
Inforithmics e6da524ab7 Merge remote-tracking branch 'upstream/main' into vulkanV3 2025-08-12 21:51:39 +02:00
Jesse Gross 8f4ec9ab28 discover: CPU supports flash attention
We already run flash attention on CPUs in cases where we have
partial offloading but were disabling it if running on pure CPU,
 which is unnecessary.
2025-08-11 15:00:34 -07:00
Inforithmics 0c27f472e7 Remove commented out code 2025-08-11 18:52:43 +02:00
Inforithmics d1f74e17d4 Update gpu.go 2025-08-10 21:28:59 +02:00
Inforithmics f6dd7070de vk_check_flash_attention 0 means supported 2025-08-10 21:22:26 +02:00
Inforithmics ee24b967f1 fixed flash attention logic enabling 2025-08-10 19:57:14 +02:00
Inforithmics a1393414ce revert remove parenthesis 2025-08-10 17:54:13 +02:00
Inforithmics 5270c4c5f7 enable falsh attention on vulkan 2025-08-10 16:53:13 +02:00
Thomas Stocker 29b1ed0077
Revert whitespace changes in gpu.go 2025-08-09 22:30:13 +02:00
Thomas Stocker 57270767ac
Remove flashattention setting gpu.go 2025-08-09 22:26:54 +02:00
Thomas Stocker 42463fbb7f
Revert changes in amd_linux.go 2025-08-09 22:24:33 +02:00
Thomas Stocker 89ac91099d
Revert changes in amd_linux.go 2025-08-09 22:23:00 +02:00
Inforithmics f8ed1541ed Merge remote-tracking branch 'upstream/main' into vulkanV3 2025-08-09 21:59:30 +02:00
Sajal Kulshreshtha ff89ba90bc
fixing broken AMD driver link (#11579) 2025-07-30 12:02:54 -07:00
Daniel Hiltgen 1c6669e64c
Re-remove cuda v11 (#10694)
* Re-remove cuda v11

Revert the revert - drop v11 support requiring drivers newer than Feb 23

This reverts commit c6bcdc4223.

* Simplify layout

With only one version of the GPU libraries, we can simplify things down somewhat.  (Jetsons still require special handling)

* distinct sbsa variant for linux arm64

This avoids accidentally trying to load the sbsa cuda libraries on
a jetson system which results in crashes.

* temporary prevent rocm+cuda mixed loading
2025-06-23 14:07:00 -07:00
Daniel Hiltgen c6bcdc4223
Revert "remove cuda v11 (#10569)" (#10692)
Bring back v11 until we can better warn users that their driver
is too old.

This reverts commit fa393554b9.
2025-05-13 13:12:54 -07:00