ollama

Commit Graph

Author	SHA1	Message	Date
Inforithmics	a7ddd0e2ae	gofumpt fix	2025-09-26 22:15:58 +02:00
Inforithmics	82f0c7e6a5	ask for supported first	2025-09-25 08:47:04 +02:00
Inforithmics	05bdfedb56	Handle GGML_VK_VISIBLE_DEVICES	2025-09-25 08:23:13 +02:00
Inforithmics	a7e2d21f59	vk_check_flash_attention is not needed (coompat2 coopmapt and scalar implementation exist)	2025-09-25 06:33:15 +02:00
Inforithmics	3a45922c01	Test if Vulkan device is supported	2025-09-25 03:22:01 +02:00
Inforithmics	1cb70716bf	revert debug code	2025-09-20 15:26:24 +02:00
Inforithmics	d26d920fb2	Filter out already supported gpus	2025-09-20 15:18:39 +02:00
Nakasaka, Masato	d0b5247084	Fixed Vulkan header More aligned with official header definition now	2025-09-18 08:40:52 +09:00
Inforithmics	15eef5cc87	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-09-17 23:06:02 +02:00
Daniel Hiltgen	9c5bf342bc	fix: multi-cuda version skew (#12318 ) Ensure that in a version skewed multi-cuda setup we use the lowest version for all GPUs	2025-09-17 13:05:09 -07:00
Nakasaka, Masato	ac9d59cf69	Fixed wrong structure ID	2025-09-17 16:59:23 +09:00
Nakasaka, Masato	45430ded4b	Fixed missing members in Vulkan header also added zero clear for some structs	2025-09-17 16:04:43 +09:00
Nakasaka, Masato	6cf4e0a7c8	added missing NL	2025-09-17 15:21:24 +09:00
Nakasaka, Masato	73441c9780	Removed unneeded function call Somehow removing this call fixed the crashing when Vulkan header was removed	2025-09-17 15:11:13 +09:00
Nakasaka, Masato	882278a258	Merge remote-tracking branch 'vk-upstream/vulkanV3' into remove-vulkan-header	2025-09-17 09:24:06 +09:00
Inforithmics	176d30744e	fixing lint error	2025-09-16 22:48:24 +02:00
Inforithmics	0d4f3341c3	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-09-16 22:15:31 +02:00
Beshoy Girgis	a1cff89b30	fix: fix CUDA detection for older GPUs (#12300 ) Prioritize GPU compute capability over driver version to ensure Pascal GPUs (CC 6.1) use compatible CUDA v12 libraries instead of v13.	2025-09-16 07:47:06 -07:00
Nakasaka, Masato	7a6b09ebae	Removed unused code Fix linter error in CI	2025-09-16 17:18:49 +09:00
Masato Nakasaka	ede4081253	Fix compile error in Mac Metal is preferred so we're disabling Vulkan for now	2025-09-16 17:00:17 +09:00
Nakasaka, Masato	da466f4f86	Copied minimal definition from vulkan header	2025-09-16 15:05:54 +09:00
Inforithmics	69ed26c93b	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-09-11 18:30:21 +02:00
Daniel Hiltgen	17a023f34b	Add v12 + v13 cuda support (#12000 ) * Add support for upcoming NVIDIA Jetsons The latest Jetsons with JetPack 7 are moving to an SBSA compatible model and will not require building a JetPack specific variant. * cuda: bring back dual versions This adds back dual CUDA versions for our releases, with v11 and v13 to cover a broad set of GPUs and driver versions. * win: break up native builds in build_windows.ps1 * v11 build working on windows and linux * switch to cuda v12.8 not JIT * Set CUDA compression to size * enhance manual install linux docs	2025-09-10 12:05:18 -07:00
Masato Nakasaka	ec7628f853	added Vulkan API to get correct Device UUID current UUID from pipelineCacheUUID does not match CUDA	2025-09-09 17:11:50 +09:00
Inforithmics	d97c2ab8b9	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-09-06 20:16:05 +02:00
Xiaodong Ye	603d3ab0ca	vulkan: get GPU ID (ollama v0.11.5) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-09-06 20:11:06 +02:00
Inforithmics	8300a55e1d	Fix Unit Test (Add Vulkan Library)	2025-08-30 20:26:53 +02:00
Daniel Hiltgen	ead4a9a1d0	Always filter devices (#12108 ) * Always filter devices Avoid crashing on unsupported AMD iGPUs * Remove cuda device filtering This interferes with mixed setups	2025-08-29 12:17:31 -07:00
Masato Nakasaka	af5f5bdf60	Removed libcap related code libcap is not directly related to Vulkan and should be added by its own PR. It adds additional library dependencies for building and also requires users to run setcap or run ollama as root, which is not ideal for easy use	2025-08-27 11:51:53 +09:00
Jesse Gross	d5a0d8d904	llm: New memory management This changes the memory allocation strategy from upfront estimation to tracking actual allocations done by the engine and reacting to that. The goal is avoid issues caused by both under-estimation (crashing) and over-estimation (low performance due to under-utilized GPUs). It is currently opt-in and can be enabled for models running on the Ollama engine by setting OLLAMA_NEW_ESTIMATES=1. Behavior in other cases is unchanged and will continue to use the existing estimates.	2025-08-14 15:24:01 -07:00
Inforithmics	56050ad8ea	Fix logging	2025-08-14 22:42:30 +02:00
Inforithmics	d71c83f2ba	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-08-14 22:11:08 +02:00
Daniel Hiltgen	837379a94c	discovery: fix cudart driver version (#11614 ) We prefer the nvcuda library, which reports driver versions. When we dropped cuda v11, we added a safety check for too-old drivers. What we missed was the cudart fallback discovery logic didn't have driver version wired up. This fixes cudart discovery to expose the driver version as well so we no longer reject all GPUs if nvcuda didn't work.	2025-08-13 15:43:33 -07:00
Inforithmics	49c4d154ae	Enable Vulkan Flash attention in FlashAttentionSupported	2025-08-12 21:55:19 +02:00
Inforithmics	e6da524ab7	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-08-12 21:51:39 +02:00
Jesse Gross	8f4ec9ab28	discover: CPU supports flash attention We already run flash attention on CPUs in cases where we have partial offloading but were disabling it if running on pure CPU, which is unnecessary.	2025-08-11 15:00:34 -07:00
Inforithmics	0c27f472e7	Remove commented out code	2025-08-11 18:52:43 +02:00
Inforithmics	d1f74e17d4	Update gpu.go	2025-08-10 21:28:59 +02:00
Inforithmics	f6dd7070de	vk_check_flash_attention 0 means supported	2025-08-10 21:22:26 +02:00
Inforithmics	ee24b967f1	fixed flash attention logic enabling	2025-08-10 19:57:14 +02:00
Inforithmics	a1393414ce	revert remove parenthesis	2025-08-10 17:54:13 +02:00
Inforithmics	5270c4c5f7	enable falsh attention on vulkan	2025-08-10 16:53:13 +02:00
Thomas Stocker	29b1ed0077	Revert whitespace changes in gpu.go	2025-08-09 22:30:13 +02:00
Thomas Stocker	57270767ac	Remove flashattention setting gpu.go	2025-08-09 22:26:54 +02:00
Thomas Stocker	42463fbb7f	Revert changes in amd_linux.go	2025-08-09 22:24:33 +02:00
Thomas Stocker	89ac91099d	Revert changes in amd_linux.go	2025-08-09 22:23:00 +02:00
Inforithmics	f8ed1541ed	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-08-09 21:59:30 +02:00
Sajal Kulshreshtha	ff89ba90bc	fixing broken AMD driver link (#11579 )	2025-07-30 12:02:54 -07:00
Daniel Hiltgen	1c6669e64c	Re-remove cuda v11 (#10694 ) * Re-remove cuda v11 Revert the revert - drop v11 support requiring drivers newer than Feb 23 This reverts commit `c6bcdc4223`. * Simplify layout With only one version of the GPU libraries, we can simplify things down somewhat. (Jetsons still require special handling) * distinct sbsa variant for linux arm64 This avoids accidentally trying to load the sbsa cuda libraries on a jetson system which results in crashes. * temporary prevent rocm+cuda mixed loading	2025-06-23 14:07:00 -07:00
Daniel Hiltgen	c6bcdc4223	Revert "remove cuda v11 (#10569 )" (#10692 ) Bring back v11 until we can better warn users that their driver is too old. This reverts commit `fa393554b9`.	2025-05-13 13:12:54 -07:00

1 2

82 Commits