ollama

Commit Graph

Author	SHA1	Message	Date
Nakasaka, Masato	d0b5247084	Fixed Vulkan header More aligned with official header definition now	2025-09-18 08:40:52 +09:00
Inforithmics	15eef5cc87	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-09-17 23:06:02 +02:00
Daniel Hiltgen	9c5bf342bc	fix: multi-cuda version skew (#12318 ) Ensure that in a version skewed multi-cuda setup we use the lowest version for all GPUs	2025-09-17 13:05:09 -07:00
Nakasaka, Masato	ac9d59cf69	Fixed wrong structure ID	2025-09-17 16:59:23 +09:00
Nakasaka, Masato	45430ded4b	Fixed missing members in Vulkan header also added zero clear for some structs	2025-09-17 16:04:43 +09:00
Nakasaka, Masato	6cf4e0a7c8	added missing NL	2025-09-17 15:21:24 +09:00
Nakasaka, Masato	73441c9780	Removed unneeded function call Somehow removing this call fixed the crashing when Vulkan header was removed	2025-09-17 15:11:13 +09:00
Nakasaka, Masato	882278a258	Merge remote-tracking branch 'vk-upstream/vulkanV3' into remove-vulkan-header	2025-09-17 09:24:06 +09:00
Inforithmics	176d30744e	fixing lint error	2025-09-16 22:48:24 +02:00
Inforithmics	0d4f3341c3	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-09-16 22:15:31 +02:00
Beshoy Girgis	a1cff89b30	fix: fix CUDA detection for older GPUs (#12300 ) Prioritize GPU compute capability over driver version to ensure Pascal GPUs (CC 6.1) use compatible CUDA v12 libraries instead of v13.	2025-09-16 07:47:06 -07:00
Nakasaka, Masato	7a6b09ebae	Removed unused code Fix linter error in CI	2025-09-16 17:18:49 +09:00
Masato Nakasaka	ede4081253	Fix compile error in Mac Metal is preferred so we're disabling Vulkan for now	2025-09-16 17:00:17 +09:00
Nakasaka, Masato	da466f4f86	Copied minimal definition from vulkan header	2025-09-16 15:05:54 +09:00
Inforithmics	69ed26c93b	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-09-11 18:30:21 +02:00
Daniel Hiltgen	17a023f34b	Add v12 + v13 cuda support (#12000 ) * Add support for upcoming NVIDIA Jetsons The latest Jetsons with JetPack 7 are moving to an SBSA compatible model and will not require building a JetPack specific variant. * cuda: bring back dual versions This adds back dual CUDA versions for our releases, with v11 and v13 to cover a broad set of GPUs and driver versions. * win: break up native builds in build_windows.ps1 * v11 build working on windows and linux * switch to cuda v12.8 not JIT * Set CUDA compression to size * enhance manual install linux docs	2025-09-10 12:05:18 -07:00
Masato Nakasaka	ec7628f853	added Vulkan API to get correct Device UUID current UUID from pipelineCacheUUID does not match CUDA	2025-09-09 17:11:50 +09:00
Inforithmics	d97c2ab8b9	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-09-06 20:16:05 +02:00
Xiaodong Ye	603d3ab0ca	vulkan: get GPU ID (ollama v0.11.5) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-09-06 20:11:06 +02:00
Inforithmics	8300a55e1d	Fix Unit Test (Add Vulkan Library)	2025-08-30 20:26:53 +02:00
Daniel Hiltgen	ead4a9a1d0	Always filter devices (#12108 ) * Always filter devices Avoid crashing on unsupported AMD iGPUs * Remove cuda device filtering This interferes with mixed setups	2025-08-29 12:17:31 -07:00
Masato Nakasaka	af5f5bdf60	Removed libcap related code libcap is not directly related to Vulkan and should be added by its own PR. It adds additional library dependencies for building and also requires users to run setcap or run ollama as root, which is not ideal for easy use	2025-08-27 11:51:53 +09:00
Jesse Gross	d5a0d8d904	llm: New memory management This changes the memory allocation strategy from upfront estimation to tracking actual allocations done by the engine and reacting to that. The goal is avoid issues caused by both under-estimation (crashing) and over-estimation (low performance due to under-utilized GPUs). It is currently opt-in and can be enabled for models running on the Ollama engine by setting OLLAMA_NEW_ESTIMATES=1. Behavior in other cases is unchanged and will continue to use the existing estimates.	2025-08-14 15:24:01 -07:00
Inforithmics	56050ad8ea	Fix logging	2025-08-14 22:42:30 +02:00
Inforithmics	d71c83f2ba	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-08-14 22:11:08 +02:00
Daniel Hiltgen	837379a94c	discovery: fix cudart driver version (#11614 ) We prefer the nvcuda library, which reports driver versions. When we dropped cuda v11, we added a safety check for too-old drivers. What we missed was the cudart fallback discovery logic didn't have driver version wired up. This fixes cudart discovery to expose the driver version as well so we no longer reject all GPUs if nvcuda didn't work.	2025-08-13 15:43:33 -07:00
Inforithmics	49c4d154ae	Enable Vulkan Flash attention in FlashAttentionSupported	2025-08-12 21:55:19 +02:00
Inforithmics	e6da524ab7	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-08-12 21:51:39 +02:00
Jesse Gross	8f4ec9ab28	discover: CPU supports flash attention We already run flash attention on CPUs in cases where we have partial offloading but were disabling it if running on pure CPU, which is unnecessary.	2025-08-11 15:00:34 -07:00
Inforithmics	0c27f472e7	Remove commented out code	2025-08-11 18:52:43 +02:00
Inforithmics	d1f74e17d4	Update gpu.go	2025-08-10 21:28:59 +02:00
Inforithmics	f6dd7070de	vk_check_flash_attention 0 means supported	2025-08-10 21:22:26 +02:00
Inforithmics	ee24b967f1	fixed flash attention logic enabling	2025-08-10 19:57:14 +02:00
Inforithmics	a1393414ce	revert remove parenthesis	2025-08-10 17:54:13 +02:00
Inforithmics	5270c4c5f7	enable falsh attention on vulkan	2025-08-10 16:53:13 +02:00
Thomas Stocker	29b1ed0077	Revert whitespace changes in gpu.go	2025-08-09 22:30:13 +02:00
Thomas Stocker	57270767ac	Remove flashattention setting gpu.go	2025-08-09 22:26:54 +02:00
Thomas Stocker	42463fbb7f	Revert changes in amd_linux.go	2025-08-09 22:24:33 +02:00
Thomas Stocker	89ac91099d	Revert changes in amd_linux.go	2025-08-09 22:23:00 +02:00
Inforithmics	f8ed1541ed	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-08-09 21:59:30 +02:00
Sajal Kulshreshtha	ff89ba90bc	fixing broken AMD driver link (#11579 )	2025-07-30 12:02:54 -07:00
Daniel Hiltgen	1c6669e64c	Re-remove cuda v11 (#10694 ) * Re-remove cuda v11 Revert the revert - drop v11 support requiring drivers newer than Feb 23 This reverts commit `c6bcdc4223`. * Simplify layout With only one version of the GPU libraries, we can simplify things down somewhat. (Jetsons still require special handling) * distinct sbsa variant for linux arm64 This avoids accidentally trying to load the sbsa cuda libraries on a jetson system which results in crashes. * temporary prevent rocm+cuda mixed loading	2025-06-23 14:07:00 -07:00
Daniel Hiltgen	c6bcdc4223	Revert "remove cuda v11 (#10569 )" (#10692 ) Bring back v11 until we can better warn users that their driver is too old. This reverts commit `fa393554b9`.	2025-05-13 13:12:54 -07:00
Michael Yang	f95a1f2bef	feat: add trace log level (#10650 ) reduce prompt log to trace level	2025-05-12 11:43:00 -07:00
Daniel Hiltgen	fa393554b9	remove cuda v11 (#10569 ) This reduces the size of our Windows installer payloads by ~256M by dropping support for nvidia drivers older than Feb 2023. Hardware support is unchanged. Linux default bundle sizes are reduced by ~600M to 1G.	2025-05-06 17:33:19 -07:00
Michael Yang	95e744beeb	discover: fix compiler warnings (#10572 )	2025-05-06 10:49:22 -07:00
Jeffrey Morgan	913905028b	all: fix cgo compiler warnings on windows (#10563 )	2025-05-05 08:02:39 -07:00
Bruce MacDonald	9876c9faa4	chore(all): replace instances of interface with any (#10067 ) Both interface{} and any (which is just an alias for interface{} introduced in Go 1.18) represent the empty interface that all types satisfy.	2025-04-02 09:44:27 -07:00
湛露先生	4059a297a6	discover: /proc/cpuinfo file open and close. (#9950 ) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>	2025-03-31 17:07:42 -07:00
Vadim Grinco	45dbd14645	Merged latest ollama 0.6.2 and nasrally's Flash Attention patches (#5 ) * readme: add Ellama to list of community integrations (#9800) * readme: add screenpipe to community integrations (#9786) * Add support for ROCm gfx1151 (#9773) * conditionally enable parallel pipelines * sample: make mutations in transforms explicit (#9743) * updated minP to use early exit making use of sorted tokens * ml/backend/ggml: allocate memory with malloc when loading model (#9822) * runner: remove cache prompt flag from ollama runner (#9826) We do not need to bypass the prompt caching in the ollama runner yet, as only embedding models needed to bypass the prompt caching. When embedding models are implemented they can skip initializing this cache completely. * ollamarunner: Check for minBatch of context space when shifting Models can specify that a group of inputs need to be handled a single batch. However, context shifting didn't respect this and could trigger a break anyways. In this case, we should instead trigger a context shift earlier so that it occurs before the grouped batch. Note that there still some corner cases: - A long prompt that exceeds the context window can get truncated in the middle of an image. With the current models, this will result in the model not recognizing the image at all, which is pretty much the expected result with truncation. - The context window is set less than the minimum batch size. The only solution to this is to refuse to load the model with these settings. However, this can never occur with current models and default settings. Since users are unlikely to run into these scenarios, fixing them is left as a follow up. * Applied latest patches from McBane87 See this for details: https://github.com/whyvl/ollama-vulkan/issues/7#issuecomment-2708820861 Signed-off-by: Vadim Grinco <vadim@grinco.eu> * Add ability to enable flash attention on vulkan (#4) * discover: add flash attention handling for vulkan * envconfig: fix typo in config.go As part of the process some code was refactored and I added a new field FlashAttention to GpuInfo since the previous solution didn't allow for a granular check via vulkan extensions. As a side effect, this now allows for granular per-device FA support checking in other places --------- Signed-off-by: Vadim Grinco <vadim@grinco.eu> Co-authored-by: zeo <108888572+zeozeozeo@users.noreply.github.com> Co-authored-by: Louis Beaumont <louis.beaumont@gmail.com> Co-authored-by: Daniel Hiltgen <dhiltgen@users.noreply.github.com> Co-authored-by: Michael Yang <mxyng@pm.me> Co-authored-by: Parth Sareen <parth.sareen@ollama.com> Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com> Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com> Co-authored-by: Jesse Gross <jesse@ollama.com> Co-authored-by: Nikita <50599445+nasrally@users.noreply.github.com>	2025-03-23 12:27:37 +01:00

1 2

75 Commits