ollama

Commit Graph

Author	SHA1	Message	Date
Inforithmics	d97c2ab8b9	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-09-06 20:16:05 +02:00
Xiaodong Ye	603d3ab0ca	vulkan: get GPU ID (ollama v0.11.5) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-09-06 20:11:06 +02:00
Inforithmics	8300a55e1d	Fix Unit Test (Add Vulkan Library)	2025-08-30 20:26:53 +02:00
Daniel Hiltgen	ead4a9a1d0	Always filter devices (#12108 ) * Always filter devices Avoid crashing on unsupported AMD iGPUs * Remove cuda device filtering This interferes with mixed setups	2025-08-29 12:17:31 -07:00
Masato Nakasaka	af5f5bdf60	Removed libcap related code libcap is not directly related to Vulkan and should be added by its own PR. It adds additional library dependencies for building and also requires users to run setcap or run ollama as root, which is not ideal for easy use	2025-08-27 11:51:53 +09:00
Jesse Gross	d5a0d8d904	llm: New memory management This changes the memory allocation strategy from upfront estimation to tracking actual allocations done by the engine and reacting to that. The goal is avoid issues caused by both under-estimation (crashing) and over-estimation (low performance due to under-utilized GPUs). It is currently opt-in and can be enabled for models running on the Ollama engine by setting OLLAMA_NEW_ESTIMATES=1. Behavior in other cases is unchanged and will continue to use the existing estimates.	2025-08-14 15:24:01 -07:00
Inforithmics	56050ad8ea	Fix logging	2025-08-14 22:42:30 +02:00
Inforithmics	d71c83f2ba	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-08-14 22:11:08 +02:00
Daniel Hiltgen	837379a94c	discovery: fix cudart driver version (#11614 ) We prefer the nvcuda library, which reports driver versions. When we dropped cuda v11, we added a safety check for too-old drivers. What we missed was the cudart fallback discovery logic didn't have driver version wired up. This fixes cudart discovery to expose the driver version as well so we no longer reject all GPUs if nvcuda didn't work.	2025-08-13 15:43:33 -07:00
Inforithmics	49c4d154ae	Enable Vulkan Flash attention in FlashAttentionSupported	2025-08-12 21:55:19 +02:00
Inforithmics	e6da524ab7	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-08-12 21:51:39 +02:00
Jesse Gross	8f4ec9ab28	discover: CPU supports flash attention We already run flash attention on CPUs in cases where we have partial offloading but were disabling it if running on pure CPU, which is unnecessary.	2025-08-11 15:00:34 -07:00
Inforithmics	0c27f472e7	Remove commented out code	2025-08-11 18:52:43 +02:00
Inforithmics	d1f74e17d4	Update gpu.go	2025-08-10 21:28:59 +02:00
Inforithmics	f6dd7070de	vk_check_flash_attention 0 means supported	2025-08-10 21:22:26 +02:00
Inforithmics	ee24b967f1	fixed flash attention logic enabling	2025-08-10 19:57:14 +02:00
Inforithmics	a1393414ce	revert remove parenthesis	2025-08-10 17:54:13 +02:00
Inforithmics	5270c4c5f7	enable falsh attention on vulkan	2025-08-10 16:53:13 +02:00
Thomas Stocker	29b1ed0077	Revert whitespace changes in gpu.go	2025-08-09 22:30:13 +02:00
Thomas Stocker	57270767ac	Remove flashattention setting gpu.go	2025-08-09 22:26:54 +02:00
Thomas Stocker	42463fbb7f	Revert changes in amd_linux.go	2025-08-09 22:24:33 +02:00
Thomas Stocker	89ac91099d	Revert changes in amd_linux.go	2025-08-09 22:23:00 +02:00
Inforithmics	f8ed1541ed	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-08-09 21:59:30 +02:00
Sajal Kulshreshtha	ff89ba90bc	fixing broken AMD driver link (#11579 )	2025-07-30 12:02:54 -07:00
Daniel Hiltgen	1c6669e64c	Re-remove cuda v11 (#10694 ) * Re-remove cuda v11 Revert the revert - drop v11 support requiring drivers newer than Feb 23 This reverts commit `c6bcdc4223`. * Simplify layout With only one version of the GPU libraries, we can simplify things down somewhat. (Jetsons still require special handling) * distinct sbsa variant for linux arm64 This avoids accidentally trying to load the sbsa cuda libraries on a jetson system which results in crashes. * temporary prevent rocm+cuda mixed loading	2025-06-23 14:07:00 -07:00
Daniel Hiltgen	c6bcdc4223	Revert "remove cuda v11 (#10569 )" (#10692 ) Bring back v11 until we can better warn users that their driver is too old. This reverts commit `fa393554b9`.	2025-05-13 13:12:54 -07:00
Michael Yang	f95a1f2bef	feat: add trace log level (#10650 ) reduce prompt log to trace level	2025-05-12 11:43:00 -07:00
Daniel Hiltgen	fa393554b9	remove cuda v11 (#10569 ) This reduces the size of our Windows installer payloads by ~256M by dropping support for nvidia drivers older than Feb 2023. Hardware support is unchanged. Linux default bundle sizes are reduced by ~600M to 1G.	2025-05-06 17:33:19 -07:00
Michael Yang	95e744beeb	discover: fix compiler warnings (#10572 )	2025-05-06 10:49:22 -07:00
Jeffrey Morgan	913905028b	all: fix cgo compiler warnings on windows (#10563 )	2025-05-05 08:02:39 -07:00
Bruce MacDonald	9876c9faa4	chore(all): replace instances of interface with any (#10067 ) Both interface{} and any (which is just an alias for interface{} introduced in Go 1.18) represent the empty interface that all types satisfy.	2025-04-02 09:44:27 -07:00
湛露先生	4059a297a6	discover: /proc/cpuinfo file open and close. (#9950 ) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>	2025-03-31 17:07:42 -07:00
Vadim Grinco	45dbd14645	Merged latest ollama 0.6.2 and nasrally's Flash Attention patches (#5 ) * readme: add Ellama to list of community integrations (#9800) * readme: add screenpipe to community integrations (#9786) * Add support for ROCm gfx1151 (#9773) * conditionally enable parallel pipelines * sample: make mutations in transforms explicit (#9743) * updated minP to use early exit making use of sorted tokens * ml/backend/ggml: allocate memory with malloc when loading model (#9822) * runner: remove cache prompt flag from ollama runner (#9826) We do not need to bypass the prompt caching in the ollama runner yet, as only embedding models needed to bypass the prompt caching. When embedding models are implemented they can skip initializing this cache completely. * ollamarunner: Check for minBatch of context space when shifting Models can specify that a group of inputs need to be handled a single batch. However, context shifting didn't respect this and could trigger a break anyways. In this case, we should instead trigger a context shift earlier so that it occurs before the grouped batch. Note that there still some corner cases: - A long prompt that exceeds the context window can get truncated in the middle of an image. With the current models, this will result in the model not recognizing the image at all, which is pretty much the expected result with truncation. - The context window is set less than the minimum batch size. The only solution to this is to refuse to load the model with these settings. However, this can never occur with current models and default settings. Since users are unlikely to run into these scenarios, fixing them is left as a follow up. * Applied latest patches from McBane87 See this for details: https://github.com/whyvl/ollama-vulkan/issues/7#issuecomment-2708820861 Signed-off-by: Vadim Grinco <vadim@grinco.eu> * Add ability to enable flash attention on vulkan (#4) * discover: add flash attention handling for vulkan * envconfig: fix typo in config.go As part of the process some code was refactored and I added a new field FlashAttention to GpuInfo since the previous solution didn't allow for a granular check via vulkan extensions. As a side effect, this now allows for granular per-device FA support checking in other places --------- Signed-off-by: Vadim Grinco <vadim@grinco.eu> Co-authored-by: zeo <108888572+zeozeozeo@users.noreply.github.com> Co-authored-by: Louis Beaumont <louis.beaumont@gmail.com> Co-authored-by: Daniel Hiltgen <dhiltgen@users.noreply.github.com> Co-authored-by: Michael Yang <mxyng@pm.me> Co-authored-by: Parth Sareen <parth.sareen@ollama.com> Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com> Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com> Co-authored-by: Jesse Gross <jesse@ollama.com> Co-authored-by: Nikita <50599445+nasrally@users.noreply.github.com>	2025-03-23 12:27:37 +01:00
Vadim Grinco	98f699773a	Applied 00-fix-vulkan-building.patch Work done by McBane87 here: https://github.com/whyvl/ollama-vulkan/issues/7#issuecomment-2660836871 Signed-off-by: Vadim Grinco <vadim@grinco.eu>	2025-03-10 12:34:37 +01:00
Vadim Grinco	e648126fe9	Merge branch 'ollama_vanilla_stable' into ollama_vulkan_stable	2025-03-10 12:29:52 +01:00
Pavol Rusnak	a499390648	build: support Compute Capability 5.0, 5.2 and 5.3 for CUDA 12.x (#8567 ) CUDA 12.x still supports Compute Capability 5.0, 5.2 and 5.3, so let's build for these architectures as well	2025-02-25 09:54:19 -08:00
Jeffrey Morgan	5296f487a8	llm: attempt to evaluate symlinks, but do not fail (#9089 ) provides a better approach to #9088 that will attempt to evaluate symlinks (important for macOS where 'ollama' is often a symlink), but use the result of os.Executable() as a fallback in scenarios where filepath.EvalSymlinks fails due to permission erorrs or other issues	2025-02-13 22:37:59 -08:00
Jeffrey Morgan	f05774b04c	llm: do not evaluate symlink for exe path lookup (#9088 ) In some cases, the directories in the executable path read by filepath.EvalSymlinks are not accessible, resulting in permission errors which results in an error when running models. It also doesn't work well on long paths on windows, also resulting in errors. This change removes filepath.EvalSymlinks when accessing os.Executable() altogether	2025-02-13 22:13:00 -08:00
pufferffish	582d41e002	Merge github.com:ollama/ollama into vulkan	2025-02-03 14:44:30 +00:00
Michael Yang	548a9f56a6	Revert "cgo: use O3" This reverts commit `bea1f1fac6`.	2025-01-31 10:25:39 -08:00
Michael Yang	bea1f1fac6	cgo: use O3	2025-01-30 12:21:50 -08:00
Jeffrey Morgan	5d75d837ef	discover: fix default LibOllamaPath value (#8702 )	2025-01-30 12:21:38 -08:00
Michael Yang	dcfb7a105c	next build (#8539 ) * add build to .dockerignore * test: only build one arch * add build to .gitignore * fix ccache path * filter amdgpu targets * only filter if autodetecting * Don't clobber gpu list for default runner This ensures the GPU specific environment variables are set properly * explicitly set CXX compiler for HIP * Update build_windows.ps1 This isn't complete, but is close. Dependencies are missing, and it only builds the "default" preset. * build: add ollama subdir * add .git to .dockerignore * docs: update development.md * update build_darwin.sh * remove unused scripts * llm: add cwd and build/lib/ollama to library paths * default DYLD_LIBRARY_PATH to LD_LIBRARY_PATH in runner on macOS * add additional cmake output vars for msvc * interim edits to make server detection logic work with dll directories like lib/ollama/cuda_v12 * remove unncessary filepath.Dir, cleanup * add hardware-specific directory to path * use absolute server path * build: linux arm * cmake install targets * remove unused files * ml: visit each library path once * build: skip cpu variants on arm * build: install cpu targets * build: fix workflow * shorter names * fix rocblas install * docs: clean up development.md * consistent build dir removal in development.md * silence -Wimplicit-function-declaration build warnings in ggml-cpu * update readme * update development readme * llm: update library lookup logic now that there is one runner (#8587) * tweak development.md * update docs * add windows cuda/rocm tests --------- Co-authored-by: jmorganca <jmorganca@gmail.com> Co-authored-by: Daniel Hiltgen <daniel@ollama.com>	2025-01-29 15:03:38 -08:00
tomaThomas	0d277d32db	Fix variable name	2025-01-25 11:23:25 +01:00
yeongbba	2bf59a512b	add aarch64 lines in vulkanGlobs and capLinuxGlobs	2025-01-19 12:51:10 +09:00
yeongbba	9ac01e88dd	Merge remote-tracking branch 'upstream/vulkan' into vulkan	2025-01-19 12:49:38 +09:00
pufferffish	9ad63a747b	fix conflict	2025-01-12 01:00:41 +00:00
Bruce MacDonald	2d33c4e97d	discover: remove leading new-line for linter	2025-01-03 12:03:58 -08:00
湛露先生	46f74e0cb5	Return err when NewHipLib() detect error. (#8012 ) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>	2024-12-10 16:32:29 -08:00
Stefan Weil	abfdc4710f	all: fix typos in documentation, code, and comments (#7021 )	2024-12-10 12:58:06 -08:00

1 2

58 Commits