ollama

Commit Graph

Author	SHA1	Message	Date
Inforithmics	8300a55e1d	Fix Unit Test (Add Vulkan Library)	2025-08-30 20:26:53 +02:00
Thomas Stocker	879041d937	Merge pull request #1 from rillomas/removeLibcap Removed libcap related code	2025-08-30 20:06:15 +02:00
Masato Nakasaka	af5f5bdf60	Removed libcap related code libcap is not directly related to Vulkan and should be added by its own PR. It adds additional library dependencies for building and also requires users to run setcap or run ollama as root, which is not ideal for easy use	2025-08-27 11:51:53 +09:00
Inforithmics	834a66689e	Update Vulkan backend to e54d41befcc1575f4c898c5ff4ef43970cead75f	2025-08-15 00:18:18 +02:00
Inforithmics	199458944f	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-08-15 00:06:53 +02:00
Michael Yang	1a19df1f3a	update vendored llama.cpp and ggml (#11823 ) * TEMPORARY: Update the llama.cpp upstream to my fork's Granite Four branch This will be redone once my branch is merged upstream in llama.cpp * feat: Update all patches There are a number that are no longer needed at all: - 0003-embeddings: Embeddings entirely overhauled on master - 0008-ensure-KV-cache-is-fully-defragmented: KV caching entirely overhauled on master - 0019-metal-add-mean-kernel-14267: Merged upstream - 0020-CUDA-add-mean-operation-14313: Merged upstream * feat: Sync llama.cpp and ggml * fix: Update rsync-filter for all moved/new/removed files * fix: Add files missing from sync * fix: Update ggml rsync-filter for new ggml-cpu/arch subdirs * fix: Add ggml files missing from sync * fix: Narrow llama.cpp rsync-filter to not include mtmd main tool cpp files * fix: Remove mtmd main cpp files * fix: Add missing include in sampling_ext.cpp * fix: Update llama.go to use mtmd instead of clip/llava * fix: Add patch for mtmd_input_text * chore: Ignore .patched in the patch directory fix: Fix support for arch-specific ggml-cpu source files with new arrangement In https://github.com/ggml-org/llama.cpp/pull/13892, all arch-specific implementations were split out into a nested tree structure under ggml-cpu/arch. This conflicts with standard CGO layout where all arch-specific source files are expected to live in the same directory as the parent go module and use suffixes based on GOOS and GOARCH. As such, there were really two options for getting this to work: 1. Add a patch on top of the GGML sync to rearrange the files to match the GO layout convention 2. Use CGO directives to conditionally include the nested source files in the compilation units This commit does (2) in order to minimize the set of changes needed on top of the upstream file layout. To get this to work, there are two key things needed: 1. In cpu.go, #cgo directives are added to explicitly set __${GOARCH}__ in the preprocessor directives 2. In arch-impls.c\|cpp, use an #ifdef \| #elif defined \| #endif chain to explicitly include the .c\|.cpp files for the given architecture from the nested directory * fix: Use mtmd_helper to correctly load the bitmap for the image * fix: Apply patch for mtmd_text_input * fix: Add missing stb to llama.cpp rsync-filter * fix: Add sync'ed stb vendored header * fix: Use c++17 and include vendor for go wrapper modules * fix: Update patch 0015 for upstream implementation of uuid * feat: Bump to the latest tip of the branch * fix: Update patches for bump * feat: Bump back to the cenral repo and point at the latest master This includes granite 4 and a number of other model architectures! * fix: Revert changes to ggml export GPU UUID patch * fix: Add patch for GGML_VERSION and GGML_COMMIT constants * feat: Sync all patched code * build: Include cmake/common.cmake in ggml sync * build: Add top-level include for GNUINstallDirs in CMakeLists.txt This is used to populate CMAKE_INSTALL_BINDIR * fix: Add a patch to avoid power throttling API on non-msvc windows builds * fix: Sync patch changes for ggml-cpu.c * feat: Bump llama.cpp to 4a4f42 This picks up support for Kimi K2 and PLaMO-2 * feat: Sync llama.cpp * fix: Handle multi-chunk image encodings from mtmd * fix: Re-number patches after merge with `main` * feat: Bump to 41e78c in the makefile * fix: Fix Solar and argsort/copy patches after bump * fix: Remove Gemma3n CUDA Graphs patch It was implemented upstream: https://github.com/ggml-org/llama.cpp/pull/14741 * feat: Sync llama.cpp / ggml after latest bump * build: Remove unnecessary CFLAGS definitions in cpu.go * fix: Remove unnecessary additions in the rsync-filter * fix: Remove unused vendored code for chat template parsing * Revert "fix: Remove Gemma3n CUDA Graphs patch" This reverts commit `d724caced3`. * fix: Update 0020 CUDA Graphs for gemma3n to keep both llama.cpp and ollama fixes https://github.com/ollama/ollama/pull/11195#issuecomment-3137312394 * fix: Sync ggml-cuda.cu after keeping both style cuda graph fixes for gemma3n * unwind mxfp4 patch Prepare to bump ggml with their impl for mxfp4 * bump * fix windows build error * Convert tensors at load time Repack the mxfp4 tensors as ggmls kernels expect them to be. * convert mlp bf16 to f32 * buffer the conversion better * reshape earlier * openai swiglu * add ids * split qkv, gate_up * fix nested alt tags * fast attention * remove debug messages * fix lint * remove redundant test * remap values only if source/target are different * add back i32->i32 copy * refactor cpu quants * clean up vendor * update patch instructions * clean up patches * remove webgpu * update mem * also handle gpt-oss * revert convert changes --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Daniel Hiltgen <daniel@ollama.com>	2025-08-14 14:42:58 -07:00
Inforithmics	56050ad8ea	Fix logging	2025-08-14 22:42:30 +02:00
Inforithmics	d71c83f2ba	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-08-14 22:11:08 +02:00
Daniel Hiltgen	7ccfd97a93	doc: clarify both rocm and main bundle necessary (#11900 ) Some users expect the rocm bundles to be self-sufficient, but are designed to be additive.	2025-08-14 12:54:55 -07:00
Daniel Hiltgen	c385ca8672	test: add valid responses (#11902 ) some of the new models need a few more valid responses to pass	2025-08-14 11:07:13 -07:00
Daniel Hiltgen	837379a94c	discovery: fix cudart driver version (#11614 ) We prefer the nvcuda library, which reports driver versions. When we dropped cuda v11, we added a safety check for too-old drivers. What we missed was the cudart fallback discovery logic didn't have driver version wired up. This fixes cudart discovery to expose the driver version as well so we no longer reject all GPUs if nvcuda didn't work.	2025-08-13 15:43:33 -07:00
Daniel Hiltgen	a24f90604f	int: adjust a few models for integration tests (#11872 )	2025-08-13 15:42:36 -07:00
Daniel Hiltgen	dc5a645434	cuda: leverage JIT for smaller footprint (#11635 ) Prior to this change our official binaries contained both JIT PTX code and the cubin binary code for our chosen compute capabilities. This change switches to only compile the PTX code and rely on JIT at runtime for generating the cubin specific to the users GPU. The cubins are cached on the users system, so they should only see a small lag on the very first model load for a given Ollama release. This also adds the first generation of Blackwell GPUs so they aren't reliant on the Hopper PTX. This change reduces the ggml-cuda.dll from 1.2G to 460M	2025-08-13 15:42:16 -07:00
Inforithmics	6543213e6f	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-08-13 23:50:00 +02:00
youzichuan	bb71654ebe	chore: fix some inconsistent function name in comment Signed-off-by: youzichuan <youzichuan6@outlook.com>	2025-08-13 09:50:27 -07:00
Inforithmics	eaf42a646c	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-08-13 08:27:22 +02:00
Jesse Gross	a343ae53a4	ggml: Use ordinal IDs for AMD GPUs on Linux when UUID is unavailable Some AMD GPUs do not provide UUIDs and report only "XX". In these cases, we should use the ordinal ID as an alternate identifier. This is the same as we always need to do on Windows for AMD. In addition, this prints out the ID for each GPU when enumerating them for easier debugging in the future.	2025-08-12 16:56:14 -07:00
Inforithmics	49c4d154ae	Enable Vulkan Flash attention in FlashAttentionSupported	2025-08-12 21:55:19 +02:00
Inforithmics	e6da524ab7	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-08-12 21:51:39 +02:00
Inforithmics	2244f304d7	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-08-12 21:43:10 +02:00
Michael Yang	d0cf6c8281	fix(openai): handle reasoning_effort (#11868 )	2025-08-12 11:02:01 -07:00
Jesse Gross	8f4ec9ab28	discover: CPU supports flash attention We already run flash attention on CPUs in cases where we have partial offloading but were disabling it if running on pure CPU, which is unnecessary.	2025-08-11 15:00:34 -07:00
Devon Rifkin	dbfd7bd027	Merge pull request #11861 from ollama/drifkin/fix-parsing-error server: fix error when parsing bad harmony tool calls	2025-08-11 14:59:57 -07:00
Devon Rifkin	ee04dbba51	server: fix error when parsing bad harmony tool calls Thanks @moll for reporting! Fixes: #11781	2025-08-11 14:09:13 -07:00
Daniel Andersen	ea7657b54a	sched: Add support for grouping GPUs (#10678 ) This patch modifies Ollama to allow grouping GPUs to memory-fit to the requested model, instead of the former algorithm of using one GPU distributing over all available GPUs. Benefits: - Lower amount of (PCIe-)bus communication between GPUs - especially when they are not very high speed - Allowing unallocated GPUs to get into power-saving mode. - Significantly reduce VRAM allocation when using more than 2 GPUs in a system - Due to the reduced memory allocation, you can run more models simultaneously.	2025-08-11 13:59:38 -07:00
Inforithmics	0c27f472e7	Remove commented out code	2025-08-11 18:52:43 +02:00
Inforithmics	e3627b2832	Add vulkan to Windows Build script	2025-08-11 18:39:10 +02:00
Inforithmics	d1f74e17d4	Update gpu.go	2025-08-10 21:28:59 +02:00
Inforithmics	f6dd7070de	vk_check_flash_attention 0 means supported	2025-08-10 21:22:26 +02:00
Inforithmics	ee24b967f1	fixed flash attention logic enabling	2025-08-10 19:57:14 +02:00
Inforithmics	a1393414ce	revert remove parenthesis	2025-08-10 17:54:13 +02:00
Inforithmics	5270c4c5f7	enable falsh attention on vulkan	2025-08-10 16:53:13 +02:00
Inforithmics	60a015e8c3	Revert chnages in ggml.go	2025-08-10 16:09:44 +02:00
Inforithmics	1edbfd0559	Revert changes in ggml.go	2025-08-10 16:07:24 +02:00
Inforithmics	fd4480a848	Fixed duplicate sync in ggml.go	2025-08-10 16:05:09 +02:00
Inforithmics	2e7452be71	Update Vulkan Code to de4c07f93783a1a96456a44dc16b9db538ee1618	2025-08-10 16:01:07 +02:00
Michael Vorburger	2c776f0780	CONTRIBUTING: Explicitly note docs:... as a good example (#11755 )	2025-08-09 18:12:30 -07:00
Thomas Stocker	bc5c3fb213	Revert vulkan copy changes in Dockerfile	2025-08-09 22:45:52 +02:00
Thomas Stocker	fa13b8de45	Revert some unintented changes in Dockerfile	2025-08-09 22:43:12 +02:00
Thomas Stocker	d03fc13d36	Revert changes in Makefile.sync	2025-08-09 22:38:37 +02:00
Thomas Stocker	a6d0d6c6ff	Revert changes in runner.go	2025-08-09 22:35:20 +02:00
Thomas Stocker	0ddb64db1f	Revert changes in transforms_test.go	2025-08-09 22:33:42 +02:00
Thomas Stocker	29b1ed0077	Revert whitespace changes in gpu.go	2025-08-09 22:30:13 +02:00
Thomas Stocker	57270767ac	Remove flashattention setting gpu.go	2025-08-09 22:26:54 +02:00
Thomas Stocker	42463fbb7f	Revert changes in amd_linux.go	2025-08-09 22:24:33 +02:00
Thomas Stocker	89ac91099d	Revert changes in amd_linux.go	2025-08-09 22:23:00 +02:00
Thomas Stocker	47bff3e532	Revert	2025-08-09 22:15:54 +02:00
Thomas Stocker	643b1c505e	Revert Readme changes	2025-08-09 22:14:54 +02:00
Inforithmics	f8ed1541ed	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-08-09 21:59:30 +02:00
Jesse Gross	79f6376f5b	ggml: No-alloc mode Callers can set a backend buffer type to be no-alloc, meaning that it does not allocate memory for tensors or operations. This can be used for calculating memory requirements. Tensors and graphs must be recreated with no-alloc set to false before loading data. Defaults to false for newly created backend buffer types.	2025-08-08 14:57:13 -07:00

1 2 3 4 5 ...

4550 Commits All Branches Search

4550 Commits

All Branches