ollama

Commit Graph

Author	SHA1	Message	Date
Inforithmics	f2842defcb	Merge remote-tracking branch 'upstream/main' into VulkanV3Update	2025-10-06 11:39:24 +02:00
Inforithmics	2acedf1756	update patch	2025-10-06 10:01:28 +02:00
Inforithmics	e9828e6b11	Return pci Properties	2025-10-06 09:53:51 +02:00
Inforithmics	fd648506c1	return integrated in vulkan backend	2025-10-05 21:13:21 +02:00
Inforithmics	37206cdf32	remvoe debug code	2025-10-05 20:56:21 +02:00
Inforithmics	d02a08aa7c	return Library Name	2025-10-05 20:55:28 +02:00
Inforithmics	66d1033610	fixed patch number	2025-10-05 20:41:05 +02:00
Inforithmics	3f38cdb590	Revert "rturn Vulkan for vulkan library" This reverts commit `690461a12f`.	2025-10-05 20:38:07 +02:00
Inforithmics	690461a12f	rturn Vulkan for vulkan library	2025-10-05 20:29:38 +02:00
Inforithmics	218e57974f	print out unknown library	2025-10-05 17:04:12 +02:00
Inforithmics	cafdb5c0d6	improve case	2025-10-05 16:46:55 +02:00
Inforithmics	d5a2462c8e	handle igpu as gpu	2025-10-05 16:20:10 +02:00
Inforithmics	908b31814d	fixed vulkan casing	2025-10-05 11:01:26 +02:00
Inforithmics	6bef63b0f9	fix format	2025-10-04 21:45:06 +02:00
Inforithmics	f8551bc631	merge fixes	2025-10-04 21:28:15 +02:00
Daniel Hiltgen	292767afb4	CI: fix win arm build (#12502 ) Resolve subtle erroraction stickiness difference between x86 and arm builder setup	2025-10-04 11:46:45 -07:00
Inforithmics	8ad169403b	update build windows script	2025-10-04 19:25:34 +02:00
Inforithmics	4803e57c9b	Merge remote-tracking branch 'upstream/main' into VulkanV3Update	2025-10-04 19:14:12 +02:00
Inforithmics	93d7126ce5	sync llama.cpp vulkan code	2025-10-04 19:02:57 +02:00
Inforithmics	163f62fcb6	fix vulkan gpu id patch	2025-10-04 18:56:38 +02:00
Daniel Hiltgen	ae5e0f0889	CI: replace clang compiler for windows (#12495 )	2025-10-04 09:18:42 -07:00
Inforithmics	96e562f982	fixed build	2025-10-04 16:35:04 +02:00
Inforithmics	9ac9f3a952	fixed formatting	2025-10-04 16:32:39 +02:00
Inforithmics	b2aba4ea83	fixed build	2025-10-04 16:26:03 +02:00
Inforithmics	06528d66aa	fixing build	2025-10-04 16:22:55 +02:00
Inforithmics	75f65bcdbf	merge fixes	2025-10-04 16:11:34 +02:00
Inforithmics	1e46db8748	fixed build	2025-10-04 15:44:23 +02:00
Inforithmics	c4d8c75e54	merge fixes	2025-10-04 15:27:52 +02:00
Inforithmics	294b179688	merge fixes	2025-10-04 15:20:33 +02:00
Inforithmics	f567cc59d4	fix build	2025-10-04 15:08:18 +02:00
Inforithmics	e6c28916e1	Merge branch 'vulkanV3' into VulkanV3Update	2025-10-04 14:59:30 +02:00
Inforithmics	ac6ba7d44b	Merge remote-tracking branch 'upstream/main' into VulkanV3Update	2025-10-04 14:53:59 +02:00
Jesse Gross	19e6796eac	llm: Support KV cache quantization with gpt-oss With the new version of GGML in #12245, KV cache quantization no longer causes a fallback to CPU.	2025-10-03 16:31:58 -07:00
Grace	33801c1597	Fixed Deepseek2 adding nil tensor error	2025-10-03 14:20:06 -07:00
Daniel Hiltgen	e4340667e3	Workaround broken NVIDIA iGPU free VRAM data (#12490 ) The CUDA APIs for reporting free VRAM are useless on NVIDIA iGPU systems as they only return the kernels actual free memory and ignore buff/cache allocations which on a typical system will quickly fill up most of the free system memory. As a result, we incorrectly think there's very little available for GPU allocations which is wrong.	2025-10-03 12:17:21 -07:00
Patrick Devine	2fa1e92a99	test: add template error test (#12489 )	2025-10-03 12:05:34 -07:00
Daniel Hiltgen	07e36761c3	ci: place rocm windows in correct runner dir (#12487 )	2025-10-03 07:28:40 -07:00
Daniel Hiltgen	c29fb007c0	CI: temporarily disable clang install (#12486 ) This will likely yield builds that have problems with unicode characters but at least we can start testing the release while we try to find an alternate clang compiler for windows, or mingw ships a fixed version.	2025-10-02 20:31:18 -07:00
Daniel Hiltgen	730ed6e9e1	ci: fix windows build (#12485 )	2025-10-02 19:16:01 -07:00
Daniel Hiltgen	dc06601677	ci: fix windows build (#12484 )	2025-10-02 18:59:26 -07:00
Patrick Devine	1ed2881ef0	templates: fix crash in improperly defined templates (#12483 )	2025-10-02 17:25:55 -07:00
Jesse Gross	0bda72892c	llm: Enable flash attention by default for qwen3 and qwen3moe	2025-10-02 17:04:10 -07:00
Daniel Hiltgen	55ca827267	AMD: block running on unsupported gfx900/gfx906 (#12481 )	2025-10-02 16:53:05 -07:00
Daniel Hiltgen	c68f367ef6	Update GGML to b6646 (#12245 ) Notable EOLs with this change: - MacOS v12 and v13 are no longer supported (v14+ required) - AMD gfx900 and gfx906 are no longer supported	2025-10-02 14:47:10 -07:00
Jesse Gross	fdb109469f	llm: Allow overriding flash attention setting As we automatically enable flash attention for more models, there are likely some cases where we get it wrong. This allows setting OLLAMA_FLASH_ATTENTION=0 to disable it, even for models that usually have flash attention.	2025-10-02 12:07:20 -07:00
Daniel Hiltgen	05a43e078a	fix panic on bootstrapDevices (#12475 ) Wrong index variable was used.	2025-10-01 17:39:29 -07:00
Daniel Hiltgen	bc8909fb38	Use runners for GPU discovery (#12090 ) This revamps how we discover GPUs in the system by leveraging the Ollama runner. This should eliminate inconsistency between our GPU discovery and the runners capabilities at runtime, particularly for cases where we try to filter out unsupported GPUs. Now the runner does that implicitly based on the actual device list. In some cases free VRAM reporting can be unreliable which can leaad to scheduling mistakes, so this also includes a patch to leverage more reliable VRAM reporting libraries if available. Automatic workarounds have been removed as only one GPU leveraged this, which is now documented. This GPU will soon fall off the support matrix with the next ROCm bump. Additional cleanup of the scheduler and discovery packages can be done in the future once we have switched on the new memory management code, and removed support for the llama runner.	2025-10-01 15:12:32 -07:00
Devon Rifkin	6b50f2b9cd	Merge pull request #12461 from ollama/drifkin/qwen3-coder-tweaks qwen3-coder: fix tool definition type rendering	2025-09-30 19:47:44 -07:00
Michael Yang	35ac4eb12c	fix keep alive this reference to keep alive was missed in #12041 so chat has a diffferent behaviour than generate	2025-09-30 17:22:28 -07:00
Jesse Gross	3d0b1734c0	ggml: Preallocate CUDA pool memory The GGML CUDA backend allocates additional memory for intermediate results during calculation. This memory isn't currently allocated during worst case graph reservation and therefore not included in scheduling. This means that as these buffers potentially grow with context length, we could crash. This extends the memory allocation system down layer from the GGML graph to the CUDA layer, preallocating the worst case memory there as well. Fixes #11753	2025-09-30 15:04:43 -07:00

1 2 3 4 5 ...

4810 Commits All Branches Search

4810 Commits

All Branches