Commit Graph

4810 Commits

Author SHA1 Message Date
Inforithmics f2842defcb Merge remote-tracking branch 'upstream/main' into VulkanV3Update 2025-10-06 11:39:24 +02:00
Inforithmics 2acedf1756 update patch 2025-10-06 10:01:28 +02:00
Inforithmics e9828e6b11 Return pci Properties 2025-10-06 09:53:51 +02:00
Inforithmics fd648506c1 return integrated in vulkan backend 2025-10-05 21:13:21 +02:00
Inforithmics 37206cdf32 remvoe debug code 2025-10-05 20:56:21 +02:00
Inforithmics d02a08aa7c return Library Name 2025-10-05 20:55:28 +02:00
Inforithmics 66d1033610 fixed patch number 2025-10-05 20:41:05 +02:00
Inforithmics 3f38cdb590 Revert "rturn Vulkan for vulkan library"
This reverts commit 690461a12f.
2025-10-05 20:38:07 +02:00
Inforithmics 690461a12f rturn Vulkan for vulkan library 2025-10-05 20:29:38 +02:00
Inforithmics 218e57974f print out unknown library 2025-10-05 17:04:12 +02:00
Inforithmics cafdb5c0d6 improve case 2025-10-05 16:46:55 +02:00
Inforithmics d5a2462c8e handle igpu as gpu 2025-10-05 16:20:10 +02:00
Inforithmics 908b31814d fixed vulkan casing 2025-10-05 11:01:26 +02:00
Inforithmics 6bef63b0f9 fix format 2025-10-04 21:45:06 +02:00
Inforithmics f8551bc631 merge fixes 2025-10-04 21:28:15 +02:00
Daniel Hiltgen 292767afb4
CI: fix win arm build (#12502)
Resolve subtle erroraction stickiness difference between x86 and arm builder setup
2025-10-04 11:46:45 -07:00
Inforithmics 8ad169403b update build windows script 2025-10-04 19:25:34 +02:00
Inforithmics 4803e57c9b Merge remote-tracking branch 'upstream/main' into VulkanV3Update 2025-10-04 19:14:12 +02:00
Inforithmics 93d7126ce5 sync llama.cpp vulkan code 2025-10-04 19:02:57 +02:00
Inforithmics 163f62fcb6 fix vulkan gpu id patch 2025-10-04 18:56:38 +02:00
Daniel Hiltgen ae5e0f0889
CI: replace clang compiler for windows (#12495) 2025-10-04 09:18:42 -07:00
Inforithmics 96e562f982 fixed build 2025-10-04 16:35:04 +02:00
Inforithmics 9ac9f3a952 fixed formatting 2025-10-04 16:32:39 +02:00
Inforithmics b2aba4ea83 fixed build 2025-10-04 16:26:03 +02:00
Inforithmics 06528d66aa fixing build 2025-10-04 16:22:55 +02:00
Inforithmics 75f65bcdbf merge fixes 2025-10-04 16:11:34 +02:00
Inforithmics 1e46db8748 fixed build 2025-10-04 15:44:23 +02:00
Inforithmics c4d8c75e54 merge fixes 2025-10-04 15:27:52 +02:00
Inforithmics 294b179688 merge fixes 2025-10-04 15:20:33 +02:00
Inforithmics f567cc59d4 fix build 2025-10-04 15:08:18 +02:00
Inforithmics e6c28916e1 Merge branch 'vulkanV3' into VulkanV3Update 2025-10-04 14:59:30 +02:00
Inforithmics ac6ba7d44b Merge remote-tracking branch 'upstream/main' into VulkanV3Update 2025-10-04 14:53:59 +02:00
Jesse Gross 19e6796eac llm: Support KV cache quantization with gpt-oss
With the new version of GGML in #12245, KV cache quantization
no longer causes a fallback to CPU.
2025-10-03 16:31:58 -07:00
Grace 33801c1597
Fixed Deepseek2 adding nil tensor error 2025-10-03 14:20:06 -07:00
Daniel Hiltgen e4340667e3
Workaround broken NVIDIA iGPU free VRAM data (#12490)
The CUDA APIs for reporting free VRAM are useless on NVIDIA iGPU
systems as they only return the kernels actual free memory and ignore
buff/cache allocations which on a typical system will quickly fill up
most of the free system memory.  As a result, we incorrectly think
there's very little available for GPU allocations which is wrong.
2025-10-03 12:17:21 -07:00
Patrick Devine 2fa1e92a99
test: add template error test (#12489) 2025-10-03 12:05:34 -07:00
Daniel Hiltgen 07e36761c3
ci: place rocm windows in correct runner dir (#12487) 2025-10-03 07:28:40 -07:00
Daniel Hiltgen c29fb007c0
CI: temporarily disable clang install (#12486)
This will likely yield builds that have problems with unicode characters
but at least we can start testing the release while we try to find an
alternate clang compiler for windows, or mingw ships a fixed version.
2025-10-02 20:31:18 -07:00
Daniel Hiltgen 730ed6e9e1
ci: fix windows build (#12485) 2025-10-02 19:16:01 -07:00
Daniel Hiltgen dc06601677
ci: fix windows build (#12484) 2025-10-02 18:59:26 -07:00
Patrick Devine 1ed2881ef0
templates: fix crash in improperly defined templates (#12483) 2025-10-02 17:25:55 -07:00
Jesse Gross 0bda72892c llm: Enable flash attention by default for qwen3 and qwen3moe 2025-10-02 17:04:10 -07:00
Daniel Hiltgen 55ca827267
AMD: block running on unsupported gfx900/gfx906 (#12481) 2025-10-02 16:53:05 -07:00
Daniel Hiltgen c68f367ef6
Update GGML to b6646 (#12245)
Notable EOLs with this change:
- MacOS v12 and v13 are no longer supported (v14+ required)
- AMD gfx900 and gfx906 are no longer supported
2025-10-02 14:47:10 -07:00
Jesse Gross fdb109469f llm: Allow overriding flash attention setting
As we automatically enable flash attention for more models, there
are likely some cases where we get it wrong. This allows setting
OLLAMA_FLASH_ATTENTION=0 to disable it, even for models that usually
have flash attention.
2025-10-02 12:07:20 -07:00
Daniel Hiltgen 05a43e078a
fix panic on bootstrapDevices (#12475)
Wrong index variable was used.
2025-10-01 17:39:29 -07:00
Daniel Hiltgen bc8909fb38
Use runners for GPU discovery (#12090)
This revamps how we discover GPUs in the system by leveraging the Ollama
runner.  This should eliminate inconsistency between our GPU discovery and the
runners capabilities at runtime, particularly for cases where we try to filter
out unsupported GPUs.  Now the runner does that implicitly based on the actual
device list.  In some cases free VRAM reporting can be unreliable which can
leaad to scheduling mistakes, so this also includes a patch to leverage more
reliable VRAM reporting libraries if available.

Automatic workarounds have been removed as only one GPU leveraged this, which
is now documented. This GPU will soon fall off the support matrix with the next
ROCm bump.

Additional cleanup of the scheduler and discovery packages can be done in the
future once we have switched on the new memory management code, and removed
support for the llama runner.
2025-10-01 15:12:32 -07:00
Devon Rifkin 6b50f2b9cd
Merge pull request #12461 from ollama/drifkin/qwen3-coder-tweaks
qwen3-coder: fix tool definition type rendering
2025-09-30 19:47:44 -07:00
Michael Yang 35ac4eb12c fix keep alive
this reference to keep alive was missed in #12041 so chat has a
diffferent behaviour than generate
2025-09-30 17:22:28 -07:00
Jesse Gross 3d0b1734c0 ggml: Preallocate CUDA pool memory
The GGML CUDA backend allocates additional memory for intermediate
results during calculation. This memory isn't currently allocated
during worst case graph reservation and therefore not included in
scheduling. This means that as these buffers potentially grow
with context length, we could crash.

This extends the memory allocation system down layer from the GGML
graph to the CUDA layer, preallocating the worst case memory there
as well.

Fixes #11753
2025-09-30 15:04:43 -07:00