Commit Graph

4834 Commits

Author SHA1 Message Date
Daniel Hiltgen 6286d9a3a5
Enable Vulkan with a temporary opt-in setting (#12931)
* docs: vulkan information

* Revert "CI: Set up temporary opt-out Vulkan support (#12614)"

This reverts commit 8b6e5baee7.

* vulkan: temporary opt-in for Vulkan support

Revert this once we're ready to enable by default.

* win: add vulkan CI build
2025-11-12 08:40:38 -08:00
Daniel Hiltgen 3a9e8e9fd4
vulkan: temporary cary of vulkan fixes (#12971)
This should be reverted once we update ggml past b6897
2025-11-12 08:31:40 -08:00
Jeffrey Morgan cb1cb06478
docs: rename api-reference.md back to api.md since redirect stopped working (#13056) 2025-11-11 15:53:06 -08:00
Jeffrey Morgan 2d5e066c8c
docs: fix openapi.yaml warnings, rename api.md to api-reference.md (#12904) 2025-11-11 15:39:35 -08:00
Bruce MacDonald 15968714bd
docs/openapi: document that delete and copy responses are empty (#13055)
Some route endpoints return an empty response with a 200 OK. These should be documented in the OpenAPI doc. Note that the previous deletion response was not correct.
2025-11-11 15:07:21 -08:00
Jesse Gross 8bf38552de llm: Prefer dedicated GPUs over iGPUs when allocating memory
We currently assign model layers to GPUs according to free VRAM,
which assumes that GPU performance is roughly equal. This does not
work well for mixed dGPU and iGPU systems because iGPUs typically
use system memory which is large but their performance is slow.
This instead assigns layers to dGPUs first and then iGPUs.

In the future, this could be generalized to have a more fine grained
notion of GPU performance but dGPU vs. iGPU performance is the most
extreme.
2025-11-11 13:11:08 -08:00
Jesse Gross b13fbad0fe llm: Separate llamaServer and ollamaServer code paths
Originally, llamaServer represented old memory estimates, which
could be used with either the old or new engine. ollamaServer was
used only for the new estimates and new engine. Since these
implementations did not map directly to engine, there was engine-
specific code in common code paths.

Now that new estimates are always used for the new engine, there is
a direct mapping between server type and engine. This separates out
most of the engine-specific code into the correct implementation
to make things easier to understand.
2025-11-11 13:11:08 -08:00
Jesse Gross f560bd077f llm: Use Ollama engine memory layouts for both old and new engines
Currently for both the old and new engines, there is code to
calculate how much memory is required for a model and lay out
the layers onto GPUs. This reuses the new engine's lay out code
for the old engine as well, bringing them closer together. The
old engine continues to use its current method of estimating
required memory.

This reduces maintainence effort and improves consistency, as new
features only need to be implemented in one place. The newer code
is also more accurate, especially with multiple GPUs.
2025-11-11 13:11:08 -08:00
Jesse Gross 4372d0bfef llamarunner: Respect device ordering for offloaded layers
We used to control the way that llama.cpp saw devices using
CUDA_VISIBLE_DEVICES or similar. This would ensure that the layers
offloaded to a device were actually the ones intended. This is
particularly important because we might reorder devices based on
free memory or performance.

When we started explicitly scheduling layers, this logic went
away but the llamarunner didn't have any way to set the correct
order of devices. This meant that the correct number of layers
would be assigned to a device but not necessarily the layers
that were expected. This change sets up the devices correctly
based on the offload information.
2025-11-11 13:11:08 -08:00
Eva H 31361c4d3c
app/ui: do not send thinking to prevent errors with cloud provider 2025-11-11 16:09:24 -05:00
Baptiste Jamin 59241c5bee
server: add logprobs and top_logprobs support to Ollama's API (#12899)
Adds logprobs support to Ollama's API including support for Ollama's
OpenAI-compatible API. By specifying the new 'logprobs' boolean parameter
in the API, Ollama will return the log probabilities for each token generated.
'top_logprobs', an integer value can also be specified up to the value 20.
When specified, the API will also provide the number of most likely tokens to
return at each token position

Co-authored-by: Baptiste Jamin <baptiste@crisp.chat>
2025-11-11 08:49:50 -08:00
Eva Ho 2a9b61f099 address comment 2025-11-11 08:58:55 -05:00
Sheikh 6df4208836
docs: fix metal gpu section header (#13045) 2025-11-10 21:51:22 -08:00
Eva Ho 9d615cdaa0 fix test 2025-11-10 20:13:50 -05:00
Eva Ho 6a818b8a09 clean up 2025-11-10 19:08:42 -05:00
Eva Ho 2aaf29acb5 app/ui: do not send to prevent errors with cloud provider 2025-11-10 19:05:00 -05:00
Eva H a42f826acb
app/ui: using streamdown AI elements for markdown rendering 2025-11-10 12:05:59 -05:00
Bruce MacDonald e10a3533a5
app/docs: remove out of date storybook instructions (#13006) 2025-11-08 13:28:18 -08:00
Patrick Devine 91ec3ddbeb
bugfix: don't include both consolidated.safetensors and model-*.safetensors (#13010) 2025-11-07 22:41:57 -08:00
Parth Sareen 755ac3b069
docs: update n8n URL for Ollama (#12994) 2025-11-07 20:07:26 -08:00
Daniel Hiltgen 60b8973559
doc: re-add login autostart faq and GPU updates (#12975)
* doc: re-add login autostart faq

This appears to have been accidentally dropped during the doc migration.

* docs: GPU updates lost on the doc update

* review comments: improve windows login disable instructions
2025-11-07 11:21:44 -08:00
Tomoya Fujita d2ef679d42
docs: fix 404 link to modelfile documentation (#12996) 2025-11-07 10:06:46 -08:00
Thomas Stocker d4e0da0890
Remove unnecessary MacOs 13 and lower Patches (#12656)
* Remove unnecessary macos 13 Patch

* Remove unnecessary MacOs Version Guard patch

* rename patchesw

* remove again macos13 patch

* rename files
2025-11-06 15:52:56 -08:00
Jeffrey Morgan 565b802a6b
openai: fix tool call ID mapping (#12988) 2025-11-06 15:26:25 -08:00
Saifeddine ALOUI 6c79e6c09a
readme: add security tools section and Ollama fortress to community integrations (#12981) 2025-11-06 15:21:13 -08:00
breatn 780762f9d2
server: fix duplicate 'is' typo in comment (#12985) 2025-11-06 14:44:44 -08:00
Jeffrey Morgan 30fcc71983
api: add omitempty to required tool function parameter type (#12989) 2025-11-06 14:08:55 -08:00
Eva Ho 3501a4bdf9 address comment 2025-11-06 16:49:22 -05:00
Eva H 73a0cafc1e
Merge pull request #12973 from macarronesc/main
feat: add support for WebP images in Ollama's app
2025-11-06 16:31:46 -05:00
Eva Ho e309c80474 address comments 2025-11-06 13:49:59 -05:00
Daniel Hiltgen 544b6739dd
ggml update to b6840 (#12791) 2025-11-06 10:19:22 -08:00
Daniel Alejandro Coll Tejeda a4a53692f8 refactor: remove GIF support from image validation tests and logging 2025-11-06 09:09:51 +00:00
7394112478 c4ba257c64
readme: remove 404 link (#11351) 2025-11-05 23:36:59 -08:00
mags0ft 342e58ce4f
readme: add hle-eval-ollama to list of terminal community integrations (#11371) 2025-11-05 23:04:30 -08:00
Saifeddine ALOUI 47b2585cfd
readme: add lollms and lollms WebUI to community integrations (#11981) 2025-11-05 22:48:43 -08:00
Vincent Koc 4111db013f
app: fix macOS file picker to support Uniform Type Identifiers (#12965) 2025-11-05 21:37:17 -08:00
Eva Ho 536c987c39 address comment 2025-11-05 20:19:34 -05:00
Eva Ho a534d4e9e1 fixing thinking not scrolling issue 2025-11-05 16:06:55 -05:00
Eva Ho 74586aa9df address comments 2025-11-05 16:06:55 -05:00
Eva Ho 8c74f5ddfd ui: using streamdown AI elements for markdown rendering 2025-11-05 16:06:55 -05:00
Daniel Hiltgen 80d34260ea
ci: re-enable signing (#12974) 2025-11-05 12:33:01 -08:00
Daniel Alejandro Coll Tejeda bddfa2100f feat: add support for WebP images in Ollama's app 2025-11-05 21:23:20 +01:00
nicole pardal 1ca608bcd1
embeddings: added embedding command for cl (#12795)
Co-authored-by: A-Akhil <akhilrahul70@gmail.com>

This PR introduces a new ollama embed command that allows users to generate embeddings directly from the command line.

Added ollama embed MODEL [TEXT...] command for generating text embeddings
Supports both direct text arguments and stdin piping for scripted workflows

Outputs embeddings as JSON arrays (one per line)
2025-11-05 11:58:03 -08:00
Daniel Hiltgen 6aa7283076
mac: fix stale VRAM data (#12972)
The scheduler updates free VRAM based on current loaded models.  This was
mutating the persisted list of GPUs, and when coupled with the non-refreshing
logic for Metal that lead to stale low VRAM reporting after unload.  The fix is
to make sure the GPU discovery always returns a copy so the schedulers GPU list
is in fact ephemeral and doesn't leak any temporary adjustments back into the
persistent list.
2025-11-05 11:55:17 -08:00
Patrick Devine f89fc1cadd
bugfix: show connection string for interactive cli usage (#12930) 2025-11-05 11:55:04 -08:00
Daniel Hiltgen 97e05d2a6b
win: revert CPU discovery logic to 0.12.3 (#12969)
The behavior change in 0.12.4 is the most likely the root cause of hangs some
users are seeing.  This reverts to the 0.12.3 code, with some added trace
logging.
2025-11-05 10:32:38 -08:00
Youdon 8bbc7395db
readme: Add handy-ollama to community integrations (#8601) 2025-11-05 09:56:14 -08:00
Daniel Hiltgen 408c2f99d0
log: trace logging for scheduler (#12961) 2025-11-05 08:12:15 -08:00
Grace 809b9c68fa
Add Tool Call ID (#12956)
* routes/types: add tool call id

---------

Co-authored-by: ParthSareen <parth.sareen@ollama.com>
2025-11-04 16:43:33 -08:00
Daniel Hiltgen ba8c035846
log: instrument CPU discovery timing (#12960) 2025-11-04 16:23:37 -08:00