Commit Graph

467 Commits

Author SHA1 Message Date
Patrick Devine 45eabc3083
docs: update the faq (#11760) 2025-12-29 06:39:50 -06:00
Gao feng 3e2a98ad55
Update downloading to pulling in api.md (#11170)
update api.md to make it consist with code.
https://github.com/ollama/ollama/blob/main/server/download.go#L447
2025-12-29 06:39:49 -06:00
Parth Sareen 179bbf2640
docs: update turbo model name (#11707) 2025-12-29 06:39:49 -06:00
Jeffrey Morgan 063d3e8163
docs: add docs for Ollama Turbo (#11687) 2025-12-29 06:39:48 -06:00
Yoshi 9bd69d0110
docs: fix typos and remove trailing whitespaces (#11554) 2025-12-29 06:39:46 -06:00
ycomiti f5319ac72b
Update linux.md (#11462) 2025-12-29 06:39:45 -06:00
frob a1a350b608
docs: add the no-Modelfile function of `ollama create` (#9077) 2025-12-29 06:39:44 -06:00
Marcelo Fornet 8c885fe5eb
docs: fix typo in macos.md (#11425) 2025-12-29 06:39:43 -06:00
先知 43cacd9309
docs: update modelfile.md to reflect current default num_ctx (#11189)
As in the commit 44b466eeb2, the default context length has been increased to 4096.
2025-12-29 06:39:43 -06:00
Daniel Hiltgen 50e4df359b
doc: add MacOS docs (#11334)
also removes stale model dir instructions for windows
2025-12-29 06:39:42 -06:00
Daniel Hiltgen 4fcc030739
Reduce default parallelism to 1 (#11330)
The current scheduler algorithm of picking the paralellism based on available
VRAM complicates the upcoming dynamic layer memory allocation algorithm.  This
changes the default to 1, with the intent going forward that parallelism is
explicit and will no longer be dynamically determined.  Removal of the dynamic
logic will come in a follow up.
2025-12-29 06:39:41 -06:00
Parth Sareen 25f6571f34
add `tool_name` to api.md (#11326) 2025-12-29 06:39:41 -06:00
Parth Sareen 1efadee48c
template: add tool result compatibility (#11294) 2025-12-29 06:39:41 -06:00
Daniel Hiltgen ba750172ca
doc: add NVIDIA blackwell to supported list (#11307) 2025-12-29 06:39:40 -06:00
Daniel Hiltgen 29ec3ddf9a
Re-remove cuda v11 (#10694)
* Re-remove cuda v11

Revert the revert - drop v11 support requiring drivers newer than Feb 23

This reverts commit c6bcdc4223.

* Simplify layout

With only one version of the GPU libraries, we can simplify things down somewhat.  (Jetsons still require special handling)

* distinct sbsa variant for linux arm64

This avoids accidentally trying to load the sbsa cuda libraries on
a jetson system which results in crashes.

* temporary prevent rocm+cuda mixed loading
2025-12-29 06:38:18 -06:00
Jeffrey Morgan 6d36b8dcfb
benchmark: remove unused benchmark test (#11120)
Removes a test under benchmark/ that is unused
2025-12-29 06:38:17 -06:00
Krzysztof Jeziorny 874e02626f
docs: update link to AMD drivers in linux.md (#10973) 2025-12-29 06:38:12 -06:00
Jeffrey Morgan 3b70283d35
Revert "server: add model capabilities to the list endpoint (#10174)" (#11004)
This reverts commit 0943001193.
2025-12-29 06:38:12 -06:00
Hunter Wittenborn 8b158c2049
docs: fix typo in development.md (#10998) 2025-12-29 06:38:11 -06:00
JasonHonKL 47bebce5f8
server: add model capabilities to the list endpoint (#10174) 2025-12-29 06:38:11 -06:00
Devon Rifkin 026aba9f11
add thinking support to the api and cli (#10584)
- Both `/api/generate` and `/api/chat` now accept a `"think"`
  option that allows specifying whether thinking mode should be on or
  not
- Templates get passed this new option so, e.g., qwen3's template can
  put `/think` or `/no_think` in the system prompt depending on the
  value of the setting
- Models' thinking support is inferred by inspecting model templates.
  The prefix and suffix the parser uses to identify thinking support is
  also automatically inferred from templates
- Thinking control & parsing is opt-in via the API to prevent breaking
  existing API consumers. If the `"think"` option is not specified, the
  behavior is unchanged from previous versions of ollama
- Add parsing for thinking blocks in both streaming/non-streaming mode
  in both `/generate` and `/chat`
- Update the CLI to make use of these changes. Users can pass `--think`
  or `--think=false` to control thinking, or during an interactive
  session they can use the commands `/set think` or `/set nothink`
- A `--hidethinking` option has also been added to the CLI. This makes
  it easy to use thinking in scripting scenarios like
  `ollama run qwen3 --think --hidethinking "my question here"` where you
  just want to see the answer but still want the benefits of thinking
  models
2025-12-29 06:38:09 -06:00
frob 56765df3ee
docs: remove unsupported quantizations (#10842) 2025-12-29 06:38:07 -06:00
Daniel Hiltgen d344573e5b
Revert "remove cuda v11 (#10569)" (#10692)
Bring back v11 until we can better warn users that their driver
is too old.

This reverts commit fa393554b9.
2025-12-29 06:37:58 -06:00
Daniel Hiltgen 0132148534
Follow up to #10363 (#10647)
The quantization PR didn't block all unsupported file types,
which this PR fixes.  It also updates the API docs to reflect
the now reduced set of supported types.
2025-12-29 06:37:57 -06:00
Jeffrey Morgan 9ec2150629
api: remove unused sampling parameters (#10581) 2025-12-29 06:37:54 -06:00
Daniel Hiltgen 3e99eae7e5
remove cuda v11 (#10569)
This reduces the size of our Windows installer payloads by ~256M by dropping
support for nvidia drivers older than Feb 2023.  Hardware support is unchanged.

Linux default bundle sizes are reduced by ~600M to 1G.
2025-12-29 06:37:53 -06:00
Jeffrey Morgan 13c66584a5
api: remove unused or unsupported api options (#10574)
Some options listed in api/types.go are not supported in
newer models, or have been deprecated in the past. This is
the first of a series of PRs to clean up the API options
2025-12-29 06:37:52 -06:00
Devon Rifkin b963dd868b
config: update default context length to 4096 2025-12-29 06:37:46 -06:00
Devon Rifkin 5a7c6c363e
Revert "increase default context length to 4096 (#10364)"
This reverts commit 424f648632.
2025-12-29 06:37:46 -06:00
Devon Rifkin 770df0887f
increase default context length to 4096 (#10364)
* increase default context length to 4096

We lower the default numParallel from 4 to 2 and use these "savings" to
double the default context length from 2048 to 4096.

We're memory neutral in cases when we previously would've used
numParallel == 4, but we add the following mitigation to handle some
cases where we would have previously fallen back to 1x2048 due to low
VRAM: we decide between 2048 and 4096 using a runtime check, choosing
2048 if we're on a one GPU system with total VRAM of <= 4 GB. We
purposefully don't check the available VRAM because we don't want the
context window size to change unexpectedly based on the available VRAM.

We plan on making the default even larger, but this is a relatively
low-risk change we can make to quickly double it.

* fix tests

add an explicit context length so they don't get truncated. The code
that converts -1 from being a signal for doing a runtime check isn't
running as part of these tests.

* tweak small gpu message

* clarify context length default

also make it actually show up in `ollama serve --help`
2025-12-29 06:37:41 -06:00
Devon Rifkin 2a8495a8ea
docs: change more template blocks to have syntax highlighting
In #8215 syntax highlighting was added to most of the blocks, but there were a couple that were still being rendered as plaintext
2025-12-29 06:37:39 -06:00
Devon Rifkin 378d3210dc
docs: update some response code blocks to json5
This is to prevent rendering bright red comments indicating invalid JSON when the comments are just supposed to be explanatory
2025-04-14 17:09:06 -07:00
frob ccc8c6777b
cleanup: remove OLLAMA_TMPDIR and references to temporary executables (#10182)
* cleanup: remove OLLAMA_TMPDIR
* cleanup: ollama doesn't use temporary executables anymore

---------

Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2025-04-08 15:01:39 -07:00
Bruce MacDonald e172f095ba
api: return model capabilities from the show endpoint (#10066)
With support for multimodal models becoming more varied and common it is important for clients to be able to easily see what capabilities a model has. Retuning these from the show endpoint will allow clients to easily see what a model can do.
2025-04-01 15:21:46 -07:00
Parth Sareen b816ff86c9
docs: make context length faq readable (#10006) 2025-03-26 17:34:18 -07:00
copeland3300 5e0b904e88
docs: add flags to example linux log output command (#9852) 2025-03-25 09:52:23 -07:00
Bruce MacDonald fb6252d786
benchmark: performance of running ollama server (#8643) 2025-03-21 13:08:20 -07:00
Parth Sareen d14ce75b95
docs: update final response for /api/chat stream (#9919) 2025-03-21 12:35:47 -07:00
Bradley Erickson 74b44fdf8f
docs: Add OLLAMA_ORIGINS for browser extension support (#9643) 2025-03-13 16:35:20 -07:00
Michael Yang fe776293f7
Merge pull request #9569 from dwt/patch-1
Better WantedBy declaration
2025-03-10 14:09:37 -07:00
frob d8a5d96b98
docs: Add OLLAMA_CONTEXT_LENGTH to FAQ. (#9545) 2025-03-10 11:02:54 -07:00
‮rekcäH nitraM‮ 25248f4bd5
Better WantedBy declaration
The problem with default.target is that it always points to the target that is currently started. So if you boot into single user mode or the rescue mode still Ollama tries to start.

I noticed this because either tried (and failed) to start all the time during a system update, where Ollama definitely is not wanted.
2025-03-07 10:26:31 +01:00
Daniel Hiltgen cae5d4d4ea
Win: doc new rocm zip file (#9367)
To stay under the 2G github artifact limit, we're splitting ROCm
out like we do on linux.
2025-03-05 14:11:21 -08:00
Blake Mizerany 55ab9f371a
server/.../backoff,syncs: don't break builds without synctest (#9484)
Previously, developers without the synctest experiment enabled would see
build failures when running tests in some server/internal/internal
packages using the synctest package. This change makes the transition to
use of the package less painful but guards the use of the synctest
package with build tags.

synctest is enabled in CI. If a new change will break a synctest
package, it will break in CI, even if it does not break locally.

The developer docs have been updated to help with any confusion about
why package tests pass locally but fail in CI.
2025-03-03 16:45:40 -08:00
Daniel Hiltgen 688925aca9
Windows ARM build (#9120)
* Windows ARM build

Skip cmake, and note it's unused in the developer docs.

* Win: only check for ninja when we need it

On windows ARM, the cim lookup fails, but we don't need ninja anyway.
2025-02-27 09:02:25 -08:00
Chuanhui Liu 888855675e
docs: rocm install link (#9346) 2025-02-25 13:15:47 -08:00
frob 4df98f3eb5
Move cgroups fix out of AMD section. (#9072)
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2025-02-25 08:52:50 -08:00
Jeffrey Morgan 7cfd4aee4d
docs: add additional ROCm docs for building (#9066) 2025-02-22 11:22:59 -08:00
James-William-Kincaid-III 0667baddc6
docs: fix incorrect shortcut key in windows.md (#9098) 2025-02-15 15:38:24 -05:00
frob 3a4449e2f1
docs: add H200 as supported device. (#9076)
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2025-02-13 10:44:23 -08:00