ollama

Commit Graph

Author	SHA1	Message	Date
Patrick Devine	45eabc3083	docs: update the faq (#11760 )	2025-12-29 06:39:50 -06:00
Gao feng	3e2a98ad55	Update downloading to pulling in api.md (#11170 ) update api.md to make it consist with code. https://github.com/ollama/ollama/blob/main/server/download.go#L447	2025-12-29 06:39:49 -06:00
Parth Sareen	179bbf2640	docs: update turbo model name (#11707 )	2025-12-29 06:39:49 -06:00
Jeffrey Morgan	063d3e8163	docs: add docs for Ollama Turbo (#11687 )	2025-12-29 06:39:48 -06:00
Yoshi	9bd69d0110	docs: fix typos and remove trailing whitespaces (#11554 )	2025-12-29 06:39:46 -06:00
ycomiti	f5319ac72b	Update linux.md (#11462 )	2025-12-29 06:39:45 -06:00
frob	a1a350b608	docs: add the no-Modelfile function of `ollama create` (#9077 )	2025-12-29 06:39:44 -06:00
Marcelo Fornet	8c885fe5eb	docs: fix typo in macos.md (#11425 )	2025-12-29 06:39:43 -06:00
先知	43cacd9309	docs: update modelfile.md to reflect current default num_ctx (#11189 ) As in the commit `44b466eeb2`, the default context length has been increased to 4096.	2025-12-29 06:39:43 -06:00
Daniel Hiltgen	50e4df359b	doc: add MacOS docs (#11334 ) also removes stale model dir instructions for windows	2025-12-29 06:39:42 -06:00
Daniel Hiltgen	4fcc030739	Reduce default parallelism to 1 (#11330 ) The current scheduler algorithm of picking the paralellism based on available VRAM complicates the upcoming dynamic layer memory allocation algorithm. This changes the default to 1, with the intent going forward that parallelism is explicit and will no longer be dynamically determined. Removal of the dynamic logic will come in a follow up.	2025-12-29 06:39:41 -06:00
Parth Sareen	25f6571f34	add `tool_name` to api.md (#11326 )	2025-12-29 06:39:41 -06:00
Parth Sareen	1efadee48c	template: add tool result compatibility (#11294 )	2025-12-29 06:39:41 -06:00
Daniel Hiltgen	ba750172ca	doc: add NVIDIA blackwell to supported list (#11307 )	2025-12-29 06:39:40 -06:00
Daniel Hiltgen	29ec3ddf9a	Re-remove cuda v11 (#10694 ) * Re-remove cuda v11 Revert the revert - drop v11 support requiring drivers newer than Feb 23 This reverts commit `c6bcdc4223`. * Simplify layout With only one version of the GPU libraries, we can simplify things down somewhat. (Jetsons still require special handling) * distinct sbsa variant for linux arm64 This avoids accidentally trying to load the sbsa cuda libraries on a jetson system which results in crashes. * temporary prevent rocm+cuda mixed loading	2025-12-29 06:38:18 -06:00
Jeffrey Morgan	6d36b8dcfb	benchmark: remove unused benchmark test (#11120 ) Removes a test under benchmark/ that is unused	2025-12-29 06:38:17 -06:00
Krzysztof Jeziorny	874e02626f	docs: update link to AMD drivers in linux.md (#10973 )	2025-12-29 06:38:12 -06:00
Jeffrey Morgan	3b70283d35	Revert "server: add model capabilities to the list endpoint (#10174 )" (#11004 ) This reverts commit `0943001193`.	2025-12-29 06:38:12 -06:00
Hunter Wittenborn	8b158c2049	docs: fix typo in development.md (#10998 )	2025-12-29 06:38:11 -06:00
JasonHonKL	47bebce5f8	server: add model capabilities to the list endpoint (#10174 )	2025-12-29 06:38:11 -06:00
Devon Rifkin	026aba9f11	add thinking support to the api and cli (#10584 ) - Both `/api/generate` and `/api/chat` now accept a `"think"` option that allows specifying whether thinking mode should be on or not - Templates get passed this new option so, e.g., qwen3's template can put `/think` or `/no_think` in the system prompt depending on the value of the setting - Models' thinking support is inferred by inspecting model templates. The prefix and suffix the parser uses to identify thinking support is also automatically inferred from templates - Thinking control & parsing is opt-in via the API to prevent breaking existing API consumers. If the `"think"` option is not specified, the behavior is unchanged from previous versions of ollama - Add parsing for thinking blocks in both streaming/non-streaming mode in both `/generate` and `/chat` - Update the CLI to make use of these changes. Users can pass `--think` or `--think=false` to control thinking, or during an interactive session they can use the commands `/set think` or `/set nothink` - A `--hidethinking` option has also been added to the CLI. This makes it easy to use thinking in scripting scenarios like `ollama run qwen3 --think --hidethinking "my question here"` where you just want to see the answer but still want the benefits of thinking models	2025-12-29 06:38:09 -06:00
frob	56765df3ee	docs: remove unsupported quantizations (#10842 )	2025-12-29 06:38:07 -06:00
Daniel Hiltgen	d344573e5b	Revert "remove cuda v11 (#10569 )" (#10692 ) Bring back v11 until we can better warn users that their driver is too old. This reverts commit `fa393554b9`.	2025-12-29 06:37:58 -06:00
Daniel Hiltgen	0132148534	Follow up to #10363 (#10647 ) The quantization PR didn't block all unsupported file types, which this PR fixes. It also updates the API docs to reflect the now reduced set of supported types.	2025-12-29 06:37:57 -06:00
Jeffrey Morgan	9ec2150629	api: remove unused sampling parameters (#10581 )	2025-12-29 06:37:54 -06:00
Daniel Hiltgen	3e99eae7e5	remove cuda v11 (#10569 ) This reduces the size of our Windows installer payloads by ~256M by dropping support for nvidia drivers older than Feb 2023. Hardware support is unchanged. Linux default bundle sizes are reduced by ~600M to 1G.	2025-12-29 06:37:53 -06:00
Jeffrey Morgan	13c66584a5	api: remove unused or unsupported api options (#10574 ) Some options listed in api/types.go are not supported in newer models, or have been deprecated in the past. This is the first of a series of PRs to clean up the API options	2025-12-29 06:37:52 -06:00
Devon Rifkin	b963dd868b	config: update default context length to 4096	2025-12-29 06:37:46 -06:00
Devon Rifkin	5a7c6c363e	Revert "increase default context length to 4096 (#10364 )" This reverts commit `424f648632`.	2025-12-29 06:37:46 -06:00
Devon Rifkin	770df0887f	increase default context length to 4096 (#10364 ) * increase default context length to 4096 We lower the default numParallel from 4 to 2 and use these "savings" to double the default context length from 2048 to 4096. We're memory neutral in cases when we previously would've used numParallel == 4, but we add the following mitigation to handle some cases where we would have previously fallen back to 1x2048 due to low VRAM: we decide between 2048 and 4096 using a runtime check, choosing 2048 if we're on a one GPU system with total VRAM of <= 4 GB. We purposefully don't check the available VRAM because we don't want the context window size to change unexpectedly based on the available VRAM. We plan on making the default even larger, but this is a relatively low-risk change we can make to quickly double it. * fix tests add an explicit context length so they don't get truncated. The code that converts -1 from being a signal for doing a runtime check isn't running as part of these tests. * tweak small gpu message * clarify context length default also make it actually show up in `ollama serve --help`	2025-12-29 06:37:41 -06:00
Devon Rifkin	2a8495a8ea	docs: change more template blocks to have syntax highlighting In #8215 syntax highlighting was added to most of the blocks, but there were a couple that were still being rendered as plaintext	2025-12-29 06:37:39 -06:00
Devon Rifkin	378d3210dc	docs: update some response code blocks to json5 This is to prevent rendering bright red comments indicating invalid JSON when the comments are just supposed to be explanatory	2025-04-14 17:09:06 -07:00
frob	ccc8c6777b	cleanup: remove OLLAMA_TMPDIR and references to temporary executables (#10182 ) * cleanup: remove OLLAMA_TMPDIR * cleanup: ollama doesn't use temporary executables anymore --------- Co-authored-by: Richard Lyons <frob@cloudstaff.com>	2025-04-08 15:01:39 -07:00
Bruce MacDonald	e172f095ba	api: return model capabilities from the show endpoint (#10066 ) With support for multimodal models becoming more varied and common it is important for clients to be able to easily see what capabilities a model has. Retuning these from the show endpoint will allow clients to easily see what a model can do.	2025-04-01 15:21:46 -07:00
Parth Sareen	b816ff86c9	docs: make context length faq readable (#10006 )	2025-03-26 17:34:18 -07:00
copeland3300	5e0b904e88	docs: add flags to example linux log output command (#9852 )	2025-03-25 09:52:23 -07:00
Bruce MacDonald	fb6252d786	benchmark: performance of running ollama server (#8643 )	2025-03-21 13:08:20 -07:00
Parth Sareen	d14ce75b95	docs: update final response for /api/chat stream (#9919 )	2025-03-21 12:35:47 -07:00
Bradley Erickson	74b44fdf8f	docs: Add OLLAMA_ORIGINS for browser extension support (#9643 )	2025-03-13 16:35:20 -07:00
Michael Yang	fe776293f7	Merge pull request #9569 from dwt/patch-1 Better WantedBy declaration	2025-03-10 14:09:37 -07:00
frob	d8a5d96b98	docs: Add OLLAMA_CONTEXT_LENGTH to FAQ. (#9545 )	2025-03-10 11:02:54 -07:00
‮rekcäH nitraM‮	25248f4bd5	Better WantedBy declaration The problem with default.target is that it always points to the target that is currently started. So if you boot into single user mode or the rescue mode still Ollama tries to start. I noticed this because either tried (and failed) to start all the time during a system update, where Ollama definitely is not wanted.	2025-03-07 10:26:31 +01:00
Daniel Hiltgen	cae5d4d4ea	Win: doc new rocm zip file (#9367 ) To stay under the 2G github artifact limit, we're splitting ROCm out like we do on linux.	2025-03-05 14:11:21 -08:00
Blake Mizerany	55ab9f371a	server/.../backoff,syncs: don't break builds without synctest (#9484 ) Previously, developers without the synctest experiment enabled would see build failures when running tests in some server/internal/internal packages using the synctest package. This change makes the transition to use of the package less painful but guards the use of the synctest package with build tags. synctest is enabled in CI. If a new change will break a synctest package, it will break in CI, even if it does not break locally. The developer docs have been updated to help with any confusion about why package tests pass locally but fail in CI.	2025-03-03 16:45:40 -08:00
Daniel Hiltgen	688925aca9	Windows ARM build (#9120 ) * Windows ARM build Skip cmake, and note it's unused in the developer docs. * Win: only check for ninja when we need it On windows ARM, the cim lookup fails, but we don't need ninja anyway.	2025-02-27 09:02:25 -08:00
Chuanhui Liu	888855675e	docs: rocm install link (#9346 )	2025-02-25 13:15:47 -08:00
frob	4df98f3eb5	Move cgroups fix out of AMD section. (#9072 ) Co-authored-by: Richard Lyons <frob@cloudstaff.com>	2025-02-25 08:52:50 -08:00
Jeffrey Morgan	7cfd4aee4d	docs: add additional ROCm docs for building (#9066 )	2025-02-22 11:22:59 -08:00
James-William-Kincaid-III	0667baddc6	docs: fix incorrect shortcut key in windows.md (#9098 )	2025-02-15 15:38:24 -05:00
frob	3a4449e2f1	docs: add H200 as supported device. (#9076 ) Co-authored-by: Richard Lyons <frob@cloudstaff.com>	2025-02-13 10:44:23 -08:00

1 2 3 4 5 ...

467 Commits