ollama

Commit Graph

Author	SHA1	Message	Date
Jesse Gross	cdceaff4e1	kvcache: Group shift operations into batches Currently, when we need to do a shift on the cache, it is one RoPE operation on the entire size of the cache (per layer). In some cases, this can create a compute graph that is larger than the forward pass since the forward pass is working in batches. Since we don't consider shifting in our memory estimates, it's possible for this to cause a crash if we run out of memory. By limiting the size of the RoPE calls to batch size chunks, we ensure that the shift will never exceed the size of the forward pass, since the forward pass will also contain a RoPE of the same size. This does not have a sigificant impact on performance since RoPE is a math operation that is mostly proportional to the size of its inputs. In theory defrag could have the same issue since it also creates a compute graph outside of the forward pass, however, since it is only copies, it does not require any working space.	2025-12-29 06:39:46 -06:00
Ruyut	9574ed9bb7	CONTRIBUTING: fix typo in commit message example (#11528 )	2025-12-29 06:39:46 -06:00
Patrick Devine	0ab1b140af	cli: catch upstream errors gracefully (#11512 )	2025-12-29 06:39:46 -06:00
Jeffrey Morgan	d9a78742ad	tools: loosen tool argument parsing (#11509 )	2025-12-29 06:39:45 -06:00
minxinyi	a35d1c358f	server: use slices.Equal to simplify code (#11502 )	2025-12-29 06:39:45 -06:00
Michael Yang	26cd61e41f	s#x/exp/maps#maps# (#11506 )	2025-12-29 06:39:45 -06:00
Patrick Devine	95f5d9d6da	Fix GetModelInfo (#11496 ) --------- Co-authored-by: Richard Lyons <frob@cloudstaff.com>	2025-12-29 06:39:45 -06:00
ycomiti	f5319ac72b	Update linux.md (#11462 )	2025-12-29 06:39:45 -06:00
Stefan Wärting	59b034f040	readme: add GMAI - Gradle Managed to community integrations (#11461 )	2025-12-29 06:39:44 -06:00
Jeffrey Morgan	30ec10cb05	tools: fix parsing issue when a tool name is a substring of another (#11456 ) Co-authored-by: frob <rick+github@frob.com.au>	2025-12-29 06:39:44 -06:00
zmldndx	ffa61a51fc	readme: update argo description to support deep research (#11455 )	2025-12-29 06:39:44 -06:00
Daniel Hiltgen	5274cd2ead	ci: switch mac builder to arm64 (#11379 ) The macos-13 is x86, while macos-13-xlarge is arm64	2025-12-29 06:39:44 -06:00
frob	a1a350b608	docs: add the no-Modelfile function of `ollama create` (#9077 )	2025-12-29 06:39:44 -06:00
frob	b2a00a0d2a	openai: allow openai endpoint to accept webp images (#11412 ) Co-authored-by: Richard Lyons <frob@cloudstaff.com>	2025-12-29 06:39:44 -06:00
Haiyue Wang	2e57f92b0c	readme: update the llama.cpp github link (#11427 )	2025-12-29 06:39:43 -06:00
Michael Yang	7221b90fe1	compile bf16 support into ggml-metal (#11430 )	2025-12-29 06:39:43 -06:00
Parth Sareen	1c48526e2e	cmd: add default assistant role to message construction (#11431 )	2025-12-29 06:39:43 -06:00
Bruce MacDonald	9e9238103d	api: fix unreachable status err (#11423 ) StatusError was unreachable, the client always checked for error messages in the response body first, and the server always includes error messages with HTTP error status codes.	2025-12-29 06:39:43 -06:00
Marcelo Fornet	8c885fe5eb	docs: fix typo in macos.md (#11425 )	2025-12-29 06:39:43 -06:00
先知	43cacd9309	docs: update modelfile.md to reflect current default num_ctx (#11189 ) As in the commit `44b466eeb2`, the default context length has been increased to 4096.	2025-12-29 06:39:43 -06:00
Jesse Gross	b47aa7e75a	ggml: Use assigned layers when reporting loading stats Reporting params.NumGPULayers can be misleading because it is the requested number of layers, not the actual number that is loaded. While they are often the same, there are cases where they might mismatch, such as if the GPU backend is missing.	2025-12-29 06:39:42 -06:00
Jesse Gross	015e39a8be	ggml: Disable unused pipeline parallelism We're not currently using it, even in cases where we could. Disabling it improves generation performance by 10-30% with multiple GPUs.	2025-12-29 06:39:42 -06:00
Daniel Hiltgen	39cec5338a	Only load supported models on new engine (#11362 ) * Only load supported models on new engine Verify the model is supported before trying to load * int: testcase for all library models	2025-12-29 06:39:42 -06:00
Jesse Gross	387cb031b3	ggml: Report ordinal IDs for AMD GPUs on Windows We don't get valid UUIDs for AMD GPUs on Windows, so the best option is to use the ordinal IDs. This brings us in line with what we currently do on the Ollama server - the only exception is AMD GPUs on Linux, which falls back to using ordinal IDs. The GGML implementation has no fallback but it doesn't appear to occur for any of the GPUs that we support. It's also possible that there are collisions between ordinal IDs for different libraries - however the only places where we use them are AMD on Windows and Metal on Mac, which can never occur on the same system.	2025-12-29 06:39:42 -06:00
Daniel Hiltgen	50e4df359b	doc: add MacOS docs (#11334 ) also removes stale model dir instructions for windows	2025-12-29 06:39:42 -06:00
Daniel Hiltgen	4fcc030739	Reduce default parallelism to 1 (#11330 ) The current scheduler algorithm of picking the paralellism based on available VRAM complicates the upcoming dynamic layer memory allocation algorithm. This changes the default to 1, with the intent going forward that parallelism is explicit and will no longer be dynamically determined. Removal of the dynamic logic will come in a follow up.	2025-12-29 06:39:41 -06:00
Daniel Hiltgen	1c94c9919b	API/CLI context enhancements (#11331 ) * API: expose context size of loaded models * CLI: add context UX This adds a column in the ps output to show the models context size.	2025-12-29 06:39:41 -06:00
Parth Sareen	25f6571f34	add `tool_name` to api.md (#11326 )	2025-12-29 06:39:41 -06:00
Parth Sareen	1efadee48c	template: add tool result compatibility (#11294 )	2025-12-29 06:39:41 -06:00
Daniel Hiltgen	fc4cb04cb9	ci: modularization (#11324 ) switch a few constants to variables	2025-12-29 06:39:41 -06:00
Jesse Gross	5f139b96ab	Revert "ggml: Temporarily disable reporting UUIDs" The root cause was an unclean upgrade - this code is fine. This reverts commit `45f216a9c7`.	2025-12-29 06:39:41 -06:00
Jeffrey Morgan	ca3520de87	readme: update Ollama icon size	2025-12-29 06:39:40 -06:00
Daniel Hiltgen	55a4a37c3a	int: add performance integration tests (#11173 ) usage example: go test --tags=integration,perf -count 1 ./integration -v -timeout 1h -run TestModelsPerf 2>&1 \| tee int.log cat int.log \| grep MODEL_PERF_HEADER \| cut -f2- -d: > perf.csv cat int.log \| grep MODEL_PERF_DATA \| cut -f2- -d: >> perf.csv	2025-12-29 06:39:40 -06:00
Daniel Hiltgen	ba750172ca	doc: add NVIDIA blackwell to supported list (#11307 )	2025-12-29 06:39:40 -06:00
Vincent RAMPAL	35bf6c0a41	Update base image to Ubuntu 24.04 LTS (#9681 )	2025-12-29 06:39:40 -06:00
Daniel Hiltgen	b23d28b549	doc: Update link for mac install (#11288 ) Favor the dmg now.	2025-12-29 06:39:40 -06:00
Daniel Hiltgen	e897624123	mimic logs for layers on new engine (#11278 ) This adds some extra logs to make the new engine a bit more consistent with the llama engine.	2025-12-29 06:39:39 -06:00
XuKecheng	a3e4bb7f58	readme: add NativeMind to community integrations (#11242 )	2025-12-29 06:39:39 -06:00
Jeffrey Morgan	9cf8ef9371	tools: fix parsing tool calls with empty arguments, missing required fields (#11233 )	2025-12-29 06:39:39 -06:00
Attogram Project	96be53fe6c	readme: add ollama-bash-toolshed to community integrations (#11224 )	2025-12-29 06:39:39 -06:00
Michael Yang	1cdab47113	chore: cleanup comments + unused vars (#11225 )	2025-12-29 06:39:39 -06:00
Jesse Gross	872d190c8f	ggml: Temporarily disable reporting UUIDs This is causing segfaults, so disable it. Currently UUIDs are only used for debugging purposes, although they planned to be used in additional ways in the future. Bug #11211	2025-12-29 06:39:39 -06:00
Michael Yang	8f2099306f	skip quantizing per_layer_token_embd (#11207 ) this tensor isn't compatible with cuda when quantized to q4_K so skip it	2025-12-29 06:39:38 -06:00
Daniel Hiltgen	59112600d1	ci: multi-stage release process (#11001 )	2025-12-29 06:39:38 -06:00
Jeffrey Morgan	10119ec2ee	fs/ggml: add multiplier in graph estimates (#11208 )	2025-12-29 06:39:38 -06:00
Jeffrey Morgan	84998ae4ba	fs/ggml: add missing architecture to OllamaEngineRequired() (#11206 )	2025-12-29 06:39:38 -06:00
Michael Yang	801564fa8b	add new gemma model (#11204 ) * update patches * cherry pick metal mean kernel * cherry pick cuda mean kernel * gemma3n	2025-12-29 06:39:38 -06:00
Daniel Hiltgen	d6253f09c2	ci: arm sbsa fixes (#11194 )	2025-12-29 06:39:37 -06:00
Daniel Hiltgen	9cf1db79b4	ci: include dependencies	2025-12-29 06:39:37 -06:00
Daniel Hiltgen	46654149c9	ci: pick up arm sbsa cuda libs (#11192 )	2025-12-29 06:39:37 -06:00

1 2 3 4 5 ...

4408 Commits All Branches Search

4408 Commits

All Branches