ollama

Author	SHA1	Message	Date
Jesse Gross	387cb031b3	ggml: Report ordinal IDs for AMD GPUs on Windows We don't get valid UUIDs for AMD GPUs on Windows, so the best option is to use the ordinal IDs. This brings us in line with what we currently do on the Ollama server - the only exception is AMD GPUs on Linux, which falls back to using ordinal IDs. The GGML implementation has no fallback but it doesn't appear to occur for any of the GPUs that we support. It's also possible that there are collisions between ordinal IDs for different libraries - however the only places where we use them are AMD on Windows and Metal on Mac, which can never occur on the same system.	2025-12-29 06:39:42 -06:00
Daniel Hiltgen	50e4df359b	doc: add MacOS docs (#11334 ) also removes stale model dir instructions for windows	2025-12-29 06:39:42 -06:00
Daniel Hiltgen	4fcc030739	Reduce default parallelism to 1 (#11330 ) The current scheduler algorithm of picking the paralellism based on available VRAM complicates the upcoming dynamic layer memory allocation algorithm. This changes the default to 1, with the intent going forward that parallelism is explicit and will no longer be dynamically determined. Removal of the dynamic logic will come in a follow up.	2025-12-29 06:39:41 -06:00
Daniel Hiltgen	1c94c9919b	API/CLI context enhancements (#11331 ) * API: expose context size of loaded models * CLI: add context UX This adds a column in the ps output to show the models context size.	2025-12-29 06:39:41 -06:00
Parth Sareen	25f6571f34	add `tool_name` to api.md (#11326 )	2025-12-29 06:39:41 -06:00
Parth Sareen	1efadee48c	template: add tool result compatibility (#11294 )	2025-12-29 06:39:41 -06:00
Daniel Hiltgen	fc4cb04cb9	ci: modularization (#11324 ) switch a few constants to variables	2025-12-29 06:39:41 -06:00
Jesse Gross	5f139b96ab	Revert "ggml: Temporarily disable reporting UUIDs" The root cause was an unclean upgrade - this code is fine. This reverts commit `45f216a9c7`.	2025-12-29 06:39:41 -06:00
Jeffrey Morgan	ca3520de87	readme: update Ollama icon size	2025-12-29 06:39:40 -06:00
Daniel Hiltgen	55a4a37c3a	int: add performance integration tests (#11173 ) usage example: go test --tags=integration,perf -count 1 ./integration -v -timeout 1h -run TestModelsPerf 2>&1 \| tee int.log cat int.log \| grep MODEL_PERF_HEADER \| cut -f2- -d: > perf.csv cat int.log \| grep MODEL_PERF_DATA \| cut -f2- -d: >> perf.csv	2025-12-29 06:39:40 -06:00
Daniel Hiltgen	ba750172ca	doc: add NVIDIA blackwell to supported list (#11307 )	2025-12-29 06:39:40 -06:00
Vincent RAMPAL	35bf6c0a41	Update base image to Ubuntu 24.04 LTS (#9681 )	2025-12-29 06:39:40 -06:00
Daniel Hiltgen	b23d28b549	doc: Update link for mac install (#11288 ) Favor the dmg now.	2025-12-29 06:39:40 -06:00
Daniel Hiltgen	e897624123	mimic logs for layers on new engine (#11278 ) This adds some extra logs to make the new engine a bit more consistent with the llama engine.	2025-12-29 06:39:39 -06:00
XuKecheng	a3e4bb7f58	readme: add NativeMind to community integrations (#11242 )	2025-12-29 06:39:39 -06:00
Jeffrey Morgan	9cf8ef9371	tools: fix parsing tool calls with empty arguments, missing required fields (#11233 )	2025-12-29 06:39:39 -06:00
Attogram Project	96be53fe6c	readme: add ollama-bash-toolshed to community integrations (#11224 )	2025-12-29 06:39:39 -06:00
Michael Yang	1cdab47113	chore: cleanup comments + unused vars (#11225 )	2025-12-29 06:39:39 -06:00
Jesse Gross	872d190c8f	ggml: Temporarily disable reporting UUIDs This is causing segfaults, so disable it. Currently UUIDs are only used for debugging purposes, although they planned to be used in additional ways in the future. Bug #11211	2025-12-29 06:39:39 -06:00
Michael Yang	8f2099306f	skip quantizing per_layer_token_embd (#11207 ) this tensor isn't compatible with cuda when quantized to q4_K so skip it	2025-12-29 06:39:38 -06:00
Daniel Hiltgen	59112600d1	ci: multi-stage release process (#11001 )	2025-12-29 06:39:38 -06:00
Jeffrey Morgan	10119ec2ee	fs/ggml: add multiplier in graph estimates (#11208 )	2025-12-29 06:39:38 -06:00
Jeffrey Morgan	84998ae4ba	fs/ggml: add missing architecture to OllamaEngineRequired() (#11206 )	2025-12-29 06:39:38 -06:00
Michael Yang	801564fa8b	add new gemma model (#11204 ) * update patches * cherry pick metal mean kernel * cherry pick cuda mean kernel * gemma3n	2025-12-29 06:39:38 -06:00
Daniel Hiltgen	d6253f09c2	ci: arm sbsa fixes (#11194 )	2025-12-29 06:39:37 -06:00
Daniel Hiltgen	9cf1db79b4	ci: include dependencies	2025-12-29 06:39:37 -06:00
Daniel Hiltgen	46654149c9	ci: pick up arm sbsa cuda libs (#11192 )	2025-12-29 06:39:37 -06:00
Daniel Hiltgen	138c973d8f	ci: recombine linux amd64 binaries (#11188 ) Glue the rocm and archive builds back together.	2025-12-29 06:39:37 -06:00
Devon Rifkin	dd8d037c16	load arrays with up to 1024 elements when estimating This mirrors the old behavior before #10382	2025-12-29 06:39:37 -06:00
Devon Rifkin	558c1920fa	ggml: fix crash for array head counts If it's an array, it uses the max value in the array If array values for head counts becomes more popular, we can consider a more invasive change like #10225 to calculate more accurate estimates. Fixes: #9984	2025-12-29 06:39:34 -06:00
Daniel Hiltgen	b9b179fe00	ci: rocm parallel builds on windows (#11187 ) The preset CMAKE_HIP_FLAGS isn't getting used on Windows. This passes the parallel flag in through the C/CXX flags, along with suppression for some log spew warnings to quiet down the build.	2025-12-29 06:38:19 -06:00
Daniel Hiltgen	38f92e7332	CI: switch windows to vs 2022 (#11184 ) * CI: switch windows to vs 2022 * ci: fix regex match	2025-12-29 06:38:18 -06:00
Daniel Hiltgen	c012d1805b	avoid context overflow (#11175 ) For smaller context models, make sure we do not exceed the training size.	2025-12-29 06:38:18 -06:00
Daniel Hiltgen	29ec3ddf9a	Re-remove cuda v11 (#10694 ) * Re-remove cuda v11 Revert the revert - drop v11 support requiring drivers newer than Feb 23 This reverts commit `c6bcdc4223`. * Simplify layout With only one version of the GPU libraries, we can simplify things down somewhat. (Jetsons still require special handling) * distinct sbsa variant for linux arm64 This avoids accidentally trying to load the sbsa cuda libraries on a jetson system which results in crashes. * temporary prevent rocm+cuda mixed loading	2025-12-29 06:38:18 -06:00
AJ	d8b03acc1a	readme: add ai-hub to community integrations (#11169 )	2025-12-29 06:38:18 -06:00
Daniel Hiltgen	95571375dd	build speedups (#11142 ) Enable parallel building of the GPU architectures.	2025-12-29 06:38:18 -06:00
Michael Yang	69ee842b6e	convert: utility for merging tensors (#11069 )	2025-12-29 06:38:17 -06:00
Michael Yang	4585d231ee	Reapply "feat: incremental gguf parser (#10822 )" (#11114 ) (#11119 ) * Reapply "feat: incremental gguf parser (#10822)" (#11114) This reverts commit `a6e64fbdf2`. * fix older ggufs	2025-12-29 06:38:17 -06:00
Jesse Gross	290d4c2c6c	ggml: Check return status for computation. We don't check the return status after computing the graph, which can silently lead to bad outputs if we try to keep going and future computation succeeds. This appears to happens in certain cases on Apple M2 devices. Fixes #11070	2025-12-29 06:38:17 -06:00
Daniel Hiltgen	29b668e649	int: add coverage for older models (#11137 ) Verified these fail on 0.9.1 and pass on HEAD.	2025-12-29 06:38:17 -06:00
Jeffrey Morgan	6d36b8dcfb	benchmark: remove unused benchmark test (#11120 ) Removes a test under benchmark/ that is unused	2025-12-29 06:38:17 -06:00
Jeffrey Morgan	5e3fb4744b	Revert "Revert "ggml: Export GPU UUIDs" (#11115 )" (#11117 ) Reverts PR #11115. The original change was mistakingly reverted instead of #10822	2025-12-29 06:38:16 -06:00
Jeffrey Morgan	c5237d9462	Revert "ggml: Export GPU UUIDs" (#11115 ) This reverts commit `aaa7818000`.	2025-12-29 06:38:16 -06:00
Jeffrey Morgan	4f1588bc37	Revert "feat: incremental gguf parser (#10822 )" (#11114 ) This reverts commit `6b04cad7e8`.	2025-12-29 06:38:16 -06:00
曹家巧	8c3501c161	cache: fix comment function name in cache.go (#11110 )	2025-12-29 06:38:16 -06:00
Jeffrey Morgan	829e77105a	tools: return empty arguments object instead of null (#11113 )	2025-12-29 06:38:16 -06:00
Jeffrey Morgan	1dc12706c5	tools: fix parsing tool calls without any parameters (#11101 ) Fixes issue where tool calls that don't expect any parameters were not being parsed. This also fixes two additional issues: one where 2+ tool calls would not be correctly parsed, and cases where tool calls with invalid parameters would still get parsed	2025-12-29 06:38:15 -06:00
Jeffrey Morgan	2c371ff357	model: treat 'user defined' tokens as special tokens (#11077 )	2025-12-29 06:38:15 -06:00
Michael Yang	142efb91b1	gguf: fix write order (#11068 ) * ggml: test write gguf order * ggml: fix write tensor order	2025-12-29 06:38:15 -06:00
NGC13009	7e0b662c6c	readme: add ollama-launcher to community integrations (#11080 )	2025-12-29 06:38:15 -06:00

1 2 3 4 5 ...

4435 Commits