ollama

Author	SHA1	Message	Date
frob	b2a00a0d2a	openai: allow openai endpoint to accept webp images (#11412 ) Co-authored-by: Richard Lyons <frob@cloudstaff.com>	2025-12-29 06:39:44 -06:00
Haiyue Wang	2e57f92b0c	readme: update the llama.cpp github link (#11427 )	2025-12-29 06:39:43 -06:00
Michael Yang	7221b90fe1	compile bf16 support into ggml-metal (#11430 )	2025-12-29 06:39:43 -06:00
Parth Sareen	1c48526e2e	cmd: add default assistant role to message construction (#11431 )	2025-12-29 06:39:43 -06:00
Bruce MacDonald	9e9238103d	api: fix unreachable status err (#11423 ) StatusError was unreachable, the client always checked for error messages in the response body first, and the server always includes error messages with HTTP error status codes.	2025-12-29 06:39:43 -06:00
Marcelo Fornet	8c885fe5eb	docs: fix typo in macos.md (#11425 )	2025-12-29 06:39:43 -06:00
先知	43cacd9309	docs: update modelfile.md to reflect current default num_ctx (#11189 ) As in the commit `44b466eeb2`, the default context length has been increased to 4096.	2025-12-29 06:39:43 -06:00
Jesse Gross	b47aa7e75a	ggml: Use assigned layers when reporting loading stats Reporting params.NumGPULayers can be misleading because it is the requested number of layers, not the actual number that is loaded. While they are often the same, there are cases where they might mismatch, such as if the GPU backend is missing.	2025-12-29 06:39:42 -06:00
Jesse Gross	015e39a8be	ggml: Disable unused pipeline parallelism We're not currently using it, even in cases where we could. Disabling it improves generation performance by 10-30% with multiple GPUs.	2025-12-29 06:39:42 -06:00
Daniel Hiltgen	39cec5338a	Only load supported models on new engine (#11362 ) * Only load supported models on new engine Verify the model is supported before trying to load * int: testcase for all library models	2025-12-29 06:39:42 -06:00
Jesse Gross	387cb031b3	ggml: Report ordinal IDs for AMD GPUs on Windows We don't get valid UUIDs for AMD GPUs on Windows, so the best option is to use the ordinal IDs. This brings us in line with what we currently do on the Ollama server - the only exception is AMD GPUs on Linux, which falls back to using ordinal IDs. The GGML implementation has no fallback but it doesn't appear to occur for any of the GPUs that we support. It's also possible that there are collisions between ordinal IDs for different libraries - however the only places where we use them are AMD on Windows and Metal on Mac, which can never occur on the same system.	2025-12-29 06:39:42 -06:00
Daniel Hiltgen	50e4df359b	doc: add MacOS docs (#11334 ) also removes stale model dir instructions for windows	2025-12-29 06:39:42 -06:00
Daniel Hiltgen	4fcc030739	Reduce default parallelism to 1 (#11330 ) The current scheduler algorithm of picking the paralellism based on available VRAM complicates the upcoming dynamic layer memory allocation algorithm. This changes the default to 1, with the intent going forward that parallelism is explicit and will no longer be dynamically determined. Removal of the dynamic logic will come in a follow up.	2025-12-29 06:39:41 -06:00
Daniel Hiltgen	1c94c9919b	API/CLI context enhancements (#11331 ) * API: expose context size of loaded models * CLI: add context UX This adds a column in the ps output to show the models context size.	2025-12-29 06:39:41 -06:00
Parth Sareen	25f6571f34	add `tool_name` to api.md (#11326 )	2025-12-29 06:39:41 -06:00
Parth Sareen	1efadee48c	template: add tool result compatibility (#11294 )	2025-12-29 06:39:41 -06:00
Daniel Hiltgen	fc4cb04cb9	ci: modularization (#11324 ) switch a few constants to variables	2025-12-29 06:39:41 -06:00
Jesse Gross	5f139b96ab	Revert "ggml: Temporarily disable reporting UUIDs" The root cause was an unclean upgrade - this code is fine. This reverts commit `45f216a9c7`.	2025-12-29 06:39:41 -06:00
Jeffrey Morgan	ca3520de87	readme: update Ollama icon size	2025-12-29 06:39:40 -06:00
Daniel Hiltgen	55a4a37c3a	int: add performance integration tests (#11173 ) usage example: go test --tags=integration,perf -count 1 ./integration -v -timeout 1h -run TestModelsPerf 2>&1 \| tee int.log cat int.log \| grep MODEL_PERF_HEADER \| cut -f2- -d: > perf.csv cat int.log \| grep MODEL_PERF_DATA \| cut -f2- -d: >> perf.csv	2025-12-29 06:39:40 -06:00
Daniel Hiltgen	ba750172ca	doc: add NVIDIA blackwell to supported list (#11307 )	2025-12-29 06:39:40 -06:00
Vincent RAMPAL	35bf6c0a41	Update base image to Ubuntu 24.04 LTS (#9681 )	2025-12-29 06:39:40 -06:00
Daniel Hiltgen	b23d28b549	doc: Update link for mac install (#11288 ) Favor the dmg now.	2025-12-29 06:39:40 -06:00
Daniel Hiltgen	e897624123	mimic logs for layers on new engine (#11278 ) This adds some extra logs to make the new engine a bit more consistent with the llama engine.	2025-12-29 06:39:39 -06:00
XuKecheng	a3e4bb7f58	readme: add NativeMind to community integrations (#11242 )	2025-12-29 06:39:39 -06:00
Jeffrey Morgan	9cf8ef9371	tools: fix parsing tool calls with empty arguments, missing required fields (#11233 )	2025-12-29 06:39:39 -06:00
Attogram Project	96be53fe6c	readme: add ollama-bash-toolshed to community integrations (#11224 )	2025-12-29 06:39:39 -06:00
Michael Yang	1cdab47113	chore: cleanup comments + unused vars (#11225 )	2025-12-29 06:39:39 -06:00
Jesse Gross	872d190c8f	ggml: Temporarily disable reporting UUIDs This is causing segfaults, so disable it. Currently UUIDs are only used for debugging purposes, although they planned to be used in additional ways in the future. Bug #11211	2025-12-29 06:39:39 -06:00
Michael Yang	8f2099306f	skip quantizing per_layer_token_embd (#11207 ) this tensor isn't compatible with cuda when quantized to q4_K so skip it	2025-12-29 06:39:38 -06:00
Daniel Hiltgen	59112600d1	ci: multi-stage release process (#11001 )	2025-12-29 06:39:38 -06:00
Jeffrey Morgan	10119ec2ee	fs/ggml: add multiplier in graph estimates (#11208 )	2025-12-29 06:39:38 -06:00
Jeffrey Morgan	84998ae4ba	fs/ggml: add missing architecture to OllamaEngineRequired() (#11206 )	2025-12-29 06:39:38 -06:00
Michael Yang	801564fa8b	add new gemma model (#11204 ) * update patches * cherry pick metal mean kernel * cherry pick cuda mean kernel * gemma3n	2025-12-29 06:39:38 -06:00
Daniel Hiltgen	d6253f09c2	ci: arm sbsa fixes (#11194 )	2025-12-29 06:39:37 -06:00
Daniel Hiltgen	9cf1db79b4	ci: include dependencies	2025-12-29 06:39:37 -06:00
Daniel Hiltgen	46654149c9	ci: pick up arm sbsa cuda libs (#11192 )	2025-12-29 06:39:37 -06:00
Daniel Hiltgen	138c973d8f	ci: recombine linux amd64 binaries (#11188 ) Glue the rocm and archive builds back together.	2025-12-29 06:39:37 -06:00
Devon Rifkin	dd8d037c16	load arrays with up to 1024 elements when estimating This mirrors the old behavior before #10382	2025-12-29 06:39:37 -06:00
Devon Rifkin	558c1920fa	ggml: fix crash for array head counts If it's an array, it uses the max value in the array If array values for head counts becomes more popular, we can consider a more invasive change like #10225 to calculate more accurate estimates. Fixes: #9984	2025-12-29 06:39:34 -06:00
Daniel Hiltgen	b9b179fe00	ci: rocm parallel builds on windows (#11187 ) The preset CMAKE_HIP_FLAGS isn't getting used on Windows. This passes the parallel flag in through the C/CXX flags, along with suppression for some log spew warnings to quiet down the build.	2025-12-29 06:38:19 -06:00
Daniel Hiltgen	38f92e7332	CI: switch windows to vs 2022 (#11184 ) * CI: switch windows to vs 2022 * ci: fix regex match	2025-12-29 06:38:18 -06:00
Daniel Hiltgen	c012d1805b	avoid context overflow (#11175 ) For smaller context models, make sure we do not exceed the training size.	2025-12-29 06:38:18 -06:00
Daniel Hiltgen	29ec3ddf9a	Re-remove cuda v11 (#10694 ) * Re-remove cuda v11 Revert the revert - drop v11 support requiring drivers newer than Feb 23 This reverts commit `c6bcdc4223`. * Simplify layout With only one version of the GPU libraries, we can simplify things down somewhat. (Jetsons still require special handling) * distinct sbsa variant for linux arm64 This avoids accidentally trying to load the sbsa cuda libraries on a jetson system which results in crashes. * temporary prevent rocm+cuda mixed loading	2025-12-29 06:38:18 -06:00
AJ	d8b03acc1a	readme: add ai-hub to community integrations (#11169 )	2025-12-29 06:38:18 -06:00
Daniel Hiltgen	95571375dd	build speedups (#11142 ) Enable parallel building of the GPU architectures.	2025-12-29 06:38:18 -06:00
Michael Yang	69ee842b6e	convert: utility for merging tensors (#11069 )	2025-12-29 06:38:17 -06:00
Michael Yang	4585d231ee	Reapply "feat: incremental gguf parser (#10822 )" (#11114 ) (#11119 ) * Reapply "feat: incremental gguf parser (#10822)" (#11114) This reverts commit `a6e64fbdf2`. * fix older ggufs	2025-12-29 06:38:17 -06:00
Jesse Gross	290d4c2c6c	ggml: Check return status for computation. We don't check the return status after computing the graph, which can silently lead to bad outputs if we try to keep going and future computation succeeds. This appears to happens in certain cases on Apple M2 devices. Fixes #11070	2025-12-29 06:38:17 -06:00
Daniel Hiltgen	29b668e649	int: add coverage for older models (#11137 ) Verified these fail on 0.9.1 and pass on HEAD.	2025-12-29 06:38:17 -06:00

1 2 3 4 5 ...

4395 Commits