ollama

Commit Graph

Author	SHA1	Message	Date
Gabe Goodhart	444c2bf248	Merge remote-tracking branch 'origin/main' into GraniteFour * origin/main: readme: add Mayan EDMS to community integrations (#11543) kvcache: Group shift operations into batches CONTRIBUTING: fix typo in commit message example (#11528)	2025-07-28 10:33:49 -04:00
Mayan EDMS	bbf66c0b96	readme: add Mayan EDMS to community integrations (#11543 )	2025-07-27 15:02:52 -07:00
Jesse Gross	764be7480f	kvcache: Group shift operations into batches Currently, when we need to do a shift on the cache, it is one RoPE operation on the entire size of the cache (per layer). In some cases, this can create a compute graph that is larger than the forward pass since the forward pass is working in batches. Since we don't consider shifting in our memory estimates, it's possible for this to cause a crash if we run out of memory. By limiting the size of the RoPE calls to batch size chunks, we ensure that the shift will never exceed the size of the forward pass, since the forward pass will also contain a RoPE of the same size. This does not have a sigificant impact on performance since RoPE is a math operation that is mostly proportional to the size of its inputs. In theory defrag could have the same issue since it also creates a compute graph outside of the forward pass, however, since it is only copies, it does not require any working space.	2025-07-25 16:50:27 -07:00
Ruyut	b72e5adb14	CONTRIBUTING: fix typo in commit message example (#11528 )	2025-07-25 14:24:06 -07:00
Gabe Goodhart	11a0d7376c	Merge remote-tracking branch 'origin/main' into GraniteFour * origin/main: cli: catch upstream errors gracefully (#11512) tools: loosen tool argument parsing (#11509) server: use slices.Equal to simplify code (#11502) s#x/exp/maps#maps# (#11506) Fix GetModelInfo (#11496) Update linux.md (#11462)	2025-07-25 09:50:47 -06:00
Patrick Devine	80b538e312	cli: catch upstream errors gracefully (#11512 )	2025-07-23 22:16:55 -07:00
Jeffrey Morgan	4f8a0166cc	tools: loosen tool argument parsing (#11509 )	2025-07-23 21:21:29 -07:00
minxinyi	1e6eab5c33	server: use slices.Equal to simplify code (#11502 )	2025-07-23 14:25:39 -07:00
Michael Yang	6c733bf0a6	s#x/exp/maps#maps# (#11506 )	2025-07-23 13:23:32 -07:00
Patrick Devine	3bac5cba60	Fix GetModelInfo (#11496 ) --------- Co-authored-by: Richard Lyons <frob@cloudstaff.com>	2025-07-22 13:40:47 -07:00
ycomiti	4151ef8cf7	Update linux.md (#11462 )	2025-07-22 11:17:31 -07:00
Gabe Goodhart	895d5563df	Merge remote-tracking branch 'origin/main' into GraniteFour * origin/main: readme: add GMAI - Gradle Managed to community integrations (#11461) tools: fix parsing issue when a tool name is a substring of another (#11456) readme: update argo description to support deep research (#11455) ci: switch mac builder to arm64 (#11379) docs: add the no-Modelfile function of `ollama create` (#9077) openai: allow openai endpoint to accept webp images (#11412) readme: update the llama.cpp github link (#11427) compile bf16 support into ggml-metal (#11430) cmd: add default assistant role to message construction (#11431) api: fix unreachable status err (#11423) docs: fix typo in macos.md (#11425)	2025-07-21 15:04:52 -06:00
Stefan Wärting	82da19c634	readme: add GMAI - Gradle Managed to community integrations (#11461 )	2025-07-20 14:55:47 -07:00
Jeffrey Morgan	bdd9d22dfd	tools: fix parsing issue when a tool name is a substring of another (#11456 ) Co-authored-by: frob <rick+github@frob.com.au>	2025-07-20 14:55:14 -07:00
zmldndx	5fc38d042f	readme: update argo description to support deep research (#11455 )	2025-07-19 13:29:38 -07:00
Daniel Hiltgen	191d94289d	ci: switch mac builder to arm64 (#11379 ) The macos-13 is x86, while macos-13-xlarge is arm64	2025-07-17 07:33:44 -07:00
frob	802ad16ce4	docs: add the no-Modelfile function of `ollama create` (#9077 )	2025-07-16 22:16:10 -07:00
frob	5e67f4f90e	openai: allow openai endpoint to accept webp images (#11412 ) Co-authored-by: Richard Lyons <frob@cloudstaff.com>	2025-07-16 21:31:49 -07:00
Haiyue Wang	e840ccb523	readme: update the llama.cpp github link (#11427 )	2025-07-16 21:20:28 -07:00
Michael Yang	b4fe3adc0a	compile bf16 support into ggml-metal (#11430 )	2025-07-16 17:32:57 -07:00
Parth Sareen	d73f8aa8c3	cmd: add default assistant role to message construction (#11431 )	2025-07-16 11:18:16 -07:00
Bruce MacDonald	92c2e8a56c	api: fix unreachable status err (#11423 ) StatusError was unreachable, the client always checked for error messages in the response body first, and the server always includes error messages with HTTP error status codes.	2025-07-16 11:03:28 -07:00
Marcelo Fornet	2e3fd86d48	docs: fix typo in macos.md (#11425 )	2025-07-16 10:50:46 -07:00
Gabe Goodhart	e6a22f20d1	Merge remote-tracking branch 'origin/main' into GraniteFour * origin/main: docs: update modelfile.md to reflect current default num_ctx (#11189) ggml: Use assigned layers when reporting loading stats ggml: Disable unused pipeline parallelism Only load supported models on new engine (#11362)	2025-07-15 14:50:19 -06:00
Gabe Goodhart	5305e2ad14	feat: Sync llama.cpp Branch: GraniteFour Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-07-15 14:50:01 -06:00
Gabe Goodhart	4f462a9f67	feat: Bump llama.cpp to 4a4f42 This picks up support for Kimi K2 and PLaMO-2 Branch: GraniteFour Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-07-15 14:49:15 -06:00
先知	4261a3b0b2	docs: update modelfile.md to reflect current default num_ctx (#11189 ) As in the commit `44b466eeb2`, the default context length has been increased to 4096.	2025-07-11 15:15:00 -07:00
Gabe Goodhart	91e4b10d40	fix: Sync patch changes for ggml-cpu.c Branch: GraniteFour Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-07-11 16:01:15 -06:00
Gabe Goodhart	0beea04b52	fix: Add a patch to avoid power throttling API on non-msvc windows builds Branch: GraniteFour Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-07-11 16:00:49 -06:00
Jesse Gross	acef9b4c1b	ggml: Use assigned layers when reporting loading stats Reporting params.NumGPULayers can be misleading because it is the requested number of layers, not the actual number that is loaded. While they are often the same, there are cases where they might mismatch, such as if the GPU backend is missing.	2025-07-11 14:21:50 -07:00
Jesse Gross	9a43994c45	ggml: Disable unused pipeline parallelism We're not currently using it, even in cases where we could. Disabling it improves generation performance by 10-30% with multiple GPUs.	2025-07-11 13:30:05 -07:00
Gabe Goodhart	e8a303a701	build: Add top-level include for GNUINstallDirs in CMakeLists.txt This is used to populate CMAKE_INSTALL_BINDIR Branch: GraniteFour Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-07-11 13:44:10 -06:00
Gabe Goodhart	81d821ba9b	build: Include cmake/common.cmake in ggml sync Branch: GraniteFour Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-07-11 13:25:01 -06:00
Daniel Hiltgen	f8a6e88819	Only load supported models on new engine (#11362 ) * Only load supported models on new engine Verify the model is supported before trying to load * int: testcase for all library models	2025-07-11 12:21:54 -07:00
Gabe Goodhart	bf1b261611	feat: Sync all patched code Branch: GraniteFour Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-07-11 11:44:18 -06:00
Gabe Goodhart	3020c462da	fix: Add patch for GGML_VERSION and GGML_COMMIT constants Branch: GraniteFour Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-07-11 11:43:14 -06:00
Gabe Goodhart	d7f98e0673	fix: Revert changes to ggml export GPU UUID patch Branch: GraniteFour Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-07-11 11:42:26 -06:00
Gabe Goodhart	111434ab39	feat: Bump back to the cenral repo and point at the latest master This includes granite 4 and a number of other model architectures! Branch: GraniteFour Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-07-11 10:43:22 -06:00
Gabe Goodhart	06a5592dc5	fix: Update patches for bump Branch: GraniteFour Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-07-10 16:01:30 -06:00
Gabe Goodhart	0a7ddc4e17	feat: Bump to the latest tip of the branch Branch: GraniteFour Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-07-10 16:01:14 -06:00
Gabe Goodhart	152260e9c7	fix: Update patch 0015 for upstream implementation of uuid Branch: GraniteFour Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-07-10 14:33:12 -06:00
Gabe Goodhart	e61826c180	Merge remote-tracking branch 'origin/main' into GraniteFour * origin/main: ggml: Report ordinal IDs for AMD GPUs on Windows doc: add MacOS docs (#11334) Reduce default parallelism to 1 (#11330) API/CLI context enhancements (#11331) add `tool_name` to api.md (#11326) template: add tool result compatibility (#11294) ci: modularization (#11324) Revert "ggml: Temporarily disable reporting UUIDs" readme: update Ollama icon size int: add performance integration tests (#11173) doc: add NVIDIA blackwell to supported list (#11307) Update base image to Ubuntu 24.04 LTS (#9681) doc: Update link for mac install (#11288) mimic logs for layers on new engine (#11278) readme: add NativeMind to community integrations (#11242) tools: fix parsing tool calls with empty arguments, missing required fields (#11233) readme: add ollama-bash-toolshed to community integrations (#11224)	2025-07-10 14:01:24 -06:00
Jesse Gross	35fda7b4af	ggml: Report ordinal IDs for AMD GPUs on Windows We don't get valid UUIDs for AMD GPUs on Windows, so the best option is to use the ordinal IDs. This brings us in line with what we currently do on the Ollama server - the only exception is AMD GPUs on Linux, which falls back to using ordinal IDs. The GGML implementation has no fallback but it doesn't appear to occur for any of the GPUs that we support. It's also possible that there are collisions between ordinal IDs for different libraries - however the only places where we use them are AMD on Windows and Metal on Mac, which can never occur on the same system.	2025-07-09 10:35:31 -07:00
Daniel Hiltgen	66fb8575ce	doc: add MacOS docs (#11334 ) also removes stale model dir instructions for windows	2025-07-08 15:38:04 -07:00
Daniel Hiltgen	20c3266e94	Reduce default parallelism to 1 (#11330 ) The current scheduler algorithm of picking the paralellism based on available VRAM complicates the upcoming dynamic layer memory allocation algorithm. This changes the default to 1, with the intent going forward that parallelism is explicit and will no longer be dynamically determined. Removal of the dynamic logic will come in a follow up.	2025-07-08 12:08:37 -07:00
Daniel Hiltgen	34088dbcfb	API/CLI context enhancements (#11331 ) * API: expose context size of loaded models * CLI: add context UX This adds a column in the ps output to show the models context size.	2025-07-08 11:59:06 -07:00
Parth Sareen	43107b15b9	add `tool_name` to api.md (#11326 )	2025-07-07 16:53:13 -07:00
Parth Sareen	1f91cb0c8c	template: add tool result compatibility (#11294 )	2025-07-07 15:53:42 -07:00
Daniel Hiltgen	12d8ad0d38	ci: modularization (#11324 ) switch a few constants to variables	2025-07-07 14:07:43 -07:00
Jesse Gross	592d21e7db	Revert "ggml: Temporarily disable reporting UUIDs" The root cause was an unclean upgrade - this code is fine. This reverts commit `45f216a9c7`.	2025-07-07 11:31:02 -07:00

1 2 3 4 5 ...

4457 Commits All Branches Search

4457 Commits

All Branches