ollama

Author	SHA1	Message	Date
frob	cf79e19403	image: add vision capability for projector-based models (#10509 ) Co-authored-by: Richard Lyons <frob@cloudstaff.com>	2025-12-29 06:37:49 -06:00
Jesse Gross	2276f7f089	kvcache: Log batch size if we can't find a slot In some cases, we can't find a cache slot when using sliding window attention. It would be helpful in this (and other cases) to know what the batch size is. Bug #10127	2025-12-29 06:37:49 -06:00
Jesse Gross	597f6cd3a9	ollamarunner: Fix memory leak when processing images The context (and therefore associated input tensors) was not being properly closed when images were being processed. We were trying to close them but in reality we were closing over an empty list, preventing anything from actually being freed. Fixes #10434	2025-12-29 06:37:49 -06:00
AliAhmedNada	dda786304e	readme: add Jirapt project to community integrations (#10522 )	2025-12-29 06:37:48 -06:00
aritra saha	33bcef045a	readme: change granite3.2 to granite3.3 (#10525 ) Update the list for readme	2025-12-29 06:37:48 -06:00
Michael Yang	79646ad87d	fix: write gguf padding (#10510 ) * add gguf_test * fix padding padding was being added to offset but not to the running count	2025-12-29 06:37:48 -06:00
Devon Rifkin	55803ceb35	strip out thinking tags in message history for qwen3 & r1 (#10490 ) * strip out thinking tags in message history for qwen3 & r1 This is in advance of "proper" support where we'll make reasoning configurable and we'll parse out thinking/reasoning tags and provide them to the caller. These models expect there to be no thinking tags in the message history, so this should improve quality * parse model names instead of hacky prefix check	2025-12-29 06:37:48 -06:00
Daniel Hiltgen	fee7c406aa	Fix "Stopping..." scheduler hang (#10487 ) * Adjust initial scheduler refCount Ensure we only set the refCount on success * sched: fix lock order inversion deadlock Under certain race conditions, there was a scenario where the scheduler would get into a deadlock while trying to update free space information while a model was trying to unload.	2025-12-29 06:37:48 -06:00
Daniel Hiltgen	098fe2f7f7	Narrow set of paths we load GGML from (#10485 ) Users may have other incompatible GGML installs on their systems. This will prevent us from trying to load them from the path.	2025-12-29 06:37:47 -06:00
Shahin R	5234d73611	readme: add link to lumina, a lightweight React frontend client (#10378 )	2025-12-29 06:37:47 -06:00
batuhankadioglu	6e74d8d222	all: update several golang.org/x packages (#10436 )	2025-12-29 06:37:47 -06:00
Daniel Hiltgen	4d8621629c	integration: fix embedding tests error handling (#10478 ) The cleanup routine from InitServerconnection should run in the defer of the test case to properly detect failures and report the server logs	2025-12-29 06:37:47 -06:00
Jesse Gross	13d497db4c	ollamarunner: Temporarily disable worst case graph preallocation When we later have a large batch running purely on a CPU, this results the error: GGML_ASSERT(talloc->buffer_id >= 0) Disabling this means that we will incrementally reallocate memory as the graph grows. Fixes #10410	2025-12-29 06:37:46 -06:00
crStiv	02a3285b60	readme: fix typos (#10399 )	2025-12-29 06:37:46 -06:00
Devon Rifkin	528bd3077a	lower default num parallel to 2 this is in part to "pay" for #10452, which doubled the default context length. The combination isn't fully neutral though, because even though the old 4x2k limit and the new 2x4k limit are memory equivalent, the 1x fallback is larger with 4k	2025-12-29 06:37:46 -06:00
Devon Rifkin	b963dd868b	config: update default context length to 4096	2025-12-29 06:37:46 -06:00
Devon Rifkin	5a7c6c363e	Revert "increase default context length to 4096 (#10364 )" This reverts commit `424f648632`.	2025-12-29 06:37:46 -06:00
Michael Yang	b236fcc9bf	model: fix build (#10416 )	2025-12-29 06:37:45 -06:00
Michael Yang	049aa30191	memory	2025-12-29 06:37:45 -06:00
Michael Yang	644d6c5256	fixes for maverick	2025-12-29 06:37:45 -06:00
Michael Yang	d2d5c5e6d5	chunked attention	2025-12-29 06:37:45 -06:00
Michael Yang	b7f628b9e8	connect vision to text	2025-12-29 06:37:45 -06:00
Michael Yang	b875952e67	image processing Co-authored-by: Patrick Devine <patrick@infrahq.com>	2025-12-29 06:37:44 -06:00
Michael Yang	0f5c45e19d	llama4	2025-12-29 06:37:44 -06:00
Michael Yang	371560df26	fix test	2025-12-29 06:37:44 -06:00
Michael Yang	a0d77f1dbe	explicitly decode maxarraysize 1024	2025-12-29 06:37:44 -06:00
Michael Yang	8a86190fd4	fix parameter count	2025-12-29 06:37:44 -06:00
Michael Yang	49f807737a	default slice values	2025-12-29 06:37:44 -06:00
Michael Yang	51e64c8f69	update comment	2025-12-29 06:37:43 -06:00
Michael Yang	84a6567dee	fix token type	2025-12-29 06:37:43 -06:00
Michael Yang	5a8e641272	zero means zero use a default of 1024 when asking for zero is confusing since most calls seem to assume 0 means do not ready any data	2025-12-29 06:37:43 -06:00
Michael Yang	f0c5b48f7b	convert: use -1 for read all	2025-12-29 06:37:43 -06:00
Michael Yang	96618f6344	generic ggml.array	2025-12-29 06:37:42 -06:00
Michael Yang	5e0d7e9332	fix superfluous call to WriteHeader the first call to http.ResponseWriter.Write implicitly calls WriteHeader with http.StatusOK if it hasn't already been called. once WriteHeader has been called, subsequent calls has no effect. Write is called when JSON encoding progressUpdateJSON{}. calls to http.ResponseWriter.WriteHeader after the first encode is useless and produces a warning: http: superfluous response.WriteHeader call from github.com/ollama/ollama/server/internal/registry.(*statusCodeRecorder).WriteHeader (server.go:77)	2025-12-29 06:37:42 -06:00
Michael Yang	584c3176d2	convert: change to colmajor	2025-12-29 06:37:42 -06:00
Michael Yang	4f01385151	ci: silence deprecated gpu targets warning	2025-12-29 06:37:42 -06:00
Jeffrey Morgan	85d3f71c02	llama: update to commit 2016f07b (#10352 )	2025-12-29 06:37:42 -06:00
Parth Sareen	83e848fcb8	server: improve spacing for JSON grammar (#10131 )	2025-12-29 06:37:41 -06:00
Parth Sareen	7cf4c146bc	llama: remove model loading for grammar (#10096 )	2025-12-29 06:37:41 -06:00
Adrien Duermael	3e201e18c2	api: fix ImageData struct comment to expect raw image bytes (#10386 )	2025-12-29 06:37:41 -06:00
Devon Rifkin	770df0887f	increase default context length to 4096 (#10364 ) * increase default context length to 4096 We lower the default numParallel from 4 to 2 and use these "savings" to double the default context length from 2048 to 4096. We're memory neutral in cases when we previously would've used numParallel == 4, but we add the following mitigation to handle some cases where we would have previously fallen back to 1x2048 due to low VRAM: we decide between 2048 and 4096 using a runtime check, choosing 2048 if we're on a one GPU system with total VRAM of <= 4 GB. We purposefully don't check the available VRAM because we don't want the context window size to change unexpectedly based on the available VRAM. We plan on making the default even larger, but this is a relatively low-risk change we can make to quickly double it. * fix tests add an explicit context length so they don't get truncated. The code that converts -1 from being a signal for doing a runtime check isn't running as part of these tests. * tweak small gpu message * clarify context length default also make it actually show up in `ollama serve --help`	2025-12-29 06:37:41 -06:00
Richard Shiue	d24108eb86	readme: add AppFlowy to community integrations (#10335 )	2025-12-29 06:37:41 -06:00
greengrass821	39a26ec939	cmd: add support for escaping ~ in filepath (#10339 ) Co-authored-by: tooth paste <tooth_paste91@Poorneshwars-MacBook-Pro.local>	2025-12-29 06:37:40 -06:00
Michael Yang	1785f37236	create tempdir in models directory the models directory should have plenty of storage and also ensure there's no cross-device copy	2025-12-29 06:37:40 -06:00
Blake Mizerany	1003e89348	server/internal/registry: make pull send errors with Error field (#10326 ) Previously, the pull handler would send an error message in the Status field, this prevented the client from using the message as a signal to stop. In the case of the "run" command, it would follow the pull with a "show" which would print a nearly identical "not found" message for unresolved models. Fixes #10307	2025-12-29 06:37:40 -06:00
Michael Yang	c916dd67bf	arange	2025-12-29 06:37:40 -06:00
Blake Mizerany	0114f7008a	server/internal/client/ollama: handle some network errors gracefully (#10317 )	2025-12-29 06:37:40 -06:00
Jeffrey Morgan	88ea0ff9e8	ml/backend/ggml: use default CUDA compression mode (#10314 )	2025-12-29 06:37:39 -06:00
Jeffrey Morgan	8c08f74532	ml: add missing cmake property and remove additional CMakeLists.txt (#10310 )	2025-12-29 06:37:39 -06:00
Devon Rifkin	2a8495a8ea	docs: change more template blocks to have syntax highlighting In #8215 syntax highlighting was added to most of the blocks, but there were a couple that were still being rendered as plaintext	2025-12-29 06:37:39 -06:00

1 2 3 4 5 ...

4214 Commits