ollama

Author	SHA1	Message	Date
Ashok Gelal	c833610871	Hide empty terminal window (#8668 ) This hides the LlamaServer blank window when chatting outside of the terminal (say like with an app like Msty). This has no other side effects when invoking it the regular way.	2025-12-29 06:37:51 -06:00
Jeffrey Morgan	7e9f243a0d	server: fix panic when runner.Options is nil (#10566 )	2025-12-29 06:37:51 -06:00
Jeffrey Morgan	9a44e41802	all: fix cgo compiler warnings on windows (#10563 )	2025-12-29 06:37:51 -06:00
湛露先生	0bffcc8cc4	file close check and close. (#10554 ) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>	2025-12-29 06:37:51 -06:00
Daniel Hiltgen	d0904ea7f1	win: ensure ollama paths come first (#10549 ) For all search path env vars make sure our dirs are first to avoid potentially finding other incompatible libraries on the users system. Also fixes a minor build script glitch for windows rocm	2025-12-29 06:37:50 -06:00
Daniel Hiltgen	cf9f00182d	sched: logging improvements (#10550 ) This enhances our logging in the scheduler. The initial "waiting for server" log no longer claims an initial error state (now "not responding" which better reflects the actual state). Runners now have slog wiring to report more details about the runner, including PID.	2025-12-29 06:37:50 -06:00
aritra saha	541a8575f0	readme: add llama 4 models (#10530 )	2025-12-29 06:37:50 -06:00
Jesse Gross	86eea6770e	ggml: Fix race that resulted in "context canceled" when loading Successfully completing processing with an errgroup cancels the associated context. However, we also have a goroutine that is checking for cancelation of the context. As a result, there is a race where the goroutine can pick up the cancelation and report an error, replacing the sucessful error message. To avoid that, this replaces the goroutine with a cancelation check when we are reading files. This also has the advantage of stopping all reads relatively quickly on error and also ensuring that there are no outstanding I/O operations when we return in this case. The downside is that if a file read blocks forever (for example, over the network) then cancelation of the context effectively won't be honored. However, this is also true for other smaller files we read and the tensors are read in small chunks (128K), so it's consistent and better on balance overall.	2025-12-29 06:37:50 -06:00
Jesse Gross	cec8a9dee0	ollamarunner: Re-enable worst case graph preallocation. Worst case graph preallocation was disabled by `a27462b` "ollamarunner: Temporarily disable worst case graph preallocation" since it caused crashes with large batches when not using the GPU. This backports upstream llama.cpp commit f057808 "ggml: Don't assert fail when tensor data changes (#13222)", which fixes the underlying bug and allows reverting the previous workaround.	2025-12-29 06:37:50 -06:00
Harsh Nevse	cc21d627df	readme: update link to langchain in community integrations (#10465 )	2025-12-29 06:37:49 -06:00
Jeffrey Morgan	723fec1b25	llama: update to commit e1e8e099 (#10513 )	2025-12-29 06:37:49 -06:00
frob	cf79e19403	image: add vision capability for projector-based models (#10509 ) Co-authored-by: Richard Lyons <frob@cloudstaff.com>	2025-12-29 06:37:49 -06:00
Jesse Gross	2276f7f089	kvcache: Log batch size if we can't find a slot In some cases, we can't find a cache slot when using sliding window attention. It would be helpful in this (and other cases) to know what the batch size is. Bug #10127	2025-12-29 06:37:49 -06:00
Jesse Gross	597f6cd3a9	ollamarunner: Fix memory leak when processing images The context (and therefore associated input tensors) was not being properly closed when images were being processed. We were trying to close them but in reality we were closing over an empty list, preventing anything from actually being freed. Fixes #10434	2025-12-29 06:37:49 -06:00
AliAhmedNada	dda786304e	readme: add Jirapt project to community integrations (#10522 )	2025-12-29 06:37:48 -06:00
aritra saha	33bcef045a	readme: change granite3.2 to granite3.3 (#10525 ) Update the list for readme	2025-12-29 06:37:48 -06:00
Michael Yang	79646ad87d	fix: write gguf padding (#10510 ) * add gguf_test * fix padding padding was being added to offset but not to the running count	2025-12-29 06:37:48 -06:00
Devon Rifkin	55803ceb35	strip out thinking tags in message history for qwen3 & r1 (#10490 ) * strip out thinking tags in message history for qwen3 & r1 This is in advance of "proper" support where we'll make reasoning configurable and we'll parse out thinking/reasoning tags and provide them to the caller. These models expect there to be no thinking tags in the message history, so this should improve quality * parse model names instead of hacky prefix check	2025-12-29 06:37:48 -06:00
Daniel Hiltgen	fee7c406aa	Fix "Stopping..." scheduler hang (#10487 ) * Adjust initial scheduler refCount Ensure we only set the refCount on success * sched: fix lock order inversion deadlock Under certain race conditions, there was a scenario where the scheduler would get into a deadlock while trying to update free space information while a model was trying to unload.	2025-12-29 06:37:48 -06:00
Daniel Hiltgen	098fe2f7f7	Narrow set of paths we load GGML from (#10485 ) Users may have other incompatible GGML installs on their systems. This will prevent us from trying to load them from the path.	2025-12-29 06:37:47 -06:00
Shahin R	5234d73611	readme: add link to lumina, a lightweight React frontend client (#10378 )	2025-12-29 06:37:47 -06:00
batuhankadioglu	6e74d8d222	all: update several golang.org/x packages (#10436 )	2025-12-29 06:37:47 -06:00
Daniel Hiltgen	4d8621629c	integration: fix embedding tests error handling (#10478 ) The cleanup routine from InitServerconnection should run in the defer of the test case to properly detect failures and report the server logs	2025-12-29 06:37:47 -06:00
Jesse Gross	13d497db4c	ollamarunner: Temporarily disable worst case graph preallocation When we later have a large batch running purely on a CPU, this results the error: GGML_ASSERT(talloc->buffer_id >= 0) Disabling this means that we will incrementally reallocate memory as the graph grows. Fixes #10410	2025-12-29 06:37:46 -06:00
crStiv	02a3285b60	readme: fix typos (#10399 )	2025-12-29 06:37:46 -06:00
Devon Rifkin	528bd3077a	lower default num parallel to 2 this is in part to "pay" for #10452, which doubled the default context length. The combination isn't fully neutral though, because even though the old 4x2k limit and the new 2x4k limit are memory equivalent, the 1x fallback is larger with 4k	2025-12-29 06:37:46 -06:00
Devon Rifkin	b963dd868b	config: update default context length to 4096	2025-12-29 06:37:46 -06:00
Devon Rifkin	5a7c6c363e	Revert "increase default context length to 4096 (#10364 )" This reverts commit `424f648632`.	2025-12-29 06:37:46 -06:00
Michael Yang	b236fcc9bf	model: fix build (#10416 )	2025-12-29 06:37:45 -06:00
Michael Yang	049aa30191	memory	2025-12-29 06:37:45 -06:00
Michael Yang	644d6c5256	fixes for maverick	2025-12-29 06:37:45 -06:00
Michael Yang	d2d5c5e6d5	chunked attention	2025-12-29 06:37:45 -06:00
Michael Yang	b7f628b9e8	connect vision to text	2025-12-29 06:37:45 -06:00
Michael Yang	b875952e67	image processing Co-authored-by: Patrick Devine <patrick@infrahq.com>	2025-12-29 06:37:44 -06:00
Michael Yang	0f5c45e19d	llama4	2025-12-29 06:37:44 -06:00
Michael Yang	371560df26	fix test	2025-12-29 06:37:44 -06:00
Michael Yang	a0d77f1dbe	explicitly decode maxarraysize 1024	2025-12-29 06:37:44 -06:00
Michael Yang	8a86190fd4	fix parameter count	2025-12-29 06:37:44 -06:00
Michael Yang	49f807737a	default slice values	2025-12-29 06:37:44 -06:00
Michael Yang	51e64c8f69	update comment	2025-12-29 06:37:43 -06:00
Michael Yang	84a6567dee	fix token type	2025-12-29 06:37:43 -06:00
Michael Yang	5a8e641272	zero means zero use a default of 1024 when asking for zero is confusing since most calls seem to assume 0 means do not ready any data	2025-12-29 06:37:43 -06:00
Michael Yang	f0c5b48f7b	convert: use -1 for read all	2025-12-29 06:37:43 -06:00
Michael Yang	96618f6344	generic ggml.array	2025-12-29 06:37:42 -06:00
Michael Yang	5e0d7e9332	fix superfluous call to WriteHeader the first call to http.ResponseWriter.Write implicitly calls WriteHeader with http.StatusOK if it hasn't already been called. once WriteHeader has been called, subsequent calls has no effect. Write is called when JSON encoding progressUpdateJSON{}. calls to http.ResponseWriter.WriteHeader after the first encode is useless and produces a warning: http: superfluous response.WriteHeader call from github.com/ollama/ollama/server/internal/registry.(*statusCodeRecorder).WriteHeader (server.go:77)	2025-12-29 06:37:42 -06:00
Michael Yang	584c3176d2	convert: change to colmajor	2025-12-29 06:37:42 -06:00
Michael Yang	4f01385151	ci: silence deprecated gpu targets warning	2025-12-29 06:37:42 -06:00
Jeffrey Morgan	85d3f71c02	llama: update to commit 2016f07b (#10352 )	2025-12-29 06:37:42 -06:00
Parth Sareen	83e848fcb8	server: improve spacing for JSON grammar (#10131 )	2025-12-29 06:37:41 -06:00
Parth Sareen	7cf4c146bc	llama: remove model loading for grammar (#10096 )	2025-12-29 06:37:41 -06:00

1 2 3 4 5 ...

4325 Commits