ollama

Author	SHA1	Message	Date
Daniel Hiltgen	a017e78f35	fix crash in old clients with quantization progress (#10710 ) Older clients assumed the digest was at least 19 characters long so increase the size of the dummy digest to avoid array out of bounds crashes.	2025-12-29 06:38:00 -06:00
Bruce MacDonald	558b0f5fe9	model: add Qwen2.5-VL support (#10385 )	2025-12-29 06:37:59 -06:00
Michael Yang	4d12503049	chore: update mllama to use ollama engine (#10637 )	2025-12-29 06:37:59 -06:00
tej	783739ee9f	Fixed over vram allcation dure to small initial layer sizes. Co-authored-by: Tej Kiran <kiran.tej@amd.com> Co-authored-by: Michael Yang <mxyng@pm.me> Co-authored-by: Tej Kiran <itej89@gmailcom>	2025-12-29 06:37:59 -06:00
Parth Sareen	d0ed25bde8	llama: fix memory leak for grammar (#10696 )	2025-12-29 06:37:59 -06:00
Jeffrey Morgan	24118aa1db	llama: fix defrag patch to defragment when no slots are available (#10695 )	2025-12-29 06:37:59 -06:00
Daniel Hiltgen	d344573e5b	Revert "remove cuda v11 (#10569 )" (#10692 ) Bring back v11 until we can better warn users that their driver is too old. This reverts commit `fa393554b9`.	2025-12-29 06:37:58 -06:00
Jeffrey Morgan	3f2b7658af	llama: fix crash on snowflake embedding model (#10690 )	2025-12-29 06:37:58 -06:00
Jeffrey Morgan	595b683ffb	server: add webp image input support (#10653 )	2025-12-29 06:37:58 -06:00
Michael Yang	b9c7aed5ce	fix vocabulary (#10679 )	2025-12-29 06:37:58 -06:00
Bruce MacDonald	f1c017735b	models: remove unused qwen2vl processing (#10677 )	2025-12-29 06:37:58 -06:00
Daniel Hiltgen	0132148534	Follow up to #10363 (#10647 ) The quantization PR didn't block all unsupported file types, which this PR fixes. It also updates the API docs to reflect the now reduced set of supported types.	2025-12-29 06:37:57 -06:00
Jeffrey Morgan	9163ed39d1	llama: update to commit de4c07f93 (#10655 )	2025-12-29 06:37:57 -06:00
Bruce MacDonald	5b54f682ed	convert: quantize from safetensors needs kv (#10675 ) When creating a quantized model from safetensors we need the array KV values to be loaded.Changing this value to -1 loads the KV values on the returned layer to be used and saved during quantization.	2025-12-29 06:37:57 -06:00
Michael Yang	7085a3f89b	feat: add trace log level (#10650 ) reduce prompt log to trace level	2025-12-29 06:37:57 -06:00
HardCodeDev	d69f623dd6	readme: add UnityCodeLama to community integrations (#10665 )	2025-12-29 06:37:57 -06:00
HardCodeDev	87ad1fe2d2	readme: add OllamaPlusPlus C++ library to community integrations (#10664 )	2025-12-29 06:37:56 -06:00
frob	1791b68cc2	llama: allocate grammar buffer based on schema length (#10649 )	2025-12-29 06:37:56 -06:00
frob	6faf548d3a	envconfig: Remove no longer supported max vram var (#10623 ) Co-authored-by: Richard Lyons <frob@cloudstaff.com>	2025-12-29 06:37:56 -06:00
Michael Yang	d9cf336ade	feat: add threshold to dump options (#10639 ) ml.Dump will preserve default values if not specified	2025-12-29 06:37:56 -06:00
AliAhmedNada	69446104a8	readme: add ojira to community integrations (#10648 )	2025-12-29 06:37:55 -06:00
Bruce MacDonald	9733b4decc	cmd: strip single quotes from image page (#10636 )	2025-12-29 06:37:55 -06:00
Michael Yang	7f513a6d46	fix: stream accumulator exits early (#10593 ) the stream accumulator exits as soon as it sees `api.ProgressResponse(status="success")` which isn't strictly correctly since some requests may have multiple successes, e.g. `/api/create` when the source model needs to be pulled.	2025-12-29 06:37:55 -06:00
Michael Yang	dbde1b67fb	lint: enable usetesting, disable tenv (#10594 )	2025-12-29 06:37:55 -06:00
Michael Yang	444ee714a7	chore: remove unused ZipReader type (#10621 )	2025-12-29 06:37:55 -06:00
Jeffrey Morgan	9ec2150629	api: remove unused sampling parameters (#10581 )	2025-12-29 06:37:54 -06:00
Jesse Gross	38fae71425	ollamarunner: Use correct constant to remove cache entries The correct constant to remove all entries to the end of the sequence for the Ollama engine is math.MaxInt32. -1 is used by the old engine. The impact of this is currently minimal because it would only occur in situations that are not supported by the implemented models or rarely used options.	2025-12-29 06:37:54 -06:00
Daniel Hiltgen	aadcbde40f	CI: trigger downstream release process (#10508 )	2025-12-29 06:37:54 -06:00
Daniel Hiltgen	414f323a9d	sched: fix race leading to orphaned runners (#10599 ) If a model is loading, and the request context is canceled during the load by a client closing the connection, and another request is inbound for the same model with a different configuration (context size, etc.) thus requiring a reload, two unload events can be in flight. The first shuts down the original model load, but the second one caused the loss of the new reloading runner reference, thus triggering the leak. The primary fix is detecting the duplicate unload and ignoring the second instance. The load routine is also hardened to ensure we detect clobbering an already present runner and unload it with a warning.	2025-12-29 06:37:54 -06:00
Jeffrey Morgan	846e53006e	api: remove unused RetrieveModelResponse type (#10603 )	2025-12-29 06:37:54 -06:00
Daniel Hiltgen	671bf5dcf8	fix data race in WriteGGUF (#10598 ) err in the go routine should not be shared with the outer scope	2025-12-29 06:37:53 -06:00
Daniel Hiltgen	3e99eae7e5	remove cuda v11 (#10569 ) This reduces the size of our Windows installer payloads by ~256M by dropping support for nvidia drivers older than Feb 2023. Hardware support is unchanged. Linux default bundle sizes are reduced by ~600M to 1G.	2025-12-29 06:37:53 -06:00
Aharon Bensadoun	08592afc11	readme: add Flufy to community integrations (#9719 )	2025-12-29 06:37:53 -06:00
Devon Rifkin	4f231cd13e	server: send 405 instead of 404 for unallowed methods (#10275 ) Fixes: #5483	2025-12-29 06:37:53 -06:00
Michael Yang	2bc6ee16e0	server: remove internal cmd (#10595 )	2025-12-29 06:37:53 -06:00
Daniel Hiltgen	39ca55a1ba	Move quantization to new backend (#10363 ) * Move quantization logic to GGML via new backend This moves the model aware logic to Go code and calls GGMLs quantization code for model creation. * Remove "add model quantizations" This is no longer needed now that quantization is implemented in Go+GGML code directly.	2025-12-29 06:37:52 -06:00
Michael Yang	2f1eb0fcce	discover: fix compiler warnings (#10572 )	2025-12-29 06:37:52 -06:00
Jeffrey Morgan	13c66584a5	api: remove unused or unsupported api options (#10574 ) Some options listed in api/types.go are not supported in newer models, or have been deprecated in the past. This is the first of a series of PRs to clean up the API options	2025-12-29 06:37:52 -06:00
Michael Yang	71167fb878	create blobs in parallel (#10135 ) * default max term height * error on out of tree files	2025-12-29 06:37:52 -06:00
Jesse Gross	48b6465aff	ggml: Reduce log level of "key not found" Most of the time this is not an error.	2025-12-29 06:37:52 -06:00
Daniel Hiltgen	efcc69e96f	win: lint fix (#10571 )	2025-12-29 06:37:51 -06:00
Ashok Gelal	c833610871	Hide empty terminal window (#8668 ) This hides the LlamaServer blank window when chatting outside of the terminal (say like with an app like Msty). This has no other side effects when invoking it the regular way.	2025-12-29 06:37:51 -06:00
Jeffrey Morgan	7e9f243a0d	server: fix panic when runner.Options is nil (#10566 )	2025-12-29 06:37:51 -06:00
Jeffrey Morgan	9a44e41802	all: fix cgo compiler warnings on windows (#10563 )	2025-12-29 06:37:51 -06:00
湛露先生	0bffcc8cc4	file close check and close. (#10554 ) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>	2025-12-29 06:37:51 -06:00
Daniel Hiltgen	d0904ea7f1	win: ensure ollama paths come first (#10549 ) For all search path env vars make sure our dirs are first to avoid potentially finding other incompatible libraries on the users system. Also fixes a minor build script glitch for windows rocm	2025-12-29 06:37:50 -06:00
Daniel Hiltgen	cf9f00182d	sched: logging improvements (#10550 ) This enhances our logging in the scheduler. The initial "waiting for server" log no longer claims an initial error state (now "not responding" which better reflects the actual state). Runners now have slog wiring to report more details about the runner, including PID.	2025-12-29 06:37:50 -06:00
aritra saha	541a8575f0	readme: add llama 4 models (#10530 )	2025-12-29 06:37:50 -06:00
Jesse Gross	86eea6770e	ggml: Fix race that resulted in "context canceled" when loading Successfully completing processing with an errgroup cancels the associated context. However, we also have a goroutine that is checking for cancelation of the context. As a result, there is a race where the goroutine can pick up the cancelation and report an error, replacing the sucessful error message. To avoid that, this replaces the goroutine with a cancelation check when we are reading files. This also has the advantage of stopping all reads relatively quickly on error and also ensuring that there are no outstanding I/O operations when we return in this case. The downside is that if a file read blocks forever (for example, over the network) then cancelation of the context effectively won't be honored. However, this is also true for other smaller files we read and the tensors are read in small chunks (128K), so it's consistent and better on balance overall.	2025-12-29 06:37:50 -06:00
Jesse Gross	cec8a9dee0	ollamarunner: Re-enable worst case graph preallocation. Worst case graph preallocation was disabled by `a27462b` "ollamarunner: Temporarily disable worst case graph preallocation" since it caused crashes with large batches when not using the GPU. This backports upstream llama.cpp commit f057808 "ggml: Don't assert fail when tensor data changes (#13222)", which fixes the underlying bug and allows reverting the previous workaround.	2025-12-29 06:37:50 -06:00

... 2 3 4 5 6 ...

4416 Commits