ollama

Commit Graph

Author	SHA1	Message	Date
Inforithmics	d71c83f2ba	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-08-14 22:11:08 +02:00
Daniel Hiltgen	7ccfd97a93	doc: clarify both rocm and main bundle necessary (#11900 ) Some users expect the rocm bundles to be self-sufficient, but are designed to be additive.	2025-08-14 12:54:55 -07:00
Daniel Hiltgen	c385ca8672	test: add valid responses (#11902 ) some of the new models need a few more valid responses to pass	2025-08-14 11:07:13 -07:00
Daniel Hiltgen	837379a94c	discovery: fix cudart driver version (#11614 ) We prefer the nvcuda library, which reports driver versions. When we dropped cuda v11, we added a safety check for too-old drivers. What we missed was the cudart fallback discovery logic didn't have driver version wired up. This fixes cudart discovery to expose the driver version as well so we no longer reject all GPUs if nvcuda didn't work.	2025-08-13 15:43:33 -07:00
Daniel Hiltgen	a24f90604f	int: adjust a few models for integration tests (#11872 )	2025-08-13 15:42:36 -07:00
Daniel Hiltgen	dc5a645434	cuda: leverage JIT for smaller footprint (#11635 ) Prior to this change our official binaries contained both JIT PTX code and the cubin binary code for our chosen compute capabilities. This change switches to only compile the PTX code and rely on JIT at runtime for generating the cubin specific to the users GPU. The cubins are cached on the users system, so they should only see a small lag on the very first model load for a given Ollama release. This also adds the first generation of Blackwell GPUs so they aren't reliant on the Hopper PTX. This change reduces the ggml-cuda.dll from 1.2G to 460M	2025-08-13 15:42:16 -07:00
Inforithmics	6543213e6f	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-08-13 23:50:00 +02:00
youzichuan	bb71654ebe	chore: fix some inconsistent function name in comment Signed-off-by: youzichuan <youzichuan6@outlook.com>	2025-08-13 09:50:27 -07:00
Inforithmics	eaf42a646c	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-08-13 08:27:22 +02:00
Jesse Gross	a343ae53a4	ggml: Use ordinal IDs for AMD GPUs on Linux when UUID is unavailable Some AMD GPUs do not provide UUIDs and report only "XX". In these cases, we should use the ordinal ID as an alternate identifier. This is the same as we always need to do on Windows for AMD. In addition, this prints out the ID for each GPU when enumerating them for easier debugging in the future.	2025-08-12 16:56:14 -07:00
Inforithmics	49c4d154ae	Enable Vulkan Flash attention in FlashAttentionSupported	2025-08-12 21:55:19 +02:00
Inforithmics	e6da524ab7	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-08-12 21:51:39 +02:00
Inforithmics	2244f304d7	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-08-12 21:43:10 +02:00
Michael Yang	d0cf6c8281	fix(openai): handle reasoning_effort (#11868 )	2025-08-12 11:02:01 -07:00
Jesse Gross	8f4ec9ab28	discover: CPU supports flash attention We already run flash attention on CPUs in cases where we have partial offloading but were disabling it if running on pure CPU, which is unnecessary.	2025-08-11 15:00:34 -07:00
Devon Rifkin	dbfd7bd027	Merge pull request #11861 from ollama/drifkin/fix-parsing-error server: fix error when parsing bad harmony tool calls	2025-08-11 14:59:57 -07:00
Devon Rifkin	ee04dbba51	server: fix error when parsing bad harmony tool calls Thanks @moll for reporting! Fixes: #11781	2025-08-11 14:09:13 -07:00
Daniel Andersen	ea7657b54a	sched: Add support for grouping GPUs (#10678 ) This patch modifies Ollama to allow grouping GPUs to memory-fit to the requested model, instead of the former algorithm of using one GPU distributing over all available GPUs. Benefits: - Lower amount of (PCIe-)bus communication between GPUs - especially when they are not very high speed - Allowing unallocated GPUs to get into power-saving mode. - Significantly reduce VRAM allocation when using more than 2 GPUs in a system - Due to the reduced memory allocation, you can run more models simultaneously.	2025-08-11 13:59:38 -07:00
Inforithmics	0c27f472e7	Remove commented out code	2025-08-11 18:52:43 +02:00
Inforithmics	e3627b2832	Add vulkan to Windows Build script	2025-08-11 18:39:10 +02:00
Inforithmics	d1f74e17d4	Update gpu.go	2025-08-10 21:28:59 +02:00
Inforithmics	f6dd7070de	vk_check_flash_attention 0 means supported	2025-08-10 21:22:26 +02:00
Inforithmics	ee24b967f1	fixed flash attention logic enabling	2025-08-10 19:57:14 +02:00
Inforithmics	a1393414ce	revert remove parenthesis	2025-08-10 17:54:13 +02:00
Inforithmics	5270c4c5f7	enable falsh attention on vulkan	2025-08-10 16:53:13 +02:00
Inforithmics	60a015e8c3	Revert chnages in ggml.go	2025-08-10 16:09:44 +02:00
Inforithmics	1edbfd0559	Revert changes in ggml.go	2025-08-10 16:07:24 +02:00
Inforithmics	fd4480a848	Fixed duplicate sync in ggml.go	2025-08-10 16:05:09 +02:00
Inforithmics	2e7452be71	Update Vulkan Code to de4c07f93783a1a96456a44dc16b9db538ee1618	2025-08-10 16:01:07 +02:00
Michael Vorburger	2c776f0780	CONTRIBUTING: Explicitly note docs:... as a good example (#11755 )	2025-08-09 18:12:30 -07:00
Thomas Stocker	bc5c3fb213	Revert vulkan copy changes in Dockerfile	2025-08-09 22:45:52 +02:00
Thomas Stocker	fa13b8de45	Revert some unintented changes in Dockerfile	2025-08-09 22:43:12 +02:00
Thomas Stocker	d03fc13d36	Revert changes in Makefile.sync	2025-08-09 22:38:37 +02:00
Thomas Stocker	a6d0d6c6ff	Revert changes in runner.go	2025-08-09 22:35:20 +02:00
Thomas Stocker	0ddb64db1f	Revert changes in transforms_test.go	2025-08-09 22:33:42 +02:00
Thomas Stocker	29b1ed0077	Revert whitespace changes in gpu.go	2025-08-09 22:30:13 +02:00
Thomas Stocker	57270767ac	Remove flashattention setting gpu.go	2025-08-09 22:26:54 +02:00
Thomas Stocker	42463fbb7f	Revert changes in amd_linux.go	2025-08-09 22:24:33 +02:00
Thomas Stocker	89ac91099d	Revert changes in amd_linux.go	2025-08-09 22:23:00 +02:00
Thomas Stocker	47bff3e532	Revert	2025-08-09 22:15:54 +02:00
Thomas Stocker	643b1c505e	Revert Readme changes	2025-08-09 22:14:54 +02:00
Inforithmics	f8ed1541ed	Merge remote-tracking branch 'upstream/main' into vulkanV3	2025-08-09 21:59:30 +02:00
Jesse Gross	79f6376f5b	ggml: No-alloc mode Callers can set a backend buffer type to be no-alloc, meaning that it does not allocate memory for tensors or operations. This can be used for calculating memory requirements. Tensors and graphs must be recreated with no-alloc set to false before loading data. Defaults to false for newly created backend buffer types.	2025-08-08 14:57:13 -07:00
Jesse Gross	756c78cfc7	ggml: Support closing backends In order to iteratively find the best memory allocation, we need to be able to free backend memory so we can try again.	2025-08-08 14:57:13 -07:00
Jesse Gross	d7f4f788d1	ggml: Use GGML's typedef'ed pointer types For many backend data structures, GGML defines a typedef of a pointer type and returns these from functions. In most cases, CGo understands that these are interchangable but some parts of Go (such as generics) think they are two different types. We should prefer the form that GGML uses.	2025-08-08 14:57:13 -07:00
Daniel Hiltgen	114c3f2265	tests: add integration coverage for oss-gpt (#11696 ) Also wires up support to override the default "smol" model	2025-08-07 15:06:57 -07:00
Jesse Gross	f2e9c9aff5	server: Reduce gpt-oss context length for small VRAM GPUs gpt-oss works best with a context length of at least 8k. However, for GPUs with limited amount of VRAM, there is a significant performance hit to this increased context. In these cases, we switch to the Ollama default of 4k	2025-08-07 14:23:55 -07:00
Devon Rifkin	aa9d889522	Merge pull request #11765 from ollama/drifkin/thinking-without-content openai: always provide reasoning	2025-08-06 19:02:23 -07:00
Devon Rifkin	735c41f9ca	openai: always provide reasoning We were missing passing along thinking if content was nil (as opposed to empty string) Also added a test for content not being passed, which was the real cause of <https://github.com/ollama/ollama/issues/11704>, since with the way `Content` is typed, not passing it and empty string are distinct	2025-08-06 18:54:20 -07:00
Devon Rifkin	223a619468	Merge pull request #11761 from ollama/drifkin/openai-tool-names openai: when converting role=tool messages, propagate the tool name	2025-08-06 17:53:25 -07:00

1 2 3 4 5 ...

4543 Commits All Branches Search

4543 Commits

All Branches