ollama

Commit Graph

Author	SHA1	Message	Date
Bruce MacDonald	ff1f74534b	block attention	2025-05-12 13:49:42 -07:00
Bruce MacDonald	104f802df1	remove todos	2025-05-12 13:49:42 -07:00
Bruce MacDonald	eed0ac2948	clean up vision model forward pass	2025-05-12 13:49:42 -07:00
Bruce MacDonald	fcfad744ff	fix patch merger	2025-05-12 13:49:42 -07:00
Michael Yang	fb3c16f2a2	window index	2025-05-12 13:49:42 -07:00
Michael Yang	ee869f35e4	fix image processing python built-in `round()` rounds to the nearest even number if the value is in the middle https://docs.python.org/3/library/functions.html#round	2025-05-12 13:49:42 -07:00
Michael Yang	ff5d1a3dc0	duplicate input embeddings	2025-05-12 13:49:42 -07:00
Michael Yang	88b231f903	use maxgridsize	2025-05-12 13:49:42 -07:00
Michael Yang	7e920c8d75	fix: patch merger and convert convert: - split patch embedding - split qkv remove duplicate PatchMerger	2025-05-12 13:49:42 -07:00
Bruce MacDonald	dd8c619fba	fixes after rebase	2025-05-12 13:49:42 -07:00
Bruce MacDonald	2af76d0e7a	default to 32 for vision block count	2025-05-12 13:49:42 -07:00
Bruce MacDonald	8d901825f0	reshape cos and sin	2025-05-12 13:49:41 -07:00
Bruce MacDonald	04936b719f	Update model_vision.go	2025-05-12 13:49:41 -07:00
Bruce MacDonald	0f0136d419	simplify by doing operations in Go rather than with tensors Co-Authored-By: Michael Yang <2372640+mxyng@users.noreply.github.com>	2025-05-12 13:49:41 -07:00
Bruce MacDonald	80498f76de	fix build	2025-05-12 13:49:41 -07:00
Bruce MacDonald	f8b48aa784	Delete model_external_test.go	2025-05-12 13:49:41 -07:00
Bruce MacDonald	5ff0d538b0	wip: implementing rope	2025-05-12 13:49:41 -07:00
Bruce MacDonald	eedc969c35	grid refactor	2025-05-12 13:49:41 -07:00
Bruce MacDonald	963531215e	update convert	2025-05-12 13:49:41 -07:00
Bruce MacDonald	3fe090f447	get patch embedding vals from config	2025-05-12 13:49:41 -07:00
Bruce MacDonald	1704072746	patch embeddings	2025-05-12 13:49:41 -07:00
Bruce MacDonald	c1f9bcb4dd	restructure image processing Update model.go Update model.go Update model.go no projector no projector vision model scaffold ... ... wip ... rebase fix patch merger tidy ... Update model_vision.go server: do not attempt to parse offset file as gguf This logic was causing issues for me when importing a gguf that had some padding at the end of the file. The valid gguf would be read, but then it would try to read the offset as a different gguf file. This does not seem right. Update process_image_test.go apply norm prompt processing prompt processing fix post tokenize fix gguf padding + populate the split patch embeddings ... ... another shot at patch embeddings ... patch embedding Update model_vision.go split pixels	2025-05-12 13:49:41 -07:00
Bruce MacDonald	198b1e6db9	text model forward pass	2025-05-12 13:49:41 -07:00
Bruce MacDonald	51ad65f831	ml: structured rope config to allow specifying context len This commit refactors the Rotary Position Embedding (RoPE) implementation across the codebase to use a structured configuration approach instead of individual parameters. Key changes: - Add new RoPEConfig struct with fields for dimension, type, base frequency, and scaling - Add RopeType enum to formalize different RoPE implementation variants - Add YarnConfig struct and related configuration for YaRN (Yet Another RoPE extensioN) context extension - Update RoPE method signature across all tensor interfaces and implementations - Refactor all model implementations (llama, gemma2, gemma3, mllama) to use the new configuration structure This change improves code organization, makes the RoPE configuration more explicit, and provides better support for different RoPE variants and context extension methods.	2025-05-12 13:49:41 -07:00
Michael Yang	f95a1f2bef	feat: add trace log level (#10650 ) reduce prompt log to trace level	2025-05-12 11:43:00 -07:00
Michael Yang	5cfc1c39f3	model: fix build (#10416 )	2025-04-25 19:24:48 -07:00
Michael Yang	7ba9fa9c7d	fixes for maverick	2025-04-25 16:59:20 -07:00
Michael Yang	8bf11b84c1	chunked attention	2025-04-25 16:59:20 -07:00
Michael Yang	470af8ab89	connect vision to text	2025-04-25 16:59:20 -07:00
Michael Yang	178761aef3	image processing Co-authored-by: Patrick Devine <patrick@infrahq.com>	2025-04-25 16:59:20 -07:00
Michael Yang	f0c66e6dea	llama4	2025-04-25 16:59:20 -07:00
Michael Yang	d26c18e25c	fix token type	2025-04-25 16:59:01 -07:00
Parth Sareen	a53d744b01	llama: remove model loading for grammar (#10096 )	2025-04-24 11:51:19 -07:00
Michael Yang	40b8fdbdca	arange	2025-04-18 11:45:44 -07:00
Jesse Gross	dbb149e6f7	ollamarunner: Preallocate worst case graph at startup Currently, the KV cache and graph are lazily allocated as needed. The cache is fully allocated on first use of the corresponding layer whereas the graph grows with the size of the context. This can be an issue if another application allocates more VRAM after we do our calculations - Ollama will crash in the middle of inference. If we instead allocate the maximum needed memory at startup of the runner, we will either succeed or fail at that point rather than at some surprising time in the future. Currently, this only generates a worst case batch for text, which means that vision models may get a partial allocation and continue to lazily allocate the rest.	2025-04-08 10:01:28 -07:00
Bruce MacDonald	6bd0a983cd	model: support for mistral-small in the ollama runner Mistral is a popular research lab making open source models. This updates the forward pass of llama architecture models to support both llama models and mistral models by accounting for additional metadata present in mistral models, and finding the correct dimensions for the output projection.	2025-04-03 16:57:36 -07:00
Michael Yang	3b96a93672	fs: move ml.Config to fs package	2025-04-03 13:12:24 -07:00
Jeffrey Morgan	b51e0f397c	model: fix issues with spm tokenizer for Gemma 3 (#10081 )	2025-04-02 13:22:56 -07:00
Michael Yang	74bd09652d	ml/backend/ggml: load tensors in 32KiB chunks	2025-03-21 14:43:52 -07:00
Jesse Gross	0fbfcf3c9c	model: Pass input tensor instead of raw data to models Rather than directly giving the input data to models, we can pass a tensor instead. In the short term, this saves some duplicated code. Longer term, we will want to overlap setting up the next batch with processing of the current one. In this case, we will only have the shape of tensor but it will not be loaded with data at the time of graph generation. By passing only a tensor to models now, we set up this possibility and prevent them from relying on data that they won't have in the future. Although the same could be done for Positions and Outputs, in some cases we either need the raw input data or don't use them at all. Therefore, for now we leave them as they are and allow models to convert them to tensors as needed.	2025-03-20 13:28:13 -07:00
Jesse Gross	0c220935bd	input: Rename Options to Batch Options is no longer very descriptive of this struct.	2025-03-20 13:28:13 -07:00
Jesse Gross	b078dd157c	gemma2: Remove second call to Rows Looks like a merge conflict that broke the model.	2025-03-19 17:28:49 -07:00
Jeffrey Morgan	da0e345200	ml: use input context for extracting outputs (#9875 )	2025-03-18 18:08:19 -07:00
Jesse Gross	282bfaaa95	ollamarunner: Use a separate context per multimodal input Currently there is a single context per sequence, shared all by all multimodal inputs. Since we build a vision encoder graph per image, with a large number of inputs we can eventually hit the maximum number of graph nodes per context. This changes to use a separate context for each image, ensuring that available resource limits are consistent.	2025-03-14 15:38:54 -07:00
Jesse Gross	9679f40146	ml: Allow models to constrain inputs to a single batch Models may require that a set of inputs all be processed as part of the same batch. For example, if an image has multiple patches with fully connected attention between them, we should not split the batch in the middle of an image. Fixes #9697	2025-03-14 15:38:54 -07:00
Michael Yang	3e102b7dad	Update model/model.go Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>	2025-03-13 13:11:52 -07:00
Michael Yang	5e2e0b46b1	fix: error if image requested without vision model	2025-03-13 10:52:09 -07:00
Bruce MacDonald	a70820daa0	models/gemma3: remove final logit softcap (#9692 ) Softcap isn't in the whitepaper/implementation for the language model so we should remove it. There is no discernible difference in output with it removed.	2025-03-12 10:17:57 -07:00
jmorganca	83f0ec8269	all: address linter errors	2025-03-11 14:49:20 -07:00
jmorganca	fb4664fcec	model: add more spm tokenizer tests	2025-03-11 14:49:20 -07:00

1 2

95 Commits