Commit Graph

24 Commits

Author SHA1 Message Date
Jesse Gross 282bfaaa95 ollamarunner: Use a separate context per multimodal input
Currently there is a single context per sequence, shared all by
all multimodal inputs. Since we build a vision encoder graph per
image, with a large number of inputs we can eventually hit the
maximum number of graph nodes per context.

This changes to use a separate context for each image, ensuring
that available resource limits are consistent.
2025-03-14 15:38:54 -07:00
Jesse Gross 9679f40146 ml: Allow models to constrain inputs to a single batch
Models may require that a set of inputs all be processed as part
of the same batch. For example, if an image has multiple patches
with fully connected attention between them, we should not split
the batch in the middle of an image.

Fixes #9697
2025-03-14 15:38:54 -07:00
Michael Yang 5e2e0b46b1 fix: error if image requested without vision model 2025-03-13 10:52:09 -07:00
Bruce MacDonald a70820daa0
models/gemma3: remove final logit softcap (#9692)
Softcap isn't in the whitepaper/implementation for the language model so we should remove it. There is no discernible difference in output with it removed.
2025-03-12 10:17:57 -07:00
jmorganca 83f0ec8269 all: address linter errors 2025-03-11 14:49:20 -07:00
Michael Yang 63a394068c use 2d pooling 2025-03-11 14:49:20 -07:00
jmorganca 11bfa62796 add trailing \n\n after <end_of_image> to match reference implementation 2025-03-11 14:49:20 -07:00
jmorganca f63e62e546 reduce kernel size, add TODO for loading from config 2025-03-11 14:49:20 -07:00
jmorganca 65b0f329d1 Revert "Allow models to force a new batch"
This reverts commit c7eae586b899083acebcd9b3847b89ea78c2850c.
2025-03-11 14:49:20 -07:00
Jesse Gross 06007c0a18 Allow models to force a new batch
This is useful for a few things:
 - Work around bugs, such as having 2 images in one batch
 - Keep the image in a single batch for fully connected attention
 - Improve performance by not evaluating embeddings multiple times
2025-03-11 14:49:20 -07:00
Jesse Gross a8e83a7654 Disable causal attention based on batch index
Currently we are using positions, which are relative to a
sequence and may not be unique.
2025-03-11 14:49:20 -07:00
Jesse Gross 2c40c4d35e Fix follow up images and images split across batches 2025-03-11 14:49:19 -07:00
Michael Yang e95278932b use non-causal mask only for image positions 2025-03-11 14:49:19 -07:00
Michael Yang 9d2a20a763 use non-causal mask for inputs with images 2025-03-11 14:49:19 -07:00
Michael Yang 6b32a2d549 compat with upstream gguf 2025-03-11 14:49:19 -07:00
Michael Yang f888912870 fix vision encoder 2025-03-11 14:49:19 -07:00
Patrick Devine 9b54267e69 fix configs 2025-03-11 14:49:19 -07:00
Michael Yang 46bb0169c4 update model 2025-03-11 14:49:19 -07:00
Michael Yang 8934324b72 use fast attention 2025-03-11 14:49:18 -07:00
Patrick Devine c62861f4fa fix conversion 2025-03-11 14:49:18 -07:00
Michael Yang 0df1800436 set non-causal attention 2025-03-11 14:49:18 -07:00
Jesse Gross 4346c2409d fix drift from main 2025-03-11 14:49:18 -07:00
Michael Yang 4b037a97dc add gemma vision encoder 2025-03-11 14:49:17 -07:00
Patrick Devine 5f74d1fd47 gemma2 impl 2025-03-11 14:35:08 -07:00