image processing
Update model.go
Update model.go
Update model.go
no projector
no projector
vision model scaffold
...
...
wip
...
rebase
fix patch merger
tidy
...
Update model_vision.go
server: do not attempt to parse offset file as gguf
This logic was causing issues for me when importing a gguf that had some padding at the end of the file. The valid gguf would be read, but then it would try to read the offset as a different gguf file. This does not seem right.
Update process_image_test.go
apply norm
prompt processing
prompt processing
fix post tokenize
fix gguf padding + populate the split patch embeddings
...
...
another shot at patch embeddings
...
patch embedding
Update model_vision.go
split pixels
This commit refactors the Rotary Position Embedding (RoPE) implementation across the codebase to use a structured configuration approach instead of individual parameters.
Key changes:
- Add new RoPEConfig struct with fields for dimension, type, base frequency, and scaling
- Add RopeType enum to formalize different RoPE implementation variants
- Add YarnConfig struct and related configuration for YaRN (Yet Another RoPE extensioN) context extension
- Update RoPE method signature across all tensor interfaces and implementations
- Refactor all model implementations (llama, gemma2, gemma3, mllama) to use the new configuration structure
This change improves code organization, makes the RoPE configuration more explicit, and provides better support for different RoPE variants and context extension methods.
When creating a quantized model from safetensors we
need the array KV values to be loaded.Changing this
value to -1 loads the KV values on the returned
layer to be used and saved during quantization.
the stream accumulator exits as soon as it sees `api.ProgressResponse(status="success")` which isn't strictly correctly
since some requests may have multiple successes, e.g. `/api/create` when the source model needs to be pulled.
The correct constant to remove all entries to the end of the sequence
for the Ollama engine is math.MaxInt32. -1 is used by the old engine.
The impact of this is currently minimal because it would only occur
in situations that are not supported by the implemented models or
rarely used options.