Commit Graph

4283 Commits

Author SHA1 Message Date
Bruce MacDonald 104f802df1 remove todos 2025-05-12 13:49:42 -07:00
Bruce MacDonald eed0ac2948 clean up vision model forward pass 2025-05-12 13:49:42 -07:00
Bruce MacDonald fcfad744ff fix patch merger 2025-05-12 13:49:42 -07:00
Michael Yang fb3c16f2a2 window index 2025-05-12 13:49:42 -07:00
Michael Yang ee869f35e4 fix image processing
python built-in `round()` rounds to the nearest even number if the value
is in the middle

https://docs.python.org/3/library/functions.html#round
2025-05-12 13:49:42 -07:00
Michael Yang ff5d1a3dc0 duplicate input embeddings 2025-05-12 13:49:42 -07:00
Michael Yang 88b231f903 use maxgridsize 2025-05-12 13:49:42 -07:00
Michael Yang 7e920c8d75 fix: patch merger and convert
convert:
- split patch embedding
- split qkv

remove duplicate PatchMerger
2025-05-12 13:49:42 -07:00
Bruce MacDonald dd8c619fba fixes after rebase 2025-05-12 13:49:42 -07:00
Bruce MacDonald 2af76d0e7a default to 32 for vision block count 2025-05-12 13:49:42 -07:00
Bruce MacDonald 8d901825f0 reshape cos and sin 2025-05-12 13:49:41 -07:00
Bruce MacDonald 04936b719f Update model_vision.go 2025-05-12 13:49:41 -07:00
Bruce MacDonald 0f0136d419 simplify by doing operations in Go rather than with tensors
Co-Authored-By: Michael Yang <2372640+mxyng@users.noreply.github.com>
2025-05-12 13:49:41 -07:00
Bruce MacDonald 80498f76de fix build 2025-05-12 13:49:41 -07:00
Bruce MacDonald f8b48aa784 Delete model_external_test.go 2025-05-12 13:49:41 -07:00
Bruce MacDonald 5ff0d538b0 wip: implementing rope 2025-05-12 13:49:41 -07:00
Bruce MacDonald eedc969c35 grid refactor 2025-05-12 13:49:41 -07:00
Bruce MacDonald 963531215e update convert 2025-05-12 13:49:41 -07:00
Bruce MacDonald 3fe090f447 get patch embedding vals from config 2025-05-12 13:49:41 -07:00
Bruce MacDonald 1704072746 patch embeddings 2025-05-12 13:49:41 -07:00
Bruce MacDonald c1f9bcb4dd restructure
image processing

Update model.go

Update model.go

Update model.go

no projector

no projector

vision model scaffold

...

...

wip

...

rebase

fix patch merger

tidy

...

Update model_vision.go

server: do not attempt to parse offset file as gguf

This logic was causing issues for me when importing a gguf that had some padding at the end of the file. The valid gguf would be read, but then it would try to read the offset as a different gguf file. This does not seem right.

Update process_image_test.go

apply norm

prompt processing

prompt processing

fix post tokenize

fix gguf padding + populate the split patch embeddings

...

...

another shot at patch embeddings

...

patch embedding

Update model_vision.go

split pixels
2025-05-12 13:49:41 -07:00
Bruce MacDonald 198b1e6db9 text model forward pass 2025-05-12 13:49:41 -07:00
Bruce MacDonald 51ad65f831 ml: structured rope config to allow specifying context len
This commit refactors the Rotary Position Embedding (RoPE) implementation across the codebase to use a structured configuration approach instead of individual parameters.

Key changes:
- Add new RoPEConfig struct with fields for dimension, type, base frequency, and scaling
- Add RopeType enum to formalize different RoPE implementation variants
- Add YarnConfig struct and related configuration for YaRN (Yet Another RoPE extensioN) context extension
- Update RoPE method signature across all tensor interfaces and implementations
- Refactor all model implementations (llama, gemma2, gemma3, mllama) to use the new configuration structure

This change improves code organization, makes the RoPE configuration more explicit, and provides better support for different RoPE variants and context extension methods.
2025-05-12 13:49:41 -07:00
Jeffrey Morgan 0cefd46f23
llama: update to commit de4c07f93 (#10655) 2025-05-12 12:17:26 -07:00
Bruce MacDonald ad035ad595
convert: quantize from safetensors needs kv (#10675)
When creating a quantized model from safetensors we
need the array KV values to be loaded.Changing this
value to -1 loads the KV values on the returned
layer to be used and saved during quantization.
2025-05-12 12:04:20 -07:00
Michael Yang f95a1f2bef
feat: add trace log level (#10650)
reduce prompt log to trace level
2025-05-12 11:43:00 -07:00
HardCodeDev 82a9e9462a
readme: add UnityCodeLama to community integrations (#10665) 2025-05-11 13:44:51 -07:00
HardCodeDev 76724e2f29
readme: add OllamaPlusPlus C++ library to community integrations (#10664) 2025-05-11 13:40:41 -07:00
frob ecf14a220f
llama: allocate grammar buffer based on schema length (#10649) 2025-05-10 11:57:30 -07:00
frob 69ce44b33c
envconfig: Remove no longer supported max vram var (#10623)
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2025-05-10 11:31:04 -07:00
Michael Yang 5969674cf1
feat: add threshold to dump options (#10639)
ml.Dump will preserve default values if not specified
2025-05-10 11:27:15 -07:00
AliAhmedNada 867d75b21e
readme: add ojira to community integrations (#10648) 2025-05-10 10:36:40 -07:00
Bruce MacDonald 3fa78598a1
cmd: strip single quotes from image page (#10636) 2025-05-09 18:05:43 -07:00
Michael Yang 0d6e35d3c6
fix: stream accumulator exits early (#10593)
the stream accumulator exits as soon as it sees `api.ProgressResponse(status="success")` which isn't strictly correctly
since some requests may have multiple successes, e.g. `/api/create` when the source model needs to be pulled.
2025-05-08 13:17:30 -07:00
Michael Yang 6e9a7a2568
lint: enable usetesting, disable tenv (#10594) 2025-05-08 11:42:14 -07:00
Michael Yang b585a58121
chore: remove unused ZipReader type (#10621) 2025-05-08 11:17:41 -07:00
Jeffrey Morgan fa9973cd7f
api: remove unused sampling parameters (#10581) 2025-05-08 08:31:08 -07:00
Jesse Gross 3d9498a425 ollamarunner: Use correct constant to remove cache entries
The correct constant to remove all entries to the end of the sequence
for the Ollama engine is math.MaxInt32. -1 is used by the old engine.

The impact of this is currently minimal because it would only occur
in situations that are not supported by the implemented models or
rarely used options.
2025-05-07 17:26:15 -07:00
Daniel Hiltgen 3098c8b29b
CI: trigger downstream release process (#10508) 2025-05-07 10:35:12 -07:00
Daniel Hiltgen 5e380c3b42
sched: fix race leading to orphaned runners (#10599)
If a model is loading, and the request context is canceled during the load
by a client closing the connection, and another request is inbound for the
same model with a different configuration (context size, etc.) thus requiring
a reload, two unload events can be in flight.  The first shuts down the
original model load, but the second one caused the loss of the new
reloading runner reference, thus triggering the leak.

The primary fix is detecting the duplicate unload and ignoring the second
instance.  The load routine is also hardened to ensure we detect
clobbering an already present runner and unload it with a warning.
2025-05-07 09:38:17 -07:00
Jeffrey Morgan 392de84031
api: remove unused RetrieveModelResponse type (#10603) 2025-05-06 23:08:03 -07:00
Daniel Hiltgen af31ccefc0
fix data race in WriteGGUF (#10598)
err in the go routine should not be shared with the outer scope
2025-05-06 17:36:38 -07:00
Daniel Hiltgen fa393554b9
remove cuda v11 (#10569)
This reduces the size of our Windows installer payloads by ~256M by dropping
support for nvidia drivers older than Feb 2023.  Hardware support is unchanged.

Linux default bundle sizes are reduced by ~600M to 1G.
2025-05-06 17:33:19 -07:00
Aharon Bensadoun 307e3b3e1d
readme: add Flufy to community integrations (#9719) 2025-05-06 14:47:35 -07:00
Devon Rifkin 4090aca97b
server: send 405 instead of 404 for unallowed methods (#10275)
Fixes: #5483
2025-05-06 14:45:37 -07:00
Michael Yang 92ce438de0
server: remove internal cmd (#10595) 2025-05-06 13:05:01 -07:00
Daniel Hiltgen 424810450f
Move quantization to new backend (#10363)
* Move quantization logic to GGML via new backend

This moves the model aware logic to Go code and calls GGMLs quantization code for model creation.

* Remove "add model quantizations"

This is no longer needed now that quantization is implemented in Go+GGML code directly.
2025-05-06 11:20:48 -07:00
Michael Yang 95e744beeb
discover: fix compiler warnings (#10572) 2025-05-06 10:49:22 -07:00
Jeffrey Morgan 3b2d2c8326
api: remove unused or unsupported api options (#10574)
Some options listed in api/types.go are not supported in
newer models, or have been deprecated in the past. This is
the first of a series of PRs to clean up the API options
2025-05-05 14:54:40 -07:00
Michael Yang d931ee8f22
create blobs in parallel (#10135)
* default max term height
* error on out of tree files
2025-05-05 11:59:26 -07:00