Commit Graph

97 Commits

Author SHA1 Message Date
Bruce MacDonald 5d3da85a16 remove out of date comments 2025-05-12 13:51:04 -07:00
Bruce MacDonald 684f0d9291 set default values for vision model in config 2025-05-12 13:51:04 -07:00
Bruce MacDonald 661bf04696 add picture prefix 2025-05-12 13:49:44 -07:00
Bruce MacDonald 2521a55ae6 fixes after rebase 2025-05-12 13:49:44 -07:00
Bruce MacDonald 32948ec952 increase rope base 2025-05-12 13:49:43 -07:00
Bruce MacDonald 16b13e0cfc Revert "ropeTheta should be 1e5"
This reverts commit cc1638b26763eae7daddd44e3975a885671ef9d3.

This reverts commit
b32385591307e2d33a8f43ce1626b529d2dac83e.
2025-05-12 13:49:43 -07:00
Bruce MacDonald 75441c56f3 add comment explaining rope theta 2025-05-12 13:49:43 -07:00
Bruce MacDonald 45f96e898d ropeTheta should be 1e5 2025-05-12 13:49:43 -07:00
Michael Yang f1257a7de4 update vision rope theta default 2025-05-12 13:49:43 -07:00
Bruce MacDonald ca981c8a49 full attn block indexes should be []int32 2025-05-12 13:49:43 -07:00
Bruce MacDonald 359e1d5b19 full attention layers 2025-05-12 13:49:42 -07:00
Bruce MacDonald eed0ac2948 clean up vision model forward pass 2025-05-12 13:49:42 -07:00
Michael Yang fb3c16f2a2 window index 2025-05-12 13:49:42 -07:00
Michael Yang 7e920c8d75 fix: patch merger and convert
convert:
- split patch embedding
- split qkv

remove duplicate PatchMerger
2025-05-12 13:49:42 -07:00
Bruce MacDonald 5ff0d538b0 wip: implementing rope 2025-05-12 13:49:41 -07:00
Bruce MacDonald 963531215e update convert 2025-05-12 13:49:41 -07:00
Bruce MacDonald c1f9bcb4dd restructure
image processing

Update model.go

Update model.go

Update model.go

no projector

no projector

vision model scaffold

...

...

wip

...

rebase

fix patch merger

tidy

...

Update model_vision.go

server: do not attempt to parse offset file as gguf

This logic was causing issues for me when importing a gguf that had some padding at the end of the file. The valid gguf would be read, but then it would try to read the offset as a different gguf file. This does not seem right.

Update process_image_test.go

apply norm

prompt processing

prompt processing

fix post tokenize

fix gguf padding + populate the split patch embeddings

...

...

another shot at patch embeddings

...

patch embedding

Update model_vision.go

split pixels
2025-05-12 13:49:41 -07:00
Michael Yang b585a58121
chore: remove unused ZipReader type (#10621) 2025-05-08 11:17:41 -07:00
Daniel Hiltgen 424810450f
Move quantization to new backend (#10363)
* Move quantization logic to GGML via new backend

This moves the model aware logic to Go code and calls GGMLs quantization code for model creation.

* Remove "add model quantizations"

This is no longer needed now that quantization is implemented in Go+GGML code directly.
2025-05-06 11:20:48 -07:00
湛露先生 7e5c8eee5c
file close check and close. (#10554)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-05-04 15:37:59 -07:00
Michael Yang 7ba9fa9c7d fixes for maverick 2025-04-25 16:59:20 -07:00
Michael Yang 8bf11b84c1 chunked attention 2025-04-25 16:59:20 -07:00
Michael Yang f0c66e6dea llama4 2025-04-25 16:59:20 -07:00
Michael Yang dc1e81f027 convert: use -1 for read all 2025-04-25 16:59:01 -07:00
Michael Yang 4892872c18 convert: change to colmajor 2025-04-25 15:27:39 -07:00
Michael Yang 2fec73eef6 fix write gguf padding 2025-04-16 10:24:35 -07:00
Bruce MacDonald 6bd0a983cd model: support for mistral-small in the ollama runner
Mistral is a popular research lab making open source models. This updates
the forward pass of llama architecture models to support both llama models
and mistral models by accounting for additional metadata present in mistral
models, and finding the correct dimensions for the output projection.
2025-04-03 16:57:36 -07:00
Bruce MacDonald 9876c9faa4
chore(all): replace instances of interface with any (#10067)
Both interface{} and any (which is just an alias for interface{} introduced in Go 1.18) represent the empty interface that all types satisfy.
2025-04-02 09:44:27 -07:00
Bruce MacDonald 61a8825216
convert: return name of unsupported architecture (#9862)
When a model's architecture cannot be converted return the name of the unsupported arch in the error message.
2025-03-18 10:38:28 -07:00
Patrick Devine 80c7ce381b
fix: change default context size for gemma3 (#9744) 2025-03-13 13:59:19 -07:00
jmorganca 83f0ec8269 all: address linter errors 2025-03-11 14:49:20 -07:00
Michael Yang 63a394068c use 2d pooling 2025-03-11 14:49:20 -07:00
Patrick Devine 2e54d72fc3 fix gemma3 1b conversion 2025-03-11 14:49:19 -07:00
Michael Yang 6b32a2d549 compat with upstream gguf 2025-03-11 14:49:19 -07:00
Michael Yang d368c039f0 skip repacking vision tensors 2025-03-11 14:49:19 -07:00
Patrick Devine 9b54267e69 fix configs 2025-03-11 14:49:19 -07:00
Michael Yang 46bb0169c4 update model 2025-03-11 14:49:19 -07:00
Patrick Devine c62861f4fa fix conversion 2025-03-11 14:49:18 -07:00
Michael Yang 0df1800436 set non-causal attention 2025-03-11 14:49:18 -07:00
Patrick Devine 631fecc6d9 temporary work around for converting spm 2025-03-11 14:49:18 -07:00
Michael Yang 4b037a97dc add gemma vision encoder 2025-03-11 14:49:17 -07:00
Patrick Devine 5f74d1fd47 gemma2 impl 2025-03-11 14:35:08 -07:00
Michael Yang 58245413f4
next ollama runner (#7913)
feat: add new Ollama engine using ggml through cgo

This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.

- `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
- `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
- `ml.Tensor` defines the interface for a tensor and tensor operations

This is the first implementation of the new engine. Follow up PRs will implement more features:

- non-greedy sampling (#8410)
- integration with Ollama and KV caching (#8301)
- more model support (#9080) with more coming soon

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2025-02-13 16:31:21 -08:00
Josh 93a8daf285
convert: import support for command-r models from safetensors (#6063)
---------

Co-authored-by: Patrick Devine <patrick@infrahq.com>
2025-01-15 16:31:22 -08:00
Bruce MacDonald f6f3713001
convert: qwen2 from safetensors (#8408)
Add native support for converting Qwen2 family models (including Qwen2.5)
from safetensors to gguf format so we can run it.
2025-01-14 10:34:37 -08:00
Stefan Weil abfdc4710f
all: fix typos in documentation, code, and comments (#7021) 2024-12-10 12:58:06 -08:00
Michael Yang 4456012956 fix unmarshaling merges 2024-12-04 09:21:56 -08:00
Patrick Devine c7cb0f0602
image processing for llama3.2 (#6963)
Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Jesse Gross <jesse@ollama.com>
2024-10-18 16:12:35 -07:00
Patrick Devine 84b84ce2db
catch when model vocab size is set correctly (#6714) 2024-09-09 17:18:54 -07:00
Patrick Devine 608e87bf87
Fix gemma2 2b conversion (#6645) 2024-09-05 17:02:28 -07:00