Vadim Grinco
d0afc677db
Merge branch 'vulkan' into ollama_vanilla_stable
2025-03-12 13:33:05 +01:00
Michael Yang
aee28501b5
Merge pull request #9661 from ollama/gemma
...
engine: add gemma support
2025-03-11 15:07:50 -07:00
jmorganca
83f0ec8269
all: address linter errors
2025-03-11 14:49:20 -07:00
jmorganca
c6b6938b3a
kvcache: fix tests by adding AvgPool2D stub
2025-03-11 14:49:20 -07:00
jmorganca
fb4664fcec
model: add more spm tokenizer tests
2025-03-11 14:49:20 -07:00
jmorganca
20e3593863
model: validate left and right pairs before merging them
2025-03-11 14:49:20 -07:00
Michael Yang
63a394068c
use 2d pooling
2025-03-11 14:49:20 -07:00
Daniel Hiltgen
ab39e08eb9
llm: auto detect models that require Ollama Engine ( #1 )
2025-03-11 14:49:20 -07:00
jmorganca
11bfa62796
add trailing \n\n after <end_of_image> to match reference implementation
2025-03-11 14:49:20 -07:00
jmorganca
f63e62e546
reduce kernel size, add TODO for loading from config
2025-03-11 14:49:20 -07:00
jmorganca
65b0f329d1
Revert "Allow models to force a new batch"
...
This reverts commit c7eae586b899083acebcd9b3847b89ea78c2850c.
2025-03-11 14:49:20 -07:00
Jesse Gross
06007c0a18
Allow models to force a new batch
...
This is useful for a few things:
- Work around bugs, such as having 2 images in one batch
- Keep the image in a single batch for fully connected attention
- Improve performance by not evaluating embeddings multiple times
2025-03-11 14:49:20 -07:00
Jesse Gross
a8e83a7654
Disable causal attention based on batch index
...
Currently we are using positions, which are relative to a
sequence and may not be unique.
2025-03-11 14:49:20 -07:00
Jesse Gross
475005504e
Restrict Gemma to a single image per request
2025-03-11 14:49:20 -07:00
Jesse Gross
2c40c4d35e
Fix follow up images and images split across batches
2025-03-11 14:49:19 -07:00
Michael Yang
e95278932b
use non-causal mask only for image positions
2025-03-11 14:49:19 -07:00
Michael Yang
9d2a20a763
use non-causal mask for inputs with images
2025-03-11 14:49:19 -07:00
Patrick Devine
2e54d72fc3
fix gemma3 1b conversion
2025-03-11 14:49:19 -07:00
Michael Yang
6b32a2d549
compat with upstream gguf
2025-03-11 14:49:19 -07:00
Michael Yang
c5cbe4fc2a
fallback to cpu
2025-03-11 14:49:19 -07:00
Michael Yang
f888912870
fix vision encoder
2025-03-11 14:49:19 -07:00
Michael Yang
9e4642e9b3
ollama debug tensor
2025-03-11 14:49:19 -07:00
Michael Yang
6b0486c216
duplicate token_embd to output
2025-03-11 14:49:19 -07:00
Michael Yang
d368c039f0
skip repacking vision tensors
2025-03-11 14:49:19 -07:00
Patrick Devine
9b54267e69
fix configs
2025-03-11 14:49:19 -07:00
Michael Yang
46bb0169c4
update model
2025-03-11 14:49:19 -07:00
Michael Yang
8934324b72
use fast attention
2025-03-11 14:49:18 -07:00
Jesse Gross
0e886595bf
Fix tests and drift from main
2025-03-11 14:49:18 -07:00
Patrick Devine
c62861f4fa
fix conversion
2025-03-11 14:49:18 -07:00
Michael Yang
0df1800436
set non-causal attention
2025-03-11 14:49:18 -07:00
Patrick Devine
631fecc6d9
temporary work around for converting spm
2025-03-11 14:49:18 -07:00
Jesse Gross
4346c2409d
fix drift from main
2025-03-11 14:49:18 -07:00
Michael Yang
4b037a97dc
add gemma vision encoder
2025-03-11 14:49:17 -07:00
Patrick Devine
5f74d1fd47
gemma2 impl
2025-03-11 14:35:08 -07:00
Daniel Hiltgen
4dcf80167a
Build release for windows with local script ( #9636 )
2025-03-11 08:34:20 -07:00
Vadim Grinco
9cb4ad02e2
This is no longer needed
...
Signed-off-by: Vadim Grinco <vadim@grinco.eu>
2025-03-11 14:34:17 +01:00
Vadim Grinco
6b1f84e171
Merging the latest stable ( #2 )
...
* Applied 00-fix-vulkan-building.patch
* Implemented vulkan backend based on the work done by whyvl, Dts0, McBane87 and others
Tested on AMD Ryzen 7 8845HS w/ Radeon 780M Graphics with ROCm disabled
```
[GIN-debug] POST /v1/chat/completions --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
[GIN-debug] POST /v1/completions --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers)
[GIN-debug] POST /v1/embeddings --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers)
[GIN-debug] GET /v1/models --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers)
[GIN-debug] GET /v1/models/:model --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers)
time=2025-03-11T13:00:40.793Z level=INFO source=gpu.go:199 msg="vulkan: load libvulkan and libcap ok"
time=2025-03-11T13:00:40.877Z level=INFO source=gpu.go:421 msg="error looking up vulkan GPU memory" error="device is a CPU"
time=2025-03-11T13:00:40.878Z level=WARN source=amd_linux.go:443 msg="amdgpu detected, but no compatible rocm library found. Either install rocm v6, or follow manual install instructions at https://github.com/ollama/ollama/blob/main/docs/linux.md#manual-install "
time=2025-03-11T13:00:40.878Z level=WARN source=amd_linux.go:348 msg="unable to verify rocm library: no suitable rocm found, falling back to CPU"
time=2025-03-11T13:00:40.879Z level=INFO source=types.go:137 msg="inference compute" id=0 library=vulkan variant="" compute=1.3 driver=1.3 name="AMD Radeon Graphics (RADV GFX1103_R1)" total="15.6 GiB" available="15.6 GiB"
```
```
# ollama run phi4:14b
>>> /set verbose
Set 'verbose' mode.
>>> how's it going?
Hello! I'm here to help you with any questions or tasks you have. How can I assist you today? 😊
total duration: 3.341959745s
load duration: 18.165612ms
prompt eval count: 15 token(s)
prompt eval duration: 475ms
prompt eval rate: 31.58 tokens/s
eval count: 26 token(s)
eval duration: 2.846s
eval rate: 9.14 tokens/s
>>>
```
2025-03-11 14:09:47 +01:00
Michael Yang
26a26998fb
Merge pull request #9590 from ollama/mxyng/dump-pad
...
fix: pad tensor item if ge zero
2025-03-10 16:34:55 -07:00
Michael Yang
9926eae015
fix: pad tensor item if ge zero
...
this produces a nicer output since both positive and negative values
produces the same width
2025-03-10 16:18:12 -07:00
Vincent Koc
8585b7b151
docs: add opik to observability integrations ( #9626 )
2025-03-10 16:15:10 -07:00
Parth Sareen
7e34f4fbfa
sample: add numerical stability to temperature/softmax transform ( #9631 )
2025-03-10 14:43:53 -07:00
Michael Yang
fe776293f7
Merge pull request #9569 from dwt/patch-1
...
Better WantedBy declaration
2025-03-10 14:09:37 -07:00
frob
d8a5d96b98
docs: Add OLLAMA_CONTEXT_LENGTH to FAQ. ( #9545 )
2025-03-10 11:02:54 -07:00
Xiaowei Zhu
757668c42f
docs: add SwiftChat ( #9540 )
2025-03-10 11:01:09 -07:00
Sam
96ec8afd09
docs(tool): add mcp-llm ( #9537 )
2025-03-10 09:52:02 -07:00
Jeffrey Morgan
e093db92c4
sample: temporarily use grammars for constrained generation in new engine ( #9586 )
2025-03-10 16:17:39 +01:00
Vadim Grinco
31606b2feb
Merged in the right direction
...
Signed-off-by: Vadim Grinco <vadim@grinco.eu>
2025-03-10 12:51:49 +01:00
Vadim Grinco
b14dd68fee
Fixed the "detached head" issues
...
Signed-off-by: Vadim Grinco <vadim@grinco.eu>
2025-03-10 12:51:49 +01:00
Vadim Grinco
cff62cc6c2
Merge branch 'ollama_vulkan_stable' into grinco-vulkan
2025-03-10 12:39:00 +01:00
Vadim Grinco
98f699773a
Applied 00-fix-vulkan-building.patch
...
Work done by McBane87 here: https://github.com/whyvl/ollama-vulkan/issues/7#issuecomment-2660836871
Signed-off-by: Vadim Grinco <vadim@grinco.eu>
2025-03-10 12:34:37 +01:00