Michael Yang
89637ae43b
gemma2: enable flash attention
2025-12-16 09:45:05 -08:00
Michael Yang
de82b1f9a3
cleanup attention interface
...
the updated interface supports variadic attention options which
removes the need for individual `AttentionWith...` functions. it means
more models can use the attention interface, e.g. models with
custom masks, logit softcapping, etc.
additionally, this interface should be less error prone since there are
now reasonable defaults for all optional parameters
2025-12-16 09:45:04 -08:00
Parth Sareen
89eb795293
parsers/renderers: use think from user for nemotron ( #13492 )
2025-12-15 18:55:17 -08:00
Parth Sareen
7e3ea813c1
llama/parsers/renderers: nemotron 3 nano ( #13489 )
...
---------
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>
2025-12-15 18:00:08 -08:00
Grace
7b95087b9d
Adding tool definitions to DeepseekV3 renderer ( #13491 )
2025-12-15 17:57:06 -08:00
Michael Yang
971d62595a
fix: qwen2.5 vl rope ( #13486 )
...
* qwen25vl: bump max pixels
* qwen25vl: mrope
fix qwen2.5vl window
* qwen25vl: vision rope
2025-12-15 17:30:33 -08:00
Parth Sareen
ffbe8e076d
model: add olmo3 and olmo3.1 ( #13415 )
2025-12-15 15:20:04 -08:00
Grace
2c639431b1
DeepseekV3 family renderer ( #13180 )
2025-12-15 14:50:52 -08:00
Parth Sareen
e3731fb160
renderers: add olmo3.1 and olmo3 fixes ( #13447 )
2025-12-15 11:26:43 -08:00
Jeffrey Morgan
4ff8a691bc
model: default gemma 3 rope scale to 1.0, apply corrections based on layer counts ( #13453 )
2025-12-12 17:51:56 -08:00
Jeffrey Morgan
1b308e1d2a
model: fix global layer rope scale values for gemma 3 ( #13452 )
2025-12-12 16:29:01 -08:00
Jeffrey Morgan
3af5d3b738
model: force rope factor 1.0 for Gemma 3 ( #13445 )
2025-12-12 13:27:08 -08:00
Jeffrey Morgan
2dfb74410d
model: fix rotary embeddings for ministral 3 ( #13432 )
2025-12-11 16:02:05 -08:00
Jeffrey Morgan
a838421ea3
model: conversion and hyperparameter fixes for ministral and devstral ( #13424 )
2025-12-11 13:04:00 -08:00
nicole pardal
76f88caf43
nomic-embed-text:v2: model implementation ( #13162 )
2025-12-09 14:24:51 -08:00
Parth Sareen
2bccf8c624
renderers/parsers: olmo3 instruct ( #13383 )
2025-12-09 11:12:27 -08:00
Parth Sareen
0c5e5f6630
parsers/renderers: olmo3 think ( #13290 )
2025-12-09 10:41:47 -08:00
Jeffrey Morgan
d2f334c1f7
model: add rnj-1 inference support ( #13354 )
2025-12-08 16:49:17 -08:00
Michael Yang
603ceefaa6
refactor rope
...
change to a flatter directory structure and group the options with the
function
update models to call rope in one place
2025-12-08 14:42:22 -08:00
Patrick Devine
d3e0a0dee4
model: ministral w/ llama4 scaling ( #13292 )
...
This change:
* fixes rope scaling in the mistral converter
* updates ministral to include llama4 scaling
* includes a new ministral parser for parsing reasoning and tool calling
---------
Co-authored-by: jmorganca <jmorganca@gmail.com>
2025-12-01 23:20:14 -08:00
Grace
d70e935526
Parser for Cogito v2 ( #13145 )
2025-11-19 17:21:07 -08:00
Michael Yang
5c1063df7f
deepseek2: upgrade to run v3+ models ( #13166 )
...
the check for mla omits v3 and r1 which should not return unsupported.
instead check the tokenizer for compatibility
2025-11-19 17:05:39 -08:00
Patrick Devine
604e43b28d
models: enable deepseek2 (deepseek v3.1 w/ MLA) on the new engine ( #13151 )
2025-11-18 22:03:50 -08:00
Grace
91935631ac
Renderer for Cogito v2 ( #13139 )
2025-11-18 19:06:34 -08:00
nicole pardal
8de30b568a
nomic-embed-text model implementation ( #13071 )
2025-11-18 18:28:10 -08:00
Michael Yang
92981ae3f2
deepseekocr
2025-11-18 16:11:37 -08:00
Michael Yang
440a3823a6
fix(tokenizer): add special tokens to empty inputs ( #13091 )
2025-11-18 11:16:56 -08:00
Grace
584e2d646f
Add deepseek v3.1 ( #13063 )
...
* Add mla for flash attention
* Revert to using chunks
2025-11-17 18:03:21 -08:00
Michael Yang
333203d871
chore: update models to use slice/chunk/chunksections ( #12934 )
...
* use slice/chunks
* bert
* llama4
* gemma3n
* gptoss
* mistral3
* qwen3vl
* qwen25vl
* deepseek2
* remove unused ops
2025-11-13 15:20:12 -08:00
Daniel Hiltgen
544b6739dd
ggml update to b6840 ( #12791 )
2025-11-06 10:19:22 -08:00
Michael Yang
ce3eb0a315
chore(gptoss): cleanup dead code ( #12932 )
2025-11-03 11:27:15 -08:00
Michael Yang
f67a6df110
interleaved mrope ( #12807 )
...
* ml(ggml): mrope
* interleave mrope
2025-10-30 11:29:00 -07:00
Michael Yang
d432ade714
fix: qwen2.5vl, qwen3vl composite image ( #12841 )
...
this change fixes images with an alpha channel by overlaying the image
onto a white background
2025-10-30 10:33:19 -07:00
Grace
0a2d92081b
Removing whitespace between Thinking and Content in Qwen3VL ( #12838 )
...
Eats extra whitespace at the end/beginning of content
2025-10-29 15:14:28 -07:00
Michael Yang
7d25b9e194
feat(model): add qwen3vl ( #12665 )
2025-10-28 17:39:47 -07:00
Michael Yang
1188f408dd
s/From*Slice/From*s/ ( #12255 )
2025-10-28 12:08:49 -07:00
Michael Yang
ec9eb28f4c
gemma3: make embedding non-causal ( #12297 )
2025-10-27 19:54:08 -07:00
Jeffrey Morgan
94f110b35a
model/parsers: remove warning for missing <think> tag for qwen3-vl ( #12713 )
2025-10-20 16:03:43 -07:00
Daniel Hiltgen
bc1a818fdc
contiguous input per layer ( #12686 )
...
Co-authored-by: Michael Yang <git@mxy.ng>
2025-10-17 18:39:18 -07:00
Jeffrey Morgan
65fb3ff49d
renderers: add global flag for setting [img] tags ( #12669 )
...
Adds a temporary global flag to renderers that causes renderers to always
render images as [img]. In a follow up change, we will consider making this
the default, and this flag could eventually be removed
2025-10-16 16:37:32 -07:00
Grace
e2a0b24435
Grace/qwen3 thinking ( #12647 )
...
* changing initial status to take into consideration prefill
* Add seperate strings for content and thinking builder
* thinking tests
* remove white space from string before closing think tag
2025-10-16 15:29:41 -07:00
Devon Rifkin
08fbb60bb2
qwen3-coder: support anyOf when parsing tool calls
2025-10-14 15:33:05 -07:00
Devon Rifkin
ddaca643d0
add registries for parsers/renderers
2025-10-14 01:13:54 -07:00
Grace
05982a95cb
Qwen3VL Cloud Parser and Renderer ( #12526 )
...
* working (other than tool call is the incorrect order) for tool calls and tools
* Tests work, other than image tags (tests do not go through server) and tools (not in the correct order, but contents are the same)
* testing for qwen3vl parser - toolparser is working
* made changes to JSON tool parser, wraps the TollCallFunction with a TollCall object
* Working parser for thinking models - assumes state of thinking, emits unambiguous content in thinking, does not call tool call in thinking
* changed the parser to start with collecting content
* thinking prefill
* add hasThinkingSupport parameter to parser
* qwen3-vl -> qwen3-vl-instruct for renderer/parser
* Add hasThinkingSupport=false to QwenVLParser
---------
Co-authored-by: Devon Rifkin <drifkin@drifkin.net>
2025-10-13 16:52:33 -07:00
Michael Yang
6c833d5f8d
fix(qwen3): deepseek distill
...
deepseek's qwen3 distill uses a different rope scheme so support both
2025-10-13 13:30:30 -07:00
yajianggroup
df411c4b02
refactor: using testing.B.Loop
...
Signed-off-by: yajianggroup <yajianggroup@outlook.com>
2025-10-10 13:25:29 -07:00
shengxinjing
47298fce39
refactor: use builtin max and min
2025-10-09 16:17:52 -07:00
shengxinjing
4a48937ef1
refactor: use builtin max and min
2025-10-09 16:17:52 -07:00
Grace
33801c1597
Fixed Deepseek2 adding nil tensor error
2025-10-03 14:20:06 -07:00
Devon Rifkin
83021fcf0f
qwen3-coder: fix tool definition type rendering
2025-09-30 15:03:15 -07:00