Jeffrey Morgan
ca3520de87
readme: update Ollama icon size
2025-12-29 06:39:40 -06:00
Daniel Hiltgen
55a4a37c3a
int: add performance integration tests ( #11173 )
...
usage example:
go test --tags=integration,perf -count 1 ./integration -v -timeout 1h -run TestModelsPerf 2>&1 | tee int.log
cat int.log | grep MODEL_PERF_HEADER | cut -f2- -d: > perf.csv
cat int.log | grep MODEL_PERF_DATA | cut -f2- -d: >> perf.csv
2025-12-29 06:39:40 -06:00
Daniel Hiltgen
ba750172ca
doc: add NVIDIA blackwell to supported list ( #11307 )
2025-12-29 06:39:40 -06:00
Vincent RAMPAL
35bf6c0a41
Update base image to Ubuntu 24.04 LTS ( #9681 )
2025-12-29 06:39:40 -06:00
Daniel Hiltgen
b23d28b549
doc: Update link for mac install ( #11288 )
...
Favor the dmg now.
2025-12-29 06:39:40 -06:00
Daniel Hiltgen
e897624123
mimic logs for layers on new engine ( #11278 )
...
This adds some extra logs to make the new engine a bit more consistent
with the llama engine.
2025-12-29 06:39:39 -06:00
XuKecheng
a3e4bb7f58
readme: add NativeMind to community integrations ( #11242 )
2025-12-29 06:39:39 -06:00
Jeffrey Morgan
9cf8ef9371
tools: fix parsing tool calls with empty arguments, missing required fields ( #11233 )
2025-12-29 06:39:39 -06:00
Attogram Project
96be53fe6c
readme: add ollama-bash-toolshed to community integrations ( #11224 )
2025-12-29 06:39:39 -06:00
Michael Yang
1cdab47113
chore: cleanup comments + unused vars ( #11225 )
2025-12-29 06:39:39 -06:00
Jesse Gross
872d190c8f
ggml: Temporarily disable reporting UUIDs
...
This is causing segfaults, so disable it. Currently UUIDs are only
used for debugging purposes, although they planned to be used in
additional ways in the future.
Bug #11211
2025-12-29 06:39:39 -06:00
Michael Yang
8f2099306f
skip quantizing per_layer_token_embd ( #11207 )
...
this tensor isn't compatible with cuda when quantized to q4_K so skip it
2025-12-29 06:39:38 -06:00
Daniel Hiltgen
59112600d1
ci: multi-stage release process ( #11001 )
2025-12-29 06:39:38 -06:00
Jeffrey Morgan
10119ec2ee
fs/ggml: add multiplier in graph estimates ( #11208 )
2025-12-29 06:39:38 -06:00
Jeffrey Morgan
84998ae4ba
fs/ggml: add missing architecture to OllamaEngineRequired() ( #11206 )
2025-12-29 06:39:38 -06:00
Michael Yang
801564fa8b
add new gemma model ( #11204 )
...
* update patches
* cherry pick metal mean kernel
* cherry pick cuda mean kernel
* gemma3n
2025-12-29 06:39:38 -06:00
Daniel Hiltgen
d6253f09c2
ci: arm sbsa fixes ( #11194 )
2025-12-29 06:39:37 -06:00
Daniel Hiltgen
9cf1db79b4
ci: include dependencies
2025-12-29 06:39:37 -06:00
Daniel Hiltgen
46654149c9
ci: pick up arm sbsa cuda libs ( #11192 )
2025-12-29 06:39:37 -06:00
Daniel Hiltgen
138c973d8f
ci: recombine linux amd64 binaries ( #11188 )
...
Glue the rocm and archive builds back together.
2025-12-29 06:39:37 -06:00
Devon Rifkin
dd8d037c16
load arrays with up to 1024 elements when estimating
...
This mirrors the old behavior before #10382
2025-12-29 06:39:37 -06:00
Devon Rifkin
558c1920fa
ggml: fix crash for array head counts
...
If it's an array, it uses the max value in the array
If array values for head counts becomes more popular, we can consider a
more invasive change like #10225 to calculate more accurate estimates.
Fixes : #9984
2025-12-29 06:39:34 -06:00
Daniel Hiltgen
b9b179fe00
ci: rocm parallel builds on windows ( #11187 )
...
The preset CMAKE_HIP_FLAGS isn't getting used on Windows.
This passes the parallel flag in through the C/CXX flags, along
with suppression for some log spew warnings to quiet down the build.
2025-12-29 06:38:19 -06:00
Daniel Hiltgen
38f92e7332
CI: switch windows to vs 2022 ( #11184 )
...
* CI: switch windows to vs 2022
* ci: fix regex match
2025-12-29 06:38:18 -06:00
Daniel Hiltgen
c012d1805b
avoid context overflow ( #11175 )
...
For smaller context models, make sure we do not exceed the training size.
2025-12-29 06:38:18 -06:00
Daniel Hiltgen
29ec3ddf9a
Re-remove cuda v11 ( #10694 )
...
* Re-remove cuda v11
Revert the revert - drop v11 support requiring drivers newer than Feb 23
This reverts commit c6bcdc4223 .
* Simplify layout
With only one version of the GPU libraries, we can simplify things down somewhat. (Jetsons still require special handling)
* distinct sbsa variant for linux arm64
This avoids accidentally trying to load the sbsa cuda libraries on
a jetson system which results in crashes.
* temporary prevent rocm+cuda mixed loading
2025-12-29 06:38:18 -06:00
AJ
d8b03acc1a
readme: add ai-hub to community integrations ( #11169 )
2025-12-29 06:38:18 -06:00
Daniel Hiltgen
95571375dd
build speedups ( #11142 )
...
Enable parallel building of the GPU architectures.
2025-12-29 06:38:18 -06:00
Michael Yang
69ee842b6e
convert: utility for merging tensors ( #11069 )
2025-12-29 06:38:17 -06:00
Michael Yang
4585d231ee
Reapply "feat: incremental gguf parser ( #10822 )" ( #11114 ) ( #11119 )
...
* Reapply "feat: incremental gguf parser (#10822 )" (#11114 )
This reverts commit a6e64fbdf2 .
* fix older ggufs
2025-12-29 06:38:17 -06:00
Jesse Gross
290d4c2c6c
ggml: Check return status for computation.
...
We don't check the return status after computing the graph, which
can silently lead to bad outputs if we try to keep going and future
computation succeeds. This appears to happens in certain cases on
Apple M2 devices.
Fixes #11070
2025-12-29 06:38:17 -06:00
Daniel Hiltgen
29b668e649
int: add coverage for older models ( #11137 )
...
Verified these fail on 0.9.1 and pass on HEAD.
2025-12-29 06:38:17 -06:00
Jeffrey Morgan
6d36b8dcfb
benchmark: remove unused benchmark test ( #11120 )
...
Removes a test under benchmark/ that is unused
2025-12-29 06:38:17 -06:00
Jeffrey Morgan
5e3fb4744b
Revert "Revert "ggml: Export GPU UUIDs" ( #11115 )" ( #11117 )
...
Reverts PR #11115 . The original change was mistakingly reverted instead of #10822
2025-12-29 06:38:16 -06:00
Jeffrey Morgan
c5237d9462
Revert "ggml: Export GPU UUIDs" ( #11115 )
...
This reverts commit aaa7818000 .
2025-12-29 06:38:16 -06:00
Jeffrey Morgan
4f1588bc37
Revert "feat: incremental gguf parser ( #10822 )" ( #11114 )
...
This reverts commit 6b04cad7e8 .
2025-12-29 06:38:16 -06:00
曹家巧
8c3501c161
cache: fix comment function name in cache.go ( #11110 )
2025-12-29 06:38:16 -06:00
Jeffrey Morgan
829e77105a
tools: return empty arguments object instead of null ( #11113 )
2025-12-29 06:38:16 -06:00
Jeffrey Morgan
1dc12706c5
tools: fix parsing tool calls without any parameters ( #11101 )
...
Fixes issue where tool calls that don't expect any parameters were
not being parsed. This also fixes two additional issues: one where
2+ tool calls would not be correctly parsed, and cases where tool calls
with invalid parameters would still get parsed
2025-12-29 06:38:15 -06:00
Jeffrey Morgan
2c371ff357
model: treat 'user defined' tokens as special tokens ( #11077 )
2025-12-29 06:38:15 -06:00
Michael Yang
142efb91b1
gguf: fix write order ( #11068 )
...
* ggml: test write gguf order
* ggml: fix write tensor order
2025-12-29 06:38:15 -06:00
NGC13009
7e0b662c6c
readme: add ollama-launcher to community integrations ( #11080 )
2025-12-29 06:38:15 -06:00
Phil
4c7cf115fe
readme: add GPTranslate to community integrations ( #11071 )
2025-12-29 06:38:15 -06:00
Jeffrey Morgan
2d86651985
tools: loosen tool parsing to allow for more formats ( #11030 )
2025-12-29 06:38:14 -06:00
Michael Yang
2c6f1dc9c8
feat: incremental gguf parser ( #10822 )
...
* incremental gguf parser
* gguf: update test to not rely on gguf on disc
* re-use existing create gguf
* read capabilities from gguf kv
* kv exists
* update tests
* s/doneFunc/successFunc/g
* new buffered reader
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2025-12-29 06:38:14 -06:00
Michael Yang
db3a312edf
feat: uneven splits ( #11048 )
...
The current splitDim function only operates on tensors that are split evenly which isn't always the case, e.g. a QKV tensor. This change allows the function to be used for arbitrary splits
2025-12-29 06:38:14 -06:00
Michael Yang
0d5c118679
skip tokenizer.model if possible ( #11050 )
...
if tokenizer.json is already copied, skip tokenizer.model
2025-12-29 06:38:14 -06:00
Michael Yang
eb2c2d61e5
use nn.Linear in place of ml.Tensor ( #11049 )
...
while nn.Linear.Forward isn't applicable for sparse MLP, it's still
a nice container for the tensors
2025-12-29 06:38:13 -06:00
Attogram Project
4fff1738a4
readme: add ollama-multirun to community integrations ( #11038 )
2025-12-29 06:38:13 -06:00
Jeffrey Morgan
26a1129d71
readme: update quickstart link text to Gemma 3
2025-12-29 06:38:13 -06:00