Commit Graph

4393 Commits

Author SHA1 Message Date
Gabe Goodhart dbd8ee2654 fix: Fix support for arch-specific ggml-cpu source files with new arrangement
In https://github.com/ggml-org/llama.cpp/pull/13892, all arch-specific
implementations were split out into a nested tree structure under
ggml-cpu/arch. This conflicts with standard CGO layout where all
arch-specific source files are expected to live in the same directory as
the parent go module and use suffixes based on GOOS and GOARCH. As such,
there were really two options for getting this to work:

1. Add a patch on top of the GGML sync to rearrange the files to match the
GO layout convention
2. Use CGO directives to conditionally include the nested source files in
the compilation units

This commit does (2) in order to minimize the set of changes needed on top
of the upstream file layout. To get this to work, there are two key things
needed:

1. In cpu.go, #cgo directives are added to explicitly set __${GOARCH}__ in
the preprocessor directives
2. In arch-impls.c|cpp, use an #ifdef | #elif defined | #endif chain to
explicitly include the .c|.cpp files for the given architecture from the
nested directory

Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:08:56 -06:00
Gabe Goodhart 7334a0ea07 chore: Ignore *.patched in the patch directory
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:08:42 -06:00
Gabe Goodhart 1664d52be6 fix: Add patch for mtmd_input_text
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:08:29 -06:00
Gabe Goodhart 3d70237fd1 fix: Update llama.go to use mtmd instead of clip/llava
It's _very_ possible that this is broken!

Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:06:47 -06:00
Gabe Goodhart fa54a3cf3a fix: Add missing include in sampling_ext.cpp
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:06:40 -06:00
Gabe Goodhart d0fd9e5aa2 fix: Remove mtmd main cpp files
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:06:31 -06:00
Gabe Goodhart 1cd9352cc3 fix: Narrow llama.cpp rsync-filter to not include mtmd main tool cpp files
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:06:18 -06:00
Gabe Goodhart 85aba511ec fix: Add ggml files missing from sync
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:06:05 -06:00
Gabe Goodhart 62af160d82 fix: Update ggml rsync-filter for new ggml-cpu/arch subdirs
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:05:39 -06:00
Gabe Goodhart 414a097372 fix: Add files missing from sync
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:05:25 -06:00
Gabe Goodhart 424e05c20e fix: Update rsync-filter for all moved/new/removed files
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:04:51 -06:00
Gabe Goodhart 2613f5da2d feat: Sync llama.cpp and ggml
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:01:24 -06:00
Gabe Goodhart 73d089bb90 feat: Update all patches
There are a number that are no longer needed at all:

- 0003-embeddings: Embeddings entirely overhauled on master
- 0008-ensure-KV-cache-is-fully-defragmented: KV caching entirely
    overhauled on master
- 0019-metal-add-mean-kernel-14267: Merged upstream
- 0020-CUDA-add-mean-operation-14313: Merged upstream

Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 16:57:05 -06:00
Gabe Goodhart a30ae1fa20 TEMPORARY: Update the llama.cpp upstream to my fork's Granite Four branch
This will be redone once my branch is merged upstream in llama.cpp

Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 16:24:42 -06:00
Michael Yang 4129af9205
chore: cleanup comments + unused vars (#11225) 2025-06-27 11:45:33 -07:00
Jesse Gross 45f216a9c7 ggml: Temporarily disable reporting UUIDs
This is causing segfaults, so disable it. Currently UUIDs are only
used for debugging purposes, although they planned to be used in
additional ways in the future.

Bug #11211
2025-06-27 11:27:22 -07:00
Michael Yang d0b32def60
skip quantizing per_layer_token_embd (#11207)
this tensor isn't compatible with cuda when quantized to q4_K so skip it
2025-06-26 21:49:35 -07:00
Daniel Hiltgen 11ffc36157
ci: multi-stage release process (#11001) 2025-06-26 10:32:48 -07:00
Jeffrey Morgan ba04902670
fs/ggml: add multiplier in graph estimates (#11208) 2025-06-26 00:19:44 -07:00
Jeffrey Morgan 3944602f51
fs/ggml: add missing architecture to OllamaEngineRequired() (#11206) 2025-06-26 00:11:23 -07:00
Michael Yang 73b642e6f3
add new gemma model (#11204)
* update patches

* cherry pick metal mean kernel

* cherry pick cuda mean kernel

* gemma3n
2025-06-25 21:47:09 -07:00
Daniel Hiltgen ad118d8b13
ci: arm sbsa fixes (#11194) 2025-06-24 21:00:15 -07:00
Daniel Hiltgen f08534137b ci: include dependencies 2025-06-24 20:27:43 -07:00
Daniel Hiltgen 4b4a90f233
ci: pick up arm sbsa cuda libs (#11192) 2025-06-24 18:59:22 -07:00
Daniel Hiltgen 03274a6b2f
ci: recombine linux amd64 binaries (#11188)
Glue the rocm and archive builds back together.
2025-06-24 18:45:01 -07:00
Devon Rifkin cc6463ebca
Merge pull request #10238 from ollama/drifkin/array-head-count-simple
ggml: fix crash for array head counts
2025-06-24 17:50:02 -07:00
Daniel Hiltgen 405d2f628f
ci: rocm parallel builds on windows (#11187)
The preset CMAKE_HIP_FLAGS isn't getting used on Windows.
This passes the parallel flag in through the C/CXX flags, along
with suppression for some log spew warnings to quiet down the build.
2025-06-24 15:27:09 -07:00
Devon Rifkin a3f7dd3e98 Merge branch 'main' into drifkin/array-head-count-simple 2025-06-24 14:20:05 -07:00
Daniel Hiltgen c85c0ebf89
CI: switch windows to vs 2022 (#11184)
* CI: switch windows to vs 2022

* ci: fix regex match
2025-06-24 13:26:55 -07:00
Daniel Hiltgen 10a8e04a8d
avoid context overflow (#11175)
For smaller context models, make sure we do not exceed the training size.
2025-06-23 15:52:50 -07:00
Daniel Hiltgen 1c6669e64c
Re-remove cuda v11 (#10694)
* Re-remove cuda v11

Revert the revert - drop v11 support requiring drivers newer than Feb 23

This reverts commit c6bcdc4223.

* Simplify layout

With only one version of the GPU libraries, we can simplify things down somewhat.  (Jetsons still require special handling)

* distinct sbsa variant for linux arm64

This avoids accidentally trying to load the sbsa cuda libraries on
a jetson system which results in crashes.

* temporary prevent rocm+cuda mixed loading
2025-06-23 14:07:00 -07:00
Devon Rifkin b2b270ad5d Merge branch 'main' into drifkin/array-head-count-simple 2025-06-23 10:37:31 -07:00
AJ 2bb69b40c7
readme: add ai-hub to community integrations (#11169) 2025-06-23 09:21:12 -07:00
Daniel Hiltgen 65bff664cb
build speedups (#11142)
Enable parallel building of the GPU architectures.
2025-06-20 12:32:51 -07:00
Michael Yang c088ac0e79
convert: utility for merging tensors (#11069) 2025-06-20 11:12:01 -07:00
Michael Yang 0a066cfd91
Reapply "feat: incremental gguf parser (#10822)" (#11114) (#11119)
* Reapply "feat: incremental gguf parser (#10822)" (#11114)

This reverts commit a6e64fbdf2.

* fix older ggufs
2025-06-20 11:11:40 -07:00
Jesse Gross 87b7af6cee ggml: Check return status for computation.
We don't check the return status after computing the graph, which
can silently lead to bad outputs if we try to keep going and future
computation succeeds. This appears to happens in certain cases on
Apple M2 devices.

Fixes #11070
2025-06-19 17:12:49 -07:00
Daniel Hiltgen f2527b08fb
int: add coverage for older models (#11137)
Verified these fail on 0.9.1 and pass on HEAD.
2025-06-19 12:10:19 -07:00
Jeffrey Morgan 8bcb3125c1
benchmark: remove unused benchmark test (#11120)
Removes a test under benchmark/ that is unused
2025-06-18 12:58:50 -07:00
Jeffrey Morgan 6baf1e31e2
Revert "Revert "ggml: Export GPU UUIDs" (#11115)" (#11117)
Reverts PR #11115. The original change was mistakingly reverted instead of #10822
2025-06-18 07:30:49 -07:00
Jeffrey Morgan ed567ef43b
Revert "ggml: Export GPU UUIDs" (#11115)
This reverts commit aaa7818000.
2025-06-18 05:45:00 -07:00
Jeffrey Morgan a6e64fbdf2
Revert "feat: incremental gguf parser (#10822)" (#11114)
This reverts commit 6b04cad7e8.
2025-06-18 05:42:44 -07:00
曹家巧 60cfa2a203
cache: fix comment function name in cache.go (#11110) 2025-06-18 05:21:45 -07:00
Jeffrey Morgan 55bbf3b4a1
tools: return empty arguments object instead of null (#11113) 2025-06-18 05:20:43 -07:00
Jeffrey Morgan 6bda1d2479
tools: fix parsing tool calls without any parameters (#11101)
Fixes issue where tool calls that don't expect any parameters were
not being parsed. This also fixes two additional issues: one where
2+ tool calls would not be correctly parsed, and cases where tool calls
with invalid parameters would still get parsed
2025-06-17 10:51:43 -07:00
Jeffrey Morgan 9e125d884c
model: treat 'user defined' tokens as special tokens (#11077) 2025-06-16 16:03:16 -07:00
Michael Yang a6fbfc880c
gguf: fix write order (#11068)
* ggml: test write gguf order
* ggml: fix write tensor order
2025-06-16 10:42:32 -07:00
NGC13009 502028968d
readme: add ollama-launcher to community integrations (#11080) 2025-06-15 21:27:49 -07:00
Phil 5a8eb0e151
readme: add GPTranslate to community integrations (#11071) 2025-06-14 08:54:03 -07:00
Jeffrey Morgan 9f8a18ec05
tools: loosen tool parsing to allow for more formats (#11030) 2025-06-12 14:18:54 -07:00