Commit Graph

4428 Commits

Author SHA1 Message Date
Gabe Goodhart 4f462a9f67 feat: Bump llama.cpp to 4a4f42
This picks up support for Kimi K2 and PLaMO-2

Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-15 14:49:15 -06:00
Gabe Goodhart 91e4b10d40 fix: Sync patch changes for ggml-cpu.c
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-11 16:01:15 -06:00
Gabe Goodhart 0beea04b52 fix: Add a patch to avoid power throttling API on non-msvc windows builds
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-11 16:00:49 -06:00
Gabe Goodhart e8a303a701 build: Add top-level include for GNUINstallDirs in CMakeLists.txt
This is used to populate CMAKE_INSTALL_BINDIR

Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-11 13:44:10 -06:00
Gabe Goodhart 81d821ba9b build: Include cmake/common.cmake in ggml sync
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-11 13:25:01 -06:00
Gabe Goodhart bf1b261611 feat: Sync all patched code
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-11 11:44:18 -06:00
Gabe Goodhart 3020c462da fix: Add patch for GGML_VERSION and GGML_COMMIT constants
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-11 11:43:14 -06:00
Gabe Goodhart d7f98e0673 fix: Revert changes to ggml export GPU UUID patch
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-11 11:42:26 -06:00
Gabe Goodhart 111434ab39 feat: Bump back to the cenral repo and point at the latest master
This includes granite 4 and a number of other model architectures!

Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-11 10:43:22 -06:00
Gabe Goodhart 06a5592dc5 fix: Update patches for bump
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-10 16:01:30 -06:00
Gabe Goodhart 0a7ddc4e17 feat: Bump to the latest tip of the branch
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-10 16:01:14 -06:00
Gabe Goodhart 152260e9c7 fix: Update patch 0015 for upstream implementation of uuid
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-10 14:33:12 -06:00
Gabe Goodhart e61826c180 Merge remote-tracking branch 'origin/main' into GraniteFour
* origin/main:
ggml: Report ordinal IDs for AMD GPUs on Windows
doc: add MacOS docs (#11334)
Reduce default parallelism to 1 (#11330)
API/CLI context enhancements (#11331)
add `tool_name` to api.md (#11326)
template: add tool result compatibility (#11294)
ci: modularization (#11324)
Revert "ggml: Temporarily disable reporting UUIDs"
readme: update Ollama icon size
int: add performance integration tests (#11173)
doc: add NVIDIA blackwell to supported list (#11307)
Update base image to Ubuntu 24.04 LTS (#9681)
doc: Update link for mac install (#11288)
mimic logs for layers on new engine (#11278)
readme: add NativeMind to community integrations (#11242)
tools: fix parsing tool calls with empty arguments, missing required fields (#11233)
readme: add ollama-bash-toolshed to community integrations (#11224)
2025-07-10 14:01:24 -06:00
Jesse Gross 35fda7b4af ggml: Report ordinal IDs for AMD GPUs on Windows
We don't get valid UUIDs for AMD GPUs on Windows, so the best option
is to use the ordinal IDs. This brings us in line with what we currently
do on the Ollama server - the only exception is AMD GPUs on Linux, which
falls back to using ordinal IDs. The GGML implementation has no fallback
but it doesn't appear to occur for any of the GPUs that we support.

It's also possible that there are collisions between ordinal IDs for
different libraries - however the only places where we use them are
AMD on Windows and Metal on Mac, which can never occur on the same
system.
2025-07-09 10:35:31 -07:00
Daniel Hiltgen 66fb8575ce
doc: add MacOS docs (#11334)
also removes stale model dir instructions for windows
2025-07-08 15:38:04 -07:00
Daniel Hiltgen 20c3266e94
Reduce default parallelism to 1 (#11330)
The current scheduler algorithm of picking the paralellism based on available
VRAM complicates the upcoming dynamic layer memory allocation algorithm.  This
changes the default to 1, with the intent going forward that parallelism is
explicit and will no longer be dynamically determined.  Removal of the dynamic
logic will come in a follow up.
2025-07-08 12:08:37 -07:00
Daniel Hiltgen 34088dbcfb
API/CLI context enhancements (#11331)
* API: expose context size of loaded models

* CLI: add context UX

This adds a column in the ps output to show the models context size.
2025-07-08 11:59:06 -07:00
Parth Sareen 43107b15b9
add `tool_name` to api.md (#11326) 2025-07-07 16:53:13 -07:00
Parth Sareen 1f91cb0c8c
template: add tool result compatibility (#11294) 2025-07-07 15:53:42 -07:00
Daniel Hiltgen 12d8ad0d38
ci: modularization (#11324)
switch a few constants to variables
2025-07-07 14:07:43 -07:00
Jesse Gross 592d21e7db Revert "ggml: Temporarily disable reporting UUIDs"
The root cause was an unclean upgrade - this code is fine.

This reverts commit 45f216a9c7.
2025-07-07 11:31:02 -07:00
Jeffrey Morgan 5a08b01f5b
readme: update Ollama icon size 2025-07-05 17:20:42 -07:00
Daniel Hiltgen 4f473e224c
int: add performance integration tests (#11173)
usage example:
  go test --tags=integration,perf -count 1 ./integration -v -timeout 1h -run TestModelsPerf 2>&1 | tee int.log
  cat int.log | grep MODEL_PERF_HEADER | cut -f2- -d: > perf.csv
  cat int.log | grep MODEL_PERF_DATA | cut -f2- -d: >> perf.csv
2025-07-05 16:07:09 -07:00
Daniel Hiltgen 9d60bb44cf
doc: add NVIDIA blackwell to supported list (#11307) 2025-07-05 16:06:30 -07:00
Vincent RAMPAL f371260e75
Update base image to Ubuntu 24.04 LTS (#9681) 2025-07-05 16:02:33 -07:00
Daniel Hiltgen c9e6d7719e
doc: Update link for mac install (#11288)
Favor the dmg now.
2025-07-03 09:48:45 -07:00
Daniel Hiltgen 2c4ce40334
mimic logs for layers on new engine (#11278)
This adds some extra logs to make the new engine a bit more consistent
with the llama engine.
2025-07-02 16:38:36 -07:00
XuKecheng 5d8c173529
readme: add NativeMind to community integrations (#11242) 2025-07-01 09:46:15 -07:00
Jeffrey Morgan 44b17d2bfa
tools: fix parsing tool calls with empty arguments, missing required fields (#11233) 2025-06-30 08:59:03 -07:00
Attogram Project 3b8b692218
readme: add ollama-bash-toolshed to community integrations (#11224) 2025-06-29 14:59:54 -07:00
Gabe Goodhart 34ff84df43 fix: Use c++17 and include vendor for go wrapper modules
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:23:27 -06:00
Gabe Goodhart d395132510 fix: Add sync'ed stb vendored header
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:17:23 -06:00
Gabe Goodhart 16c116c2b7 fix: Add missing stb to llama.cpp rsync-filter
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:16:58 -06:00
Gabe Goodhart 58300273f4 fix: Apply patch for mtmd_text_input
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:09:48 -06:00
Gabe Goodhart f358dd5a1c fix: Use mtmd_helper to correctly load the bitmap for the image
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:09:05 -06:00
Gabe Goodhart dbd8ee2654 fix: Fix support for arch-specific ggml-cpu source files with new arrangement
In https://github.com/ggml-org/llama.cpp/pull/13892, all arch-specific
implementations were split out into a nested tree structure under
ggml-cpu/arch. This conflicts with standard CGO layout where all
arch-specific source files are expected to live in the same directory as
the parent go module and use suffixes based on GOOS and GOARCH. As such,
there were really two options for getting this to work:

1. Add a patch on top of the GGML sync to rearrange the files to match the
GO layout convention
2. Use CGO directives to conditionally include the nested source files in
the compilation units

This commit does (2) in order to minimize the set of changes needed on top
of the upstream file layout. To get this to work, there are two key things
needed:

1. In cpu.go, #cgo directives are added to explicitly set __${GOARCH}__ in
the preprocessor directives
2. In arch-impls.c|cpp, use an #ifdef | #elif defined | #endif chain to
explicitly include the .c|.cpp files for the given architecture from the
nested directory

Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:08:56 -06:00
Gabe Goodhart 7334a0ea07 chore: Ignore *.patched in the patch directory
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:08:42 -06:00
Gabe Goodhart 1664d52be6 fix: Add patch for mtmd_input_text
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:08:29 -06:00
Gabe Goodhart 3d70237fd1 fix: Update llama.go to use mtmd instead of clip/llava
It's _very_ possible that this is broken!

Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:06:47 -06:00
Gabe Goodhart fa54a3cf3a fix: Add missing include in sampling_ext.cpp
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:06:40 -06:00
Gabe Goodhart d0fd9e5aa2 fix: Remove mtmd main cpp files
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:06:31 -06:00
Gabe Goodhart 1cd9352cc3 fix: Narrow llama.cpp rsync-filter to not include mtmd main tool cpp files
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:06:18 -06:00
Gabe Goodhart 85aba511ec fix: Add ggml files missing from sync
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:06:05 -06:00
Gabe Goodhart 62af160d82 fix: Update ggml rsync-filter for new ggml-cpu/arch subdirs
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:05:39 -06:00
Gabe Goodhart 414a097372 fix: Add files missing from sync
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:05:25 -06:00
Gabe Goodhart 424e05c20e fix: Update rsync-filter for all moved/new/removed files
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:04:51 -06:00
Gabe Goodhart 2613f5da2d feat: Sync llama.cpp and ggml
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:01:24 -06:00
Gabe Goodhart 73d089bb90 feat: Update all patches
There are a number that are no longer needed at all:

- 0003-embeddings: Embeddings entirely overhauled on master
- 0008-ensure-KV-cache-is-fully-defragmented: KV caching entirely
    overhauled on master
- 0019-metal-add-mean-kernel-14267: Merged upstream
- 0020-CUDA-add-mean-operation-14313: Merged upstream

Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 16:57:05 -06:00
Gabe Goodhart a30ae1fa20 TEMPORARY: Update the llama.cpp upstream to my fork's Granite Four branch
This will be redone once my branch is merged upstream in llama.cpp

Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 16:24:42 -06:00
Michael Yang 4129af9205
chore: cleanup comments + unused vars (#11225) 2025-06-27 11:45:33 -07:00