Gabe Goodhart
895d5563df
Merge remote-tracking branch 'origin/main' into GraniteFour
...
* origin/main:
readme: add GMAI - Gradle Managed to community integrations (#11461 )
tools: fix parsing issue when a tool name is a substring of another (#11456 )
readme: update argo description to support deep research (#11455 )
ci: switch mac builder to arm64 (#11379 )
docs: add the no-Modelfile function of `ollama create` (#9077 )
openai: allow openai endpoint to accept webp images (#11412 )
readme: update the llama.cpp github link (#11427 )
compile bf16 support into ggml-metal (#11430 )
cmd: add default assistant role to message construction (#11431 )
api: fix unreachable status err (#11423 )
docs: fix typo in macos.md (#11425 )
2025-07-21 15:04:52 -06:00
Stefan Wärting
82da19c634
readme: add GMAI - Gradle Managed to community integrations ( #11461 )
2025-07-20 14:55:47 -07:00
Jeffrey Morgan
bdd9d22dfd
tools: fix parsing issue when a tool name is a substring of another ( #11456 )
...
Co-authored-by: frob <rick+github@frob.com.au>
2025-07-20 14:55:14 -07:00
zmldndx
5fc38d042f
readme: update argo description to support deep research ( #11455 )
2025-07-19 13:29:38 -07:00
Daniel Hiltgen
191d94289d
ci: switch mac builder to arm64 ( #11379 )
...
The macos-13 is x86, while macos-13-xlarge is arm64
2025-07-17 07:33:44 -07:00
frob
802ad16ce4
docs: add the no-Modelfile function of `ollama create` ( #9077 )
2025-07-16 22:16:10 -07:00
frob
5e67f4f90e
openai: allow openai endpoint to accept webp images ( #11412 )
...
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2025-07-16 21:31:49 -07:00
Haiyue Wang
e840ccb523
readme: update the llama.cpp github link ( #11427 )
2025-07-16 21:20:28 -07:00
Michael Yang
b4fe3adc0a
compile bf16 support into ggml-metal ( #11430 )
2025-07-16 17:32:57 -07:00
Parth Sareen
d73f8aa8c3
cmd: add default assistant role to message construction ( #11431 )
2025-07-16 11:18:16 -07:00
Bruce MacDonald
92c2e8a56c
api: fix unreachable status err ( #11423 )
...
StatusError was unreachable, the client always checked for error messages in the response body first, and the server always includes error messages with HTTP error status codes.
2025-07-16 11:03:28 -07:00
Marcelo Fornet
2e3fd86d48
docs: fix typo in macos.md ( #11425 )
2025-07-16 10:50:46 -07:00
Gabe Goodhart
e6a22f20d1
Merge remote-tracking branch 'origin/main' into GraniteFour
...
* origin/main:
docs: update modelfile.md to reflect current default num_ctx (#11189 )
ggml: Use assigned layers when reporting loading stats
ggml: Disable unused pipeline parallelism
Only load supported models on new engine (#11362 )
2025-07-15 14:50:19 -06:00
Gabe Goodhart
5305e2ad14
feat: Sync llama.cpp
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-15 14:50:01 -06:00
Gabe Goodhart
4f462a9f67
feat: Bump llama.cpp to 4a4f42
...
This picks up support for Kimi K2 and PLaMO-2
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-15 14:49:15 -06:00
先知
4261a3b0b2
docs: update modelfile.md to reflect current default num_ctx ( #11189 )
...
As in the commit 44b466eeb2 , the default context length has been increased to 4096.
2025-07-11 15:15:00 -07:00
Gabe Goodhart
91e4b10d40
fix: Sync patch changes for ggml-cpu.c
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-11 16:01:15 -06:00
Gabe Goodhart
0beea04b52
fix: Add a patch to avoid power throttling API on non-msvc windows builds
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-11 16:00:49 -06:00
Jesse Gross
acef9b4c1b
ggml: Use assigned layers when reporting loading stats
...
Reporting params.NumGPULayers can be misleading because it is the
requested number of layers, not the actual number that is loaded.
While they are often the same, there are cases where they might mismatch,
such as if the GPU backend is missing.
2025-07-11 14:21:50 -07:00
Jesse Gross
9a43994c45
ggml: Disable unused pipeline parallelism
...
We're not currently using it, even in cases where we could. Disabling
it improves generation performance by 10-30% with multiple GPUs.
2025-07-11 13:30:05 -07:00
Gabe Goodhart
e8a303a701
build: Add top-level include for GNUINstallDirs in CMakeLists.txt
...
This is used to populate CMAKE_INSTALL_BINDIR
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-11 13:44:10 -06:00
Gabe Goodhart
81d821ba9b
build: Include cmake/common.cmake in ggml sync
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-11 13:25:01 -06:00
Daniel Hiltgen
f8a6e88819
Only load supported models on new engine ( #11362 )
...
* Only load supported models on new engine
Verify the model is supported before trying to load
* int: testcase for all library models
2025-07-11 12:21:54 -07:00
Gabe Goodhart
bf1b261611
feat: Sync all patched code
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-11 11:44:18 -06:00
Gabe Goodhart
3020c462da
fix: Add patch for GGML_VERSION and GGML_COMMIT constants
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-11 11:43:14 -06:00
Gabe Goodhart
d7f98e0673
fix: Revert changes to ggml export GPU UUID patch
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-11 11:42:26 -06:00
Gabe Goodhart
111434ab39
feat: Bump back to the cenral repo and point at the latest master
...
This includes granite 4 and a number of other model architectures!
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-11 10:43:22 -06:00
Gabe Goodhart
06a5592dc5
fix: Update patches for bump
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-10 16:01:30 -06:00
Gabe Goodhart
0a7ddc4e17
feat: Bump to the latest tip of the branch
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-10 16:01:14 -06:00
Gabe Goodhart
152260e9c7
fix: Update patch 0015 for upstream implementation of uuid
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-10 14:33:12 -06:00
Gabe Goodhart
e61826c180
Merge remote-tracking branch 'origin/main' into GraniteFour
...
* origin/main:
ggml: Report ordinal IDs for AMD GPUs on Windows
doc: add MacOS docs (#11334 )
Reduce default parallelism to 1 (#11330 )
API/CLI context enhancements (#11331 )
add `tool_name` to api.md (#11326 )
template: add tool result compatibility (#11294 )
ci: modularization (#11324 )
Revert "ggml: Temporarily disable reporting UUIDs"
readme: update Ollama icon size
int: add performance integration tests (#11173 )
doc: add NVIDIA blackwell to supported list (#11307 )
Update base image to Ubuntu 24.04 LTS (#9681 )
doc: Update link for mac install (#11288 )
mimic logs for layers on new engine (#11278 )
readme: add NativeMind to community integrations (#11242 )
tools: fix parsing tool calls with empty arguments, missing required fields (#11233 )
readme: add ollama-bash-toolshed to community integrations (#11224 )
2025-07-10 14:01:24 -06:00
Jesse Gross
35fda7b4af
ggml: Report ordinal IDs for AMD GPUs on Windows
...
We don't get valid UUIDs for AMD GPUs on Windows, so the best option
is to use the ordinal IDs. This brings us in line with what we currently
do on the Ollama server - the only exception is AMD GPUs on Linux, which
falls back to using ordinal IDs. The GGML implementation has no fallback
but it doesn't appear to occur for any of the GPUs that we support.
It's also possible that there are collisions between ordinal IDs for
different libraries - however the only places where we use them are
AMD on Windows and Metal on Mac, which can never occur on the same
system.
2025-07-09 10:35:31 -07:00
Daniel Hiltgen
66fb8575ce
doc: add MacOS docs ( #11334 )
...
also removes stale model dir instructions for windows
2025-07-08 15:38:04 -07:00
Daniel Hiltgen
20c3266e94
Reduce default parallelism to 1 ( #11330 )
...
The current scheduler algorithm of picking the paralellism based on available
VRAM complicates the upcoming dynamic layer memory allocation algorithm. This
changes the default to 1, with the intent going forward that parallelism is
explicit and will no longer be dynamically determined. Removal of the dynamic
logic will come in a follow up.
2025-07-08 12:08:37 -07:00
Daniel Hiltgen
34088dbcfb
API/CLI context enhancements ( #11331 )
...
* API: expose context size of loaded models
* CLI: add context UX
This adds a column in the ps output to show the models context size.
2025-07-08 11:59:06 -07:00
Parth Sareen
43107b15b9
add `tool_name` to api.md ( #11326 )
2025-07-07 16:53:13 -07:00
Parth Sareen
1f91cb0c8c
template: add tool result compatibility ( #11294 )
2025-07-07 15:53:42 -07:00
Daniel Hiltgen
12d8ad0d38
ci: modularization ( #11324 )
...
switch a few constants to variables
2025-07-07 14:07:43 -07:00
Jesse Gross
592d21e7db
Revert "ggml: Temporarily disable reporting UUIDs"
...
The root cause was an unclean upgrade - this code is fine.
This reverts commit 45f216a9c7 .
2025-07-07 11:31:02 -07:00
Jeffrey Morgan
5a08b01f5b
readme: update Ollama icon size
2025-07-05 17:20:42 -07:00
Daniel Hiltgen
4f473e224c
int: add performance integration tests ( #11173 )
...
usage example:
go test --tags=integration,perf -count 1 ./integration -v -timeout 1h -run TestModelsPerf 2>&1 | tee int.log
cat int.log | grep MODEL_PERF_HEADER | cut -f2- -d: > perf.csv
cat int.log | grep MODEL_PERF_DATA | cut -f2- -d: >> perf.csv
2025-07-05 16:07:09 -07:00
Daniel Hiltgen
9d60bb44cf
doc: add NVIDIA blackwell to supported list ( #11307 )
2025-07-05 16:06:30 -07:00
Vincent RAMPAL
f371260e75
Update base image to Ubuntu 24.04 LTS ( #9681 )
2025-07-05 16:02:33 -07:00
Daniel Hiltgen
c9e6d7719e
doc: Update link for mac install ( #11288 )
...
Favor the dmg now.
2025-07-03 09:48:45 -07:00
Daniel Hiltgen
2c4ce40334
mimic logs for layers on new engine ( #11278 )
...
This adds some extra logs to make the new engine a bit more consistent
with the llama engine.
2025-07-02 16:38:36 -07:00
XuKecheng
5d8c173529
readme: add NativeMind to community integrations ( #11242 )
2025-07-01 09:46:15 -07:00
Jeffrey Morgan
44b17d2bfa
tools: fix parsing tool calls with empty arguments, missing required fields ( #11233 )
2025-06-30 08:59:03 -07:00
Attogram Project
3b8b692218
readme: add ollama-bash-toolshed to community integrations ( #11224 )
2025-06-29 14:59:54 -07:00
Gabe Goodhart
34ff84df43
fix: Use c++17 and include vendor for go wrapper modules
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:23:27 -06:00
Gabe Goodhart
d395132510
fix: Add sync'ed stb vendored header
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-27 17:17:23 -06:00