Update tests and implementation to use the new ordered map-based
ToolCallFunctionArguments type which replaces the previous map[string]any.
- Add mapToArgs helper to convert map[string]any to ToolCallFunctionArguments
- Add testArgs and testProps helpers in tests
- Use cmpopts.IgnoreUnexported for cmp.Diff comparisons
Use w.ResponseWriter.Status() instead of parsing StatusCode from JSON
payload. routes.go typically sends errors as gin.H{"error": "..."}
without a StatusCode field, causing all errors to be mapped to
"api_error" instead of the appropriate type (not_found_error,
invalid_request_error, etc.).
Added tests to verify error handling for common routes.go patterns.
Use *string instead of string for Text and Thinking fields in ContentBlock
so that omitempty works correctly:
- nil pointer: field omitted from JSON (for blocks that don't use it)
- ptr(""): field present as "" (for SDK streaming accumulation)
- ptr("content"): field present with content
This keeps the JSON output clean (text blocks don't have thinking field,
thinking blocks don't have text field) while still satisfying SDK
requirements for field presence during streaming.
Fix edge case where messages containing only a thinking block (no text,
images, or tool calls) would be dropped. Add thinking != "" to the
condition that creates messages from content blocks.
Add tests documenting that Text and Thinking fields must be present
in JSON output even when empty. The Anthropic SDK requires these fields
in content_block_start events to accumulate streaming deltas properly.
Tests verify:
- ContentBlock JSON includes empty text/thinking fields
- StreamConverter emits content_block_start with required fields
Remove omitempty from Text and Thinking fields in ContentBlock struct.
The Anthropic SDK requires these fields to be present (even if empty)
in content_block_start events to properly accumulate streaming deltas.
- Add proper error handling for JSON marshal in StreamConverter to
prevent corrupted streams when tool arguments cannot be serialized
- Add tests for unmarshalable arguments and mixed validity scenarios
- Fix documentation typo and update recommended models to qwen3-coder
Add middleware to support the Anthropic Messages API format at /v1/messages.
This enables tools like Claude Code to work with Ollama models through the
Anthropic API interface.
Features:
- Request/response transformation between Anthropic and internal formats
- Streaming support with SSE events (message_start, content_block_delta, etc.)
- Tool calling support (tool_use and tool_result content blocks)
- Thinking/extended thinking block support
- Image content block support (base64)
- System prompt handling
- Multi-turn conversation support
- Proper stop_reason mapping (end_turn, max_tokens, tool_use)
- Error responses in Anthropic format
New files:
- anthropic/anthropic.go: Types and transformation functions
- middleware/anthropic.go: Request/response middleware
* preserve tool definition and call JSON ordering
This is another iteration of
<https://github.com/ollama/ollama/pull/12518>, but this time we've
simplified things by relaxing the competing requirements of being
compatible AND order-preserving with templates (vs. renderers). We
maintain backwards compatibility at the cost of not guaranteeing order
for templates. We plan on moving more and more models to renderers,
which have been updated to use these new data types, and additionally
we could add an opt-in way of templates getting an order-preserved list
(e.g., via sibling template vars)
* orderedmap_test: remove testify
The normalize function now checks for NaN and Inf values in the
embedding vector before processing. This prevents JSON encoding
failures when models produce invalid floating-point values.
Fixes#13572
Signed-off-by: majiayu000 <1835304752@qq.com>
The tool calling example used "get_temperature" for tool_calls but
defined the tool as "get_weather". Also removed trailing commas that
made the JSON invalid.
Fixes#13031
On the llama engine, when we compute the memory layout, we reserve
a buffer to allow for some flexibility for incorrect estimates.
This is subtracted from GPU free memory and on GPUs with limited
memory, it may underflow.
Fixes#13494
* Revert "add support for NVIDIA Nemotron 3 Nano"
This reverts commit e7d2ae9d69.
* GGML update to 380b4c984
Remove MaskBatchPadding as GGML_KQ_MASK_PAD is no longer present (no
padding required)
* update to c45f89d55
* ec98e2002
solar pro needed more adjusting - needs verification
* review comments
Refactored the ConfigV2 and RootFS types from server/images.go to a new types/model/config.go file under the model package. Updated all references to use model.ConfigV2 and model.RootFS. This allows for use in other projects without worrying about compiling the c code in the llama package.
The ggml/src/CMakeLists.txt uses GGML_VERSION_MAJOR for the shared
library SOVERSION property, but these variables were not defined when
building from ollama's CMakeLists.txt.
This caused libggml-base.so to be named with a literal "SOVERSION"
suffix (libggml-base.so.SOVERSION) instead of the actual version
number (libggml-base.so.0).
The fix adds the required GGML_VERSION_* variables before including
the ggml subdirectory.
Fixes#13436
* flash attn: add auto mode for llama engine
If the user does not specify fa in the environment, use auto-mode.
* review comments
* ensure kv cache quantized types have FA explicitly enabled
additional review comments
This changes the default behavior to use the Ollama engine for supported
models, while retaining the ability to disable the Ollama engine and
fall back to the Llama engine. Models in the OllamaEngineRequired list
will always run on the Ollama engine.