MCP (Model Context Protocol) support:
- Add MCPRef type for agent MCP server references
- Parse MCP command in Agentfiles (MCP name command [args...])
- Load and manage MCP servers with mcpManager
- Implement agentic loop for multi-turn tool execution
- Add /mcp REPL commands (add, remove, disable, enable)
- Add 'ollama mcp' CLI commands for global config management
- Support both model-bundled and global (~/.ollama/mcp.json) MCPs
- Display MCPs in 'ollama show' output
ENTRYPOINT support:
- Add ENTRYPOINT command to Agentfiles for custom runtimes
- Allow agents without FROM when ENTRYPOINT is specified
- Execute entrypoint as subprocess with stdin/stdout connected
- Support $PROMPT placeholder for prompt insertion control
- Hide Model section in 'ollama show' for entrypoint-only agents
- Pass user prompt as argument to entrypoint command
- Add check for registry references without digest in loadSkillsFromRefs
- Fix IsLocalSkillPath to not treat registry refs as local paths
- Inject OLLAMA_WORKING_DIR env var so skill scripts can access the
directory where 'ollama run' was called from
Add skill management commands and interactive REPL support:
CLI commands (cmd/skill_cmd.go):
ollama skill push NAME PATH - Push skill to registry
ollama skill pull NAME - Pull skill from registry
ollama skill list - List installed skills
ollama skill show NAME - Show skill details
ollama skill rm NAME - Remove a skill
Skill loading (cmd/skills.go):
- Load skills from model manifests
- Parse SKILL.md frontmatter for metadata
- Inject skill instructions into system prompt
- Provide run_skill_script tool for script execution
Interactive mode (cmd/interactive.go):
/skills - Show available skills
/skill add PATH - Add skill from local path
/skill remove NAME - Remove skill from session
/skill list - List session skills
Add SKILL command to the Modelfile/Agentfile parser.
Supports both local paths and registry references:
SKILL ./path/to/skill # Local skill bundled with agent
SKILL skill/calc:1.0.0 # Registry skill reference
SKILL alice/skill/calc:1.0 # User skill from registry
Add skill-related types to the API and configuration:
- api/types.go: Skill reference types for API requests/responses
- types/model/config.go: Skill configuration in model config
- envconfig/config.go: Environment configuration for skills
Add support for skill layers in model manifests:
- server/skill.go: New file with skill extraction and packaging
- GetSkillsPath: Returns path to extracted skills cache
- ExtractSkillBlob: Extracts skill tar.gz to cache
- CreateSkillLayer: Creates skill blob from directory
- ParseSkillName/GetSkillManifestPath: Skill name handling
- server/images.go: Extract skill layers on pull
- server/create.go: Create skill layers from SKILL directives
- server/routes.go: Skill-related route handling
Skills are stored as gzipped tar archives with MediaType
"application/vnd.ollama.image.skill".
Updates ModelPath struct and parsing to support the Kind field,
enabling skills and agents to use the 5-part naming structure.
- ParseModelPath detects valid kinds (skill, agent)
- GetNamespaceRepository includes kind in path
- GetManifestPath returns correct 5-part filepath
- GetFullTagname/GetShortTagname include kind when present
Extends the model name structure from 4-part to 5-part:
host/namespace/kind/model:tag
The Kind field is optional and supports:
- "skill" for skill packages
- "agent" for agent packages (future)
- empty for regular models
Parser detects valid kinds to distinguish between old format
(host/namespace/model) and new format (host/namespace/kind/model).
On the llama engine, when we compute the memory layout, we reserve
a buffer to allow for some flexibility for incorrect estimates.
This is subtracted from GPU free memory and on GPUs with limited
memory, it may underflow.
Fixes#13494
* Revert "add support for NVIDIA Nemotron 3 Nano"
This reverts commit e7d2ae9d69.
* GGML update to 380b4c984
Remove MaskBatchPadding as GGML_KQ_MASK_PAD is no longer present (no
padding required)
* update to c45f89d55
* ec98e2002
solar pro needed more adjusting - needs verification
* review comments
Refactored the ConfigV2 and RootFS types from server/images.go to a new types/model/config.go file under the model package. Updated all references to use model.ConfigV2 and model.RootFS. This allows for use in other projects without worrying about compiling the c code in the llama package.
The ggml/src/CMakeLists.txt uses GGML_VERSION_MAJOR for the shared
library SOVERSION property, but these variables were not defined when
building from ollama's CMakeLists.txt.
This caused libggml-base.so to be named with a literal "SOVERSION"
suffix (libggml-base.so.SOVERSION) instead of the actual version
number (libggml-base.so.0).
The fix adds the required GGML_VERSION_* variables before including
the ggml subdirectory.
Fixes#13436
* flash attn: add auto mode for llama engine
If the user does not specify fa in the environment, use auto-mode.
* review comments
* ensure kv cache quantized types have FA explicitly enabled
additional review comments
This changes the default behavior to use the Ollama engine for supported
models, while retaining the ability to disable the Ollama engine and
fall back to the Llama engine. Models in the OllamaEngineRequired list
will always run on the Ollama engine.
* docs: add docs for v1/responses and rework openai compat section
I reworked the examples to be separated by topic and to be fully
runnable (i.e., they now log output instead of just suggesting how a
call might be made).
We now use `<CodeGroup>`s so that each example has a dropdown on the
docs site for users to choose, which makes the examples a lot more
digestible (since you only see approx 1/3 of the code you used to).
I also added a new tool to extract code examples into files so that it's
easier to actually run them and check that they work.
## Example
```shell
go run docs/tools/extract-examples/main.go docs/api/openai-compatibility.mdx
```
Output:
```
Extracting code examples to: /var/folders/vq/wfm2g6k917d3ldzpjdxc8ph00000gn/T/mdx-examples-3271754368
- 01_basic.py
- 01_basic.js
- 01_basic.sh
- 02_responses.py
- 02_responses.js
- 02_responses.sh
- 03_vision.py
- 03_vision.js
- 03_vision.sh
Extracted 9 file(s) to /var/folders/vq/wfm2g6k917d3ldzpjdxc8ph00000gn/T/mdx-examples-3271754368
To run examples:
cd /var/folders/vq/wfm2g6k917d3ldzpjdxc8ph00000gn/T/mdx-examples-3271754368
npm install # for JS examples
then run individual files with `node file.js`, `python file.py`, `bash file.sh`
```
In the future we should consider actually running the examples in CI and
having some sort of acceptance test so we can automatically detect when
our examples break. So this is just a start in that direction.
* Update docs/api/openai-compatibility.mdx
Co-authored-by: Parth Sareen <parth.sareen@ollama.com>
* Update docs/api/openai-compatibility.mdx
Co-authored-by: Parth Sareen <parth.sareen@ollama.com>
---------
Co-authored-by: Parth Sareen <parth.sareen@ollama.com>
This PR detects embedding models and sets batch_size = context_size so the full input fits in a single batch.
Previously, if batch size was smaller than the input, tokens could be split across batches and cause a SIGTRAP crash.
This change ensures all tokens stay in one batch and prevents crashes.
Fixes: #12938#13054
Co-authored-by: Jesse Gross <jesse@ollama.com>