Commit Graph

388 Commits

Author SHA1 Message Date
Blake Mizerany 3519dd1c6e
server/internal/client/ollama: hold DiskCache on Registry (#9463)
Previously, using a Registry required a DiskCache to be passed in for
use in various methods. This was a bit cumbersome, as the DiskCache is
required for most operations, and the DefaultCache is used in most of
those cases. This change makes the DiskCache an optional field on the
Registry struct.

This also changes DefaultCache to initialize on first use. This is to
not burden clients with the cost of creating a new cache per use, or
having to hold onto a cache for the lifetime of the Registry.

Also, slip in some minor docs updates for Trace.
2025-03-02 20:55:44 -08:00
Blake Mizerany 2412adf42b
server/internal: replace model delete API with new registry handler. (#9347)
This commit introduces a new API implementation for handling
interactions with the registry and the local model cache. The new API is
located in server/internal/registry. The package name is "registry" and
should be considered temporary; it is hidden and not bleeding outside of
the server package. As the commits roll in, we'll start consuming more
of the API and then let reverse osmosis take effect, at which point it
will surface closer to the root level packages as much as needed.
2025-02-27 12:04:53 -08:00
Blake Mizerany 68bac1e0a6
server: group routes by category and purpose (#9270)
The route assembly in Handler lacked clear organization making it
difficult scan for routes and their relationships to each other. This
commit aims to fix that by reordering the assembly of routes to group
them by category and purpose.

Also, be more specific about what "config" refers to (it is about CORS
if you were wondering... I was.)
2025-02-21 21:02:26 -08:00
Lucas Hahn 351a85d9ea
openai: add 'timeout' to allowable x-stainless headers (#9237) 2025-02-19 21:56:18 -08:00
Jesse Gross ed443a0393 Runner for Ollama engine
This provides integration with the new Ollama engine
(5824541 next ollama runner (#7913)) and the rest of the Ollama
infrastructure such as the runner and Ollama server.

In addition, it also builds out the KV cache infrastructure to
support requirements of how Ollama runs models such as:
 - Parallel processing
 - Memory management for defragmentation and shifting
 - Multi-modal modals

Both old and new engines continue to be supported. By default, only
the old engine is used. To enable the new engine:

Start the server with the OLLAMA_NEW_ENGINE environment variable set:
OLLAMA_NEW_ENGINE=1 ./ollama serve

Start a model that is supported by the Ollama engine. This one is Llama 3.1 8b Q4_K_M:
./ollama run jessegross/llama3.1
2025-02-13 17:09:26 -08:00
Jesse Gross 6945617af5 models: Move model into their own directory
This allows there to be a file that is a list of models that is
not mixed into the runner code.
2025-02-13 17:09:26 -08:00
Michael Yang 58245413f4
next ollama runner (#7913)
feat: add new Ollama engine using ggml through cgo

This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.

- `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
- `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
- `ml.Tensor` defines the interface for a tensor and tensor operations

This is the first implementation of the new engine. Follow up PRs will implement more features:

- non-greedy sampling (#8410)
- integration with Ollama and KV caching (#8301)
- more model support (#9080) with more coming soon

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2025-02-13 16:31:21 -08:00
Michael Yang dcfb7a105c
next build (#8539)
* add build to .dockerignore

* test: only build one arch

* add build to .gitignore

* fix ccache path

* filter amdgpu targets

* only filter if autodetecting

* Don't clobber gpu list for default runner

This ensures the GPU specific environment variables are set properly

* explicitly set CXX compiler for HIP

* Update build_windows.ps1

This isn't complete, but is close.  Dependencies are missing, and it only builds the "default" preset.

* build: add ollama subdir

* add .git to .dockerignore

* docs: update development.md

* update build_darwin.sh

* remove unused scripts

* llm: add cwd and build/lib/ollama to library paths

* default DYLD_LIBRARY_PATH to LD_LIBRARY_PATH in runner on macOS

* add additional cmake output vars for msvc

* interim edits to make server detection logic work with dll directories like lib/ollama/cuda_v12

* remove unncessary filepath.Dir, cleanup

* add hardware-specific directory to path

* use absolute server path

* build: linux arm

* cmake install targets

* remove unused files

* ml: visit each library path once

* build: skip cpu variants on arm

* build: install cpu targets

* build: fix workflow

* shorter names

* fix rocblas install

* docs: clean up development.md

* consistent build dir removal in development.md

* silence -Wimplicit-function-declaration build warnings in ggml-cpu

* update readme

* update development readme

* llm: update library lookup logic now that there is one runner (#8587)

* tweak development.md

* update docs

* add windows cuda/rocm tests

---------

Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>
2025-01-29 15:03:38 -08:00
isamu arimoto 6ae2adc1af
openai: accept additional headers to fix CORS errors (#8343) 2025-01-08 11:28:11 -08:00
Patrick Devine 86a622cbdc
Update the /api/create endpoint to use JSON (#7935)
Replaces `POST /api/create` to use JSON instead of a Modelfile.

This is a breaking change.
2024-12-31 18:02:30 -08:00
湛露先生 928de9050e
server: reuse InvalidModelNameErrMsg type (#8163) 2024-12-23 10:38:34 -05:00
Patrick Devine 8c9fb8eb73
imageproc mllama refactor (#7537)
Refactor mllama image processing code, and add pixtral and qwen2vl
2024-12-14 19:50:15 -08:00
Blake Mizerany b1fd7fef86
server: more support for mixed-case model names (#8017)
Fixes #7944
2024-12-11 15:29:59 -08:00
frob 757eeacc1b
server: lowercase hostname for Host header check (#5851) 2024-12-10 13:43:22 -08:00
Daniel Hiltgen 4879a234c4
build: Make target improvements (#7499)
* llama: wire up builtin runner

This adds a new entrypoint into the ollama CLI to run the cgo built runner.
On Mac arm64, this will have GPU support, but on all other platforms it will
be the lowest common denominator CPU build.  After we fully transition
to the new Go runners more tech-debt can be removed and we can stop building
the "default" runner via make and rely on the builtin always.

* build: Make target improvements

Add a few new targets and help for building locally.
This also adjusts the runner lookup to favor local builds, then
runners relative to the executable, and finally payloads.

* Support customized CPU flags for runners

This implements a simplified custom CPU flags pattern for the runners.
When built without overrides, the runner name contains the vector flag
we check for (AVX) to ensure we don't try to run on unsupported systems
and crash.  If the user builds a customized set, we omit the naming
scheme and don't check for compatibility.  This avoids checking
requirements at runtime, so that logic has been removed as well.  This
can be used to build GPU runners with no vector flags, or CPU/GPU
runners with additional flags (e.g. AVX512) enabled.

* Use relative paths

If the user checks out the repo in a path that contains spaces, make gets
really confused so use relative paths for everything in-repo to avoid breakage.

* Remove payloads from main binary

* install: clean up prior libraries

This removes support for v0.3.6 and older versions (before the tar bundle)
and ensures we clean up prior libraries before extracting the bundle(s).
Without this change, runners and dependent libraries could leak when we
update and lead to subtle runtime errors.
2024-12-10 09:47:19 -08:00
Parth Sareen c6c526275d
api: add generate endpoint for structured outputs (#7939) 2024-12-04 17:37:12 -08:00
Parth Sareen 630e7dc6ff
api: structured outputs - chat endpoint (#7900)
Adds structured outputs to chat endpoint
---------

Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Hieu Nguyen <hieunguyen1053@outlook.com>
2024-12-04 16:31:19 -08:00
Jeffrey Morgan d543b282a7
server: add warning message for deprecated context field (#7878) 2024-11-30 14:05:50 -08:00
Parth Sareen 5f8051180e
Enable index tracking for tools - openai api support (#7888) 2024-11-29 20:00:09 -08:00
Parth Sareen ce7455a8e1
api: enable tool streaming (#7836) 2024-11-27 13:40:57 -08:00
oza6ut0ne 31cb1ca9e5
openai: accept X-Stainless-Retry-Count header (#6910) 2024-11-23 12:39:05 -08:00
Daniel Hiltgen f602ab4de4
expose underlying error on embedding failure (#7743)
Avoid a round-trip asking users for logs to see what went wrong.
2024-11-19 16:26:05 -08:00
Blake Mizerany 4b8a2e341a
server: allow mixed-case model names on push, pull, cp, and create (#7676)
This change allows for mixed-case model names to be pushed, pulled,
copied, and created, which was previously disallowed because the Ollama
registry was backed by a Docker registry that enforced a naming
convention that disallowed mixed-case names, which is no longer the
case.

This does not break existing, intended, behaviors.

Also, make TestCase test a story of creating, updating, pulling, and
copying a model with case variations, ensuring the model's manifest is
updated correctly, and not duplicated across different files with
different case variations.
2024-11-19 15:05:57 -08:00
Daniel Hiltgen a4c70fe157
One corrupt manifest should not wedge model operations (#7515)
One potential failure mode is an empty file which bubbles up as an EOF error,
leading to all pulls and listing operations failing.  Instead, continue and
warn about the corrupt manifest.  This also allows re-pulling the corrupt
manifest to repair the system.
2024-11-05 14:21:45 -08:00
Daniel Hiltgen 4ebfa2cb91
Quiet down debug log of image payload (#7454)
Avoid excessive log spew and make consistent with chat logging
2024-11-04 13:05:16 -08:00
Jesse Gross c826e57475 runner.go: Better abstract vision model integration
-Update mllama to take the cross attention state as embeddings in
a batch, more similar to how Llava handles it. This improves
integration with the input cache.
-Pass locations in a prompt for embeddings using tags similar to Llava.
-Abstract interface to vision models so the main runner accesses Clip
and Mllama similarly

Co-authored-by: Michael Yang <mxyng@pm.me>
2024-10-30 14:53:43 -07:00
Patrick Devine 084929c293
add mllama image processing to the generate handler (#7384) 2024-10-28 13:51:19 -07:00
Patrick Devine c7cb0f0602
image processing for llama3.2 (#6963)
Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Jesse Gross <jesse@ollama.com>
2024-10-18 16:12:35 -07:00
Daniel Hiltgen 05cd82ef94
Rename gpu package discover (#7143)
Cleaning up go package naming
2024-10-16 17:45:00 -07:00
Alex Mavrogiannis f40bb398f6
Stop model before deletion if loaded (fixed #6957) (#7050) 2024-10-01 15:45:43 -07:00
Daniel Hiltgen cd5c8f6471
Optimize container images for startup (#6547)
* Optimize container images for startup

This change adjusts how to handle runner payloads to support
container builds where we keep them extracted in the filesystem.
This makes it easier to optimize the cpu/cuda vs cpu/rocm images for
size, and should result in faster startup times for container images.

* Refactor payload logic and add buildx support for faster builds

* Move payloads around

* Review comments

* Converge to buildx based helper scripts

* Use docker buildx action for release
2024-09-12 12:10:30 -07:00
Patrick Devine abed273de3
add "stop" command (#6739) 2024-09-11 16:36:21 -07:00
Jeffrey Morgan 47fa0839b9
server: clean up route names for consistency (#6524) 2024-08-26 19:36:11 -07:00
royjhan 8b00a415ab
Load Embedding Model on Empty Input (#6325)
* load on empty input

* no load on invalid input
2024-08-13 10:19:56 -07:00
Jeffrey Morgan 15c2d8fe14
server: parallelize embeddings in API web handler instead of in subprocess runner (#6220)
For simplicity, perform parallelization of embedding requests in the API handler instead of offloading this to the subprocess runner. This keeps the scheduling story simpler as it builds on existing parallel requests, similar to existing text completion functionality.
2024-08-11 11:57:10 -07:00
Jesse Gross 1829fb61bd manifest: Fix crash on startup when trying to clean up unused files (#5840)
Currently if the config field is missing in the manifest file (or
corrupted), Ollama will crash when it tries to read it. This can
happen at startup or when pulling new models.

This data is mostly just used for showing model information so we
can be tolerant of it not being present - it is not required to
run the models. Besides avoiding crashing, this also gives us the
ability to restructure the config in the future by pulling it
into the main manifest file.
2024-08-07 10:30:44 -07:00
Michael Yang b732beba6a lint 2024-08-01 17:06:06 -07:00
Vyacheslav Moskalev 8a9f946ca7 Refactor and format code. 2024-08-02 03:50:05 +07:00
Vyacheslav Moskalev 3b5210548e Refactor code. Remove extra variable. 2024-08-01 19:56:15 +07:00
Vyacheslav Moskalev b0c216584c Better types and naming closer to style. 2024-08-01 19:43:44 +07:00
Vyacheslav Moskalev 49a5483139 Change the order of context and prompt. 2024-08-01 19:25:56 +07:00
Vyacheslav Moskalev 6bc5c13758 Fix extra context concatenation in generate handler (#5980). 2024-08-01 15:45:58 +07:00
Michael Yang c4c84b7a0d
Merge pull request #5196 from ollama/mxyng/messages-2
include modelfile messages
2024-07-31 10:18:17 -07:00
Michael Yang 5c1912769e
Merge pull request #5473 from ollama/mxyng/environ
fix: environ lookup
2024-07-31 10:18:05 -07:00
royjhan 1b44d873e7
Add Metrics to `api\embed` response (#5709)
* add prompt tokens to embed response

* rm slog

* metrics

* types

* prompt n

* clean up

* reset submodule

* update tests

* test name

* list metrics
2024-07-30 13:12:21 -07:00
Michael Yang 15af558423 include modelfile messages 2024-07-26 11:40:11 -07:00
Josh db0968f30c
fix dupe err message (#5857) 2024-07-22 15:48:15 -07:00
Michael Yang 55cd3ddcca bool 2024-07-22 11:27:21 -07:00
Michael Yang d1a5227cad origins 2024-07-22 11:25:30 -07:00
Michael Yang 35b89b2eab rfc: dynamic environ lookup 2024-07-22 11:25:30 -07:00
Jeffrey Morgan b3e5491e41
server: collect nested tool call objects when parsing (#5824) 2024-07-22 12:38:03 -04:00
Josh e8b954c646
server: validate template (#5734)
add template validation to modelfile
2024-07-19 15:24:29 -07:00
Jeffrey Morgan 70b1010fa5
server: check for empty tools array too (#5779) 2024-07-18 11:44:57 -07:00
Jeffrey Morgan 319fb1ce03
server: only parse tool calls if tools are provided (#5771)
* server: only parse tool calls if tools are provided

* still set `resp.Message.Content`
2024-07-18 08:50:23 -07:00
Michael Yang c279f96371 remove ToolCall from GenerateResponse 2024-07-16 15:22:49 -07:00
Michael Yang d290e87513 add suffix support to generate endpoint
this change is triggered by the presence of "suffix", particularly
useful for code completion tasks
2024-07-16 14:31:35 -07:00
royjhan 987dbab0b0
OpenAI: /v1/embeddings compatibility (#5285)
* OpenAI v1 models

* Empty List Testing

* Add back envconfig

* v1/models docs

* Remove Docs

* OpenAI batch embed compatibility

* merge conflicts

* integrate with api/embed

* ep

* merge conflicts

* request tests

* rm resp test

* merge conflict

* merge conflict

* test fixes

* test fn renaming

* input validation for empty string

---------

Co-authored-by: jmorganca <jmorganca@gmail.com>
2024-07-16 13:36:08 -07:00
Jeffrey Morgan 4cb5d7decc
server: omit model system prompt if empty (#5717) 2024-07-16 11:09:00 -07:00
Michael Yang d02bbebb11 tools 2024-07-15 15:26:16 -07:00
royjhan b9f5e16c80
Introduce `/api/embed` endpoint supporting batch embedding (#5127)
* Initial Batch Embedding

* Revert "Initial Batch Embedding"

This reverts commit c22d54895a.

* Initial Draft

* mock up notes

* api/embed draft

* add server function

* check normalization

* clean up

* normalization

* playing around with truncate stuff

* Truncation

* Truncation

* move normalization to go

* Integration Test Template

* Truncation Integration Tests

* Clean up

* use float32

* move normalize

* move normalize test

* refactoring

* integration float32

* input handling and handler testing

* Refactoring of legacy and new

* clear comments

* merge conflicts

* touches

* embedding type 64

* merge conflicts

* fix hanging on single string

* refactoring

* test values

* set context length

* clean up

* testing clean up

* testing clean up

* remove function closure

* Revert "remove function closure"

This reverts commit 55d48c6ed1.

* remove function closure

* remove redundant error check

* clean up

* more clean up

* clean up
2024-07-15 12:14:24 -07:00
Patrick Devine 057d31861e
remove template (#5655) 2024-07-13 20:56:24 -07:00
jmorganca f7ee012300 server: prepend system message in chat handler 2024-07-13 15:08:00 -07:00
Jeffrey Morgan 1ed0aa8fea
server: fix `context`, `load_duration` and `total_duration` fields (#5676)
* server: fix `contet`, `load_duration` and `total_duration` fields

* Update server/routes.go
2024-07-13 09:25:31 -07:00
Michael Yang ac7a842e55 fix model reloading
ensure runtime model changes (template, system prompt, messages,
options) are captured on model updates without needing to reload the
server
2024-07-05 13:17:25 -07:00
Michael Yang 2c3fe1fd97 comments 2024-07-05 13:17:24 -07:00
Michael Yang 269ed6e6a2 update message processing 2024-07-05 13:16:58 -07:00
Daniel Hiltgen 955f2a4e03 Only set default keep_alive on initial model load
This change fixes the handling of keep_alive so that if client
request omits the setting, we only set this on initial load.  Once
the model is loaded, if new requests leave this unset, we'll keep
whatever keep_alive was there.
2024-07-03 15:29:56 -07:00
Michael Yang 65a5040e09 fix generate template 2024-07-02 16:42:17 -07:00
royjhan d626b99b54
OpenAI: v1/completions compatibility (#5209)
* OpenAI v1 models

* Refactor Writers

* Add Test

Co-Authored-By: Attila Kerekes

* Credit Co-Author

Co-Authored-By: Attila Kerekes <439392+keriati@users.noreply.github.com>

* Empty List Testing

* Use Namespace for Ownedby

* Update Test

* Add back envconfig

* v1/models docs

* Use ModelName Parser

* Test Names

* Remove Docs

* Clean Up

* Test name

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Add Middleware for Chat and List

* Completions Endpoint

* Testing Cleanup

* Test with Fatal

* Add functionality to chat test

* Rename function

* float types

* type cleanup

* cleaning

* more cleaning

* Extra test cases

* merge conflicts

* merge conflicts

* merge conflicts

* merge conflicts

* cleaning

* cleaning

---------

Co-authored-by: Attila Kerekes <439392+keriati@users.noreply.github.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-07-02 16:01:45 -07:00
Michael Yang dddb58a38b
Merge pull request #5051 from ollama/mxyng/capabilities
add model capabilities
2024-07-02 14:26:07 -07:00
royjhan 996bb1b85e
OpenAI: /v1/models and /v1/models/{model} compatibility (#5007)
* OpenAI v1 models

* Refactor Writers

* Add Test

Co-Authored-By: Attila Kerekes

* Credit Co-Author

Co-Authored-By: Attila Kerekes <439392+keriati@users.noreply.github.com>

* Empty List Testing

* Use Namespace for Ownedby

* Update Test

* Add back envconfig

* v1/models docs

* Use ModelName Parser

* Test Names

* Remove Docs

* Clean Up

* Test name

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Add Middleware for Chat and List

* Testing Cleanup

* Test with Fatal

* Add functionality to chat test

* OpenAI: /v1/models/{model} compatibility (#5028)

* Retrieve Model

* OpenAI Delete Model

* Retrieve Middleware

* Remove Delete from Branch

* Update Test

* Middleware Test File

* Function name

* Cleanup

* Test Update

* Test Update

---------

Co-authored-by: Attila Kerekes <439392+keriati@users.noreply.github.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-07-02 11:50:56 -07:00
Michael Yang a30915bde1 add capabilities 2024-07-01 10:47:43 -07:00
Michael Yang 58e3fff311 rename templates to template 2024-07-01 10:40:54 -07:00
Daniel Hiltgen 3518aaef33
Merge pull request #4218 from dhiltgen/auto_parallel
Enable concurrency by default
2024-07-01 08:32:29 -07:00
Blake Mizerany cb42e607c5
llm: speed up gguf decoding by a lot (#5246)
Previously, some costly things were causing the loading of GGUF files
and their metadata and tensor information to be VERY slow:

  * Too many allocations when decoding strings
  * Hitting disk for each read of each key and value, resulting in a
    not-okay amount of syscalls/disk I/O.

The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro
m3.

This commit also prevents collecting large arrays of values when
decoding GGUFs (if desired). When such keys are encountered, their
values are null, and are encoded as such in JSON.

Also, this fixes a broken test that was not encoding valid GGUF.
2024-06-24 21:47:52 -07:00
Daniel Hiltgen 642cee1342 Sort the ps output
Provide consistent ordering for the ps command - longest duration listed first
2024-06-21 15:59:41 -07:00
royjhan fedf71635e
Extend api/show and ollama show to return more model info (#4881)
* API Show Extended

* Initial Draft of Information

Co-Authored-By: Patrick Devine <pdevine@sonic.net>

* Clean Up

* Descriptive arg error messages and other fixes

* Second Draft of Show with Projectors Included

* Remove Chat Template

* Touches

* Prevent wrapping from files

* Verbose functionality

* Docs

* Address Feedback

* Lint

* Resolve Conflicts

* Function Name

* Tests for api/show model info

* Show Test File

* Add Projector Test

* Clean routes

* Projector Check

* Move Show Test

* Touches

* Doc update

---------

Co-authored-by: Patrick Devine <pdevine@sonic.net>
2024-06-19 14:19:02 -07:00
royjhan 89c79bec8c
Add ModifiedAt Field to /api/show (#5033)
* Add Mod Time to Show

* Error Handling
2024-06-15 20:53:56 -07:00
royjhan 1a29e9a879
API app/browser access (#4879)
* API app/browser access

* Add tauri (resolves #2291, #4791, #3799, #4388)
2024-06-06 15:19:03 -07:00
royjhan 4bf1da4944
Separate ListResponse and ModelResponse for api/tags vs api/ps (#4842)
* Remove false time fields

* Struct Separation for List and Process

* Remove Marshaler
2024-06-06 10:11:45 -07:00
Michael Yang d61ef8b954 update create handler to use model.Name 2024-06-04 13:28:25 -07:00
Michael Yang e40145a39d lint 2024-06-04 11:13:30 -07:00
Michael Yang 04f3c12bb7 replace x/exp/slices with slices 2024-06-04 11:13:30 -07:00
Michael Yang 96bc232b43
Merge pull request #4413 from ollama/mxyng/name-check
check if name exists before create/pull/copy
2024-05-29 12:06:58 -07:00
Michael Yang bca7b12284
Merge pull request #3718 from ollama/mxyng/modelname-3
update delete handler to use model.Name
2024-05-29 12:02:07 -07:00
Patrick Devine 4cc3be3035
Move envconfig and consolidate env vars (#4608) 2024-05-24 14:57:15 -07:00
Michael Yang f36f1d6be9 tidy intermediate blobs 2024-05-20 15:15:06 -07:00
Michael Yang 3520c0e4d5 cache and reuse intermediate blobs
particularly useful for zipfiles and f16s
2024-05-20 13:25:10 -07:00
Patrick Devine ccdf0b2a44
Move the parser back + handle utf16 files (#4533) 2024-05-20 11:26:45 -07:00
Daniel Hiltgen 02b31c9dc8 Don't return error on signal exit 2024-05-16 16:25:38 -07:00
Patrick Devine d1692fd3e0
fix the cpu estimatedTotal memory + get the expiry time for loading models (#4461) 2024-05-15 15:43:16 -07:00
Patrick Devine f2cf97d6f1
fix typo in modelfile generation (#4439) 2024-05-14 15:34:29 -07:00
Michael Yang 85a57006d1 check if name exists before create/pull/copy 2024-05-14 14:58:58 -07:00
Michael Yang c2714fcbfd routes: use Manifests for ListHandler 2024-05-14 14:08:24 -07:00
Michael Yang a2fc933fed update delete handler to use model.Name 2024-05-14 14:08:24 -07:00
Ryo Machida 798b107f19
Fixed the API endpoint /api/tags when the model list is empty. (#4424)
* Fixed the API endpoint /api/tags to return {models: []} instead of {models: null} when the model list is empty.

* Update server/routes.go

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-14 11:18:10 -07:00
Patrick Devine 7ca71a6b0f
don't abort when an invalid model name is used in /save (#4416) 2024-05-13 18:48:28 -07:00
Patrick Devine 6845988807
Ollama `ps` command for showing currently loaded models (#4327) 2024-05-13 17:17:36 -07:00
Jeffrey Morgan 6602e793c0
Use `--quantize` flag and `quantize` api parameter (#4321)
* rename `--quantization` to `--quantize`

* backwards

* Update api/types.go

Co-authored-by: Michael Yang <mxyng@pm.me>

---------

Co-authored-by: Michael Yang <mxyng@pm.me>
2024-05-10 13:06:13 -07:00
Michael Yang e03637176d fix(routes): skip bad manifests 2024-05-10 08:46:11 -07:00