Commit Graph

350 Commits

Author SHA1 Message Date
Vyacheslav Moskalev 3b5210548e Refactor code. Remove extra variable. 2024-08-01 19:56:15 +07:00
Vyacheslav Moskalev b0c216584c Better types and naming closer to style. 2024-08-01 19:43:44 +07:00
Vyacheslav Moskalev 49a5483139 Change the order of context and prompt. 2024-08-01 19:25:56 +07:00
Vyacheslav Moskalev 6bc5c13758 Fix extra context concatenation in generate handler (#5980). 2024-08-01 15:45:58 +07:00
Michael Yang c4c84b7a0d
Merge pull request #5196 from ollama/mxyng/messages-2
include modelfile messages
2024-07-31 10:18:17 -07:00
Michael Yang 5c1912769e
Merge pull request #5473 from ollama/mxyng/environ
fix: environ lookup
2024-07-31 10:18:05 -07:00
royjhan 1b44d873e7
Add Metrics to `api\embed` response (#5709)
* add prompt tokens to embed response

* rm slog

* metrics

* types

* prompt n

* clean up

* reset submodule

* update tests

* test name

* list metrics
2024-07-30 13:12:21 -07:00
Michael Yang 15af558423 include modelfile messages 2024-07-26 11:40:11 -07:00
Josh db0968f30c
fix dupe err message (#5857) 2024-07-22 15:48:15 -07:00
Michael Yang 55cd3ddcca bool 2024-07-22 11:27:21 -07:00
Michael Yang d1a5227cad origins 2024-07-22 11:25:30 -07:00
Michael Yang 35b89b2eab rfc: dynamic environ lookup 2024-07-22 11:25:30 -07:00
Jeffrey Morgan b3e5491e41
server: collect nested tool call objects when parsing (#5824) 2024-07-22 12:38:03 -04:00
Josh e8b954c646
server: validate template (#5734)
add template validation to modelfile
2024-07-19 15:24:29 -07:00
Jeffrey Morgan 70b1010fa5
server: check for empty tools array too (#5779) 2024-07-18 11:44:57 -07:00
Jeffrey Morgan 319fb1ce03
server: only parse tool calls if tools are provided (#5771)
* server: only parse tool calls if tools are provided

* still set `resp.Message.Content`
2024-07-18 08:50:23 -07:00
Michael Yang c279f96371 remove ToolCall from GenerateResponse 2024-07-16 15:22:49 -07:00
Michael Yang d290e87513 add suffix support to generate endpoint
this change is triggered by the presence of "suffix", particularly
useful for code completion tasks
2024-07-16 14:31:35 -07:00
royjhan 987dbab0b0
OpenAI: /v1/embeddings compatibility (#5285)
* OpenAI v1 models

* Empty List Testing

* Add back envconfig

* v1/models docs

* Remove Docs

* OpenAI batch embed compatibility

* merge conflicts

* integrate with api/embed

* ep

* merge conflicts

* request tests

* rm resp test

* merge conflict

* merge conflict

* test fixes

* test fn renaming

* input validation for empty string

---------

Co-authored-by: jmorganca <jmorganca@gmail.com>
2024-07-16 13:36:08 -07:00
Jeffrey Morgan 4cb5d7decc
server: omit model system prompt if empty (#5717) 2024-07-16 11:09:00 -07:00
Michael Yang d02bbebb11 tools 2024-07-15 15:26:16 -07:00
royjhan b9f5e16c80
Introduce `/api/embed` endpoint supporting batch embedding (#5127)
* Initial Batch Embedding

* Revert "Initial Batch Embedding"

This reverts commit c22d54895a.

* Initial Draft

* mock up notes

* api/embed draft

* add server function

* check normalization

* clean up

* normalization

* playing around with truncate stuff

* Truncation

* Truncation

* move normalization to go

* Integration Test Template

* Truncation Integration Tests

* Clean up

* use float32

* move normalize

* move normalize test

* refactoring

* integration float32

* input handling and handler testing

* Refactoring of legacy and new

* clear comments

* merge conflicts

* touches

* embedding type 64

* merge conflicts

* fix hanging on single string

* refactoring

* test values

* set context length

* clean up

* testing clean up

* testing clean up

* remove function closure

* Revert "remove function closure"

This reverts commit 55d48c6ed1.

* remove function closure

* remove redundant error check

* clean up

* more clean up

* clean up
2024-07-15 12:14:24 -07:00
Patrick Devine 057d31861e
remove template (#5655) 2024-07-13 20:56:24 -07:00
jmorganca f7ee012300 server: prepend system message in chat handler 2024-07-13 15:08:00 -07:00
Jeffrey Morgan 1ed0aa8fea
server: fix `context`, `load_duration` and `total_duration` fields (#5676)
* server: fix `contet`, `load_duration` and `total_duration` fields

* Update server/routes.go
2024-07-13 09:25:31 -07:00
Michael Yang ac7a842e55 fix model reloading
ensure runtime model changes (template, system prompt, messages,
options) are captured on model updates without needing to reload the
server
2024-07-05 13:17:25 -07:00
Michael Yang 2c3fe1fd97 comments 2024-07-05 13:17:24 -07:00
Michael Yang 269ed6e6a2 update message processing 2024-07-05 13:16:58 -07:00
Daniel Hiltgen 955f2a4e03 Only set default keep_alive on initial model load
This change fixes the handling of keep_alive so that if client
request omits the setting, we only set this on initial load.  Once
the model is loaded, if new requests leave this unset, we'll keep
whatever keep_alive was there.
2024-07-03 15:29:56 -07:00
Michael Yang 65a5040e09 fix generate template 2024-07-02 16:42:17 -07:00
royjhan d626b99b54
OpenAI: v1/completions compatibility (#5209)
* OpenAI v1 models

* Refactor Writers

* Add Test

Co-Authored-By: Attila Kerekes

* Credit Co-Author

Co-Authored-By: Attila Kerekes <439392+keriati@users.noreply.github.com>

* Empty List Testing

* Use Namespace for Ownedby

* Update Test

* Add back envconfig

* v1/models docs

* Use ModelName Parser

* Test Names

* Remove Docs

* Clean Up

* Test name

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Add Middleware for Chat and List

* Completions Endpoint

* Testing Cleanup

* Test with Fatal

* Add functionality to chat test

* Rename function

* float types

* type cleanup

* cleaning

* more cleaning

* Extra test cases

* merge conflicts

* merge conflicts

* merge conflicts

* merge conflicts

* cleaning

* cleaning

---------

Co-authored-by: Attila Kerekes <439392+keriati@users.noreply.github.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-07-02 16:01:45 -07:00
Michael Yang dddb58a38b
Merge pull request #5051 from ollama/mxyng/capabilities
add model capabilities
2024-07-02 14:26:07 -07:00
royjhan 996bb1b85e
OpenAI: /v1/models and /v1/models/{model} compatibility (#5007)
* OpenAI v1 models

* Refactor Writers

* Add Test

Co-Authored-By: Attila Kerekes

* Credit Co-Author

Co-Authored-By: Attila Kerekes <439392+keriati@users.noreply.github.com>

* Empty List Testing

* Use Namespace for Ownedby

* Update Test

* Add back envconfig

* v1/models docs

* Use ModelName Parser

* Test Names

* Remove Docs

* Clean Up

* Test name

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Add Middleware for Chat and List

* Testing Cleanup

* Test with Fatal

* Add functionality to chat test

* OpenAI: /v1/models/{model} compatibility (#5028)

* Retrieve Model

* OpenAI Delete Model

* Retrieve Middleware

* Remove Delete from Branch

* Update Test

* Middleware Test File

* Function name

* Cleanup

* Test Update

* Test Update

---------

Co-authored-by: Attila Kerekes <439392+keriati@users.noreply.github.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-07-02 11:50:56 -07:00
Michael Yang a30915bde1 add capabilities 2024-07-01 10:47:43 -07:00
Michael Yang 58e3fff311 rename templates to template 2024-07-01 10:40:54 -07:00
Daniel Hiltgen 3518aaef33
Merge pull request #4218 from dhiltgen/auto_parallel
Enable concurrency by default
2024-07-01 08:32:29 -07:00
Blake Mizerany cb42e607c5
llm: speed up gguf decoding by a lot (#5246)
Previously, some costly things were causing the loading of GGUF files
and their metadata and tensor information to be VERY slow:

  * Too many allocations when decoding strings
  * Hitting disk for each read of each key and value, resulting in a
    not-okay amount of syscalls/disk I/O.

The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro
m3.

This commit also prevents collecting large arrays of values when
decoding GGUFs (if desired). When such keys are encountered, their
values are null, and are encoded as such in JSON.

Also, this fixes a broken test that was not encoding valid GGUF.
2024-06-24 21:47:52 -07:00
Daniel Hiltgen 642cee1342 Sort the ps output
Provide consistent ordering for the ps command - longest duration listed first
2024-06-21 15:59:41 -07:00
royjhan fedf71635e
Extend api/show and ollama show to return more model info (#4881)
* API Show Extended

* Initial Draft of Information

Co-Authored-By: Patrick Devine <pdevine@sonic.net>

* Clean Up

* Descriptive arg error messages and other fixes

* Second Draft of Show with Projectors Included

* Remove Chat Template

* Touches

* Prevent wrapping from files

* Verbose functionality

* Docs

* Address Feedback

* Lint

* Resolve Conflicts

* Function Name

* Tests for api/show model info

* Show Test File

* Add Projector Test

* Clean routes

* Projector Check

* Move Show Test

* Touches

* Doc update

---------

Co-authored-by: Patrick Devine <pdevine@sonic.net>
2024-06-19 14:19:02 -07:00
royjhan 89c79bec8c
Add ModifiedAt Field to /api/show (#5033)
* Add Mod Time to Show

* Error Handling
2024-06-15 20:53:56 -07:00
royjhan 1a29e9a879
API app/browser access (#4879)
* API app/browser access

* Add tauri (resolves #2291, #4791, #3799, #4388)
2024-06-06 15:19:03 -07:00
royjhan 4bf1da4944
Separate ListResponse and ModelResponse for api/tags vs api/ps (#4842)
* Remove false time fields

* Struct Separation for List and Process

* Remove Marshaler
2024-06-06 10:11:45 -07:00
Michael Yang d61ef8b954 update create handler to use model.Name 2024-06-04 13:28:25 -07:00
Michael Yang e40145a39d lint 2024-06-04 11:13:30 -07:00
Michael Yang 04f3c12bb7 replace x/exp/slices with slices 2024-06-04 11:13:30 -07:00
Michael Yang 96bc232b43
Merge pull request #4413 from ollama/mxyng/name-check
check if name exists before create/pull/copy
2024-05-29 12:06:58 -07:00
Michael Yang bca7b12284
Merge pull request #3718 from ollama/mxyng/modelname-3
update delete handler to use model.Name
2024-05-29 12:02:07 -07:00
Patrick Devine 4cc3be3035
Move envconfig and consolidate env vars (#4608) 2024-05-24 14:57:15 -07:00
Michael Yang f36f1d6be9 tidy intermediate blobs 2024-05-20 15:15:06 -07:00
Michael Yang 3520c0e4d5 cache and reuse intermediate blobs
particularly useful for zipfiles and f16s
2024-05-20 13:25:10 -07:00
Patrick Devine ccdf0b2a44
Move the parser back + handle utf16 files (#4533) 2024-05-20 11:26:45 -07:00
Daniel Hiltgen 02b31c9dc8 Don't return error on signal exit 2024-05-16 16:25:38 -07:00
Patrick Devine d1692fd3e0
fix the cpu estimatedTotal memory + get the expiry time for loading models (#4461) 2024-05-15 15:43:16 -07:00
Patrick Devine f2cf97d6f1
fix typo in modelfile generation (#4439) 2024-05-14 15:34:29 -07:00
Michael Yang 85a57006d1 check if name exists before create/pull/copy 2024-05-14 14:58:58 -07:00
Michael Yang c2714fcbfd routes: use Manifests for ListHandler 2024-05-14 14:08:24 -07:00
Michael Yang a2fc933fed update delete handler to use model.Name 2024-05-14 14:08:24 -07:00
Ryo Machida 798b107f19
Fixed the API endpoint /api/tags when the model list is empty. (#4424)
* Fixed the API endpoint /api/tags to return {models: []} instead of {models: null} when the model list is empty.

* Update server/routes.go

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-14 11:18:10 -07:00
Patrick Devine 7ca71a6b0f
don't abort when an invalid model name is used in /save (#4416) 2024-05-13 18:48:28 -07:00
Patrick Devine 6845988807
Ollama `ps` command for showing currently loaded models (#4327) 2024-05-13 17:17:36 -07:00
Jeffrey Morgan 6602e793c0
Use `--quantize` flag and `quantize` api parameter (#4321)
* rename `--quantization` to `--quantize`

* backwards

* Update api/types.go

Co-authored-by: Michael Yang <mxyng@pm.me>

---------

Co-authored-by: Michael Yang <mxyng@pm.me>
2024-05-10 13:06:13 -07:00
Michael Yang e03637176d fix(routes): skip bad manifests 2024-05-10 08:46:11 -07:00
Daniel Hiltgen 3ae2f441e0 Fix race in shutdown logic
Ensure the runners are terminated
2024-05-09 15:54:02 -07:00
Daniel Hiltgen 8727a9c140 Record more GPU information
This cleans up the logging for GPU discovery a bit, and can
serve as a foundation to report GPU information in a future UX.
2024-05-09 14:18:14 -07:00
Bruce MacDonald cfa84b8470
add done_reason to the api (#4235) 2024-05-09 13:30:14 -07:00
Michael Yang a7ee84fc31 routes: skip invalid filepaths 2024-05-09 11:23:22 -07:00
Jeffrey Morgan d5eec16d23
use model defaults for `num_gqa`, `rope_frequency_base ` and `rope_frequency_scale` (#1983) 2024-05-09 09:06:13 -07:00
Bruce MacDonald cef45feaa4
Add preflight OPTIONS handling and update CORS config (#4086)
* Add preflight OPTIONS handling and update CORS config

- Implement early return with HTTP 204 (No Content) for OPTIONS requests in allowedHostsMiddleware to optimize preflight handling.

- Extend CORS configuration to explicitly allow 'Authorization' headers and 'OPTIONS' method when OLLAMA_ORIGINS environment variable is set.

* allow auth, content-type, and user-agent headers

* Update routes.go
2024-05-08 13:14:00 -07:00
Bruce MacDonald 8cbd3e7510
skip hidden files in list models handler (#4247) 2024-05-07 19:01:45 -07:00
Bruce MacDonald dc9b1111e0 fix invalid destination error message 2024-05-07 17:35:52 -07:00
Michael Yang ffbd3d173f
Merge pull request #3715 from ollama/mxyng/modelname-2
update list handler to use model.Name
2024-05-07 15:21:39 -07:00
Michael Yang 1e0a669f75
Merge pull request #3682 from ollama/mxyng/quantize-all-the-things
quantize any fp16/fp32 model
2024-05-07 15:20:49 -07:00
Michael Yang 548a7df014 update list handler to use model.Name 2024-05-07 09:38:45 -07:00
Jeffrey Morgan 39d9d22ca3
close server on receiving signal (#4213) 2024-05-06 16:01:37 -07:00
Michael Yang 9685c34509 quantize any fp16/fp32 model
- FROM /path/to/{safetensors,pytorch}
- FROM /path/to/fp{16,32}.bin
- FROM model:fp{16,32}
2024-05-06 15:24:01 -07:00
Daniel Hiltgen f56aa20014 Centralize server config handling
This moves all the env var reading into one central module
and logs the loaded config once at startup which should
help in troubleshooting user server logs
2024-05-05 16:49:50 -07:00
Daniel Hiltgen 20f6c06569 Make maximum pending request configurable
This also bumps up the default to be 50 queued requests
instead of 10.
2024-05-04 21:00:52 -07:00
Michael Yang b7a87a22b6
Merge pull request #4059 from ollama/mxyng/parser-2
rename parser to model/file
2024-05-03 13:01:22 -07:00
Michael Yang e9ae607ece
Merge pull request #3892 from ollama/mxyng/parser
refactor modelfile parser
2024-05-02 17:04:47 -07:00
Michael Yang 45b6a12e45 server: target invalid 2024-05-01 12:40:45 -07:00
Michael Yang 119589fcb3 rename parser to model/file 2024-05-01 09:53:50 -07:00
Michael Yang 9cf0f2e973 use parser.Format instead of templating modelfile 2024-05-01 09:52:54 -07:00
Jeffrey Morgan bb31def011
return code `499` when user cancels request while a model is loading (#3955) 2024-04-26 17:38:29 -04:00
Michael Yang 592dae31c8 update copy to use model.Name 2024-04-24 15:54:54 -07:00
Daniel Hiltgen 34b9db5afc Request and model concurrency
This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.
2024-04-22 19:29:12 -07:00
Jeffrey Morgan a0b8a32eb4
Terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading (#3653)
* terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading

* use `unload` in signal handler
2024-04-15 12:09:32 -04:00
Michael Yang 9502e5661f cgo quantize 2024-04-08 15:31:08 -07:00
Michael Yang e1c9a2a00f no blob create if already exists 2024-04-08 15:09:48 -07:00
Daniel Hiltgen 6589eb8a8c Revert options as a ref in the server 2024-04-02 16:44:10 -07:00
Daniel Hiltgen 58d95cc9bd Switch back to subprocessing for llama.cpp
This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems.  This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.
2024-04-01 16:48:18 -07:00
Michael Yang 91b3e4d282 update memory calcualtions
count each layer independently when deciding gpu offloading
2024-04-01 13:16:32 -07:00
Michael Yang af8a8a6b59 fix: trim quotes on OLLAMA_ORIGINS 2024-03-27 15:24:29 -07:00
Patrick Devine 1b272d5bcd
change `github.com/jmorganca/ollama` to `github.com/ollama/ollama` (#3347) 2024-03-26 13:04:17 -07:00
Blake Mizerany 703684a82a
server: replace blob prefix separator from ':' to '-' (#3146)
This fixes issues with blob file names that contain ':' characters to be rejected by file systems that do not support them.
2024-03-14 20:18:06 -07:00
Patrick Devine 47cfe58af5
Default Keep Alive environment variable (#3094)
---------

Co-authored-by: Chris-AS1 <8493773+Chris-AS1@users.noreply.github.com>
2024-03-13 13:29:40 -07:00
Daniel Hiltgen 4a5c9b8035 Finish unwinding idempotent payload logic
The recent ROCm change partially removed idempotent
payloads, but the ggml-metal.metal file for mac was still
idempotent.  This finishes switching to always extract
the payloads, and now that idempotentcy is gone, the
version directory is no longer useful.
2024-03-09 08:34:39 -08:00
Jeffrey Morgan 5b3fad9636 separate out `isLocalIP` 2024-03-09 00:22:08 -08:00
Jeffrey Morgan bfec2c6e10 simplify host checks 2024-03-08 23:29:53 -08:00
Jeffrey Morgan 5c143af726 add additional allowed hosts 2024-03-08 23:23:59 -08:00
Jeffrey Morgan fc8c044584
add allowed host middleware and remove `workDir` middleware (#3018) 2024-03-08 22:23:47 -08:00