Commit Graph

765 Commits

Author SHA1 Message Date
Michael Yang 4ecc70d3b4
Merge pull request #6386 from zwwhdls/fix-new-layer
fix: chmod new layer to 0o644 when creating it
2024-08-21 10:58:45 -07:00
Daniel Hiltgen 88e7705079
Merge pull request #6402 from rick-github/numParallel
Override numParallel in pickBestPartialFitByLibrary() only if unset.
2024-08-19 11:07:22 -07:00
Jeffrey Morgan 9fddef3731
server: limit upload parts to 16 (#6411) 2024-08-19 09:20:52 -07:00
Richard Lyons 885cf45087 Fix white space. 2024-08-18 03:07:16 +02:00
Richard Lyons 9352eeb752 Reset NumCtx. 2024-08-18 02:55:01 +02:00
Richard Lyons 0ad0e738cd Override numParallel only if unset. 2024-08-18 01:43:26 +02:00
zwwhdls bdc4308afb fix: chmod new layer to 0o644 when creating it
Signed-off-by: zwwhdls <zww@hdls.me>
2024-08-16 11:43:19 +08:00
Michael Yang 3a75e74e34 only skip invalid json manifests 2024-08-15 10:29:14 -07:00
Michael Yang 237dccba1e skip invalid manifest files 2024-08-14 16:55:45 -07:00
Michael Yang b3f75fc812 fix noprune 2024-08-14 15:48:51 -07:00
Blake Mizerany 8e1050f366
server: reduce max connections used in download (#6347)
The previous value of 64 was WAY too high and unnecessary. It reached
diminishing returns and blew past it. This is a more reasonable number
for _most_ normal cases. For users on cloud servers with excellent
network quality, this will keep screaming for them, without hitting our
CDN limits. For users with relatively poor network quality, this will
keep them from saturating their network and causing other issues.
2024-08-13 16:47:35 -07:00
Michael Yang 2697d7f5aa lint
- fixes printf: non-constant format string in call to fmt.Printf
- fixes SA1032: arguments have the wrong order
- disables testifylint
2024-08-13 14:36:33 -07:00
royjhan 8b00a415ab
Load Embedding Model on Empty Input (#6325)
* load on empty input

* no load on invalid input
2024-08-13 10:19:56 -07:00
Josh 980dd15f81
cmd: speed up gguf creates (#6324) 2024-08-12 11:46:09 -07:00
Josh 1dc3ef3aa9
Revert "server: speed up single gguf creates (#5898)" (#6323)
This reverts commit 8aac22438e.
2024-08-12 09:57:51 -07:00
Josh 8aac22438e
server: speed up single gguf creates (#5898) 2024-08-12 09:28:55 -07:00
Jeffrey Morgan 15c2d8fe14
server: parallelize embeddings in API web handler instead of in subprocess runner (#6220)
For simplicity, perform parallelization of embedding requests in the API handler instead of offloading this to the subprocess runner. This keeps the scheduling story simpler as it builds on existing parallel requests, similar to existing text completion functionality.
2024-08-11 11:57:10 -07:00
Jesse Gross 9b53e39d8e
Merge pull request #6258 from coolljt0725/fix_typo
server/download.go: Fix a typo in log
2024-08-09 17:19:48 -07:00
Daniel Hiltgen 2fa1db4345 Don't hard fail on sparse setup error
It seems this can fail in some casees, but proceed
with the download anyway.
2024-08-09 12:16:19 -07:00
Jitang Lei 7b61eba471 server/download.go: Fix a typo in log
Signed-off-by: Jitang Lei <leijitang@outlook.com>
2024-08-08 20:28:01 +08:00
Jesse Gross 7edaf6e7e8 manifest: Store layers inside manifests consistently as values.
Commit 1829fb61 ("manifest: Fix crash on startup when trying to clean up
unused files (#5840)") changed the config layer stored in manifests
from a pointer to a value. This was done in order to avoid potential
nil pointer dereferences after it is deserialized from JSON in the
event that the field is missing.

This changes the Layers slice to also be stored by value. This enables
consistency in handling across the two objects.
2024-08-07 17:03:06 -07:00
Jesse Gross 97ec8cfd4e image: Clarify argument to WriteManifest is config
When creating a model the config layer is appended to the list of
layers and then the last layer is used as the config when writing the
manifest. This change directly uses the config layer to write the
manifest. There is no behavior change but it is less error prone.
2024-08-07 16:58:42 -07:00
Jesse Gross 1829fb61bd manifest: Fix crash on startup when trying to clean up unused files (#5840)
Currently if the config field is missing in the manifest file (or
corrupted), Ollama will crash when it tries to read it. This can
happen at startup or when pulling new models.

This data is mostly just used for showing model information so we
can be tolerant of it not being present - it is not required to
run the models. Besides avoiding crashing, this also gives us the
ability to restructure the config in the future by pulling it
into the main manifest file.
2024-08-07 10:30:44 -07:00
Jesse Gross 685a53534b manifest: Don't prune layers if we can't open a manifest file
If there is an error when opening a manifest file (corrupted, permission denied, etc.)
then the referenced layers will not be included in the list of active
layers. This causes them to be deleted when pruning happens at startup
or a model is pulled.

In such a situation, we should prefer to preserve data in the hopes that
it can be recovered rather than being agressive about deletion.
2024-08-06 23:11:19 -07:00
Daniel Hiltgen fc85f50a2b Ensure sparse files on windows during download
The file.Truncate call on windows will write the whole file
unless you set the sparse flag, leading to heavy I/O at the
beginning of download.  This should improve our
I/O behavior on windows and put less stress on the users disk.
2024-08-06 10:58:08 -07:00
Michael Yang a091fadfda use testing tempdirs 2024-08-02 16:04:06 -07:00
Michael Yang b732beba6a lint 2024-08-01 17:06:06 -07:00
Michael Yang ff7c9060ec
Merge pull request #6115 from slouffka/fix-context
Fix context in /api/generate grows too much (#5980).
2024-08-01 15:13:59 -07:00
Michael Yang 0ff42e84b0
Merge pull request #4756 from ollama/mxyng/convert2
refactor convert
2024-08-01 14:16:30 -07:00
Vyacheslav Moskalev 8a9f946ca7 Refactor and format code. 2024-08-02 03:50:05 +07:00
Vyacheslav Moskalev 3b5210548e Refactor code. Remove extra variable. 2024-08-01 19:56:15 +07:00
Vyacheslav Moskalev b0c216584c Better types and naming closer to style. 2024-08-01 19:43:44 +07:00
Vyacheslav Moskalev 49a5483139 Change the order of context and prompt. 2024-08-01 19:25:56 +07:00
Vyacheslav Moskalev 6bc5c13758 Fix extra context concatenation in generate handler (#5980). 2024-08-01 15:45:58 +07:00
Michael Yang d87b4a488e fix modelfile message quotes 2024-07-31 16:52:09 -07:00
Blake Mizerany dc77bbcfa4
server: fix json marshalling of downloadBlobPart (#6108) 2024-07-31 16:01:24 -07:00
Michael Yang eafc607abb convert: only extract large files 2024-07-31 15:58:55 -07:00
Michael Yang df993fa37b comments 2024-07-31 15:58:55 -07:00
Michael Yang 5e9db9fb0b refactor convert 2024-07-31 15:58:33 -07:00
Michael Yang c4c84b7a0d
Merge pull request #5196 from ollama/mxyng/messages-2
include modelfile messages
2024-07-31 10:18:17 -07:00
Michael Yang 5c1912769e
Merge pull request #5473 from ollama/mxyng/environ
fix: environ lookup
2024-07-31 10:18:05 -07:00
royjhan 1b44d873e7
Add Metrics to `api\embed` response (#5709)
* add prompt tokens to embed response

* rm slog

* metrics

* types

* prompt n

* clean up

* reset submodule

* update tests

* test name

* list metrics
2024-07-30 13:12:21 -07:00
Daniel Hiltgen 345420998e Prevent partial loading on mixed GPU brands
In mult-brand GPU setups, if we couldn't fully load the model we
would fall through the scheduler and mistakenly try to load across
a mix of brands.  This makes sure we find the set of GPU(s) that
best fit for the partial load.
2024-07-30 11:00:55 -07:00
Michael Yang 079b2c3b03
Merge pull request #5999 from ollama/mxyng/fix-push
fix nil deref in auth.go
2024-07-26 14:28:34 -07:00
Blake Mizerany 750c1c55f7
server: fix race conditions during download (#5994)
This fixes various data races scattered throughout the download/pull
client where the client was accessing the download state concurrently.

This commit is mostly a hot-fix and will be replaced by a new client one
day soon.

Also, remove the unnecessary opts argument from downloadChunk.
2024-07-26 14:24:24 -07:00
Michael Yang a622c47bd3 fix nil deref in auth.go 2024-07-26 14:14:48 -07:00
Michael Yang ec4c35fe99
Merge pull request #5512 from ollama/mxyng/detect-stop
autodetect stop parameters from template
2024-07-26 13:48:23 -07:00
Michael Yang 15af558423 include modelfile messages 2024-07-26 11:40:11 -07:00
Blake Mizerany c8af3c2d96
server: reuse original download URL for images (#5962)
This changes the registry client to reuse the original download URL
it gets on the first redirect response for all subsequent requests,
preventing thundering herd issues when hot new LLMs are released.
2024-07-25 15:58:30 -07:00
Josh db0968f30c
fix dupe err message (#5857) 2024-07-22 15:48:15 -07:00
Michael Yang 85d9d73a72 comments 2024-07-22 11:49:03 -07:00
Michael Yang 1954ec5917 uint64 2024-07-22 11:49:02 -07:00
Michael Yang 0f1910129f int 2024-07-22 11:30:07 -07:00
Michael Yang 8570c1c0ef keepalive 2024-07-22 11:27:22 -07:00
Michael Yang 55cd3ddcca bool 2024-07-22 11:27:21 -07:00
Michael Yang 66fe77f084 models 2024-07-22 11:26:12 -07:00
Michael Yang d1a5227cad origins 2024-07-22 11:25:30 -07:00
Michael Yang 35b89b2eab rfc: dynamic environ lookup 2024-07-22 11:25:30 -07:00
Jeffrey Morgan b3e5491e41
server: collect nested tool call objects when parsing (#5824) 2024-07-22 12:38:03 -04:00
Jeffrey Morgan 80ee9b5e47
Remove out of space test temporarily (#5825) 2024-07-21 00:22:11 -04:00
Daniel Hiltgen 06e5d74e34
Merge pull request #5506 from dhiltgen/sched_tests
Refine scheduler unit tests for reliability
2024-07-20 15:48:39 -07:00
Jeffrey Morgan 69a2d4ccff
Fix generate test flakyness (#5804) 2024-07-19 19:11:25 -07:00
Josh e8b954c646
server: validate template (#5734)
add template validation to modelfile
2024-07-19 15:24:29 -07:00
Michael Yang 43606d6d6a fix parsing tool calls 2024-07-18 12:08:11 -07:00
Jeffrey Morgan 70b1010fa5
server: check for empty tools array too (#5779) 2024-07-18 11:44:57 -07:00
Jeffrey Morgan 319fb1ce03
server: only parse tool calls if tools are provided (#5771)
* server: only parse tool calls if tools are provided

* still set `resp.Message.Content`
2024-07-18 08:50:23 -07:00
Michael Yang b255445557
marshal json automatically for some template values (#5758) 2024-07-17 15:35:11 -07:00
Michael Yang 5fd6988126 parse tool call as individual objects 2024-07-17 11:19:04 -07:00
Michael Yang c279f96371 remove ToolCall from GenerateResponse 2024-07-16 15:22:49 -07:00
Michael Yang 499e87c9ba
Merge pull request #5730 from ollama/mxyng/cleanup
remove unneeded tool calls
2024-07-16 14:42:13 -07:00
Michael Yang d290e87513 add suffix support to generate endpoint
this change is triggered by the presence of "suffix", particularly
useful for code completion tasks
2024-07-16 14:31:35 -07:00
Michael Yang 5a83f79afd remove unneeded tool calls 2024-07-16 13:48:45 -07:00
royjhan 987dbab0b0
OpenAI: /v1/embeddings compatibility (#5285)
* OpenAI v1 models

* Empty List Testing

* Add back envconfig

* v1/models docs

* Remove Docs

* OpenAI batch embed compatibility

* merge conflicts

* integrate with api/embed

* ep

* merge conflicts

* request tests

* rm resp test

* merge conflict

* merge conflict

* test fixes

* test fn renaming

* input validation for empty string

---------

Co-authored-by: jmorganca <jmorganca@gmail.com>
2024-07-16 13:36:08 -07:00
Michael Yang a8388beb94
Merge pull request #5726 from ollama/mxyng/tools-templates
fix unmarshal type errors
2024-07-16 12:12:10 -07:00
Michael Yang 5afbb60fc4 fix unmarshal type errors 2024-07-16 11:39:34 -07:00
Jeffrey Morgan 4cb5d7decc
server: omit model system prompt if empty (#5717) 2024-07-16 11:09:00 -07:00
Michael Yang 4a565cbf94 add chat and generate tests with mock runner 2024-07-16 09:39:31 -07:00
Michael Yang 64039df6d7
Merge pull request #5284 from ollama/mxyng/tools
tools
2024-07-15 18:03:37 -07:00
Jeffrey Morgan 7ac6d462ec
server: return empty slice on empty `/api/embed` request (#5713)
* server: return empty slice on empty `/api/embed` request

* fix tests
2024-07-15 17:39:44 -07:00
Michael Yang ef5136a745 tools test 2024-07-15 17:18:21 -07:00
Michael Yang d02bbebb11 tools 2024-07-15 15:26:16 -07:00
royjhan b9f5e16c80
Introduce `/api/embed` endpoint supporting batch embedding (#5127)
* Initial Batch Embedding

* Revert "Initial Batch Embedding"

This reverts commit c22d54895a.

* Initial Draft

* mock up notes

* api/embed draft

* add server function

* check normalization

* clean up

* normalization

* playing around with truncate stuff

* Truncation

* Truncation

* move normalization to go

* Integration Test Template

* Truncation Integration Tests

* Clean up

* use float32

* move normalize

* move normalize test

* refactoring

* integration float32

* input handling and handler testing

* Refactoring of legacy and new

* clear comments

* merge conflicts

* touches

* embedding type 64

* merge conflicts

* fix hanging on single string

* refactoring

* test values

* set context length

* clean up

* testing clean up

* testing clean up

* remove function closure

* Revert "remove function closure"

This reverts commit 55d48c6ed1.

* remove function closure

* remove redundant error check

* clean up

* more clean up

* clean up
2024-07-15 12:14:24 -07:00
Patrick Devine 057d31861e
remove template (#5655) 2024-07-13 20:56:24 -07:00
jmorganca f7ee012300 server: prepend system message in chat handler 2024-07-13 15:08:00 -07:00
Jeffrey Morgan 1ed0aa8fea
server: fix `context`, `load_duration` and `total_duration` fields (#5676)
* server: fix `contet`, `load_duration` and `total_duration` fields

* Update server/routes.go
2024-07-13 09:25:31 -07:00
Michael Yang 22c5451fc2
fix system prompt (#5662)
* fix system prompt

* execute template when hitting previous roles

* fix tests

---------

Co-authored-by: jmorganca <jmorganca@gmail.com>
2024-07-12 21:04:44 -07:00
Michael Yang ebc529cbb3 autodetect stop parameters from template 2024-07-12 16:01:23 -07:00
Michael Yang 57ec6901eb revert embedded templates to use prompt/response
This reverts commit 19753c18c0.

for compat. messages will be added at a later date
2024-07-11 14:49:35 -07:00
Jeffrey Morgan 791650ddef
sched: only error when over-allocating system memory (#5626) 2024-07-11 00:53:12 -07:00
Michael Yang 41be28096a add system prompt to first legacy template 2024-07-10 17:03:08 -07:00
Daniel Hiltgen f4408219e9 Refine scheduler unit tests for reliability
This breaks up some of the test scenarios to create a
more reliable set of tests, as well as adding a little more
coverage.
2024-07-09 16:00:08 -07:00
Michael Yang 6bbbc50f10
Merge pull request #5440 from ollama/mxyng/messages-templates
update named templates
2024-07-09 09:36:32 -07:00
Michael Yang 9bbddc37a7
Merge pull request #5126 from ollama/mxyng/messages
update message processing
2024-07-09 09:20:44 -07:00
Jeffrey Morgan e4ff73297d
server: fix model reloads when setting `OLLAMA_NUM_PARALLEL` (#5560)
* server: fix unneeded model reloads when setting `OLLAMA_NUM_PARALLEL`

* remove whitespace change

* undo some changes
2024-07-08 22:32:15 -07:00
Jeffrey Morgan 0ee87615c7
sched: don't error if paging to disk on Windows and macOS (#5523) 2024-07-06 22:01:52 -04:00
Michael Yang fb6cbc02fb update named templates 2024-07-05 16:29:32 -07:00
Michael Yang ac7a842e55 fix model reloading
ensure runtime model changes (template, system prompt, messages,
options) are captured on model updates without needing to reload the
server
2024-07-05 13:17:25 -07:00
Michael Yang 2c3fe1fd97 comments 2024-07-05 13:17:24 -07:00
Michael Yang 269ed6e6a2 update message processing 2024-07-05 13:16:58 -07:00
Daniel Hiltgen af28b94533
Merge pull request #5469 from dhiltgen/prevent_system_oom
Prevent loading models larger than total memory
2024-07-05 08:22:20 -07:00