Commit Graph

2343 Commits

Author SHA1 Message Date
Blake Mizerany 9f2d8d2117 ... 2024-04-04 00:11:31 -07:00
Blake Mizerany d42c3f6be1 x/build/blob: add fuzz test for ParseRef 2024-04-03 23:50:43 -07:00
Blake Mizerany 4ea3e9efa6 x/build/blob: lock in zero allocs for ParseRef 2024-04-03 23:03:36 -07:00
Blake Mizerany 2e1ea6ecaa x/build/blob: move most commit value checks to emit func 2024-04-03 22:55:53 -07:00
Blake Mizerany 6d2da77ce2 x/build/blob: add Parts for streaming ref parts
Also, make ParseRef use the new Parts method to parse the ref parts.
2024-04-03 22:27:55 -07:00
Blake Mizerany def4d902bf ... wip still broke 2024-04-03 22:15:58 -07:00
Blake Mizerany 76a202c04e ... 2024-04-03 20:52:27 -07:00
Blake Mizerany f7cfe946dc x/registry: fixing tests wip 2024-04-03 16:37:27 -07:00
Blake Mizerany 005b6373e2 x/registry: fix startMinio 2024-04-03 16:19:50 -07:00
Blake Mizerany d54e0fb3b2 ... 2024-04-03 16:14:22 -07:00
Blake Mizerany bdd05e0ae0 x/registry: skip ref test 2024-04-03 15:59:23 -07:00
Blake Mizerany 1a346640db x/registry: work on getting basic test passing 2024-04-03 15:58:04 -07:00
Blake Mizerany f5883070f8 x/registry: upload smoke test passing 2024-04-03 14:30:58 -07:00
Blake Mizerany adc23d5f96 Add 'x/' from commit 'a10a11b9d371f36b7c3510da32a1d70b74e27bd1'
git-subtree-dir: x
git-subtree-mainline: 7d05a6ee8f
git-subtree-split: a10a11b9d3
2024-04-03 10:40:23 -07:00
Blake Mizerany a10a11b9d3 registry: initial work on multipart pushes 2024-04-03 10:39:30 -07:00
Blake Mizerany 7d05a6ee8f
cmd: provide feedback if OLLAMA_MODELS is set on non-serve command (#3470)
This also moves the checkServerHeartbeat call out of the "RunE" Cobra
stuff (that's the only word I have for that) to on-site where it's after
the check for OLLAMA_MODELS, which allows the helpful error message to
be printed before the server heartbeat check. This also arguably makes
the code more readable without the magic/superfluous "pre" function
caller.
2024-04-02 22:11:13 -07:00
Daniel Hiltgen 464d817824
Merge pull request #3464 from dhiltgen/subprocess
Fix numgpu opt miscomparison
2024-04-02 20:10:17 -07:00
Pier Francesco Contino 531324a9be
feat: add OLLAMA_DEBUG in ollama server help message (#3461)
Co-authored-by: Pier Francesco Contino <pfcontino@gmail.com>
2024-04-02 18:20:03 -07:00
Daniel Hiltgen 6589eb8a8c Revert options as a ref in the server 2024-04-02 16:44:10 -07:00
Michael Yang a039e383cd
Merge pull request #3465 from ollama/mxyng/fix-metal
fix metal gpu
2024-04-02 16:29:58 -07:00
Michael Yang 80163ebcb5 fix metal gpu 2024-04-02 16:06:45 -07:00
Daniel Hiltgen a57818d93e
Merge pull request #3343 from dhiltgen/bump_more2
Bump llama.cpp to b2581
2024-04-02 15:08:26 -07:00
Blake Mizerany 94befe366a ... 2024-04-02 14:28:06 -07:00
Blake Mizerany c95f97689b utils/upload: init 2024-04-02 14:15:21 -07:00
Blake Mizerany 618eb5b909 registry: multipart push 2024-04-02 13:40:23 -07:00
Daniel Hiltgen 841adda157 Fix windows lint CI flakiness 2024-04-02 12:22:16 -07:00
Daniel Hiltgen 0035e31af8 Bump to b2581 2024-04-02 11:53:07 -07:00
Blake Mizerany eb75418be9 build/blob: test ParseRef round-trip 2024-04-02 11:45:01 -07:00
Blake Mizerany 9959da05de build/blob: break out test refs for other tests/fuzzing 2024-04-02 11:38:10 -07:00
Daniel Hiltgen c863c6a96d
Merge pull request #3218 from dhiltgen/subprocess
Switch back to subprocessing for llama.cpp
2024-04-02 10:49:44 -07:00
Blake Mizerany aff7970628 build: remove superfluous parseCompleteRef 2024-04-01 23:41:42 -07:00
Blake Mizerany 628f1feb36 build: back to taking manifests as []byte
Its nicer to have the manifests be an opaque []byte, rather than a
struct. This way users of the build package don't need to know about the
internal structure of the manifests. The registry can interpret the
manifests as it sees fit, while letting build keep its own Go type of
manifest which is easier to work with in the build package.
2024-04-01 23:18:58 -07:00
Blake Mizerany ce3125afd5 registry: add New and take a minio client as argument 2024-04-01 22:53:49 -07:00
Blake Mizerany f488652ba7 build: make Build accept only refs without builds 2024-04-01 22:12:43 -07:00
Blake Mizerany 2318ed2919 build: remove unused manifest() 2024-04-01 21:59:38 -07:00
Blake Mizerany b1b8be33d9 build: cleanup error names and other things 2024-04-01 21:57:34 -07:00
Blake Mizerany 876f7eab81 build: move Manifest from internal/blobstore to build
It was getting confusing to have the arbirary handling of manifests in
the blobstore. It also prevented us from using model.Ref in the
blobstore because of cyclic dependencies.

This is much easier to grok now.
2024-04-01 21:43:30 -07:00
Blake Mizerany 7cfc8a0838 build/blob: fix awkward Ref type 2024-04-01 21:25:18 -07:00
Daniel Hiltgen 1f11b52511 Refined min memory from testing 2024-04-01 16:48:33 -07:00
Daniel Hiltgen 526d4eb204 Release gpu discovery library after use
Leaving the cudart library loaded kept ~30m of memory
pinned in the GPU in the main process.  This change ensures
we don't hold GPU resources when idle.
2024-04-01 16:48:33 -07:00
Daniel Hiltgen 0a74cb31d5 Safeguard for noexec
We may have users that run into problems with our current
payload model, so this gives us an escape valve.
2024-04-01 16:48:33 -07:00
Daniel Hiltgen 10ed1b6292 Detect too-old cuda driver
"cudart init failure: 35" isn't particularly helpful in the logs.
2024-04-01 16:48:33 -07:00
Daniel Hiltgen 4fec5816d6 Integration test improvements
Cleaner shutdown logic, a bit of response hardening
2024-04-01 16:48:18 -07:00
Daniel Hiltgen 0a0e9f3e0f Apply 01-cache.diff 2024-04-01 16:48:18 -07:00
Daniel Hiltgen 58d95cc9bd Switch back to subprocessing for llama.cpp
This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems.  This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.
2024-04-01 16:48:18 -07:00
Patrick Devine 3b6a9154dd
Simplify model conversion (#3422) 2024-04-01 16:14:53 -07:00
Michael Yang d6dd2ff839
Merge pull request #3241 from ollama/mxyng/mem
update memory estimations for gpu offloading
2024-04-01 13:59:14 -07:00
Michael Yang e57a6ba89f
Merge pull request #2926 from ollama/mxyng/decode-ggml-v2
refactor model parsing
2024-04-01 13:58:13 -07:00
Michael Yang 12ec2346ef
Merge pull request #3442 from ollama/mxyng/generate-output
fix generate output
2024-04-01 13:56:09 -07:00
Michael Yang 1ec0df1069 fix generate output 2024-04-01 13:47:34 -07:00