Daniel Hiltgen
e890be4814
Revert "More parallelism on windows generate"
...
This reverts commit 0577af98f4 .
2024-06-17 13:32:46 -07:00
Daniel Hiltgen
b2799f111b
Move libraries out of users path
...
We update the PATH on windows to get the CLI mapped, but this has
an unintended side effect of causing other apps that may use our bundled
DLLs to get terminated when we upgrade.
2024-06-17 13:12:18 -07:00
Jeffrey Morgan
152fc202f5
llm: update llama.cpp commit to `7c26775` ( #4896 )
...
* llm: update llama.cpp submodule to `7c26775`
* disable `LLAMA_BLAS` for now
* `-DLLAMA_OPENMP=off`
2024-06-17 15:56:16 -04:00
Lei Jitang
4ad0d4d6d3
Fix a build warning ( #5096 )
...
Signed-off-by: Lei Jitang <leijitang@outlook.com>
2024-06-17 14:47:48 -04:00
Jeffrey Morgan
163cd3e77c
gpu: add env var for detecting Intel oneapi gpus ( #5076 )
...
* gpu: add env var for detecting intel oneapi gpus
* fix build error
2024-06-16 20:09:05 -04:00
Daniel Hiltgen
4c2c8f93dd
Merge pull request #5080 from dhiltgen/debug_intel_crash
...
Add some more debugging logs for intel discovery
2024-06-16 14:42:41 -07:00
Daniel Hiltgen
fd1e6e0590
Add some more debugging logs for intel discovery
...
Also removes an unused overall count variable
2024-06-16 07:42:52 -07:00
royjhan
89c79bec8c
Add ModifiedAt Field to /api/show ( #5033 )
...
* Add Mod Time to Show
* Error Handling
2024-06-15 20:53:56 -07:00
Jeffrey Morgan
c7b77004e3
docs: add missing powershell package to windows development instructions ( #5075 )
...
* docs: add missing instruction for powershell build
The powershell script for building Ollama on Windows now requires the `ThreadJob` module. Add this to the instructions and dependency list.
* Update development.md
2024-06-15 23:08:09 -04:00
pufferffish
b6554e9b8c
fix vulkan handle releasing
2024-06-15 21:11:07 +01:00
Daniel Hiltgen
07d143f412
Merge pull request #5058 from coolljt0725/fix_build_warning
...
gpu: Fix build warning
2024-06-15 11:52:36 -07:00
Daniel Hiltgen
a12283e2ff
Implement custom github release action
...
This implements the release logic we want via gh cli
to support updating releases with rc tags in place and retain
release notes and other community reactions.
2024-06-15 11:36:56 -07:00
Daniel Hiltgen
4b0050cf0e
Merge pull request #5037 from dhiltgen/faster_win_build
...
More parallelism on windows generate
2024-06-15 08:03:05 -07:00
Daniel Hiltgen
0577af98f4
More parallelism on windows generate
...
Make the build faster
2024-06-15 07:44:55 -07:00
Daniel Hiltgen
17ce203a26
Merge pull request #4875 from dhiltgen/rocm_gfx900_workaround
...
Rocm gfx900 workaround
2024-06-15 07:38:58 -07:00
Daniel Hiltgen
d76555ffb5
Merge pull request #4874 from dhiltgen/rocm_v6_bump
...
Rocm v6 bump
2024-06-15 07:38:32 -07:00
Daniel Hiltgen
2786dff5d3
Merge pull request #4264 from dhiltgen/show_gpu_visible_settings
...
Centralize GPU configuration vars
2024-06-15 07:33:52 -07:00
DSLstandard
b958cd2848
remove cap_get_bound check
2024-06-15 20:19:19 +08:00
KOISHI KOMEIJI FROM TOUHOU 11
e3f9ca4009
fix check_perfmon len
2024-06-15 20:13:15 +08:00
pufferffish
38466f1821
fix build
2024-06-15 12:06:43 +01:00
pufferffish
18f3f960b0
update gpu.go
2024-06-15 12:05:01 +01:00
pufferffish
e77ea68e11
Merge branch 'refs/heads/main' into vulkan
...
# Conflicts:
# gpu/gpu.go
2024-06-15 12:01:36 +01:00
pufferffish
11c55fab81
fix total memory monitor
2024-06-15 10:58:12 +01:00
pufferffish
257364cb3c
fix free memory monitor
2024-06-15 10:52:34 +01:00
pufferffish
e4e8a5d25a
fix compilation
2024-06-15 09:44:10 +01:00
pufferffish
724fac470f
fix segfault
2024-06-15 08:05:48 +01:00
pufferffish
24c8840037
it builds
2024-06-15 07:49:28 +01:00
Lei Jitang
225f0d1219
gpu: Fix build warning
...
Signed-off-by: Lei Jitang <leijitang@outlook.com>
2024-06-15 14:26:23 +08:00
pufferffish
93c4d69daa
add support in gen_linux.sh
2024-06-15 05:42:59 +01:00
pufferffish
9c6b049567
add support in gpu.go
2024-06-15 05:27:14 +01:00
Daniel Hiltgen
532db58311
Merge pull request #4972 from jayson-cloude/main
...
fix: "Skip searching for network devices"
2024-06-14 17:04:40 -07:00
Daniel Hiltgen
6be309e1bd
Centralize GPU configuration vars
...
This should aid in troubleshooting by capturing and reporting the GPU
settings at startup in the logs along with all the other server settings.
2024-06-14 15:59:10 -07:00
Daniel Hiltgen
da3bf23354
Workaround gfx900 SDMA bugs
...
Implement support for GPU env var workarounds, and leverage
this for the Vega RX 56 which needs
HSA_ENABLE_SDMA=0 set to work properly
2024-06-14 15:38:13 -07:00
Daniel Hiltgen
26ab67732b
Bump ROCm linux to 6.1.1
2024-06-14 15:37:54 -07:00
Daniel Hiltgen
45cacbaf05
Merge pull request #4517 from dhiltgen/gpu_incremental
...
Enhanced GPU discovery and multi-gpu support with concurrency
2024-06-14 15:35:00 -07:00
Daniel Hiltgen
17df6520c8
Remove mmap related output calc logic
2024-06-14 14:55:50 -07:00
Daniel Hiltgen
6f351bf586
review comments and coverage
2024-06-14 14:55:50 -07:00
Daniel Hiltgen
ff4f0cbd1d
Prevent multiple concurrent loads on the same gpus
...
While models are loading, the VRAM metrics are dynamic, so try
to load on a GPU that doesn't have a model actively loading, or wait
to avoid races that lead to OOMs
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
fc37c192ae
Refine CPU load behavior with system memory visibility
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
434dfe30c5
Reintroduce nvidia nvml library for windows
...
This library will give us the most reliable free VRAM reporting on windows
to enable concurrent model scheduling.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
4e2b7e181d
Refactor intel gpu discovery
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
48702dd149
Harden unload for empty runners
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
68dfc6236a
refined test timing
...
adjust timing on some tests so they don't timeout on small/slow GPUs
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
5e8ff556cb
Support forced spreading for multi GPU
...
Our default behavior today is to try to fit into a single GPU if possible.
Some users would prefer the old behavior of always spreading across
multiple GPUs even if the model can fit into one. This exposes that
tunable behavior.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
6fd04ca922
Improve multi-gpu handling at the limit
...
Still not complete, needs some refinement to our prediction to understand the
discrete GPUs available space so we can see how many layers fit in each one
since we can't split one layer across multiple GPUs we can't treat free space
as one logical block
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
206797bda4
Fix concurrency integration test to work locally
...
This worked remotely but wound up trying to spawn multiple servers
locally which doesn't work
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
43ed358f9a
Refine GPU discovery to bootstrap once
...
Now that we call the GPU discovery routines many times to
update memory, this splits initial discovery from free memory
updating.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
b32ebb4f29
Use DRM driver for VRAM info for amd
...
The amdgpu drivers free VRAM reporting omits some other apps, so leverage the
upstream DRM driver which keeps better tabs on things
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
fb9cdfa723
Fix server.cpp for the new cuda build macros
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
efac488675
Revert "Limit GPU lib search for now ( #4777 )"
...
This reverts commit 476fb8e892 .
2024-06-14 14:51:40 -07:00