Roy Han
00a4cb26ca
use float32
2024-07-02 10:30:29 -07:00
Roy Han
512e0a7bde
Clean up
2024-07-01 16:29:54 -07:00
Roy Han
1a0c8b363c
Truncation Integration Tests
2024-07-01 16:26:30 -07:00
Roy Han
e068e7f698
Integration Test Template
2024-07-01 15:24:26 -07:00
Roy Han
aee25acb5b
move normalization to go
2024-07-01 14:10:58 -07:00
Roy Han
9c32b6b9ed
Truncation
2024-07-01 11:59:44 -07:00
Roy Han
1daac52651
Truncation
2024-07-01 11:55:16 -07:00
Roy Han
80c1a3f812
playing around with truncate stuff
2024-06-28 18:17:09 -07:00
Roy Han
c111d8bb51
normalization
2024-06-28 17:19:04 -07:00
Roy Han
5213c12354
clean up
2024-06-28 15:26:58 -07:00
Roy Han
b9c74df37b
check normalization
2024-06-28 15:10:58 -07:00
Roy Han
49e341147d
add server function
2024-06-28 15:03:53 -07:00
Roy Han
c406fa7a4c
api/embed draft
2024-06-28 14:54:21 -07:00
Roy Han
22458c573a
mock up notes
2024-06-28 14:21:45 -07:00
Roy Han
ff191d7cba
Initial Draft
2024-06-25 13:29:47 -07:00
Roy Han
0f87628b6d
Revert "Initial Batch Embedding"
...
This reverts commit c22d54895a .
2024-06-24 15:26:05 -07:00
Roy Han
c22d54895a
Initial Batch Embedding
2024-06-18 17:34:36 -07:00
Daniel Hiltgen
26d0bf9236
Merge pull request #5117 from dhiltgen/fix_prediction
...
Handle models with divergent layer sizes
2024-06-18 11:36:51 -07:00
Daniel Hiltgen
359b15a597
Handle models with divergent layer sizes
...
The recent refactoring of the memory prediction assumed all layers
are the same size, but for some models (like deepseek-coder-v2) this
is not the case, so our predictions were significantly off.
2024-06-18 11:05:34 -07:00
Daniel Hiltgen
b55958a587
Merge pull request #5106 from dhiltgen/clean_logs
...
Tighten up memory prediction logging
2024-06-18 09:24:38 -07:00
Daniel Hiltgen
7784ca33ce
Tighten up memory prediction logging
...
Prior to this change, we logged the memory prediction multiple times
as the scheduler iterates to find a suitable configuration, which can be
confusing since only the last log before the server starts is actually valid.
This now logs once just before starting the server on the final configuration.
It also reports what library instead of always saying "offloading to gpu" when
using CPU.
2024-06-18 09:15:35 -07:00
Daniel Hiltgen
c9c8c98bf6
Merge pull request #5105 from dhiltgen/cuda_mmap
...
Adjust mmap logic for cuda windows for faster model load
2024-06-17 17:07:30 -07:00
Daniel Hiltgen
171796791f
Adjust mmap logic for cuda windows for faster model load
...
On Windows, recent llama.cpp changes make mmap slower in most
cases, so default to off. This also implements a tri-state for
use_mmap so we can detect the difference between a user provided
value of true/false, or unspecified.
2024-06-17 16:54:30 -07:00
Jeffrey Morgan
176d0f7075
Update import.md
2024-06-17 19:44:14 -04:00
Daniel Hiltgen
8ed51cac37
Merge pull request #5103 from dhiltgen/faster_win_build
...
Revert powershell jobs, but keep nvcc and cmake parallelism
2024-06-17 14:23:18 -07:00
Daniel Hiltgen
c9e6f0542d
Merge pull request #5069 from dhiltgen/ci_release
...
Implement custom github release action
2024-06-17 13:59:37 -07:00
Daniel Hiltgen
b0930626c5
Add back lower level parallel flags
...
nvcc supports parallelism (threads) and cmake + make can use -j,
while msbuild requires /p:CL_MPcount=8
2024-06-17 13:44:46 -07:00
Daniel Hiltgen
e890be4814
Revert "More parallelism on windows generate"
...
This reverts commit 0577af98f4 .
2024-06-17 13:32:46 -07:00
Jeffrey Morgan
152fc202f5
llm: update llama.cpp commit to `7c26775` ( #4896 )
...
* llm: update llama.cpp submodule to `7c26775`
* disable `LLAMA_BLAS` for now
* `-DLLAMA_OPENMP=off`
2024-06-17 15:56:16 -04:00
Lei Jitang
4ad0d4d6d3
Fix a build warning ( #5096 )
...
Signed-off-by: Lei Jitang <leijitang@outlook.com>
2024-06-17 14:47:48 -04:00
Jeffrey Morgan
163cd3e77c
gpu: add env var for detecting Intel oneapi gpus ( #5076 )
...
* gpu: add env var for detecting intel oneapi gpus
* fix build error
2024-06-16 20:09:05 -04:00
Daniel Hiltgen
4c2c8f93dd
Merge pull request #5080 from dhiltgen/debug_intel_crash
...
Add some more debugging logs for intel discovery
2024-06-16 14:42:41 -07:00
Daniel Hiltgen
fd1e6e0590
Add some more debugging logs for intel discovery
...
Also removes an unused overall count variable
2024-06-16 07:42:52 -07:00
royjhan
89c79bec8c
Add ModifiedAt Field to /api/show ( #5033 )
...
* Add Mod Time to Show
* Error Handling
2024-06-15 20:53:56 -07:00
Jeffrey Morgan
c7b77004e3
docs: add missing powershell package to windows development instructions ( #5075 )
...
* docs: add missing instruction for powershell build
The powershell script for building Ollama on Windows now requires the `ThreadJob` module. Add this to the instructions and dependency list.
* Update development.md
2024-06-15 23:08:09 -04:00
Daniel Hiltgen
07d143f412
Merge pull request #5058 from coolljt0725/fix_build_warning
...
gpu: Fix build warning
2024-06-15 11:52:36 -07:00
Daniel Hiltgen
a12283e2ff
Implement custom github release action
...
This implements the release logic we want via gh cli
to support updating releases with rc tags in place and retain
release notes and other community reactions.
2024-06-15 11:36:56 -07:00
Daniel Hiltgen
4b0050cf0e
Merge pull request #5037 from dhiltgen/faster_win_build
...
More parallelism on windows generate
2024-06-15 08:03:05 -07:00
Daniel Hiltgen
0577af98f4
More parallelism on windows generate
...
Make the build faster
2024-06-15 07:44:55 -07:00
Daniel Hiltgen
17ce203a26
Merge pull request #4875 from dhiltgen/rocm_gfx900_workaround
...
Rocm gfx900 workaround
2024-06-15 07:38:58 -07:00
Daniel Hiltgen
d76555ffb5
Merge pull request #4874 from dhiltgen/rocm_v6_bump
...
Rocm v6 bump
2024-06-15 07:38:32 -07:00
Daniel Hiltgen
2786dff5d3
Merge pull request #4264 from dhiltgen/show_gpu_visible_settings
...
Centralize GPU configuration vars
2024-06-15 07:33:52 -07:00
Lei Jitang
225f0d1219
gpu: Fix build warning
...
Signed-off-by: Lei Jitang <leijitang@outlook.com>
2024-06-15 14:26:23 +08:00
Daniel Hiltgen
532db58311
Merge pull request #4972 from jayson-cloude/main
...
fix: "Skip searching for network devices"
2024-06-14 17:04:40 -07:00
Daniel Hiltgen
6be309e1bd
Centralize GPU configuration vars
...
This should aid in troubleshooting by capturing and reporting the GPU
settings at startup in the logs along with all the other server settings.
2024-06-14 15:59:10 -07:00
Daniel Hiltgen
da3bf23354
Workaround gfx900 SDMA bugs
...
Implement support for GPU env var workarounds, and leverage
this for the Vega RX 56 which needs
HSA_ENABLE_SDMA=0 set to work properly
2024-06-14 15:38:13 -07:00
Daniel Hiltgen
26ab67732b
Bump ROCm linux to 6.1.1
2024-06-14 15:37:54 -07:00
Daniel Hiltgen
45cacbaf05
Merge pull request #4517 from dhiltgen/gpu_incremental
...
Enhanced GPU discovery and multi-gpu support with concurrency
2024-06-14 15:35:00 -07:00
Daniel Hiltgen
17df6520c8
Remove mmap related output calc logic
2024-06-14 14:55:50 -07:00
Daniel Hiltgen
6f351bf586
review comments and coverage
2024-06-14 14:55:50 -07:00