|
ext_server
|
Clean up
|
2024-07-01 16:29:54 -07:00 |
|
generate
|
Add back lower level parallel flags
|
2024-06-17 13:44:46 -07:00 |
|
ggla.go
|
simplify safetensors reading
|
2024-05-21 11:28:22 -07:00 |
|
ggml.go
|
Improve multi-gpu handling at the limit
|
2024-06-14 14:51:40 -07:00 |
|
llm.go
|
revert tokenize ffi (#4761)
|
2024-05-31 18:54:21 -07:00 |
|
llm_linux.go
|
Switch back to subprocessing for llama.cpp
|
2024-04-01 16:48:18 -07:00 |
|
memory.go
|
Handle models with divergent layer sizes
|
2024-06-18 11:05:34 -07:00 |
|
memory_test.go
|
review comments and coverage
|
2024-06-14 14:55:50 -07:00 |
|
payload.go
|
review comments and coverage
|
2024-06-14 14:55:50 -07:00 |
|
server.go
|
use float32
|
2024-07-02 10:30:29 -07:00 |
|
status.go
|
Switch back to subprocessing for llama.cpp
|
2024-04-01 16:48:18 -07:00 |