ollama/llm
Blake Mizerany acbffa59e9 llm: suppress large allocations for GGUF arrays
This introduces a little array type for holding GGUF arrays that
prevents the array from growing too large. It preserves the total size
of the array, but limits the number of elements that are actually
allocated.

GGUF arrays that are extremely large, such as tokens, etc, are generally
uninteresting to users, and are not worth the memory overhead, and the
time spent allocating and freeing them. They are necessary for
inference, but not for inspection.

The size of these arrays is, however, important in Ollama, so it is
preserved in a separate field on array.
2024-06-23 14:26:56 -07:00
..
ext_server remove confusing log message 2024-06-19 11:14:11 -07:00
generate Merge pull request #5072 from dhiltgen/windows_path 2024-06-19 09:13:39 -07:00
llama.cpp@7c26775adb llm: update llama.cpp commit to `7c26775` (#4896) 2024-06-17 15:56:16 -04:00
patches llm: update llama.cpp commit to `7c26775` (#4896) 2024-06-17 15:56:16 -04:00
filetype.go Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL (#4322) 2024-05-23 13:21:49 -07:00
ggla.go simplify safetensors reading 2024-05-21 11:28:22 -07:00
ggml.go llm: suppress large allocations for GGUF arrays 2024-06-23 14:26:56 -07:00
gguf.go llm: suppress large allocations for GGUF arrays 2024-06-23 14:26:56 -07:00
llm.go revert tokenize ffi (#4761) 2024-05-31 18:54:21 -07:00
llm_darwin_amd64.go Switch back to subprocessing for llama.cpp 2024-04-01 16:48:18 -07:00
llm_darwin_arm64.go Switch back to subprocessing for llama.cpp 2024-04-01 16:48:18 -07:00
llm_linux.go Switch back to subprocessing for llama.cpp 2024-04-01 16:48:18 -07:00
llm_windows.go Move nested payloads to installer and zip file on windows 2024-04-23 16:14:47 -07:00
memory.go handle asymmetric embedding KVs 2024-06-20 09:57:27 -07:00
memory_test.go review comments and coverage 2024-06-14 14:55:50 -07:00
payload.go Move libraries out of users path 2024-06-17 13:12:18 -07:00
server.go Refine mmap default logic on linux 2024-06-20 11:07:04 -07:00
status.go Switch back to subprocessing for llama.cpp 2024-04-01 16:48:18 -07:00