ollama/kvcache
Bruce MacDonald 057cc54b66 benchmark: compare backend graph computation times
Track execution time of individual tensor operations (views, copies, reshapes etc)
during LLM forward passes using CGo bindings to the native graph runtime. This
helps identify performance bottlenecks in the computation graph and optimize memory
operations that can significantly impact inference latency.
2025-02-19 15:22:53 -08:00
..
cache.go Runner for Ollama engine 2025-02-13 17:09:26 -08:00
causal.go Runner for Ollama engine 2025-02-13 17:09:26 -08:00
causal_test.go benchmark: compare backend graph computation times 2025-02-19 15:22:53 -08:00
encoder.go Runner for Ollama engine 2025-02-13 17:09:26 -08:00
wrapper.go Runner for Ollama engine 2025-02-13 17:09:26 -08:00