ollama

History

Shalini Salomi Bodapati 7689aded24 ggml-cpu: Enbale Matrix Math Accelerator for Power10 Adding -mcpu=power10 improves matrix multiplication performance when running Ollama on PowerPC-based hardware. -mcpu=power10 needs to be added in llamafile.go, so that powerpc optimized code(using Matrix Multiply Assist) for llamafile_sgemm is enabled and is available in ollama binary. This changes adds -mcpu=power10 flag when built with build tag ppc64le.power10 and this enables mma optimizations in ollama binary -mcpu=power9 flag is added when built with build tag ppc64le.power9 and this enables vsx optimizations in ollama binary. When building on power10 machine use go build --tags ppc64le.power10 . When building on power9 machine use go build --tags ppc64le.power9 . Performance Impact: Improved performance on Power10 Chips for Q4_0,Q8_0,FP32,BF16 Models. Inference time with ollama run llama3:8b ( Q4_0 Model) ( ~ 30% less time for a 50 word summarization of a prompt with 512 tokens. with MMA enabled : 6.05 sec without MMA (Base) : 8.45 sec Signed-off-by: Shalini Salomi Bodapati <Shalini.Salomi.Bodapati@ibm.com>	2025-12-11 07:37:33 -06:00
..
ggml	ggml-cpu: Enbale Matrix Math Accelerator for Power10	2025-12-11 07:37:33 -06:00
backend.go	next ollama runner (#7913 )	2025-02-13 16:31:21 -08:00

Shalini Salomi Bodapati 7689aded24 ggml-cpu: Enbale Matrix Math Accelerator for Power10

Adding -mcpu=power10 improves matrix multiplication performance when
running Ollama on PowerPC-based hardware. -mcpu=power10 needs to be
added in llamafile.go, so that powerpc optimized code(using Matrix
Multiply Assist) for llamafile_sgemm is enabled and is available in
ollama binary.

This changes adds -mcpu=power10 flag when built with build tag
ppc64le.power10 and this enables mma optimizations in ollama binary

-mcpu=power9 flag is added when built with build tag ppc64le.power9 and
this enables vsx optimizations in ollama binary.

When building on power10 machine use
go build --tags ppc64le.power10 .

When building on power9 machine use
go build --tags ppc64le.power9 .

Performance Impact:

Improved performance on Power10 Chips for Q4_0,Q8_0,FP32,BF16 Models.
Inference time with ollama run llama3:8b ( Q4_0 Model) ( ~ 30% less time
for a 50 word summarization of a prompt with 512 tokens.
with MMA enabled : 6.05 sec
without MMA (Base) : 8.45 sec

Signed-off-by: Shalini Salomi Bodapati <Shalini.Salomi.Bodapati@ibm.com>

2025-12-11 07:37:33 -06:00

ggml

ggml-cpu: Enbale Matrix Math Accelerator for Power10

2025-12-11 07:37:33 -06:00

backend.go

next ollama runner (#7913 )

2025-02-13 16:31:21 -08:00