ollama/docs/rocm-apu.md

93 lines
3.9 KiB
Markdown

# Experimental ROCm iGPU Support
This branch adds a ROCm backend path geared toward AMD APUs that only expose a small VRAM aperture but share a large UMA pool with the CPU. The steps below outline how to reproduce the build and how to run Ollama with the staged ROCm runtime.
> **Warning**
> Upstream ROCm does not officially support these APUs yet. Expect driver updates, kernel parameters, or environment variables such as `HSA_OVERRIDE_GFX_VERSION` to change between releases.
## 1. Stage the ROCm runtime
We avoid touching the system installation by unpacking the required RPMs into `build/rocm-stage`.
```bash
mkdir -p build/rocm-stage build/rpm-tmp
cd build/rpm-tmp
dnf download \
hipblas hipblas-devel hipblas-common-devel \
rocblas rocblas-devel \
rocsolver rocsolver-devel \
rocm-hip-devel rocm-device-libs rocm-comgr rocm-comgr-devel
cd ../rocm-stage
for rpm in ../rpm-tmp/*.rpm; do
echo "extracting ${rpm}"
rpm2cpio "${rpm}" | bsdtar -xf -
done
```
Important staged paths after extraction:
| Purpose | Location |
| ------------------------ | ----------------------------------------------- |
| HIP/rocBLAS libraries | `build/rocm-stage/lib64` |
| Tensile kernels (rocBLAS)| `build/rocm-stage/lib64/rocblas/library` |
| Headers (`hip`, `rocblas`)| `build/rocm-stage/include` |
## 2. Build the ROCm backend
Configure CMake with the preset that targets ROCm 6.x and point it at the staged HIP compiler:
```bash
cmake --preset "ROCm 6" -B build/rocm \
-DGGML_VULKAN=OFF \
-DCMAKE_INSTALL_PREFIX=/usr/local \
-DCMAKE_HIP_COMPILER=/usr/bin/hipcc \
-DCMAKE_PREFIX_PATH="$PWD/build/rocm-stage"
cmake --build build/rocm --target ggml-hip -j$(nproc)
```
Artifacts land in `build/lib/ollama/rocm` (and mirrored in `dist/lib/ollama/rocm` when packaging). These include `libggml-hip.so`, CPU fallback variants, Vulkan, and `librocsolver.so`.
## 3. Run Ollama on ROCm
The runner needs to see both the GGML plugins and the staged ROCm runtime. The following environment block works for an AMD Radeon 760M with a UMA carve-out:
```bash
export BASE=$HOME/ollama-gpu
export OLLAMA_LIBRARY_PATH=$BASE/build/lib/ollama/rocm:$BASE/build/lib/ollama
export LD_LIBRARY_PATH=$OLLAMA_LIBRARY_PATH:$BASE/build/rocm-stage/lib64:${LD_LIBRARY_PATH:-}
export ROCBLAS_TENSILE_LIBPATH=$BASE/build/rocm-stage/lib64/rocblas/library
export ROCBLAS_TENSILE_PATH=$ROCBLAS_TENSILE_LIBPATH
export HSA_OVERRIDE_GFX_VERSION=11.0.0 # spoof gfx1100 for Phoenix
export GGML_HIP_FORCE_GTT=1 # force GTT allocations for UMA memory
export OLLAMA_GPU_DRIVER=rocm
export OLLAMA_GPU=100 # opt into GPU-only scheduling
export OLLAMA_LLM_LIBRARY=rocm # skip CUDA/Vulkan discovery noise
export OLLAMA_VULKAN=0 # optional: suppress Vulkan backend
$BASE/build/ollama serve
```
On launch you should see log lines similar to:
```
library=ROCm compute=gfx1100 name=ROCm0 description="AMD Radeon 760M Graphics"
ggml_hip_get_device_memory using GTT memory for 0000:0e:00.0 (total=16352354304 free=15034097664)
```
If the runner crashes before enumerating devices:
- Double-check that `ROCBLAS_TENSILE_LIBPATH` points to the staged `rocblas/library`.
- Ensure no other `LD_LIBRARY_PATH` entries override `libamdhip64.so`.
- Try unsetting `HSA_OVERRIDE_GFX_VERSION` to confirm whether the kernel patch is still needed on your system.
> Example discovery + run log: [`docs/logs/rocm-760m-run.log`](logs/rocm-760m-run.log). The matching `curl` response is saved as [`docs/logs/rocm-760m-run-response.json`](logs/rocm-760m-run-response.json).
## 4. Sharing this build
- Keep the staged RPMs alongside the branch so others can reproduce the exact runtime.
- Include `/tmp/ollama_rocm_run.log` or similar discovery logs in issues/PRs to help maintainers understand the UMA setup.
- Mention any kernel parameters (e.g., large UMA buffer in firmware) when opening upstream tickets.