TL;DR;
ollama-cuda on CachyOS comes (at this point in time) with support only for compute >= 7.5.
If you have older GPUs with earlier compute, you HAVE TO do one of 2 things:
- Build
ollama-cudafrom source, with support for the older compute targets - Install
ollama-vulkan
I chose ollama-vulkan and itβs working perfectly!
Story time
Today I had the inspiration to finally shut down my Windows 11 VM and migrate it to Linux.
The requirements were to be able to run:
- Ollama - for local model magic
- Docker - for remote development
- Steam - for playing games
- Jellyfin - for home videos archive
Now I am going to focus only on Ollama as it was an interesting experience and it got me wondering why things are not more obvious.
The machine has 2 GPUs in it:
- GTX 1070 - 8GB
- GTX 970 - 4GB
I wanted to run Ollama on them, to play around with local models and how I could integrate it in some projects. For example - anonymizing data locally, doing a non-destructive OpenClaw implementation that is fit for my use-cases, local OCR for documents that I scan with my phone, automating some things in the everyday management of a home.
So naturally I installed ollama and ollama-cuda as I have 2 Nvidia GPUs in there, expecting that it would work fine, as it was working on Windows just a few minutes prior.
I started Ollama, loaded qwen3:8b for the quick experiment andβ¦ it was running on CPU π€
The GPUs were perfectly capable to hold it in VRAM, but for some reason Ollama was choosing to run them on CPU.
After trying to figure it out from logs, I asked Claude to help out a bit and it came up with the pretty daunting finding:
The logs tell the story clearly. Both your GPUs are detected but then filtered out because they fail CUDA initialization:
"filtering device which didn't fully initialize" id=GPU-eb132a13-b9f4-40e3-d85d-60f211dfd21e ... library=CUDA
"filtering device which didn't fully initialize" id=GPU-550263c8-37e2-2a25-c0b6-10cc7511ed20 ... library=CUDA
The reason is the CUDA library bundled with Ollama 0.17.4 was compiled for these architectures only:
CUDA.0.ARCHS=750,800,860,870,880,890,900,1000,1030,1100,1200,1210
The minimum supported compute capability is 7.5 (Turing / RTX 20xx series). Your GPUs are older than that:
ββββββββββββ¬βββββββββββββββββββββ¬βββββββββββββββ
β GPU β Compute Capability β Architecture β
ββββββββββββΌβββββββββββββββββββββΌβββββββββββββββ€
β GTX 1070 β 6.1 β Pascal β
ββββββββββββΌβββββββββββββββββββββΌβββββββββββββββ€
β GTX 970 β 5.2 β Maxwell β
ββββββββββββ΄βββββββββββββββββββββ΄βββββββββββββββ
Neither is in the supported list, so CUDA initialization fails for both, and Ollama falls back to CPU-only inference.
Your options:
1. Downgrade Ollama to an older version that still included Pascal/Maxwell support (pre-0.6.x era had broader arch coverage, but check release notes for specifics).
2. Build Ollama from source with CUDA_ARCHITECTURES="52;61;75;80;86" to include your GPU architectures.
3. Upgrade GPUs to at least an RTX 2060 or newer (compute 7.5+).Now, this was kind of a letdown for me, as I want to use the GPUs as much as possible, before moving on to a new one.
I didnβt want to actually rebuild Ollama from source, as it should support things out of the box and I was very confused why it would not, even though the Windows install it supported those architectures.
These cards were also shown on Ollamaβs website as supported!
To cut a long story of Googling short:
Turns out that ollama-cuda comes with just the requirements listed above⦠and the suggestion was to install ollama-vulkan.
After doing that, everything started working very smoothly!