Running Ollama with older GPUs on CachyOS

TL;DR;

ollama-cuda on CachyOS comes (at this point in time) with support only for compute >= 7.5.

If you have older GPUs with earlier compute, you HAVE TO do one of 2 things:

Build ollama-cuda from source, with support for the older compute targets
Install ollama-vulkan

I chose ollama-vulkan and it’s working perfectly!

Story time

Today I had the inspiration to finally shut down my Windows 11 VM and migrate it to Linux.

The requirements were to be able to run:

Ollama - for local model magic
Docker - for remote development
Steam - for playing games
Jellyfin - for home videos archive

Now I am going to focus only on Ollama as it was an interesting experience and it got me wondering why things are not more obvious.

The machine has 2 GPUs in it:

GTX 1070 - 8GB
GTX 970 - 4GB

I wanted to run Ollama on them, to play around with local models and how I could integrate it in some projects. For example - anonymizing data locally, doing a non-destructive OpenClaw implementation that is fit for my use-cases, local OCR for documents that I scan with my phone, automating some things in the everyday management of a home.

So naturally I installed ollama and ollama-cuda as I have 2 Nvidia GPUs in there, expecting that it would work fine, as it was working on Windows just a few minutes prior.

I started Ollama, loaded qwen3:8b for the quick experiment and… it was running on CPU 🤔

The GPUs were perfectly capable to hold it in VRAM, but for some reason Ollama was choosing to run them on CPU.

After trying to figure it out from logs, I asked Claude to help out a bit and it came up with the pretty daunting finding:

The logs tell the story clearly. Both your GPUs are detected but then filtered out because they fail CUDA initialization:

  "filtering device which didn't fully initialize" id=GPU-eb132a13-b9f4-40e3-d85d-60f211dfd21e ... library=CUDA
  "filtering device which didn't fully initialize" id=GPU-550263c8-37e2-2a25-c0b6-10cc7511ed20 ... library=CUDA

  The reason is the CUDA library bundled with Ollama 0.17.4 was compiled for these architectures only:

  CUDA.0.ARCHS=750,800,860,870,880,890,900,1000,1030,1100,1200,1210

  The minimum supported compute capability is 7.5 (Turing / RTX 20xx series). Your GPUs are older than that:

  ┌──────────┬────────────────────┬──────────────┐
  │   GPU    │ Compute Capability │ Architecture │
  ├──────────┼────────────────────┼──────────────┤
  │ GTX 1070 │ 6.1                │ Pascal       │
  ├──────────┼────────────────────┼──────────────┤
  │ GTX 970  │ 5.2                │ Maxwell      │
  └──────────┴────────────────────┴──────────────┘

  Neither is in the supported list, so CUDA initialization fails for both, and Ollama falls back to CPU-only inference.

  Your options:

  1. Downgrade Ollama to an older version that still included Pascal/Maxwell support (pre-0.6.x era had broader arch coverage, but check release notes for specifics).
  2. Build Ollama from source with CUDA_ARCHITECTURES="52;61;75;80;86" to include your GPU architectures.
  3. Upgrade GPUs to at least an RTX 2060 or newer (compute 7.5+).

Now, this was kind of a letdown for me, as I want to use the GPUs as much as possible, before moving on to a new one.

I didn’t want to actually rebuild Ollama from source, as it should support things out of the box and I was very confused why it would not, even though the Windows install it supported those architectures.

These cards were also shown on Ollama’s website as supported!

To cut a long story of Googling short:

Turns out that ollama-cuda comes with just the requirements listed above… and the suggestion was to install ollama-vulkan.

After doing that, everything started working very smoothly!