Buy for VRAM first — it decides what you can run — so a 24 GB card (like a used RTX 3090 or a 4090) is the sweet spot; 16 GB is a strong mid-range choice, and 8-12 GB is a fine entry point for smaller models.
"What GPU should I buy for local LLMs?" has a clear guiding principle that surprises newcomers: buy for VRAM first, and speed second. For running models, VRAM (the memory on the card) decides what you can run at all, while raw GPU speed only affects how fast it runs once it fits. A cheaper card with more VRAM will run bigger models than a pricier card with less — so the memory number should lead your decision.
Here's why. To run a model quickly, its weights plus the KV cache must fit in VRAM so the whole thing runs on the GPU. If it doesn't fit, you offload part to much slower system RAM (generation crawls) or can't run it. So the first question about any model is "does it fit," which is a VRAM question. This is why two cards with the same VRAM (like the RTX 3090 and 4090, both 24 GB) run the same set of models — the newer one just generates faster among the models that already fit.
Mapping VRAM to what you can run, using about 0.5 GB per billion parameters at 4-bit plus headroom:
8 GB (RTX 3060/4060, 8 GB laptop cards) is a solid entry point — small models and 7-8B models at 4-bit, covering a lot of genuinely useful chat and coding. Good if budget is tight and you're happy with smaller models.
12 GB (RTX 3060 12 GB, 4070) buys comfort at the 7-8B sizes (higher quants, longer contexts) and reaches low-teens-billion models. A modest step up from 8 GB.
16 GB (RTX 4060 Ti 16 GB, 4070 Ti Super) is a strong mid-range choice — mid-teens-billion models at 4-bit, and 7-13B models at high quality settings. A well-balanced pick.
24 GB (RTX 3090, 4090, 7900 XTX) is the sweet spot for serious local AI — 32B-class models become runnable on a single card, and everything smaller runs at near-maximum quality. This is the tier most enthusiasts target.
48 GB and up (workstation cards like the RTX A6000, or two 24 GB cards together) is what you need for 70B-class models. Serious money, for people who specifically want the biggest models locally.
The standout value recommendation is the 24 GB tier, and specifically a used RTX 3090 — it has the same 24 GB as the far pricier 4090, and since VRAM capacity governs model fit, it's one of the best dollars-per-gigabyte choices for LLM work. A 4090 is the faster new option at the same capacity if speed and warranty matter to you. Below that, 16 GB is the sensible mid-range, and 8-12 GB is a fine, cheaper start for smaller models.
A few honest considerations beyond VRAM. NVIDIA cards have the smoothest software support for AI (CUDA), so they're the default recommendation; AMD cards work but the ecosystem is a bit less turnkey. Check power draw and physical size against your power supply and case — high-end cards are large and hungry. And if you own an Apple Silicon Mac, you may not need to buy a GPU at all: its unified memory acts as a VRAM budget and it runs models well, so a high-RAM Mac is an alternative to a dedicated card. If you don't want to buy hardware, renting a cloud GPU or using a pay-per-token API with your own key are the no-purchase routes.
Don't over-buy for models you won't run: if you'll mostly use 7-14B models, a 16 GB card is plenty and cheaper than chasing 24 GB you won't fill. And don't under-buy VRAM to get a faster chip — you'll regret the models you can't load. Match the card's VRAM to the largest model you realistically want to run.
Spanvero makes the match objective: it computes what fits each card size, so you can see exactly which models a given GPU can run before you buy. Compare cards on the per-GPU pages under /gpu/, browse models by VRAM tier at /models/8gb-vram/, /models/16gb-vram/, and /models/24gb-vram/, and read the value case for a used 3090 at /learn/is-a-used-3090-good-for-local-llms/. Use /calculator/ to check a specific model against a card you're considering.
Is a used RTX 3090 good for local LLMs in 2026? · VRAM · How much RAM vs VRAM do I need for LLMs? · What LLMs can I run on 24GB of VRAM? · Do I need a GPU to run local AI? · Can I run Llama 3 on a MacBook? · H100 vs A100 for inference · Quantization
All explainers → · Browse models →
Open the free Spanvero advisor → · Honest, $0-markup. © 2026 Cynosure LLC.
A short email of real AI price moves, straight from the daily log — no hype. We're collecting the list now; the first issue goes out when it opens. Unsubscribe with one click.