What LLMs can I run on 24GB of VRAM?

24 GB is the local sweet spot — it comfortably fits 32B-class models at 4-bit with room for context, and runs smaller models at very high quants or long contexts; only 70B-and-up models are out of reach.

A 24 GB graphics card — the RTX 3090, 4090, or 7900 XTX tier — is widely considered the sweet spot for serious local AI, and for good reason: it's where genuinely capable mid-size models become runnable without a multi-card setup. Understanding exactly what 24 GB unlocks helps you get the full value from it.

Using the standard rule — about 0.5 GB per billion parameters at 4-bit for the weights, plus headroom for the KV cache, runtime, and OS — 24 GB comfortably holds 32B-class models at 4-bit with room for a solid context. That's the headline: 32B-class checkpoints, which offer noticeably stronger general reasoning than the 7-14B tier, become practical on a single card. Below that, 24 GB runs 7-14B models with enormous headroom, so you can use very high quants (Q6, Q8) for near-lossless quality, or very long contexts, without straining. So 24 GB gives you both a higher ceiling and the freedom to run smaller models at their absolute best.

The practical picture: 32B-class models at 4-bit are the new capability you gain; 14-20B models fit with room for high quants and long contexts; and 7-13B models run at essentially maximum quality with a large context. You can also stretch toward larger models with aggressive quantization or a GPU/CPU split, though that's where you start bumping the ceiling. For a single-card local setup, 24 GB covers an excellent range — most of what individuals want, at quality settings that leave nothing on the table.

The honest limit: 24 GB does not fit a 70B model at 4-bit, which needs roughly 40 GB of weights plus headroom. If running 70B-class models locally is your specific goal, you'd need two 24 GB cards, a 48 GB workstation card, a big-memory Mac, or a rented GPU. For everything up to and including 32B-class models, though, a single 24 GB card is plenty — and for many tasks a strong 32B model at 4-bit is good enough that the jump to 70B isn't worth the extra hardware.

A cost-value note worth making: on pure dollars-per-gigabyte-of-VRAM, a used RTX 3090 (same 24 GB as the pricier 4090) is a favorite budget path to this tier, since VRAM capacity — not raw speed — is what governs model fit. The 4090 generates faster, but both fit the same models. So the 24 GB tier is reachable without top-end spending if you buy for VRAM.

The cost angle: a 24 GB card runs 32B-class and smaller models for effectively $0 in compute beyond electricity, fully private and offline. For heavy personal use of capable mid-size models, that's about as good as the economics get once you own the card. Only when you need 70B-and-up models do the honest alternatives — more VRAM, a rented GPU, or a pay-per-token API with your own key — come into play.

Spanvero makes "what fits 24 GB" an objective, computed answer at each model's default quant and a realistic context. See the full ranked list of models that fit at /models/24gb-vram/ and the picks at /best/best-llm-for-24gb-vram/; step down a tier at /models/16gb-vram/ or up to two-card territory at /models/48gb-vram/; and use /calculator/ to check a specific model, quant, and context against 24 GB and its honest local cost. For the card-value angle, see the guide at /learn/is-a-used-3090-good-for-local-llms/, and for the 70B jump, /learn/vram-for-70b/.

Related

What LLMs can I run on 16GB of VRAM? · How much VRAM does a 70B model need? · Is a used RTX 3090 good for local LLMs in 2026? · VRAM · What GPU should I buy for running local LLMs? · What quantization should I use? · Quantization · What's the cheapest way to run a 70B model?

All explainers → · Browse models →

Open the free Spanvero advisor → · Honest, $0-markup. © 2026 Cynosure LLC.

The weekly price index

A short email of real AI price moves, straight from the daily log — no hype. We're collecting the list now; the first issue goes out when it opens. Unsubscribe with one click.