What can NVIDIA RTX 5090 32GB run? 298 models fit & the real cost
298 of the 377 models in the Spanvero catalog fit NVIDIA RTX 5090 32GB's 32 GB VRAM (at a sensible quant, 16k context). For each: run it locally ($0 compute + electricity), rent an equivalent GPU ($0 markup, as of 2026-07-02), or pay per-token via your own API key (as of 2026-06-29).
Three honest ways to run each model on NVIDIA RTX 5090 32GB
Run it locally: $0 in compute — you pay only electricity (~575 W under load on this card). Local is real money, never a fake "$0".
Rent an equivalent GPU: from a $0-markup vendor rate (as of 2026-07-02) — you rent on your own account and pay the vendor directly; we never resell compute.
Skip the box: run the same model through your own API key, paying per million tokens (prices as of 2026-06-29).
What fits NVIDIA RTX 5090 32GB (32 GB VRAM)
298 of the 377 notable models in the Spanvero catalog fit NVIDIA RTX 5090 32GB at a sensible quant (context capped at 16k for the estimate). Most capable first:
Phi 3.5 MoE instruct (microsoft, 41.9B) — needs ~31 GB at Q4_K_M: run it locally for $0 compute + ~$0.7966/1M in electricity, rent NVIDIA RTX A6000 48GB from $0.49/hr ($0 markup), or ~$0.44/1M via your own API key (size estimate).
Karnak 40B v1.0 (Applied-Innovation-Center, 40.7B) — needs ~29 GB at Q4_K_M: run it locally for $0 compute + ~$0.7783/1M in electricity, rent NVIDIA RTX A6000 48GB from $0.49/hr ($0 markup), or ~$0.43/1M via your own API key (size estimate).
Seed OSS 36B Instruct (ByteDance-Seed, 36.2B) — needs ~27 GB at Q4_K_M: run it locally for $0 compute + ~$0.7086/1M in electricity, rent NVIDIA RTX A6000 48GB from $0.49/hr ($0 markup), or ~$0.39/1M via your own API key (size estimate).
Hermes 4.3 36B (NousResearch, 36.2B) — needs ~27 GB at Q4_K_M: run it locally for $0 compute + ~$0.7086/1M in electricity, rent NVIDIA RTX A6000 48GB from $0.49/hr ($0 markup), or ~$0.39/1M via your own API key (size estimate).
Yi-1.5-34B-Chat (01.AI, 34.4B) — needs ~25 GB at Q4_K_M: run it locally for $0 compute + ~$0.6803/1M in electricity, rent NVIDIA RTX A6000 48GB from $0.49/hr ($0 markup), or ~$0.38/1M via your own API key (size estimate).
Laguna XS.2 (poolside, 33.4B) — needs ~24 GB at Q4_K_M: run it locally for $0 compute + ~$0.6644/1M in electricity, rent NVIDIA RTX A6000 48GB from $0.49/hr ($0 markup), or ~$0.15/1M via your own API key.
Qwen3-32B (Alibaba, 32.8B) — needs ~25 GB at Q4_K_M: run it locally for $0 compute + ~$0.6549/1M in electricity, rent NVIDIA RTX A6000 48GB from $0.49/hr ($0 markup), or ~$0.18/1M via your own API key.
Qwen2.5 32B Instruct (Qwen, 32.8B) — needs ~27 GB at Q4_K_M: run it locally for $0 compute + ~$0.6549/1M in electricity, rent NVIDIA RTX A6000 48GB from $0.49/hr ($0 markup), or ~$0.36/1M via your own API key (size estimate).
cogito v1 preview qwen 32B (deepcogito, 32.8B) — needs ~27 GB at Q4_K_M: run it locally for $0 compute + ~$0.6549/1M in electricity, rent NVIDIA RTX A6000 48GB from $0.49/hr ($0 markup), or ~$0.36/1M via your own API key (size estimate).
QwQ 32B (Qwen, 32.8B) — needs ~27 GB at Q4_K_M: run it locally for $0 compute + ~$0.6549/1M in electricity, rent NVIDIA RTX A6000 48GB from $0.49/hr ($0 markup), or ~$0.36/1M via your own API key (size estimate).
DeepSeek-R1-Distill-Qwen-32B (DeepSeek, 32.5B) — needs ~27 GB at Q4_K_M: run it locally for $0 compute + ~$0.6501/1M in electricity, rent NVIDIA RTX A6000 48GB from $0.49/hr ($0 markup), or ~$0.36/1M via your own API key (size estimate).
granite 4.0 h small (ibm-granite, 32.2B) — needs ~25 GB at Q4_K_M: run it locally for $0 compute + ~$0.6453/1M in electricity, rent NVIDIA RTX A6000 48GB from $0.49/hr ($0 markup), or ~$0.36/1M via your own API key (size estimate).
sarvam 30b (sarvamai, 32.2B) — needs ~22 GB at Q4_K_M: run it locally for $0 compute + ~$0.6453/1M in electricity, rent NVIDIA RTX A6000 48GB from $0.49/hr ($0 markup), or ~$0.36/1M via your own API key (size estimate).
llm jp 4 32b a3b thinking (llm-jp, 32.1B) — needs ~23 GB at Q4_K_M: run it locally for $0 compute + ~$0.6437/1M in electricity, rent NVIDIA RTX A6000 48GB from $0.49/hr ($0 markup), or ~$0.36/1M via your own API key (size estimate).
Qwen2.5-Coder 32B Instruct (Alibaba, 32B) — needs ~25 GB at Q4_K_M: run it locally for $0 compute + ~$0.6421/1M in electricity, rent NVIDIA RTX A6000 48GB from $0.49/hr ($0 markup), or ~$0.83/1M via your own API key.
NVIDIA Nemotron 3 Nano 30B A3B BF16 (nvidia, 31.6B) — needs ~22 GB at Q4_K_M: run it locally for $0 compute + ~$0.6356/1M in electricity, rent NVIDIA RTX A6000 48GB from $0.49/hr ($0 markup), or ~$0.35/1M via your own API key (size estimate).
Nemotron Cascade 2 30B A3B (nvidia, 31.6B) — needs ~22 GB at Q4_K_M: run it locally for $0 compute + ~$0.6356/1M in electricity, rent NVIDIA RTX A6000 48GB from $0.49/hr ($0 markup), or ~$0.35/1M via your own API key (size estimate).
GLM 4.7 Flash (zai-org, 31.2B) — needs ~28 GB at Q4_K_M: run it locally for $0 compute + ~$0.6292/1M in electricity, rent NVIDIA RTX A6000 48GB from $0.49/hr ($0 markup), or ~$0.23/1M via your own API key.
GLM 4.7 Flash (unsloth, 31.2B) — needs ~28 GB at Q4_K_M: run it locally for $0 compute + ~$0.6292/1M in electricity, rent NVIDIA RTX A6000 48GB from $0.49/hr ($0 markup), or ~$0.35/1M via your own API key (size estimate).
Qwen3 30B A3B (Qwen, 30.5B) — needs ~22 GB at Q4_K_M: run it locally for $0 compute + ~$0.6179/1M in electricity, rent NVIDIA RTX A6000 48GB from $0.49/hr ($0 markup), or ~$0.31/1M via your own API key.
Street price $3,680.00 (as of 2026-07-02; GPU Poet US price tracker, Jun 2026 — lowest-average listing $3,680 (launch MSRP $1,999; GDDR7 supply crunch)) — amortized over 3 years that's ~$3.3607/day whether or not you're generating.
Electricity: ~575 W under sustained inference at $0.1883/kWh (EIA Electric Power Monthly Table 5.6.A — U.S. residential average, Apr 2026, as of 2026-07-02) — the per-1M-token figures above already include this at each model's speed.
Straight talk: for the small models a 32 GB VRAM box runs, hosted APIs are often cheaper per token. Own local for privacy, offline use, and unlimited runs — not to save money on tokens.
Too big for NVIDIA RTX 5090 32GB — rent or use an API instead
These need more than the 32 GB VRAM on this card. Closest first — you can still run them on a rented GPU ($0 markup) or via your own API key:
Mixtral 8x7B Instruct (Mistral AI, 46.7B) — needs ~34 GB; rent NVIDIA RTX A6000 48GB from $0.49/hr, or ~$0.24/1M via your own API key (last-known).
Kimi Linear 48B A3B Instruct (moonshotai, 49.1B) — needs ~36 GB; rent NVIDIA RTX A6000 48GB from $0.49/hr, or ~$0.49/1M via your own API key (size estimate).
Nemotron 3 Nano 30B A3B (unsloth, 31.6B) — needs ~37 GB; rent NVIDIA RTX A6000 48GB from $0.49/hr, or ~$0.35/1M via your own API key (size estimate).
NVIDIA Nemotron 3 Nano 30B A3B Base BF16 (nvidia, 31.6B) — needs ~37 GB; rent NVIDIA RTX A6000 48GB from $0.49/hr, or ~$0.35/1M via your own API key (size estimate).
gemma 4 31B it NVFP4 turbo (LilaRest, 32.5B) — needs ~38 GB; rent NVIDIA RTX A6000 48GB from $0.49/hr, or ~$0.36/1M via your own API key (size estimate).
HyperCLOVAX SEED Think 32B (naver-hyperclovax, 33.3B) — needs ~39 GB; rent NVIDIA RTX A6000 48GB from $0.49/hr, or ~$0.37/1M via your own API key (size estimate).
A short email of real AI price moves, straight from the daily log — no hype. We're collecting the list now; the first issue goes out when it opens. Unsubscribe with one click.