Home › Best › best LLM for 12GB VRAM The best open LLMs you can run on 12 GB of VRAM Open LLMs that fit in 12 GB of VRAM at their default quant — the sweet spot for an RTX 3060 12 GB, 4070, or 6700 XT. Ranked by the largest model that still fits, with the honest $0-local, rent-a-GPU, and your-own-API-key cost for each. We guarantee the fit; you judge which one you like best.
How this is ranked: Objective fit filter only (fills the gap between the 8 and 16 GB tiers). 'Best' means 'runs on a 12 GB card.' VRAM is engine-computed; ordering is by size, not a quality ranking we'd have to invent.
1. Param2 17B A2.4B Thinking — bharatgenai, 17.2B · ~12 GB VRAM · $0.24/1M API est. · non-commercial2. LLaDA2.0 mini — inclusionAI, 16.3B · ~12 GB VRAM · $0.23/1M API est. · commercial OK3. DeepSeek-Coder-V2-Lite Instruct — DeepSeek, 15.7B · ~11 GB VRAM · $0.23/1M API est. · commercial OK4. Qwen1.5 MoE A2.7B — Qwen, 14.3B · ~12 GB VRAM · $0.21/1M API est. · non-commercial5. HarmBench Llama 2 13b cls — cais, 13B · ~11 GB VRAM · $0.20/1M API est. · commercial OK6. MN 12B Mag Mell R1 — inflatebot, 12.2B · ~12 GB VRAM · $0.20/1M API est. · non-commercial7. Falcon3-10B Instruct — TII, 10B · ~10 GB VRAM · $0.18/1M API est. · commercial OK8. Darwin 9B NEG — ansulev, 9.7B · ~12 GB VRAM · $0.18/1M API est. · commercial OK9. SeeClick — cckevinn, 9.7B · ~12 GB VRAM · $0.18/1M API est. · non-commercial10. gemma 2 9b — google, 9.2B · ~11 GB VRAM · $0.17/1M API est. · commercial OK11. Gemma 2 9B Instruct — Google, 9B · ~9.0 GB VRAM · $0.06/1M API · commercial OK12. NVIDIA Nemotron Nano 9B v2 — nvidia, 8.9B · ~10 GB VRAM · $0.17/1M API est. · non-commercial13. NVIDIA Nemotron Nano 9B v2 Japanese — nvidia, 8.9B · ~10 GB VRAM · $0.17/1M API est. · non-commercial14. internlm3 8b instruct — internlm, 8.8B · ~8.0 GB VRAM · $0.17/1M API est. · commercial OK15. Nemotron Labs Diffusion 8B Base — nvidia, 8.5B · ~7.0 GB VRAM · $0.17/1M API est. · non-commercial16. LFM2.5 8B A1B — LiquidAI, 8.5B · ~7.0 GB VRAM · $0.17/1M API est. · non-commercial17. gemma 7b — google, 8.5B · ~11 GB VRAM · $0.17/1M API est. · commercial OK18. Qwen3-8B — Alibaba, 8.2B · ~9.0 GB VRAM · $0.17/1M API est. · commercial OK19. Qwen3 8B Base — Qwen, 8.2B · ~9.0 GB VRAM · $0.17/1M API est. · commercial OK20. DeepSeek R1 0528 Qwen3 8B — deepseek-ai, 8.2B · ~9.0 GB VRAM · $0.17/1M API est. · commercial OK21. Qwen3 14B NVFP4 — nvidia, 8.2B · ~9.0 GB VRAM · $0.17/1M API est. · commercial OK22. granite 3.1 8b instruct — ibm-granite, 8.2B · ~9.0 GB VRAM · $0.17/1M API est. · commercial OK23. Qwen3Guard Gen 8B — Qwen, 8.2B · ~9.0 GB VRAM · $0.17/1M API est. · commercial OK24. granite 3.0 8b instruct — ibm-granite, 8.2B · ~7.0 GB VRAM · $0.17/1M API est. · commercial OK25. granite 3.3 8b instruct — ibm-granite, 8.2B · ~9.0 GB VRAM · $0.17/1M API est. · commercial OK26. Apertus 8B Instruct 2509 — swiss-ai, 8.1B · ~9.0 GB VRAM · $0.16/1M API est. · commercial OK27. Llama 3.1 8B Instruct — Meta, 8B · ~8.0 GB VRAM · $0.03/1M API · commercial OK28. Qwen2-VL 7B Instruct — Alibaba, 8B · ~7.0 GB VRAM · $0.16/1M API est. · commercial OK29. Llama 3.1 8B Instruct (Abliterated) — mlabonne (community), 8B · ~8.0 GB VRAM · $0.16/1M API est. · commercial OK30. Hermes 3 — Llama 3.1 8B — Nous Research, 8B · ~8.0 GB VRAM · $0.16/1M API est. · commercial OK31. Dolphin 3.0 — Llama 3.1 8B — Cognitive Computations, 8B · ~8.0 GB VRAM · $0.16/1M API est. · commercial OK32. Meta Llama 3 8B Instruct — meta-llama, 8B · ~10 GB VRAM · $0.16/1M API est. · commercial OK33. Llama 3.1 8B — meta-llama, 8B · ~10 GB VRAM · $0.16/1M API est. · commercial OK34. Meta Llama 3 8B — meta-llama, 8B · ~10 GB VRAM · $0.16/1M API est. · commercial OK35. LLaDA 8B Instruct — GSAI-ML, 8B · ~8.0 GB VRAM · $0.16/1M API est. · commercial OK36. DeepSeek R1 Distill Llama 8B — deepseek-ai, 8B · ~8.0 GB VRAM · $0.16/1M API est. · commercial OK37. saiga llama3 8b — IlyaGusev, 8B · ~7.0 GB VRAM · $0.16/1M API est. · non-commercial38. Meta Llama 3.1 8B Instruct — unsloth, 8B · ~8.0 GB VRAM · $0.16/1M API est. · commercial OK39. gemma 4 E4B it OBLITERATED — OBLITERATUS, 8B · ~10 GB VRAM · $0.16/1M API est. · commercial OK40. LLaDA 1.5 — GSAI-ML, 8B · ~8.0 GB VRAM · $0.16/1M API est. · commercial OKShowing the top 40 of 211. See all →
More: all "best" lists · cost calculator · all models
Open the free Spanvero advisor → · Honest, $0-markup. © 2026 Cynosure LLC.