Home › Best › best LLM for 16GB VRAM The best open LLMs you can run on 16 GB of VRAM Open LLMs that fit in 16 GB of VRAM at their default quant — enough for an RTX 4060 Ti 16 GB, 4070 Ti Super, or a 16 GB Mac. Ranked by the largest model that fits, with the honest $0-local cost. We confirm the fit; quality is yours to judge.
How this is ranked: Objective filter: 'best' = 'fits 16 GB.' Engine-computed VRAM, ordered by size. No subjective quality claim.
1. gpt oss safeguard 20b — openai, 21.5B · ~16 GB VRAM · $0.27/1M API est. · commercial OK2. gpt-oss-20b — OpenAI, 21B · ~15 GB VRAM · $0.09/1M API · commercial OK3. gpt oss 20b BF16 — unsloth, 20.9B · ~15 GB VRAM · $0.27/1M API est. · commercial OK4. NVIDIA Nemotron 3 Nano 30B A3B NVFP4 — nvidia, 18.2B · ~13 GB VRAM · $0.25/1M API est. · non-commercial5. Qwen3 30B A3B NVFP4 — RedHatAI, 17.5B · ~13 GB VRAM · $0.24/1M API est. · commercial OK6. Qwen3 32B NVFP4 — nvidia, 17.2B · ~15 GB VRAM · $0.24/1M API est. · commercial OK7. Param2 17B A2.4B Thinking — bharatgenai, 17.2B · ~12 GB VRAM · $0.24/1M API est. · non-commercial8. LLaDA2.0 mini — inclusionAI, 16.3B · ~12 GB VRAM · $0.23/1M API est. · commercial OK9. DeepSeek-Coder-V2-Lite Instruct — DeepSeek, 15.7B · ~11 GB VRAM · $0.23/1M API est. · commercial OK10. DeepSeek V2 Lite Chat — deepseek-ai, 15.7B · ~15 GB VRAM · $0.23/1M API est. · non-commercial11. DeepSeek V2 Lite — deepseek-ai, 15.7B · ~15 GB VRAM · $0.23/1M API est. · non-commercial12. Qwen2.5 Coder 14B Instruct — Qwen, 14.8B · ~14 GB VRAM · $0.22/1M API est. · commercial OK13. Qwen2.5 14B Instruct — Qwen, 14.8B · ~14 GB VRAM · $0.22/1M API est. · commercial OK14. Qwen3 14B — Qwen, 14.8B · ~13 GB VRAM · $0.22/1M API est. · commercial OK15. Qwen3 14B Base — Qwen, 14.8B · ~13 GB VRAM · $0.22/1M API est. · commercial OK16. Qwen3 14B Instruct — OpenPipe, 14.8B · ~13 GB VRAM · $0.22/1M API est. · commercial OK17. Qwen1.5 MoE A2.7B — Qwen, 14.3B · ~12 GB VRAM · $0.21/1M API est. · non-commercial18. Phi-4 — Microsoft, 14B · ~13 GB VRAM · $0.21/1M API est. · commercial OK19. Llama 2 13b chat hf — meta-llama, 13B · ~16 GB VRAM · $0.20/1M API est. · commercial OK20. HarmBench Llama 2 13b cls — cais, 13B · ~11 GB VRAM · $0.20/1M API est. · commercial OK21. NVIDIA Nemotron Nano 12B v2 — nvidia, 12.3B · ~15 GB VRAM · $0.20/1M API est. · non-commercial22. MN 12B Mag Mell R1 — inflatebot, 12.2B · ~12 GB VRAM · $0.20/1M API est. · non-commercial23. Gemma 3 12B — Google, 12B · ~13 GB VRAM · $0.20/1M API est. · commercial OK24. Gemma 4 12B OBLITERATED — OBLITERATUS, 12B · ~14 GB VRAM · $0.20/1M API est. · commercial OK25. Apertus 70B Instruct 2509 quantized.w4a16 — RedHatAI, 11.3B · ~14 GB VRAM · $0.19/1M API est. · commercial OK26. Falcon3-10B Instruct — TII, 10B · ~10 GB VRAM · $0.18/1M API est. · commercial OK27. Darwin 9B NEG — ansulev, 9.7B · ~12 GB VRAM · $0.18/1M API est. · commercial OK28. SeeClick — cckevinn, 9.7B · ~12 GB VRAM · $0.18/1M API est. · non-commercial29. gemma 2 9b — google, 9.2B · ~11 GB VRAM · $0.17/1M API est. · commercial OK30. Gemma 2 9B Instruct — Google, 9B · ~9.0 GB VRAM · $0.06/1M API · commercial OK31. NVIDIA Nemotron Nano 9B v2 — nvidia, 8.9B · ~10 GB VRAM · $0.17/1M API est. · non-commercial32. NVIDIA Nemotron Nano 9B v2 Japanese — nvidia, 8.9B · ~10 GB VRAM · $0.17/1M API est. · non-commercial33. internlm3 8b instruct — internlm, 8.8B · ~8.0 GB VRAM · $0.17/1M API est. · commercial OK34. Nemotron Labs Diffusion 8B Base — nvidia, 8.5B · ~7.0 GB VRAM · $0.17/1M API est. · non-commercial35. LFM2.5 8B A1B — LiquidAI, 8.5B · ~7.0 GB VRAM · $0.17/1M API est. · non-commercial36. gemma 7b — google, 8.5B · ~11 GB VRAM · $0.17/1M API est. · commercial OK37. Qwen3-8B — Alibaba, 8.2B · ~9.0 GB VRAM · $0.17/1M API est. · commercial OK38. Qwen3 8B Base — Qwen, 8.2B · ~9.0 GB VRAM · $0.17/1M API est. · commercial OK39. DeepSeek R1 0528 Qwen3 8B — deepseek-ai, 8.2B · ~9.0 GB VRAM · $0.17/1M API est. · commercial OK40. Qwen3 14B NVFP4 — nvidia, 8.2B · ~9.0 GB VRAM · $0.17/1M API est. · commercial OKShowing the top 40 of 237. See all →
More: all "best" lists · cost calculator · all models
Open the free Spanvero advisor → · Honest, $0-markup. © 2026 Cynosure LLC.