Home › Models › 16 GB of VRAM AI models that run on 16 GB of VRAM 237 open models fit in 16 GB of VRAM at their default quant — enough for a mid-range card like an RTX 4060 Ti 16 GB / 4070 Ti Super. Most capable first; run any of them for $0 on your own hardware.
gpt oss safeguard 20b — 21.5B, openai · ~16 GB VRAMgpt-oss-20b — 21B, OpenAI · ~15 GB VRAMgpt oss 20b BF16 — 20.9B, unsloth · ~15 GB VRAMNVIDIA Nemotron 3 Nano 30B A3B NVFP4 — 18.2B, nvidia · ~13 GB VRAMQwen3 30B A3B NVFP4 — 17.5B, RedHatAI · ~13 GB VRAMQwen3 32B NVFP4 — 17.2B, nvidia · ~15 GB VRAMParam2 17B A2.4B Thinking — 17.2B, bharatgenai · ~12 GB VRAMLLaDA2.0 mini — 16.3B, inclusionAI · ~12 GB VRAMDeepSeek-Coder-V2-Lite Instruct — 15.7B, DeepSeek · ~11 GB VRAMDeepSeek V2 Lite Chat — 15.7B, deepseek-ai · ~15 GB VRAMDeepSeek V2 Lite — 15.7B, deepseek-ai · ~15 GB VRAMQwen2.5 Coder 14B Instruct — 14.8B, Qwen · ~14 GB VRAMQwen2.5 14B Instruct — 14.8B, Qwen · ~14 GB VRAMQwen3 14B — 14.8B, Qwen · ~13 GB VRAMQwen3 14B Base — 14.8B, Qwen · ~13 GB VRAMQwen3 14B Instruct — 14.8B, OpenPipe · ~13 GB VRAMQwen1.5 MoE A2.7B — 14.3B, Qwen · ~12 GB VRAMPhi-4 — 14B, Microsoft · ~13 GB VRAMLlama 2 13b chat hf — 13B, meta-llama · ~16 GB VRAMHarmBench Llama 2 13b cls — 13B, cais · ~11 GB VRAMNVIDIA Nemotron Nano 12B v2 — 12.3B, nvidia · ~15 GB VRAMMN 12B Mag Mell R1 — 12.2B, inflatebot · ~12 GB VRAMGemma 3 12B — 12B, Google · ~13 GB VRAMGemma 4 12B OBLITERATED — 12B, OBLITERATUS · ~14 GB VRAMApertus 70B Instruct 2509 quantized.w4a16 — 11.3B, RedHatAI · ~14 GB VRAMFalcon3-10B Instruct — 10B, TII · ~10 GB VRAMDarwin 9B NEG — 9.7B, ansulev · ~12 GB VRAMSeeClick — 9.7B, cckevinn · ~12 GB VRAMgemma 2 9b — 9.2B, google · ~11 GB VRAMGemma 2 9B Instruct — 9B, Google · ~9.0 GB VRAMNVIDIA Nemotron Nano 9B v2 — 8.9B, nvidia · ~10 GB VRAMNVIDIA Nemotron Nano 9B v2 Japanese — 8.9B, nvidia · ~10 GB VRAMinternlm3 8b instruct — 8.8B, internlm · ~8.0 GB VRAMNemotron Labs Diffusion 8B Base — 8.5B, nvidia · ~7.0 GB VRAMLFM2.5 8B A1B — 8.5B, LiquidAI · ~7.0 GB VRAMgemma 7b — 8.5B, google · ~11 GB VRAMQwen3-8B — 8.2B, Alibaba · ~9.0 GB VRAMQwen3 8B Base — 8.2B, Qwen · ~9.0 GB VRAMDeepSeek R1 0528 Qwen3 8B — 8.2B, deepseek-ai · ~9.0 GB VRAMQwen3 14B NVFP4 — 8.2B, nvidia · ~9.0 GB VRAMgranite 3.1 8b instruct — 8.2B, ibm-granite · ~9.0 GB VRAMQwen3Guard Gen 8B — 8.2B, Qwen · ~9.0 GB VRAMgranite 3.0 8b instruct — 8.2B, ibm-granite · ~7.0 GB VRAMgranite 3.3 8b instruct — 8.2B, ibm-granite · ~9.0 GB VRAMApertus 8B Instruct 2509 — 8.1B, swiss-ai · ~9.0 GB VRAMLlama 3.1 8B Instruct — 8B, Meta · ~8.0 GB VRAMQwen2-VL 7B Instruct — 8B, Alibaba · ~7.0 GB VRAMLlama 3.1 8B Instruct (Abliterated) — 8B, mlabonne (community) · ~8.0 GB VRAMHermes 3 — Llama 3.1 8B — 8B, Nous Research · ~8.0 GB VRAMDolphin 3.0 — Llama 3.1 8B — 8B, Cognitive Computations · ~8.0 GB VRAMMeta Llama 3 8B Instruct — 8B, meta-llama · ~10 GB VRAMLlama 3.1 8B — 8B, meta-llama · ~10 GB VRAMMeta Llama 3 8B — 8B, meta-llama · ~10 GB VRAMLLaDA 8B Instruct — 8B, GSAI-ML · ~8.0 GB VRAMDeepSeek R1 Distill Llama 8B — 8B, deepseek-ai · ~8.0 GB VRAMsaiga llama3 8b — 8B, IlyaGusev · ~7.0 GB VRAMMeta Llama 3.1 8B Instruct — 8B, unsloth · ~8.0 GB VRAMgemma 4 E4B it OBLITERATED — 8B, OBLITERATUS · ~10 GB VRAMLLaDA 1.5 — 8B, GSAI-ML · ~8.0 GB VRAMMeta Llama 3 8B Instruct — 8B, NousResearch · ~7.0 GB VRAMMeta Llama 3.1 8B Instruct — 8B, NousResearch · ~8.0 GB VRAMLlama 3.1 8B Instruct — 8B, unsloth · ~8.0 GB VRAMLLaDA 8B Base — 8B, GSAI-ML · ~8.0 GB VRAMHumanish Roleplay Llama 3.1 8B — 8B, vicgalle · ~8.0 GB VRAMLlama Guard 3 8B — 8B, meta-llama · ~10 GB VRAMllava onevision qwen2 7b ov — 8B, lmms-lab · ~7.0 GB VRAMLlama 3.1 Nemotron Safety Guard 8B v3 — 8B, nvidia · ~8.0 GB VRAMNeuralDaredevil 8B abliterated — 8B, mlabonne · ~7.0 GB VRAML3 8B Stheno v3.2 — 8B, Sao10K · ~7.0 GB VRAMMiMo 7B Base — 7.8B, XiaomiMiMo · ~9.0 GB VRAMEXAONE 3.5 7.8B Instruct — 7.8B, LGAI-EXAONE · ~8.0 GB VRAMhf moshiko — 7.8B, kmhf · ~8.0 GB VRAMMiMo 7B RL — 7.8B, XiaomiMiMo · ~9.0 GB VRAMQwen1.5 7B — 7.7B, Qwen · ~15 GB VRAMinternlm2 5 7b chat — 7.7B, internlm · ~8.0 GB VRAMQwen 7B Chat — 7.7B, Qwen · ~15 GB VRAMinternlm2 chat 7b — 7.7B, internlm · ~8.0 GB VRAMQwen2 7B Instruct — 7.6B, Qwen · ~7.0 GB VRAMQwen2.5 7B — 7.6B, Qwen · ~7.0 GB VRAMDeepSeek R1 Distill Qwen 7B — 7.6B, deepseek-ai · ~7.0 GB VRAMDream v0 Instruct 7B — 7.6B, Dream-org · ~7.0 GB VRAMQwen2.5 Coder 7B — 7.6B, Qwen · ~7.0 GB VRAMPhi mini MoE instruct — 7.6B, microsoft · ~7.0 GB VRAMQwen2.5 Math 7B Instruct — 7.6B, Qwen · ~6.0 GB VRAMQwen2.5 7B Instruct — 7.6B, unsloth · ~7.0 GB VRAMOlmo 3 7B Instruct SFT — 7.3B, allenai · ~14 GB VRAMOlmo 3 1025 7B — 7.3B, allenai · ~14 GB VRAMfalcon mamba 7b — 7.3B, tiiuae · ~14 GB VRAMOLMo 2 1124 7B Instruct — 7.3B, allenai · ~8.0 GB VRAMMistral 7B Instruct v0.3 — 7.2B, Mistral AI · ~8.0 GB VRAMOther GPU budgets 8 GB of VRAM · 24 GB of VRAM · 48 GB of VRAM · All models
Open the free advisor → · Prices as of 2026-06-17. We're an honest advisor — $0 markup, your own accounts, we never resell compute. © 2026 Cynosure LLC.