Home › Models › 8 GB of VRAM AI models that run on 8 GB of VRAM 173 open models fit in 8 GB of VRAM at their default quant — enough for an entry GPU like an RTX 3060 / 4060 or 8 GB laptop card. Most capable first; run any of them for $0 on your own hardware.
internlm3 8b instruct — 8.8B, internlm · ~8.0 GB VRAMNemotron Labs Diffusion 8B Base — 8.5B, nvidia · ~7.0 GB VRAMLFM2.5 8B A1B — 8.5B, LiquidAI · ~7.0 GB VRAMgranite 3.0 8b instruct — 8.2B, ibm-granite · ~7.0 GB VRAMLlama 3.1 8B Instruct — 8B, Meta · ~8.0 GB VRAMQwen2-VL 7B Instruct — 8B, Alibaba · ~7.0 GB VRAMLlama 3.1 8B Instruct (Abliterated) — 8B, mlabonne (community) · ~8.0 GB VRAMHermes 3 — Llama 3.1 8B — 8B, Nous Research · ~8.0 GB VRAMDolphin 3.0 — Llama 3.1 8B — 8B, Cognitive Computations · ~8.0 GB VRAMLLaDA 8B Instruct — 8B, GSAI-ML · ~8.0 GB VRAMDeepSeek R1 Distill Llama 8B — 8B, deepseek-ai · ~8.0 GB VRAMsaiga llama3 8b — 8B, IlyaGusev · ~7.0 GB VRAMMeta Llama 3.1 8B Instruct — 8B, unsloth · ~8.0 GB VRAMLLaDA 1.5 — 8B, GSAI-ML · ~8.0 GB VRAMMeta Llama 3 8B Instruct — 8B, NousResearch · ~7.0 GB VRAMMeta Llama 3.1 8B Instruct — 8B, NousResearch · ~8.0 GB VRAMLlama 3.1 8B Instruct — 8B, unsloth · ~8.0 GB VRAMLLaDA 8B Base — 8B, GSAI-ML · ~8.0 GB VRAMHumanish Roleplay Llama 3.1 8B — 8B, vicgalle · ~8.0 GB VRAMllava onevision qwen2 7b ov — 8B, lmms-lab · ~7.0 GB VRAMLlama 3.1 Nemotron Safety Guard 8B v3 — 8B, nvidia · ~8.0 GB VRAMNeuralDaredevil 8B abliterated — 8B, mlabonne · ~7.0 GB VRAML3 8B Stheno v3.2 — 8B, Sao10K · ~7.0 GB VRAMEXAONE 3.5 7.8B Instruct — 7.8B, LGAI-EXAONE · ~8.0 GB VRAMhf moshiko — 7.8B, kmhf · ~8.0 GB VRAMinternlm2 5 7b chat — 7.7B, internlm · ~8.0 GB VRAMinternlm2 chat 7b — 7.7B, internlm · ~8.0 GB VRAMQwen2 7B Instruct — 7.6B, Qwen · ~7.0 GB VRAMQwen2.5 7B — 7.6B, Qwen · ~7.0 GB VRAMDeepSeek R1 Distill Qwen 7B — 7.6B, deepseek-ai · ~7.0 GB VRAMDream v0 Instruct 7B — 7.6B, Dream-org · ~7.0 GB VRAMQwen2.5 Coder 7B — 7.6B, Qwen · ~7.0 GB VRAMPhi mini MoE instruct — 7.6B, microsoft · ~7.0 GB VRAMQwen2.5 Math 7B Instruct — 7.6B, Qwen · ~6.0 GB VRAMQwen2.5 7B Instruct — 7.6B, unsloth · ~7.0 GB VRAMOLMo 2 1124 7B Instruct — 7.3B, allenai · ~8.0 GB VRAMMistral 7B Instruct v0.3 — 7.2B, Mistral AI · ~8.0 GB VRAMMistral 7B Instruct v0.2 — 7.2B, mistralai · ~8.0 GB VRAMMistral 7B v0.1 — 7.2B, mistralai · ~8.0 GB VRAMMistral 7B Instruct v0.1 — 7.2B, mistralai · ~8.0 GB VRAMzephyr 7b beta — 7.2B, HuggingFaceH4 · ~8.0 GB VRAMQwen2.5 7B Instruct — 7B, Alibaba · ~7.0 GB VRAMQwen2.5-Coder 7B Instruct — 7B, Alibaba · ~7.0 GB VRAMQwen2.5 7B Instruct (Abliterated) — 7B, huihui-ai (community) · ~7.0 GB VRAMdeepseek coder 7b instruct v1.5 — 6.9B, deepseek-ai · ~8.0 GB VRAMgranite 4.0 h tiny — 6.9B, ibm-granite · ~7.0 GB VRAMOLMoE 1B 7B 0125 Instruct — 6.9B, allenai · ~6.0 GB VRAMOLMoE 1B 7B 0924 — 6.9B, allenai · ~6.0 GB VRAMllama 7b — 6.7B, huggyllama · ~7.0 GB VRAMgranite 4.0 tiny preview — 6.7B, ibm-granite · ~7.0 GB VRAMLlama 2 7b hf — 6.7B, NousResearch · ~8.0 GB VRAMLlama 2 7b chat hf — 6.7B, NousResearch · ~8.0 GB VRAMGemma 4 E4B it NVFP4 — 6B, bg-digitalservices · ~8.0 GB VRAMQwen3.6 27B MTPLX Optimized Speed — 4.7B, Youssofal · ~7.0 GB VRAMJosiefied Qwen3 VL 4B Instruct abliterated beta v1 — 4.4B, Goekdeniz-Guelmez · ~6.0 GB VRAMQwen3 4B — 4B, Qwen · ~5.0 GB VRAMQwen3 4B Instruct 2507 — 4B, Qwen · ~5.0 GB VRAMRio 3.0 Open Mini — 4B, prefeitura-rio · ~5.0 GB VRAMNVIDIA Nemotron 3 Nano 4B BF16 — 4B, nvidia · ~6.0 GB VRAMQwen3 4B Base — 4B, Qwen · ~5.0 GB VRAMQwen3 4B Thinking 2507 — 4B, Qwen · ~5.0 GB VRAMNanbeige4.1 3B — 3.9B, Nanbeige · ~5.0 GB VRAMPhi-3.5-mini Instruct — 3.8B, Microsoft · ~5.0 GB VRAMPhi 4 mini instruct — 3.8B, microsoft · ~6.0 GB VRAMPhi tiny MoE instruct — 3.8B, microsoft · ~4.0 GB VRAMPhi 3 mini 4k instruct — 3.8B, microsoft · ~5.0 GB VRAMNemotron Labs Diffusion 3B Base — 3.8B, nvidia · ~4.0 GB VRAMPhi 4 mini reasoning — 3.8B, microsoft · ~6.0 GB VRAMNemotron Labs Diffusion 3B — 3.8B, nvidia · ~5.0 GB VRAMHyperCLOVAX SEED Vision Instruct 3B — 3.7B, naver-hyperclovax · ~5.0 GB VRAMPowerLM 3b — 3.5B, ibm-research · ~5.0 GB VRAMPowerMoE 3b — 3.4B, ibm-research · ~4.0 GB VRAMgranite 4.1 3b — 3.4B, ibm-granite · ~5.0 GB VRAMgranite 4.0 micro — 3.4B, ibm-granite · ~5.0 GB VRAMtiny aya base — 3.3B, CohereLabs · ~5.0 GB VRAMtiny aya global — 3.3B, CohereLabs · ~5.0 GB VRAMLlama 3.2 3B — 3.2B, meta-llama · ~5.0 GB VRAMLlama 3.2 3B Instruct — 3.2B, unsloth · ~5.0 GB VRAMLlama 3.2 3B Instruct pythonic — 3.2B, baseten · ~5.0 GB VRAMQwen2.5 3B Instruct — 3.1B, Qwen · ~4.0 GB VRAMQwen2.5 Coder 3B — 3.1B, Qwen · ~4.0 GB VRAMSmolLM3 3B — 3.1B, HuggingFaceTB · ~5.0 GB VRAMQwen2.5 3B — 3.1B, Qwen · ~4.0 GB VRAMSmolLM3 3B Base — 3.1B, HuggingFaceTB · ~5.0 GB VRAMQwen2.5 Coder 3B Instruct — 3.1B, Qwen · ~4.0 GB VRAMLlama 3.2 3B Instruct — 3B, Meta · ~5.0 GB VRAMstarcoder2 3b — 3B, bigcode · ~4.0 GB VRAMkimi k2.6 eagle3 mla — 3B, lightseekorg · ~4.0 GB VRAMphi 2 — 2.8B, microsoft · ~4.0 GB VRAMstablelm 3b 4e1t — 2.8B, stabilityai · ~5.0 GB VRAMOther GPU budgets 16 GB of VRAM · 24 GB of VRAM · 48 GB of VRAM · All models
Open the free advisor → · Prices as of 2026-06-17. We're an honest advisor — $0 markup, your own accounts, we never resell compute. © 2026 Cynosure LLC.