Home › Best › cheapest LLM to run locally on your own GPU The cheapest open LLMs to run on your own hardware Open LLMs ranked by how little VRAM they need to run locally — the smaller the footprint, the cheaper the GPU you need and the closer to truly $0 it gets. Sorted by computed VRAM-to-run (lowest first), with the honest local and rent-a-GPU cost for each.
How this is ranked: Objective: ranks by engine-computed VRAM (proxy for self-hosting cost — lowest VRAM = cheapest hardware to buy/rent). All runs are $0-markup. Not a quality ranking; the user judges which cheap-to-run model is good enough.
1. Nomic Embed Text v1.5 — Nomic AI, 137M · ~2.0 GB VRAM · $0.10/1M API est. · commercial OK2. gemma 3 270m — google, 300M · ~2.0 GB VRAM · $0.10/1M API est. · commercial OK3. Qwen2.5 0.5B Instruct — Qwen, 500M · ~2.0 GB VRAM · $0.10/1M API est. · commercial OK4. Qwen2.5 0.5B — Qwen, 500M · ~2.0 GB VRAM · $0.10/1M API est. · commercial OK5. TinyLlama 1.1B Chat v1.0 — TinyLlama, 1.1B · ~2.0 GB VRAM · $0.11/1M API est. · commercial OK6. gpt2 large — openai-community, 800M · ~2.0 GB VRAM · $0.11/1M API est. · commercial OK7. h2ovl mississippi 800m — h2oai, 800M · ~2.0 GB VRAM · $0.11/1M API est. · commercial OK8. Qwen2 0.5B — Qwen, 500M · ~2.0 GB VRAM · $0.10/1M API est. · commercial OK9. Qwen2 0.5B Instruct — Qwen, 500M · ~2.0 GB VRAM · $0.10/1M API est. · commercial OK10. bloom 560m — bigscience, 600M · ~2.0 GB VRAM · $0.10/1M API est. · non-commercial11. gpt2 medium — openai-community, 400M · ~2.0 GB VRAM · $0.10/1M API est. · commercial OK12. SmolLM2 360M Instruct — HuggingFaceTB, 400M · ~2.0 GB VRAM · $0.10/1M API est. · commercial OK13. LLaMmlein 1B prerelease — LSX-UniWue, 1.1B · ~2.0 GB VRAM · $0.11/1M API est. · non-commercial14. bloomz 560m — bigscience, 600M · ~2.0 GB VRAM · $0.10/1M API est. · non-commercial15. MiniCPM5 1B — openbmb, 1.1B · ~2.0 GB VRAM · $0.11/1M API est. · commercial OK16. gemma 3 270m it — google, 300M · ~2.0 GB VRAM · $0.10/1M API est. · commercial OK17. pythia 410m — EleutherAI, 500M · ~2.0 GB VRAM · $0.10/1M API est. · commercial OK18. functiongemma 270m it — google, 300M · ~2.0 GB VRAM · $0.10/1M API est. · commercial OK19. Falcon H1 0.5B Base — tiiuae, 500M · ~2.0 GB VRAM · $0.10/1M API est. · non-commercial20. Qwen2.5 Coder 0.5B Instruct — Qwen, 500M · ~2.0 GB VRAM · $0.10/1M API est. · commercial OK21. pythia 1b — EleutherAI, 1.1B · ~2.0 GB VRAM · $0.11/1M API est. · commercial OK22. qwen sft countdown defaultproj — asingh15, 500M · ~2.0 GB VRAM · $0.10/1M API est. · non-commercial23. Qwen3.6 35B A3B DFlash — z-lab, 500M · ~2.0 GB VRAM · $0.10/1M API est. · commercial OK24. pythia 410m deduped — EleutherAI, 500M · ~2.0 GB VRAM · $0.10/1M API est. · commercial OK25. Qwen3 8B DFlash b16 — z-lab, 1B · ~2.0 GB VRAM · $0.11/1M API est. · commercial OK26. SmolLM2 360M — HuggingFaceTB, 400M · ~2.0 GB VRAM · $0.10/1M API est. · commercial OK27. LFM2.5 350M — LiquidAI, 400M · ~2.0 GB VRAM · $0.10/1M API est. · non-commercial28. tinyllama oneshot w8w8 test static shape change — nm-testing, 1.1B · ~2.0 GB VRAM · $0.11/1M API est. · non-commercial29. TinyLlama 1.1B intermediate step 1431k 3T — TinyLlama, 1.1B · ~2.0 GB VRAM · $0.11/1M API est. · commercial OK30. LFM2.5 350M Base — LiquidAI, 400M · ~2.0 GB VRAM · $0.10/1M API est. · non-commercial31. MiniCPM5 1B SFT — openbmb, 1.1B · ~2.0 GB VRAM · $0.11/1M API est. · commercial OK32. Llama 3.2 1B Instruct — Meta, 1.2B · ~3.0 GB VRAM · $0.11/1M API est. · commercial OK33. BGE-M3 — BAAI, 567M · ~3.0 GB VRAM · $0.10/1M API est. · commercial OK34. Qwen3 0.6B — Qwen, 800M · ~3.0 GB VRAM · $0.11/1M API est. · commercial OK35. Qwen2.5 1.5B Instruct — Qwen, 1.5B · ~3.0 GB VRAM · $0.11/1M API est. · commercial OK36. Qwen2 1.5B Instruct — Qwen, 1.5B · ~3.0 GB VRAM · $0.11/1M API est. · commercial OK37. Llama 3.2 1B — meta-llama, 1.2B · ~3.0 GB VRAM · $0.11/1M API est. · commercial OK38. gemma 3 1b it — google, 1B · ~3.0 GB VRAM · $0.11/1M API est. · commercial OK39. OpenELM 1 1B Instruct — apple, 1.1B · ~3.0 GB VRAM · $0.11/1M API est. · non-commercial40. Qwen2.5 1.5B — Qwen, 1.5B · ~3.0 GB VRAM · $0.11/1M API est. · commercial OKShowing the top 40 of 354. See all →
More: all "best" lists · cost calculator · all models
Open the free Spanvero advisor → · Honest, $0-markup. © 2026 Cynosure LLC.