Mistral Small 3 (24B, 2501) (Mistral AI, 23.6B) runs on your own machine for $0 if you have about 20 GB of VRAM. Here's how to run it with Ollama, LM Studio, or llama.cpp — and what it would cost the other ways.
VRAM to run ~20 GB | Download ~14 GB | Quant Q4_K_M | Context 32.8K |
Auto-downloads a 4-bit quant and starts a chat — the simplest option.
ollama run mistral-small:24bFirst run downloads ~14 GB, then it's offline and free. Get Ollama at ollama.com.
Open LM Studio, search “Mistral Small 3 (24B, 2501)”, and download a quant that fits your VRAM (≈20 GB at Q4_K_M). Load it and chat — fully offline. It also serves a local OpenAI-compatible API you can point Spanvero at.
Grab a community GGUF build of Mistral Small 3 (24B, 2501) from Hugging Face (search “Mistral Small 3 (24B, 2501) GGUF” — bartowski and unsloth publish reliable ones), then run:
./llama-cli -m <Q4_K_M-file>.gguf -p "Hello" -ngl 99Or serve it with ./llama-server -m <file>.gguf for an OpenAI-compatible API on :8080.
License: commercial use OK.
Browse: Mistral Small 3 (24B, 2501) cost · models for your GPU · all models
Open the free Spanvero advisor → — it detects your hardware and confirms what fits.
Prices as of 2026-06-17. $0 markup, your own accounts, we never resell compute. © 2026 Cynosure LLC.