How do I pick a model for coding?

Start with a recognized coding-tuned model in a size your hardware can run, prefer a permissive license if it's for work, and test it on your own real code — coding quality is best judged on your stack, not on someone else's benchmark.

Picking a model for coding is a fit-and-test decision, not a matter of finding a single "best" one, and being honest about that saves you from chasing benchmark leaderboards that may not reflect your actual work. Here's a practical, honest way to choose.

First, narrow to coding-tuned models. Many open models are fine-tuned specifically on source code — families like Qwen-Coder, DeepSeek-Coder, Code Llama, and others — and these generally outperform general models of the same size on programming tasks. That's an objective starting filter: a model tagged or named as a coder is built for the job. General instruct models can code too, and the strongest large ones code well, but if your primary use is programming, starting from the coding-tuned set is the sensible default.

Second, filter by what your hardware can actually run. This is the constraint that matters most for local use, because a model you can't run is no use however good it is. Use the VRAM rule — about 0.5 GB per billion parameters at 4-bit — to find the largest coding model that fits your card with headroom for context. Coding especially benefits from a long context (so the model can see more of your codebase), and a long context grows the KV cache, so leave room for it. If you're on an 8-16 GB card, a 7-14B coder is your range; on 24 GB, 32B-class coders open up; larger coders mean more VRAM, a rented GPU, or an API.

Third, check the license if this is for work. If you're writing code for a product or a business, you want a model whose license clearly permits commercial use — the Apache-2.0 and MIT-licensed models are the cleanest. Some capable models are research-only or non-commercial, which is fine for personal projects but a problem for shipping. The license is a hard fact worth verifying before you invest time in a model.

Fourth — and this is the honest heart of it — test the candidates on your own real code. Coding quality is highly stack-specific: a model that tops a Python benchmark may be weaker at your particular language, framework, or codebase conventions. There is no substitute for giving two or three candidates the same real tasks from your actual work — a bug to fix, a function to write, a refactor to do — and seeing which one's output you'd actually use. This is why reputable guidance avoids ranking coding models by a single quality score: the right one depends on what you build. Spanvero deliberately does not quote coding benchmarks it didn't run; it surfaces the recognized coding models with honest run-costs and lets you judge the code.

A few practical considerations that affect the experience. Context length matters a lot for coding — more context lets the model reason over more of your files at once, so favor models with generous windows if you work in large codebases. Speed matters if you're using the model interactively in an editor; a smaller or more heavily quantized coder that generates faster can be more pleasant than a slightly smarter but sluggish one. And integration matters: because local runners like Ollama and LM Studio expose an OpenAI-compatible API, you can point many coding tools and editor extensions at a local model, getting assistance at zero per-token cost with your code never leaving your machine — a real privacy win for proprietary code.

On the run-it-where question: local is excellent for coding because of privacy (your code stays put), $0 marginal cost (unlimited use once you own the hardware), and offline access. If the best coder for your work is too big for your hardware, a rented GPU or a pay-per-token API with your own key are the alternatives, chosen by your usage volume.

Spanvero lists the recognized open coding models with transparent, $0-markup run costs and honest license facts — never a fabricated quality ranking. Browse them at /best/best-open-coding-llms/, filter to what fits your card at /models/16gb-vram/ or /models/24gb-vram/, check commercial-safe options at /best/best-commercial-use-open-llms/, and use /calculator/ to compare the cost of running a coder locally versus renting or an API. For the broader how-to-choose framework, see /learn/how-to-pick-which-model-to-run/.

Related

How do I choose which AI model to run? · Base vs instruct model · Context window · What quantization should I use? · What LLMs can I run on 24GB of VRAM? · How do I run AI privately and offline? · Ollama · VRAM

All explainers → · Browse models →

Open the free Spanvero advisor → · Honest, $0-markup. © 2026 Cynosure LLC.

The weekly price index

A short email of real AI price moves, straight from the daily log — no hype. We're collecting the list now; the first issue goes out when it opens. Unsubscribe with one click.