How do I choose which AI model to run?

Filter by objective facts first — what fits your hardware, the license for your use, and the task type — then test the top candidates on your own work, because quality is best judged on your task, not on someone else's benchmark.

With hundreds of open models available, "which one should I run?" feels overwhelming — but there's an honest, systematic way to narrow the field that avoids the trap of chasing benchmark leaderboards. The key insight is to filter by objective facts first (which are knowable and reliable), then judge quality by testing on your own work (which no benchmark can do for you). Here's the framework.

Filter one: what fits your hardware. This is the hard constraint — a model you can't run is no use however good it is. Using the VRAM rule (about 0.5 GB per billion parameters at 4-bit, plus headroom), find the largest models your card or Mac can hold. An 8 GB card points you at small-to-7B models; 16 GB reaches the mid-teens-billion range; 24 GB opens up 32B-class models; 70B and up needs a workstation, multiple cards, or a rented GPU. This filter alone cuts the field dramatically and is completely objective — it's a measurable fit, not an opinion.

Filter two: the license, if it's for work. If you'll use the model commercially, you need one whose license clearly permits that — Apache-2.0 and MIT are the cleanest, and some capable models are research-only or non-commercial. This is a hard fact worth checking before you invest time, and it's a legitimate way to narrow to a safe set.

Filter three: the task type. Match the model to the job. Coding tasks favor coding-tuned models (Qwen-Coder, DeepSeek-Coder, and the like); tasks involving images need vision-language models; long-document work needs a long context window; and different modalities entirely — image generation, speech — call for media models rather than text LLMs. Filtering to the models built for your task is objective (it's a category, not a quality claim) and gets you to a short, relevant list.

Now the honest part: within that filtered short list, judge quality by testing on your own work. This is where reputable guidance parts ways with the leaderboard-chasing instinct. Benchmarks are run on generic tasks that may not resemble yours, and parameter count tells you cost and memory, not quality — a well-trained smaller model routinely beats a bigger one on real work. So take two or three candidates that fit, are licensed for your use, and target your task, and give them the same real tasks from your actual workflow. Whichever produces output you'd genuinely use is your answer. This is the one step no external ranking can do for you, and it's why Spanvero deliberately doesn't fabricate quality rankings — it gives you the objective filters and honest costs, and leaves the quality judgment where it belongs: with you, on your task.

A few useful signals to break ties among candidates. Popularity and recognition (like download counts) are a real, honest signal — widely-used models tend to be well-supported, well-documented, and reliable, even though popularity isn't the same as "best for you." Newer models are often (not always) improvements on older ones of the same size. And practical factors matter: a slightly-less-capable model that generates faster or fits with more context headroom may serve you better day-to-day than a marginally smarter but slower or tighter one. Weigh these after the hard filters, as tiebreakers among models you've actually tried.

Don't skip straight to "what's the best model?" — it's the wrong question, because "best" depends on your hardware, your license needs, your task, and your own judgment of the output. Ask instead: what fits, what's licensed for my use, what's built for my task — and then which of those do I like best when I try them. That sequence turns an overwhelming choice into a short, testable one.

Spanvero is built around exactly this framework: objective filters plus honest costs, never a fabricated quality verdict. Filter by your hardware at /models/8gb-vram/, /models/16gb-vram/, or /models/24gb-vram/; by license at /best/best-commercial-use-open-llms/; by task at /best/best-open-coding-llms/ or /best/best-open-vision-llms/; compare finalists head-to-head at /compare/; and use /calculator/ to see the honest cost of running your top pick locally, on a rented GPU, or via your own API key. Browse the full catalog at /models/.

How do I pick a model for coding? · How do I run my first local AI model? · Parameters (the "B" / billions) · Base vs instruct model · Is fine-tuning worth it, or should I just prompt? · What LLMs can I run on 16GB of VRAM? · How many tokens per second is usable? · Context window

All explainers → · Browse models →

The weekly price index

A short email of real AI price moves, straight from the daily log — no hype. We're collecting the list now; the first issue goes out when it opens. Unsubscribe with one click.

Joining the list needs JavaScript — or just email support@spanvero.com and we'll add you.

How do I choose which AI model to run?

Related

The weekly price index