Spanvero How it works Find a model Compare models Pricing

The most recognized open vision and multimodal LLMs

The most recognized open vision-language models — LLMs that see images alongside text (Qwen-VL, Llama 4, Gemma 3 and more). Each with the honest cost to run it locally or on a rented GPU. We list the popular open multimodal models with transparent run-costs; how well each one reads your images is yours to judge.

How this is ranked: Objective filter (vision modality / multimodal tag is a catalog fact), ordered by popularity (a real recognition signal). We never rank visual-understanding quality or cite benchmarks — we surface the recognized open VLMs with honest run-costs and let the user judge.

1. Gemma 3 27B — Google, 27B · ~30 GB VRAM · $0.27/1M API · commercial OK
2. Llama 4 Maverick (17B-128E) — Meta, 402B · ~274 GB VRAM · $0.38/1M API · commercial OK
3. Gemma 3 12B — Google, 12B · ~13 GB VRAM · $0.10/1M API · commercial OK
4. Llama 4 Scout (17B-16E) — Meta, 109B · ~77 GB VRAM · $0.20/1M API · commercial OK
5. Qwen2-VL 7B Instruct — Alibaba, 8B · ~7.0 GB VRAM · $0.16/1M API est. · commercial OK
6. Phi 3 vision 128k instruct — microsoft, 4.1B · ~10 GB VRAM · $0.13/1M API est. · commercial OK
7. HyperCLOVAX SEED Vision Instruct 3B — naver-hyperclovax, 3.7B · ~5.0 GB VRAM · $0.13/1M API est. · non-commercial
8. Josiefied Qwen3 VL 4B Instruct abliterated beta v1 — Goekdeniz-Guelmez, 4.4B · ~6.0 GB VRAM · $0.14/1M API est. · non-commercial
9. llava onevision qwen2 7b ov — lmms-lab, 8B · ~7.0 GB VRAM · $0.16/1M API est. · commercial OK

More: all "best" lists · Outcome Lab · all models

The weekly price index

A short email of real AI price moves, straight from the daily log — no hype. We're collecting the list now; the first issue goes out when it opens. Unsubscribe with one click.

Joining the list needs JavaScript — or just email support@spanvero.com and we'll add you.