Spanvero How it works Find a model Compare models Pricing

The most recognized open speech-to-text models (Whisper and alternatives)

The open speech-to-text / ASR models you can run yourself — Whisper, Distil-Whisper, NVIDIA Parakeet & Canary, Moonshine and more. Ranked by recognition, each with the honest VRAM-to-run, license, and runner. We list the recognized open transcription models with transparent costs (some run real-time on CPU); accuracy on your audio is yours to judge.

How this is ranked: Objective task filter, ordered by notability/recognition. We don't quote word-error-rate benchmarks we didn't run — we surface the recognized open ASR models with honest run-costs. 'Whisper alternative' is a real high-intent query; we present alternatives, the user judges accuracy for their language/audio.

1. Whisper large-v3 — Speech → text, OpenAI · ~10 GB VRAM · commercial OK
2. Whisper large-v3-turbo — Speech → text, OpenAI · ~6.0 GB VRAM · commercial OK
3. Parakeet TDT 0.6B v2 — Speech → text, NVIDIA · ~4.0 GB VRAM · commercial OK
4. Distil-Whisper large-v3 — Speech → text, Hugging Face · ~5.0 GB VRAM · commercial OK
5. Canary 1B v2 — Speech → text, NVIDIA · ~6.0 GB VRAM · commercial OK
6. Moonshine (base) — Speech → text, Useful Sensors · ~2.0 GB VRAM · commercial OK

More: all "best" lists · Outcome Lab · all models

The weekly price index

A short email of real AI price moves, straight from the daily log — no hype. We're collecting the list now; the first issue goes out when it opens. Unsubscribe with one click.

Joining the list needs JavaScript — or just email support@spanvero.com and we'll add you.