The most recognized open text-to-speech and voice-cloning models
The open TTS and voice-cloning models you can run yourself — Kokoro, Chatterbox, XTTS-v2, F5-TTS and more. Ranked by recognition, each with the honest VRAM-to-run, license, and runner. We list the recognized open voice models with transparent costs (many run on a small GPU or even CPU); which voice you prefer is yours to judge.
How this is ranked: Objective task filter over the audio catalog, ordered by notability/recognition. We don't rank naturalness/quality — we present the recognized open voice models with honest run-costs and flag non-commercial licenses (XTTS-v2, F5-TTS are non-commercial).
1. Qwen2.5-Omni-7B — Omni understanding + speech generation, Alibaba Qwen · ~24 GB VRAM · commercial OK
2. Kokoro-82M — Text → speech, hexgrad · ~1.0 GB VRAM · commercial OK