How to run XTTS-v2 locally

Coqui · Voice cloning · 460M params · Coqui Public Model License (CPML, non-commercial) (non-commercial)

The widely-adopted multilingual voice-cloning model that copies a speaker from ~6 seconds of audio across 17 languages.

What it costs to run — $0 markup

Note: generative-media models are billed per image / per second / per minute on hosted services — not per token. Running locally or on your own rented GPU is usually far cheaper and keeps your data on your machine.

Key facts

DoesVoice cloning
VRAM to run~6.0 GB (~4-6GB; clones a voice from a ~6s clip)
Download~1.9 GB
Parameters460M
LicenseCoqui Public Model License (CPML, non-commercial) (non-commercial)
Run withCoqui TTS

Get XTTS-v2 on Hugging Face →

More voice & audio models

Browse: all media models · chat / LLM models

Open the free Spanvero advisor → · We point you to the open weights + your own accounts, $0 markup, never resell compute. © 2026 Cynosure LLC.