Spanvero How it works Find a model Compare models Pricing

Open image, video & voice models

62 open generative-media models you can actually download and run — text-to-image, video, text-to-speech, speech-to-text, music, and unified multimodal. For each: what it does, its download size, the VRAM to run it locally, the license, and how to run it. We don't host them — we point you to the real weights.

Unlike chat models, media models are priced per image / per second / per minute, not per token — so we show the honest "$0 on your own hardware, or rent a GPU by the hour" path.

Image · 16

Open text-to-image and image-editing models you can download and run. See all →

FLUX.1 [dev] — Text → image, Black Forest Labs · ~12 GB VRAM · rent $0.13/hr · non-commercial
Stable Diffusion XL 1.0 (SDXL) — Text → image, Stability AI · ~8.0 GB VRAM · rent $0.06/hr · commercial OK
Stable Diffusion 1.5 — Text → image, Runway / Stability AI · ~4.0 GB VRAM · rent $0.06/hr · commercial OK
FLUX.1 Kontext [dev] — Image editing, Black Forest Labs · ~12 GB VRAM · rent $0.13/hr · non-commercial
FLUX.1 [schnell] — Text → image, Black Forest Labs · ~12 GB VRAM · rent $0.13/hr · commercial OK
Qwen-Image — Text → image, Alibaba (Qwen) · ~24 GB VRAM · rent $0.49/hr · commercial OK
Stable Diffusion 3.5 Large — Text → image, Stability AI · ~12 GB VRAM · rent $0.13/hr · commercial OK
Qwen-Image-Edit — Image editing, Alibaba (Qwen) · ~24 GB VRAM · rent $0.49/hr · commercial OK

Video · 14

Open text-to-video and image-to-video models — heavier, but runnable on a rented GPU. See all →

Wan2.2 T2V-A14B — Text → video, Alibaba (Wan-AI) · ~24 GB VRAM · rent $0.49/hr · commercial OK
Wan2.2 I2V-A14B — Image → video, Alibaba (Wan-AI) · ~24 GB VRAM · rent $0.49/hr · commercial OK
HunyuanVideo — Text → video, Tencent · ~45 GB VRAM · rent $1.39/hr · non-commercial
Wan2.2 TI2V-5B — Text → video, Alibaba (Wan-AI) · ~8.0 GB VRAM · rent $0.06/hr · commercial OK
LTX-Video (13B) — Text → video, Lightricks · ~13 GB VRAM · rent $0.13/hr · commercial OK
CogVideoX-5B — Text → video, Zhipu AI / THUDM · ~5.0 GB VRAM · rent $0.06/hr · non-commercial
Stable Video Diffusion (img2vid-XT) — Image → video, Stability AI · ~16 GB VRAM · rent $0.13/hr · commercial OK
Wan2.1 T2V-1.3B — Text → video, Alibaba (Wan-AI) · ~8.0 GB VRAM · rent $0.06/hr · commercial OK

Voice & Audio · 20

Open text-to-speech, voice cloning, speech-to-text and music models. See all →

Whisper large-v3 — Speech → text, OpenAI · ~10 GB VRAM · rent $0.06/hr · commercial OK
Whisper large-v3-turbo — Speech → text, OpenAI · ~6.0 GB VRAM · rent $0.06/hr · commercial OK
Kokoro-82M — Text → speech, hexgrad · ~1.0 GB VRAM · rent $0.06/hr · commercial OK
XTTS-v2 — Voice cloning, Coqui · ~6.0 GB VRAM · rent $0.06/hr · non-commercial
MusicGen Large — Text → music, Meta · ~16 GB VRAM · rent $0.13/hr · non-commercial
Chatterbox — Text → speech, Resemble AI · ~8.0 GB VRAM · rent $0.06/hr · commercial OK
Parakeet TDT 0.6B v2 — Speech → text, NVIDIA · ~4.0 GB VRAM · rent $0.06/hr · commercial OK
F5-TTS — Voice cloning, SWivid · ~6.0 GB VRAM · rent $0.06/hr · non-commercial

Multimodal / Omni · 12

Unified models that handle several modalities — image, audio, video and text — in one. See all →

Qwen2.5-Omni-7B — Omni understanding + speech generation, Alibaba Qwen · ~24 GB VRAM · rent $0.49/hr · commercial OK
Janus-Pro-7B — Unified understanding + image gen, DeepSeek · ~16 GB VRAM · rent $0.13/hr · commercial OK
OmniGen2 — Any to image (gen + edit + in context), BAAI / VectorSpaceLab · ~17 GB VRAM · rent $0.13/hr · commercial OK
MiniCPM-o 2.6 — Omni understanding + speech generation, OpenBMB · ~9.0 GB VRAM · rent $0.06/hr · commercial OK
BAGEL-7B-MoT — Unified understanding + image gen + edit, ByteDance Seed · ~24 GB VRAM · rent $0.49/hr · commercial OK
Emu3.5 — Any to any world model (gen + edit), BAAI · ~48 GB VRAM · rent $1.39/hr · commercial OK
Emu3-Gen — Next token any to any generation, BAAI · ~18 GB VRAM · rent $0.13/hr · commercial OK
OmniGen v1 — Unified image gen + edit, BAAI · ~12 GB VRAM · rent $0.13/hr · commercial OK

Looking for chat / LLM models? →

The weekly price index

A short email of real AI price moves, straight from the daily log — no hype. We're collecting the list now; the first issue goes out when it opens. Unsubscribe with one click.

Joining the list needs JavaScript — or just email support@spanvero.com and we'll add you.