Home › Media Open image, video & voice models 62 open generative-media models you can actually download and run — text-to-image, video, text-to-speech, speech-to-text, music, and unified multimodal. For each: what it does, its download size, the VRAM to run it locally, the license, and how to run it. We don't host them — we point you to the real weights.
Unlike chat models, media models are priced per image / per second / per minute , not per token — so we show the honest "$0 on your own hardware , or rent a GPU by the hour" path.
Image · 16 Open text-to-image and image-editing models you can download and run. See all →
FLUX.1 [dev] — Text → image, Black Forest Labs · ~12 GB VRAM · rent $0.26/hr · non-commercialStable Diffusion XL 1.0 (SDXL) — Text → image, Stability AI · ~8.0 GB VRAM · rent $0.26/hr · commercial OKStable Diffusion 1.5 — Text → image, Runway / Stability AI · ~4.0 GB VRAM · rent $0.26/hr · commercial OKFLUX.1 Kontext [dev] — Image editing, Black Forest Labs · ~12 GB VRAM · rent $0.26/hr · non-commercialFLUX.1 [schnell] — Text → image, Black Forest Labs · ~12 GB VRAM · rent $0.26/hr · commercial OKQwen-Image — Text → image, Alibaba (Qwen) · ~24 GB VRAM · rent $0.49/hr · commercial OKStable Diffusion 3.5 Large — Text → image, Stability AI · ~12 GB VRAM · rent $0.26/hr · commercial OKQwen-Image-Edit — Image editing, Alibaba (Qwen) · ~24 GB VRAM · rent $0.49/hr · commercial OKVideo · 14 Open text-to-video and image-to-video models — heavier, but runnable on a rented GPU. See all →
Wan2.2 T2V-A14B — Text → video, Alibaba (Wan-AI) · ~24 GB VRAM · rent $0.49/hr · commercial OKWan2.2 I2V-A14B — Image → video, Alibaba (Wan-AI) · ~24 GB VRAM · rent $0.49/hr · commercial OKHunyuanVideo — Text → video, Tencent · ~45 GB VRAM · rent $1.39/hr · non-commercialWan2.2 TI2V-5B — Text → video, Alibaba (Wan-AI) · ~8.0 GB VRAM · rent $0.26/hr · commercial OKLTX-Video (13B) — Text → video, Lightricks · ~13 GB VRAM · rent $0.26/hr · commercial OKCogVideoX-5B — Text → video, Zhipu AI / THUDM · ~5.0 GB VRAM · rent $0.26/hr · non-commercialStable Video Diffusion (img2vid-XT) — Image → video, Stability AI · ~16 GB VRAM · rent $0.26/hr · commercial OKWan2.1 T2V-1.3B — Text → video, Alibaba (Wan-AI) · ~8.0 GB VRAM · rent $0.26/hr · commercial OKVoice & Audio · 20 Open text-to-speech, voice cloning, speech-to-text and music models. See all →
Whisper large-v3 — Speech → text, OpenAI · ~10 GB VRAM · rent $0.26/hr · commercial OKWhisper large-v3-turbo — Speech → text, OpenAI · ~6.0 GB VRAM · rent $0.26/hr · commercial OKKokoro-82M — Text → speech, hexgrad · ~1.0 GB VRAM · rent $0.26/hr · commercial OKXTTS-v2 — Voice cloning, Coqui · ~6.0 GB VRAM · rent $0.26/hr · non-commercialMusicGen Large — Text → music, Meta · ~16 GB VRAM · rent $0.26/hr · non-commercialChatterbox — Text → speech, Resemble AI · ~8.0 GB VRAM · rent $0.26/hr · commercial OKParakeet TDT 0.6B v2 — Speech → text, NVIDIA · ~4.0 GB VRAM · rent $0.26/hr · commercial OKF5-TTS — Voice cloning, SWivid · ~6.0 GB VRAM · rent $0.26/hr · non-commercialMultimodal / Omni · 12 Unified models that handle several modalities — image, audio, video and text — in one. See all →
Qwen2.5-Omni-7B — Omni understanding + speech generation, Alibaba Qwen · ~24 GB VRAM · rent $0.49/hr · commercial OKJanus-Pro-7B — Unified understanding + image gen, DeepSeek · ~16 GB VRAM · rent $0.26/hr · commercial OKOmniGen2 — Any to image (gen + edit + in context), BAAI / VectorSpaceLab · ~17 GB VRAM · rent $0.26/hr · commercial OKMiniCPM-o 2.6 — Omni understanding + speech generation, OpenBMB · ~9.0 GB VRAM · rent $0.26/hr · commercial OKBAGEL-7B-MoT — Unified understanding + image gen + edit, ByteDance Seed · ~24 GB VRAM · rent $0.49/hr · commercial OKEmu3.5 — Any to any world model (gen + edit), BAAI · ~48 GB VRAM · rent $1.39/hr · commercial OKEmu3-Gen — Next token any to any generation, BAAI · ~18 GB VRAM · rent $0.26/hr · commercial OKOmniGen v1 — Unified image gen + edit, BAAI · ~12 GB VRAM · rent $0.26/hr · commercial OKLooking for chat / LLM models? →
Open the free Spanvero advisor → · We point you to the open weights + your own accounts, $0 markup, never resell compute. © 2026 Cynosure LLC.