How to run Ming-Lite-Omni locally

inclusionAI (Ant Group) · Omni understanding + image & speech gen · 19B params · MIT (commercial OK)

An MoE omni model (19B total, only 2.8B activated) that ingests image, text, audio and video and uniquely generates both images (with editing and style transfer) and speech, all under a permissive MIT license.

What it costs to run — $0 markup

Note: generative-media models are billed per image / per second / per minute on hosted services — not per token. Running locally or on your own rented GPU is usually far cheaper and keeps your data on your machine.

Key facts

DoesVisual understanding, Audio understanding, Video understanding, Text → image, Image editing, Text → speech, Speech recognition
VRAM to run~41 GB (~41GB in BF16 (19B total, 2.8B active MoE) needs an A6000/A100-class 48GB card at full precision; quantization can fit a 24-32GB card.)
Download~41 GB
Parameters19B
LicenseMIT (commercial use OK)
Run withTransformers (BailingMM)

Get Ming-Lite-Omni on Hugging Face →

More multimodal / omni models

Browse: all media models · chat / LLM models

Open the free Spanvero advisor → · We point you to the open weights + your own accounts, $0 markup, never resell compute. © 2026 Cynosure LLC.