inclusionAI (Ant Group) · Omni understanding + image & speech gen · 19B params · MIT (commercial OK)
An MoE omni model (19B total, only 2.8B activated) that ingests image, text, audio and video and uniquely generates both images (with editing and style transfer) and speech, all under a permissive MIT license.
Note: generative-media models are billed per image / per second / per minute on hosted services — not per token. Running locally or on your own rented GPU is usually far cheaper and keeps your data on your machine.
| Does | Visual understanding, Audio understanding, Video understanding, Text → image, Image editing, Text → speech, Speech recognition |
| VRAM to run | ~41 GB (~41GB in BF16 (19B total, 2.8B active MoE) needs an A6000/A100-class 48GB card at full precision; quantization can fit a 24-32GB card.) |
| Download | ~41 GB |
| Parameters | 19B |
| License | MIT (commercial use OK) |
| Run with | Transformers (BailingMM) |
Get Ming-Lite-Omni on Hugging Face →
Browse: all media models · chat / LLM models
Open the free Spanvero advisor → · We point you to the open weights + your own accounts, $0 markup, never resell compute. © 2026 Cynosure LLC.