OpenBMB · Omni understanding + speech generation · 8B params · MiniCPM Model License (code Apache-2.0) (commercial OK)
GPT-4o-style 8B omni model that takes images, video and audio and supports real-time bilingual speech-to-speech conversation with voice cloning and configurable voices. Free commercial use is permitted after completing OpenBMB's registration form.
Note: generative-media models are billed per image / per second / per minute on hosted services — not per token. Running locally or on your own rented GPU is usually far cheaper and keeps your data on your machine.
| Does | Visual understanding, Video understanding, Audio understanding, Speech recognition, Text → speech, Speech → speech, Ocr |
| VRAM to run | ~9.0 GB (~17-19GB in BF16; the official int4 build runs real-time speech and vision in roughly 7-9GB, suitable for a single mid-range GPU.) |
| Download | ~17 GB |
| Parameters | 8B |
| License | MiniCPM Model License (code Apache-2.0) (commercial use OK) |
| Run with | Transformers (also llama.cpp / vLLM) |
Get MiniCPM-o 2.6 on Hugging Face →
Browse: all media models · chat / LLM models
Open the free Spanvero advisor → · We point you to the open weights + your own accounts, $0 markup, never resell compute. © 2026 Cynosure LLC.