How to run MiniCPM-o 2.6 locally

OpenBMB · Omni understanding + speech generation · 8B params · MiniCPM Model License (code Apache-2.0) (commercial OK)

GPT-4o-style 8B omni model that takes images, video and audio and supports real-time bilingual speech-to-speech conversation with voice cloning and configurable voices. Free commercial use is permitted after completing OpenBMB's registration form.

What it costs to run — $0 markup

Note: generative-media models are billed per image / per second / per minute on hosted services — not per token. Running locally or on your own rented GPU is usually far cheaper and keeps your data on your machine.

Key facts

DoesVisual understanding, Video understanding, Audio understanding, Speech recognition, Text → speech, Speech → speech, Ocr
VRAM to run~9.0 GB (~17-19GB in BF16; the official int4 build runs real-time speech and vision in roughly 7-9GB, suitable for a single mid-range GPU.)
Download~17 GB
Parameters8B
LicenseMiniCPM Model License (code Apache-2.0) (commercial use OK)
Run withTransformers (also llama.cpp / vLLM)

Get MiniCPM-o 2.6 on Hugging Face →

More multimodal / omni models

Browse: all media models · chat / LLM models

Open the free Spanvero advisor → · We point you to the open weights + your own accounts, $0 markup, never resell compute. © 2026 Cynosure LLC.