Active vs total parameters

In an MoE model, total parameters is everything that must be loaded into memory, while active parameters is the smaller subset actually used per token — the former drives memory, the latter drives speed/cost.

For a dense (non-MoE) model these are the same number. For a Mixture-of-Experts model they differ, and the distinction matters a lot for planning. Total parameters determine how much memory you need to hold the whole model; active parameters determine how much compute each token costs, which sets the speed and the per-token price.

Concretely, Mixtral 8x7B is "47B total / 13B active" and GPT-OSS-120B is "117B total / 5.1B active." It runs roughly as fast as a 13B or 5B model but you must still have memory for the full 47B or 117B.

When you see a model advertised by its active size ("runs like a 5B!"), check the total size for your VRAM math. Both figures are objective specs Spanvero can report.

Related

Mixture of Experts (MoE) · Parameters (the "B" / billions) · VRAM · Inference

All explainers → · Browse models →

Open the free Spanvero advisor → · Honest, $0-markup. © 2026 Cynosure LLC.