Parameters (the "B" / billions)

A model's learned numerical weights; the "B" in a name like "7B" or "70B" means billions of them, and it is the single biggest driver of how big and capable a model is.

When a model is named "Llama-3-8B" or "Qwen-72B," the number is how many parameters it has, in billions. Parameters are the values the model adjusted during training — they encode everything it "knows." More parameters generally means more capability, but also more memory and slower, costlier inference.

Parameter count is the main thing that determines how much memory a model needs to run. A rough rule of thumb at full 16-bit precision is ~2 GB of memory per billion parameters, so a 7B model needs ~14 GB before quantization shrinks it. This is why parameter count and quantization together decide whether a model fits on your GPU.

Parameter count is an objective, factual spec — it is not a quality score. A well-trained 8B model can beat a sloppy 70B one on a given task, so use parameter count to estimate cost and hardware fit, then judge actual quality yourself.

Related

Quantization · VRAM · Mixture of Experts (MoE) · Inference

All explainers → · Browse models →

Open the free Spanvero advisor → · Honest, $0-markup. © 2026 Cynosure LLC.