Spanvero How it works Find a model Compare models Pricing

Parameters (the "B" / billions)

A model's learned numerical weights; the "B" in a name like "7B" or "70B" means billions of them, and it is the single biggest driver of how big, capable, and expensive a model is to run.

When you see a model named "Llama-3-8B," "Qwen-72B," or "gpt-oss-120b," the number is how many parameters it has, in billions. Parameters are the values the network adjusted during training — the weights inside every layer that, taken together, encode everything the model "knows." Training is the one-time, enormously expensive process that sets those numbers; once training is done, the parameters are fixed and you just reuse them every time you run the model.

More parameters generally means more capability: the model can store more patterns, more facts, and more nuance. But capacity is not free. Parameter count is the main thing that determines how much memory a model needs, how much it costs to run, and how fast it generates. This is why the "B" is the first number to look at when you're deciding what you can actually run.

The most useful rule of thumb ties parameters directly to memory. At full 16-bit precision, a model needs roughly 2 GB of memory per billion parameters — so an 8B model wants about 16 GB before any compression, and a 70B model wants around 140 GB. That is why raw parameter count so often decides whether a model fits on your GPU. The practical escape hatch is quantization, which stores those same weights at lower precision (commonly 4-bit) and shrinks the footprint to roughly a quarter — turning that 8B model into about 4-5 GB of weights. Parameter count and quantization together are what determine your VRAM bill.

A critical honesty point: parameter count is an objective spec, not a quality score. It tells you how big a model is and roughly what it costs — it does not tell you how good it is at your task. Training data, architecture, and how well a model was fine-tuned matter enormously. A carefully trained 8B model routinely beats a sloppy 70B one on real work, and small models have improved dramatically year over year. So use parameter count for what it's honestly good for — estimating hardware fit and cost — and judge actual quality by trying the model on your own task.

It helps to know roughly what the size tiers mean in practice. Models under about 4 billion parameters are the "small" tier — fast, cheap, and able to run almost anywhere, including phones and CPUs, and increasingly capable for focused tasks. The 7B-to-13B range is the workhorse tier that most people run at home: genuinely useful, and comfortable on a mid-range GPU once quantized. The 30B-to-70B range is where you get noticeably stronger general reasoning, but you need a high-end card (or two) or a rented GPU to run it. Above that — 100B and up — you're almost always looking at renting a GPU or using a hosted API, unless the model is a Mixture-of-Experts design that runs lighter than its total size suggests.

A related subtlety is precision. The "~2 GB per billion parameters" figure assumes 16-bit weights, which is how models are trained and published. But you rarely run them that way locally. Once you apply quantization, the effective memory per parameter drops — to about 1 GB per billion at 8-bit and about 0.5 GB per billion at 4-bit. So the same parameter count can translate into very different memory bills depending on how you run it, which is why parameters and quantization always have to be considered together, and why the number of tokens you push through (the workload) shapes the running cost on top of that.

One more wrinkle: with Mixture-of-Experts (MoE) models, the headline parameter count can be misleading. An MoE model advertises a large total parameter count but only uses a small "active" subset per token, so it runs fast like a small model while still needing memory for the whole thing. If a model's name or spec sheet mentions experts, check both the total and active figures before doing your VRAM math.

On Spanvero, parameter count is one of the core facts we surface for every model, because it drives the honest cost math. You can browse models grouped by size and see the real cost to run each at /models/, filter to what fits your hardware at pages like /models/24gb-vram/, or plug a specific size into the calculator at /calculator/ to see the memory and dollar cost for your own setup — all with zero markup.

Quantization · VRAM · Mixture of Experts (MoE) · Active vs total parameters · Inference · Tokens

All explainers → · Browse models →

The weekly price index

A short email of real AI price moves, straight from the daily log — no hype. We're collecting the list now; the first issue goes out when it opens. Unsubscribe with one click.

Joining the list needs JavaScript — or just email support@spanvero.com and we'll add you.

Parameters (the "B" / billions)

Related

The weekly price index