Spanvero How it works Find a model Compare models Pricing

Tokens

The chunks of text (roughly word-pieces) that a model reads and writes; pricing, speed, and context limits are all measured in tokens, not words.

Tokens are the fundamental unit that language models work in, and once you understand them, a lot of the numbers around AI — prices, speeds, context limits — suddenly make sense. A model does not read raw characters or whole words. Instead, text is broken into tokens: common chunks of characters that the model's tokenizer has learned. A token might be a whole short word ("the"), a word fragment ("tokeniz" + "ation"), a single character, a space, or a piece of punctuation. Frequent words tend to be one token; rarer or longer words get split into several.

The practical conversion everyone reaches for: in English, roughly 1 token ≈ 0.75 words, or about 4 characters per token. So a 1,000-word document is very roughly 1,300 tokens. These are approximations — the exact count depends on the language and the specific tokenizer a model uses. Non-English text, code, and text with lots of numbers or unusual symbols often tokenize less efficiently (more tokens per word), which is worth knowing if you work in other languages or with source code.

Tokens matter because they are the unit that everything is counted and charged in. When you use a hosted API, prices are quoted per million tokens — and almost always split into a cheaper rate for input tokens (your prompt) and a pricier rate for output tokens (the model's reply), because generating text costs more than reading it. Spanvero reports a blended input+output figure so you can compare models on a single number, while being clear that the two rates differ. Generation speed is reported in tokens per second, which tells you how fast a model produces its answer. And a model's context window — its working memory limit — is measured in tokens, covering both what you send in and what it generates back.

Because both your prompt and the model's response consume tokens, and because they share the same context budget, thinking in tokens helps you estimate three things at once: how much a request will cost, whether your input will fit in the context window, and roughly how long the answer will take. A long system prompt plus a long document plus a long expected answer all add up against both your bill and the context limit.

A couple of useful intuitions. Summarizing a long document is input-heavy and output-light, so it's relatively cheap per API pricing. Generating long-form content from a short prompt is the opposite — light input, heavy output — so the more expensive output rate dominates. And on local hardware, tokens still matter because more tokens in context means a larger KV cache and more VRAM used.

It's worth understanding, at least roughly, how tokenization happens, because it explains some surprising counts. Tokenizers are built by a process (commonly byte-pair encoding) that scans a huge amount of text and learns the most frequent character sequences, merging them into single tokens. Very common words become one token; rare words get broken into familiar sub-pieces. This is why "the" is one token but an unusual technical term or a long compound word might be three or four, and why a made-up string of characters can tokenize into many pieces. It's also why different model families can report slightly different token counts for the same text — they were trained with different tokenizers and different vocabularies.

A few concrete anchors help build intuition. A typical short chat message is a few dozen tokens. A page of prose is very roughly 500-800 tokens. A 10-page document is on the order of 5,000-8,000 tokens. Source code and heavily formatted or non-English text usually cost more tokens per visible character than plain English prose. Keeping these anchors in mind lets you sanity-check whether a task will fit in a model's context window and roughly what it will cost before you run it — which is the whole point of thinking in tokens rather than words.

Everything Spanvero prices is grounded in tokens, because that's how real usage is billed and measured. The calculator at /calculator/ lets you enter expected input and output token counts for your workload and see the honest, $0-markup cost across running locally, renting a GPU, or using your own API key — so you can see exactly what your token volume translates to in dollars before you commit.

Context window · Inference · Local vs API vs renting a GPU · Embeddings · KV cache · Parameters (the "B" / billions)

All explainers → · Browse models →

The weekly price index

A short email of real AI price moves, straight from the daily log — no hype. We're collecting the list now; the first issue goes out when it opens. Unsubscribe with one click.

Joining the list needs JavaScript — or just email support@spanvero.com and we'll add you.

Tokens

Related

The weekly price index