The maximum number of tokens (prompt plus generated output) a model can consider at once; anything beyond it is cut off or forgotten.
The context window is the model's working memory for a single request, measured in tokens. It must hold your system prompt, the conversation/document you provide, and the response being generated. Common sizes range from 4K up to 128K, 200K, or even 1M tokens on newer models.
If input exceeds the window, the oldest tokens fall out of scope — the model simply can't see them, which is why long chats "forget" earlier messages. Larger windows let you feed in big documents or long histories, but they cost more memory: the KV cache scales with how many tokens are in context, so a huge context can dominate your VRAM usage.
Context window is an objective catalog spec, so Spanvero lets you sort/filter by it (e.g. "models with at least 128K context") without making any quality claim.
Tokens · KV cache · VRAM · Inference
All explainers → · Browse models →
Open the free Spanvero advisor → · Honest, $0-markup. © 2026 Cynosure LLC.