A single-file model format used by llama.cpp, Ollama, and LM Studio that bundles the (usually quantized) weights plus all metadata, optimized for local CPU/GPU inference.
GGUF (GPT-Generated Unified Format) is the format you download to run a model locally with llama.cpp-based tools. It packs the weights, tokenizer, and model metadata into one file, and it natively supports quantization (the Q4_K_M, Q5_K_M, etc. variants), so a GGUF download is typically already shrunk to run on consumer hardware.
GGUF is built for inference, not training. It runs well on CPU, GPU, or a mix of both, which is what makes it the backbone of consumer tools like Ollama and LM Studio. Its predecessor was the older GGML format, which GGUF replaced.
A common workflow is: a model is trained and published as safetensors on Hugging Face, then converted to GGUF (and quantized) for local use. If you're running a model on your own machine, you almost always want the GGUF version.
Safetensors · Q4_K_M and quant levels · llama.cpp · Ollama
All explainers → · Browse models →
Open the free Spanvero advisor → · Honest, $0-markup. © 2026 Cynosure LLC.