Hugging Face's standard, safe-to-load weight format that stores raw model tensors without executable code, used as the canonical full-precision distribution format for open models.
Safetensors is the default format for publishing model weights on the Hugging Face Hub. It was created to replace Python "pickle" checkpoints, which could execute arbitrary code when loaded — safetensors stores only the tensor data, so loading a file can't run hidden code. It also supports fast, memory-mapped (zero-copy) loading.
Unlike GGUF, a safetensors model is usually shipped as the full-precision weights alongside separate config and tokenizer files (and often split across several shards for large models). It's the format training frameworks read and write.
For running models locally on modest hardware you'll generally convert/download a GGUF instead, but safetensors is what fine-tuning, vLLM serving, and most GPU pipelines consume directly. Think: safetensors for training and GPU serving, GGUF for local quantized inference.
GGUF · Fine-tuning · vLLM · Quantization
All explainers → · Browse models →
Open the free Spanvero advisor → · Honest, $0-markup. © 2026 Cynosure LLC.