Spanvero How it works Find a model Compare models Pricing

Safetensors

Hugging Face's standard, safe-to-load weight format that stores raw model tensors without executable code, used as the canonical full-precision distribution format for open models.

Safetensors is the default format for publishing model weights on the Hugging Face Hub, and it's what you'll see listed on almost every open model's page. The name says what it's for: safely storing tensors — the multi-dimensional arrays of numbers that make up a model's weights. If GGUF is the format you download to run a model locally, safetensors is the format the model was originally published in and the format that training and GPU-serving pipelines consume directly.

The "safe" part is not marketing. Safetensors was created specifically to replace Python "pickle" checkpoints, the old default for saving model weights. Pickle files can contain arbitrary Python code that executes the moment you load the file — which means downloading a malicious pickle checkpoint could run harmful code on your machine. This was a genuine, exploited security hole in the ML world. Safetensors fixes it by design: the format stores only the raw tensor data plus a small metadata header, with no mechanism to embed or run code. Loading a safetensors file cannot execute anything, so you can pull weights from the Hub without worrying that opening them will compromise your system.

Beyond safety, safetensors is fast. It supports zero-copy, memory-mapped loading, meaning the runtime can point directly at the data on disk instead of copying gigabytes into memory first. For very large models split across many shards, this makes loading noticeably quicker and lighter on RAM.

Structurally, a safetensors release is different from a GGUF one. A safetensors model is usually shipped as the full-precision weights (16-bit) alongside separate files for the model config and the tokenizer, and for large models the weights are split across several shard files with an index. It is not pre-quantized the way a downloaded GGUF typically is — it's the raw material. That's exactly what you want for the jobs that need the full weights: fine-tuning frameworks read and write safetensors, and high-throughput serving engines like vLLM load safetensors directly onto the GPU. LoRA adapters are also commonly distributed as small safetensors files.

Because safetensors holds full-precision weights, its download size tracks the roughly-2-GB-per-billion-parameters rule directly: a 7B model in safetensors is around 14 GB across its shards, a 70B model well over 100 GB. That's the full-fat version, which is why people running on modest hardware convert to a quantized GGUF instead of loading safetensors directly. It also explains why safetensors is the canonical archival and starting format: publishers release the complete, uncompressed weights so the community can quantize, fine-tune, and convert from a faithful original, rather than from an already-lossy copy.

A nice property worth appreciating is that safetensors is deliberately simple and language-neutral. The format is just a small JSON header describing each tensor's name, data type, and shape, followed by the raw tensor bytes — nothing else. That simplicity is what makes it both safe (no code path to exploit) and fast (the runtime can map the bytes directly), and it's why the format was adopted so widely across the ecosystem in a short time. If you inspect a safetensors file, you're looking at plain numbers and their labels, not a program.

So when should you care which format you have? If your goal is to run a model on modest hardware for personal use, you'll generally want a GGUF instead — either download one directly or convert the safetensors. If your goal is to fine-tune a model, serve it at scale on GPUs with vLLM, or do anything in a training framework, safetensors is what those tools expect. The clean mental model: safetensors for training and GPU serving, GGUF for local quantized inference. Spanvero reports the honest cost for both paths — the free-on-your-machine local route (GGUF) and the rent-a-GPU serving route (safetensors + vLLM) — so you can compare them for any model at /calculator/ or side by side on a model's page under /models/.

What should you do with a safetensors model?

Choose the path based on what you are trying to do, not the file extension alone.

Run it locally → — Start with models sized for a consumer GPU, then open the model’s exact run guide.
Check the memory first → — Turn parameter count, quant and context into an honest VRAM requirement.
Skip conversion and rent → — Compare current GPU memory and hourly rates when local hardware is the constraint.

GGUF · Fine-tuning · vLLM · Quantization · LoRA · Inference

All explainers → · Browse models →

The weekly price index

A short email of real AI price moves, straight from the daily log — no hype. We're collecting the list now; the first issue goes out when it opens. Unsubscribe with one click.

Joining the list needs JavaScript — or just email support@spanvero.com and we'll add you.

Safetensors

What should you do with a safetensors model?

Related

The weekly price index