The real cost to run GLM 5.2 W4AFP8

PhalaCloud · 391.9B parameters · 1M context · commercial OK

GLM 5.2 W4AFP8 — 391.9B params (PhalaCloud). Auto-indexed from the Hugging Face Hub (13,224 downloads). Parameter count is exact; download size and quantizations are estimates.

What it costs to run GLM 5.2 W4AFP8 — $0 markup

On your own machine — $0. Runs free locally if you have about 295 GB of VRAM at Q4_K_M (point Spanvero at LM Studio, Ollama, or llama.cpp).
On your own rented GPU — from $3.43/hr. 7× NVIDIA RTX A6000 48GB at the direct RunPod price (≈ $4.97 a session), $0 markup.
Via your own API key — $3.24/1M tokens (blended input + output). No GPU to manage (rough estimate for this size).

Key facts

Parameters	391.9B
Context window	1M tokens
Recommended quant	Q4_K_M
VRAM to run	~295 GB (at Q4_K_M, 16.4K context)
Download size	~235 GB
License	Commercial use OK

Open the free Spanvero advisor → for the live, interactive math for your exact workload and hardware.

Related models

DeepSeek R1 0528 NVFP4 v2 — 393.6B, nvidia
Llama 4 Maverick (17B-128E) — 402B, Meta
GLM 5.1 NVFP4 — 381.5B, nvidia
GLM 5.2 NVFP4 — 381B, nvidia
GLM 5.2 NVFP4 — 379.2B, Mapika

Browse: All models · Compare