The real cost to run llm jp 4 8b thinking

llm-jp · 8.6B parameters · 65.5K context · commercial OK

llm jp 4 8b thinking — 8.6B params (llm-jp). Auto-indexed from the Hugging Face Hub (83,620 downloads). Parameter count is exact; download size and quantizations are estimates.

What it costs to run llm jp 4 8b thinking — $0 markup

On your own machine — $0. Runs free locally if you have about 9 GB of VRAM at Q4_K_M (point Spanvero at LM Studio, Ollama, or llama.cpp).
On your own rented GPU — from $0.26/hr. NVIDIA RTX 3090 24GB at the direct Vast.ai price (≈ $0.10 a session), $0 markup.
Via your own API key — $0.17/1M tokens (blended input + output). No GPU to manage (rough estimate for this size).

Key facts

Parameters	8.6B
Context window	65.5K tokens
Recommended quant	Q4_K_M
VRAM to run	~9 GB (at Q4_K_M, 16.4K context)
Download size	~5 GB
License	Commercial use OK

Open the free Spanvero advisor → for the live, interactive math for your exact workload and hardware.

Related models

llm jp 4 32b a3b thinking — 32.1B, llm-jp
Nemotron Labs Diffusion 8B Base — 8.5B, nvidia
LFM2.5 8B A1B — 8.5B, LiquidAI
gemma 7b — 8.5B, google
internlm3 8b instruct — 8.8B, internlm

Browse: All models · Compare