How to run dolphin 2.9.1 yi 1.5 34b locally

dolphin 2.9.1 yi 1.5 34b (dphn, 34.4B) runs on your own machine for $0 if you have about 26 GB of VRAM. Here's how to run it with LM Studio or llama.cpp — and what it would cost the other ways.

VRAM to run

~26 GB

Download

~21 GB

Quant

Q4_K_M

Context

8.2K

Two ways to run dolphin 2.9.1 yi 1.5 34b locally

1. LM Studio — point-and-click

Open LM Studio, search “dolphin 2.9.1 yi 1.5 34b”, and download a quant that fits your VRAM (≈26 GB at Q4_K_M). Load it and chat — fully offline. It also serves a local OpenAI-compatible API you can point Spanvero at.

2. llama.cpp — maximum control

Grab a community GGUF build of dolphin 2.9.1 yi 1.5 34b from Hugging Face (search “dolphin 2.9.1 yi 1.5 34b GGUF” — bartowski and unsloth publish reliable ones), then run:

./llama-cli -m <Q4_K_M-file>.gguf -p "Hello" -ngl 99

Or serve it with ./llama-server -m <file>.gguf for an OpenAI-compatible API on :8080.

What it costs — $0 markup

On your machine: $0 — you already have the hardware (needs ~26 GB VRAM).
No GPU big enough? Use your own API key at about $0.38/1M tokens (size estimate), or rent a GPU by the hour. Full cost breakdown →

License: commercial use OK.

Browse: dolphin 2.9.1 yi 1.5 34b cost · models for your GPU · all models

Open the free Spanvero advisor → — it detects your hardware and confirms what fits.