A cheap fine-tuning method that freezes the base model and trains tiny add-on "adapter" matrices, producing a small file you can stack on top of the original weights.
LoRA (Low-Rank Adaptation) is the most popular way to customize an open model without retraining it. Instead of updating billions of weights, it freezes the base model and learns small low-rank matrices that nudge the model's behavior. Training is far cheaper in memory and time, and the result is a small adapter file (often megabytes) rather than a full multi-gigabyte model.
Because the adapter is separate, you can keep one base model and swap different LoRAs on top — one for a coding style, one for a persona, etc. A common variant, QLoRA, fine-tunes on top of a quantized base so it fits on consumer GPUs.
LoRAs are especially common in the image-generation world too, where small LoRA files add a specific style or subject to a diffusion model. They can later be merged into the base weights if you want a single standalone model.
Fine-tuning · Base vs instruct model · Quantization · Diffusion model
All explainers → · Browse models →
Open the free Spanvero advisor → · Honest, $0-markup. © 2026 Cynosure LLC.