Private Intelligence

Homelab inference research and infrastructure observations. The method is measure, don't speculate.

Latest

Page cache costs 6 seconds. Compile cache costs 72.

May 14, 2026 · ~12 minute read · Storage & power on local LLM inference

What two RTX 3090s, an 8-cell cold-start sweep, and a power-cap experiment taught me about where the seconds and watts actually go. Three assumptions the data didn't support, one 12× ratio that surprised me, and a 250 W power cap that gives back 36% of GPU power for an 11% throughput cost.

Elsewhere

Hashnode April 2026 · ~18 minute read

Same model, same GPU, 4× the context: a weekend of inference-stack dogfooding

Standing up vLLM nightly and llama.cpp on the same 3090 with the same Qwen3.6-27B model — and discovering that two inference engines on identical hardware give a 4× difference in usable context. Hybrid Mamba-attention architecture accounting, quantization comparison, and the prompt-cache mechanics behind the gap.