llm-d in Action — EPP Prefix Cache Routing and What It Actually Means
The stack is deployed. Now let’s see what it actually does. EPP prefix cache routing, 81.1% KV cache hit rate, TTFT at 15ms p50, and what those numbers mean ...
The stack is deployed. Now let’s see what it actually does. EPP prefix cache routing, 81.1% KV cache hit rate, TTFT at 15ms p50, and what those numbers mean ...
I deployed llm-d on a Lambda Labs GH200. Nothing worked first try. Here is the honest account of what broke, why, and how to fix it — so you don’t spend your...
I treated an M4 Mac Mini as a production-like inference environment — wired up Prometheus, Grafana, a kind cluster with nginx, and ran real load tests. Here’...
An Engineer’s annotated tour through what actually happens when you hit send — from bytes to tokens to embeddings to attention to the word your model finally...
Translating SQL to NoSQL: Architecture Deep-Dive
Part 2 of 2: Empirical evaluation of classification accuracy, routing performance, and cost attribution — with honest analysis of failure modes
Part 1 of 2: Design decisions, trade-offs, and lessons from building inference-sentinel