Field notes from the production trenches

Engineering, research, and process essays from the Overflow Labs team. Published when we've got something worth saying — usually monthly.

Featured

EngineeringMar 12, 20268 min

Evals are the product

Most LLM systems fail in production not because the model is wrong, but because no one defined what 'right' looks like. Here's how we approach evaluation as a first-class deliverable.

Amit Singh

Read essay

Research12 min

RAG isn't magic — it's information retrieval with extra steps

Why teams keep shipping disappointing RAG systems, and what 30 years of IR research can teach us about building ones that actually work.

Priya MenonFeb 28, 2026

Strategy10 min

Buy vs. build vs. fine-tune in 2026

A decision framework for technical leaders evaluating AI infrastructure, with real cost models for the three most common architectures.

Marco ReyesFeb 14, 2026

Engineering7 min

Agents without frameworks

Why the most reliable agentic systems we've shipped contain less than 200 lines of orchestration code — and what that says about the current framework landscape.

Sara LindqvistJan 30, 2026

Research9 min

The quiet rise of small models

Frontier models get the headlines, but a 7B model fine-tuned on the right 50k examples is still the right answer for half the problems we see.

Daniel ChoJan 16, 2026

Process6 min

Why we ship on day 90

Every Overflow Labs engagement targets a production milestone in the first quarter. Here's the operating system that makes it possible.

Amit SinghJan 4, 2026