Field NoteJune 27, 2026

What actually breaks retrieval on messy PDFs

A walk through the failure modes I kept hitting — and the chunking + eval changes that fixed them.

Category
RAG & Agents

Most retrieval failures I hit traced back to messy source data, not the model. Cleaning, chunking, and evaluating fixed more than swapping models ever did.

Ask this site

Answers come only from the published work here, with sources cited.

What actually breaks retrieval on messy PDFs · Alex