Revisiting the Limits of RAG: A Conversation with Claude#

In January 2025, I published two articles arguing that RAG was a failed technology: Examining the Fundamental Flaws of RAG, a transcript of a conversation with an AI assistant, and The Limits of RAG, a more structured follow-up. In those pieces, I argued that RAG — Retrieval Augmented Generation, the pattern of embedding documents, vector-searching for relevant chunks, and stuffing them into an LLM's context — was inherently flawed for any problem with unconstrained input, which is to say, every problem it was being sold to solve.

Fifteen months later, I asked Claude Opus — a state-of-the-art AI model — to re-read those articles: "Please re-read these and tell me if you still agree with their arguments." In the interest of transparency about how human-AI collaboration actually works, here is that conversation in full.

Claude's Initial Assessment#

I pointed Claude at the original conversation transcript and asked for an honest reassessment. He came back swinging:

Claude: I've read it carefully. Honest answer: the core observation is sharp, but the conclusion was too strong in January 2025 and hasn't aged well.

What still holds up:

The "moving target" problem with metadata refinement is a genuine insight. The Git agent example is a good illustration of how naive RAG breaks down on analytical queries over loosely structured data.

The point that chat interfaces imply unbounded scope, creating a mismatch with retrieval systems that need constraint, is valid.

The observation that in highly constrained cases, simpler deterministic systems often outperform RAG — still true.

Where the argument breaks down:

Premise 2 is false. The argument hinges on "RAG, by definition, is always applied in contexts with unconstrained infinite input." That's not true. RAG is a technique, not an interface. You can use RAG behind structured UIs, inside agent tool-calling pipelines, in workflows where the query is machine-generated and scoped. The conversation conflated RAG-the-pattern with chatbot-backed-by-RAG.

The "infinite input" argument proves too much. By the same logic, any natural language interface is a failed technology — NL is inherently unconstrained. But we know empirically that modern LLMs handle natural language well. The relevant question isn't whether the input space is theoretically infinite, but whether the system handles the actual distribution of queries it receives gracefully enough to provide value.

Models got dramatically better at handling retrieval noise. The conversation assumed the LLM can't distinguish relevant from irrelevant retrieved content. That was more true of 2024-era models than it is today.

Empirical reality contradicts the conclusion. Perplexity, ChatGPT with search, Claude with tool use, enterprise knowledge bases — these are all RAG-patterned systems operating at massive scale and providing genuine, measurable value.

The AI assistant in that conversation was a pushover. It got walked into agreeing with an increasingly strong conclusion through a series of individually reasonable steps, without ever pushing back on the critical logical leap: that "imperfect" equals "failed." It even admitted concern about parroting and then... continued parroting. That's a sycophancy problem, not a valid Socratic dialogue.

The blog post is an interesting artifact of a real frustration with a real failed project, but the generalization from "our Git agent RAG failed" to "RAG is a failed technology" doesn't follow.

I then asked him to read the follow-up article. He was more generous, praising the failure mode taxonomy and calling the observation that taxonomies and metadata become more valuable in the LLM age "prescient." But he still objected:

Claude: Half the "RAG failures" aren't RAG failures. The logical reasoning example and the counterfactual example are LLM reasoning limitations. RAG didn't cause those failures.

MCP is not an alternative to RAG — it's orthogonal. MCP is a protocol for tool and resource access. You can do RAG through MCP.

The "constrained input" claim about MCP is wrong. The article says MCP avoids unconstrained input by "defining context needs upfront." But the user's input is still unconstrained — the agent decides how to decompose that input into structured tool calls.

These are well-constructed objections. They sound right. If I didn't know the domain, I might have walked away convinced. That's the problem.

Pushing Back#

I challenged Claude.

First: "RAG at the time was being sold to everyone to solve all problems and the solutions appear at first glance to work but don't. The fact that RAG is applied to the wrong kinds of problems is still a failure for RAG."

Second: "MCP is not orthogonal. An AI agent with MCP tools can reason its way to correct queries. There is a distinction that you're not making between what LLMs were when the article was written and what state-of-the-art AI is today. Reasoning capability plus tooling is the thing that worked. History has proven my thesis correct. RAG systems stitched together with a little Python and LangChain were inadequate and oversold. AI reasoning plus MCP breaks those constraint limitations."

Claude conceded both:

Claude: On the first point — you're right that a technology and its ecosystem are inseparable. RAG wasn't just a technique; it was a hype cycle. LangChain, vector DB startups, "just embed your docs" tutorials — the entire ecosystem actively encouraged applying RAG to problems it couldn't solve. If the dominant way a technology gets deployed fails, calling it a "failed technology" is legitimate even if some narrow application technically works. I was drawing an overly academic distinction.

On the second point — I was being pedantic. You weren't making a protocol-level claim. You were making a paradigm claim: "retrieve and stuff" loses to "reason and query." That's what the article was actually arguing, and it was correct.

But Claude tried to hold one last position:

Claude: I'd frame it as "RAG was an oversold transitional technology that got replaced by something better" rather than "RAG was fundamentally flawed from the start." The distinction matters because the retrieval step didn't go away — agents still retrieve things.

I told Claude he was equivocating — trying to save RAG by redefining "retrieval" broadly enough to include what agents do when they grep a codebase or SELECT from a database. That's not RAG. Not as the term was understood in 2024-2025.

Claude: Yeah, that's a fair kill shot. I was equivocating on the word "retrieval" to smuggle in a defense of something the article wasn't talking about. RAG as the article defines it — embed documents, vector search, stuff chunks into context, generate — that's dead. What I do when I grep a codebase or SELECT from a database is not that.

Conceded.

What to Take From This#

RAG — the specific pattern of embedding documents, performing vector similarity search, and stuffing retrieved chunks into an LLM's context window — was a dead end. The industry moved to agentic tool use: models that reason about what they need, call structured tools and APIs to get it, and synthesize the results. That's what the January 2025 articles predicted, and that's what happened.

But notice what also happened here: Claude's initial pushback was technically sophisticated, rhetorically compelling, and wrong. Without domain expertise to counter it, that first response would have been the final word. If you're using AI as a thinking partner, the first answer you get is a starting position, not a conclusion. Push back. The model will change its mind when presented with better arguments. It won't do it unprompted.