Codebase RAG over MCP for Codex and Claude Code

Q: Does MCP replace traditional codebase RAG?

No. RAG is the retrieval technique; MCP is the interface an agent uses to request that retrieval inside a workflow. In 2025, the MCP specification separated resources, prompts, and tools.

Q: Do I need embeddings on day one?

No. Start with lexical search and symbols. In 2026, GitHub tied part of the agentic review improvement to intelligent context retrieval, not to one technique.

Q: Does MCP authorize the agent to do anything?

It should not. In 2026, Anthropic recommends connecting only trusted remote MCP servers. In codebase RAG, each tool needs scope, limits, and logs.

Codebase RAG over MCP is a context layer for coding agents. Instead of asking Codex or Claude Code to read the whole repository, you expose small tools: search a symbol, find an API contract, list relevant tests, retrieve an architecture decision, and return only the excerpt needed for the next decision.

In 2026, GitHub said Copilot code review had grown 10x since its initial launch and now accounts for more than one in five code reviews on GitHub (GitHub, "60 million Copilot code reviews and counting", 2026). That volume makes the pain clear: agents need trustworthy context, not ever larger prompts.

Practical TL;DR

Codebase RAG over MCP should answer small questions about code.

A good tool returns path, excerpt, reason, and confidence boundary.

Lexical search, symbols, and embeddings complement each other.

Security comes from scope, audit logs, and short outputs.

Abstract context layer connects coding agents to codebase blocks.

Why did codebase RAG over MCP become urgent?

In 2025, Google Cloud's DORA report measured 90% AI adoption among software professionals (Google, "How are developers using AI? Inside our 2025 DORA report", 2025). Codebase RAG over MCP became urgent because adoption grew faster than teams' ability to organize context for agents.

The problem is not only context-window size. A larger window accepts more noise. In real codebases, the agent must separate current rule, dead code, broken test, local convention, and old decision. Dropping everything into the prompt turns discovery into a lottery.

This post extends my article on context engineering for coding agents. There, the focus was context budget. Here, the focus is the interface: how the agent asks for context, how the tool responds, and how the team audits what was used.

A well-designed MCP server works like an internal API for engineering knowledge. It does not grant "repo access". It gives narrow answers with boundaries. When the agent asks for a symbol, it gets candidate files. When it asks for tests, it gets names and commands. When it asks for risk, it gets sensitive areas and reasons.

The practical gain shows up in review. If the agent opens a PR with evidence of which excerpts it retrieved, the reviewer can separate implementation error from context error. That changes the discussion: the team stops asking "why did the AI invent this?" and starts asking "which tool returned weak context?".

What should an MCP server expose?

In 2025, the Model Context Protocol specification defined three server surfaces: resources, prompts, and tools (Model Context Protocol, "Specification 2025-11-25", 2025). For codebase RAG, tools should run controlled queries, resources should load stable context, and prompts should standardize repeated workflows.

Abstract flow shows context queries moving through code blocks and returning filtered signals.

Start with read tools. search_symbol, search_reference, find_tests, read_api_contract, and retrieve_decision cover most of the work. Each tool should have typed arguments, result limits, and a response with path, short excerpt, selection reason, and a note on when the search may be incomplete.

Then expose stable resources. Good candidates include module maps, test conventions, API contracts, error patterns, runbooks, and architecture decisions. This pairs with TypeScript service architecture, because clear boundaries make retrieval more precise.

MCP surface	Agent use	Good output signal
Tool	Queries symbols, tests, and references.	Returns a few candidates with reasons.
Resource	Loads stable conventions and decisions.	States scope and update date.
Prompt	Standardizes triage, review, or migration.	Produces a PR-ready checklist.

Use MCP prompts for workflows the team repeats. A bug triage prompt can ask for reproduction, likely files, and tests. A PR review prompt can ask for scope, risk, and evidence. A migration prompt can ask for dependents, rollback, and data impact.

Avoid the "read everything" tool. It looks useful, but it breaks the contract. If a tool must return many files, it should write an artifact to disk and respond with a synthesis. For long loops, I use RemoteCode to extend Claude Code and Codex with less context waste as my own tool when a task must cross sessions without carrying the whole history in the main prompt.

How do lexical search, symbols, and embeddings work together?

In 2026, GitHub said Copilot code review's agentic architecture retrieves repository context, reasons across changes, and drove an initial 8.1% increase in positive feedback (GitHub, "60 million Copilot code reviews and counting", 2026). The lesson is practical: retrieval must be planned, not incidental.

Lexical search is the first filter. It finds exact names, error messages, routes, environment variables, and commands. Use rg or an equivalent index before embeddings. If the error mentions SessionExpiredError, text search is usually more reliable than a semantic neighborhood.

Symbols add structure. A definitions-and-references index understands functions, classes, exports, routes, and types. That keeps the agent from confusing a similar string with a real contract. In TypeScript, for example, a type-reference search can separate production use, fixture, and test.

Embeddings help when the question is conceptual. "Where do we handle session expiration?" may not use the same words in every file. Semantic search finds candidates. Reranking compares those candidates against symbols and tests. The MCP server then returns a few results, not a pile of files.

A good response has a predictable shape:

{
  "query": "session expiration",
  "results": [
    {
      "file": "src/auth/session.service.ts",
      "excerpt": "function that renews a session before issuing a new token",
      "reason": "defines the central renewal behavior",
      "next_steps": ["run session test", "check revocation"]
    }
  ],
  "limit": "result summarized by relevance; use search_reference to expand"
}

This contract helps subagents as well. A security subagent can query risk. Another can query tests. The main agent receives synthesis, not logs. That split fits the coding-agent harness for reliable pull requests, because each retrieval becomes reviewable evidence.

How do you keep MCP from becoming a security gap?

In 2026, GitHub added secret scanning to the GitHub MCP Server to detect secrets before commit or PR in MCP-compatible IDEs and agents (GitHub Changelog, "Secret scanning in AI coding agents via the GitHub MCP Server", 2026). That is the right framing: MCP should reduce operational risk, not expand permission without control.

Abstract layers show permissions, audit, and blocking points around MCP tools.

Treat each tool as a permission surface. A tool that reads code differs from one that opens a PR. A tool that queries logs differs from one that queries a database. A tool that lists contracts differs from one that runs a migration. The agent should not receive all of them because it "might need them".

Use workspace scope. If Codex uses .codex/config.toml in a trusted project, that codebase MCP server should resolve paths inside the project. If Claude Code runs a stdio server, use the project-directory environment variable to avoid ambiguous paths. The rule is simple: a tool should not cross into another repository by accident.

Audit calls. Record tool, arguments, returned files, and output size. Do not record secrets. If an agent PR fails, that log shows whether the error came from the tool, prompt, stale index, or model decision. Without a log, the team only debates symptoms.

In my practice, that log is most useful when a human can review it quickly. I want to see the query that led to the file, not every token returned by the index. If the agent chose the wrong test, the log shows whether it received weak candidates or ignored a good one.

Limit output as well. Claude Code documentation warns when an MCP tool exceeds 10,000 tokens and uses a default 25,000-token output limit (Claude Code Docs, "Connect Claude Code to tools via MCP", 2026). Even when a tool can return more, good codebase RAG answers short by default.

How do you connect this to Codex and Claude Code?

In 2026, Codex documentation says MCP lives in config.toml and can be scoped to a project with .codex/config.toml in trusted projects (OpenAI Developers, "Model Context Protocol - Codex", 2026). That configuration lets the CLI and IDE extension share the same servers.

In Codex, treat the codebase RAG server as a repository dependency. Its name, command, permissions, and scope should live near the codebase, not in a lost personal setting. That way, an agent working in a monorepo uses different tools than an agent working in a small service.

In Claude Code, MCP connects tools, databases, and APIs to the agent's workflow. The documentation itself recommends verifying trust before connecting servers, because servers that fetch external content can expose prompt injection risk (Claude Code Docs, "Connect Claude Code to tools via MCP", 2026). For codebase RAG, prefer local sources and a controlled index.

A minimal setup can look like this:

[mcp_servers.codebase_context]
command = "node"
args = ["tools/mcp-codebase-context/server.js"]
env = { WORKSPACE_ROOT = "." }

The server should expose only a few tools at first. Do not implement vector storage, dependency graph, and PR analysis on the same day. Publish search_symbol, search_reference, and find_tests. Then measure which questions the agent still asks manually.

This integration also needs to appear in the PR. Ask the agent to record "context consulted" in the body: tool, returned files, and chosen test. That connects MCP to the article on PR evals that keep code agents honest in CI. Without that trace, retrieval improves the session, but not the review.

What is the minimum viable version for this week?

In 2025, Stack Overflow's survey showed that 69% of agent users saw productivity gains, but only 17% saw improved team collaboration (Stack Overflow, "2025 Developer Survey: AI", 2025). The minimum viable version should attack that gap: turn individual gains into shared, auditable context.

First, choose one module with real pain. Authentication, billing, queues, or external integrations work well because they have contracts, tests, and risk. Do not start with the whole monorepo. A small scope reveals the right design without creating an expensive index nobody uses.

Second, create three tools. One searches symbols. Another finds related tests. The third retrieves architecture decisions from short files, such as ADRs or design notes. If no decisions exist, write down the three most important ones before automating.

Third, define a short response. Each result needs file, excerpt, reason, and next step. If the tool does not know, it should say so. A tool that fakes certainty teaches the agent to trust the wrong context.

Fourth, run a small PR with a trace. The agent should use the tools, apply a patch, run a test, and state what it consulted. If the reviewer can understand the context chain in under one minute, the MVP works. If raw logs are still required, the output is too large.

FAQ about codebase RAG over MCP

In 2026, GitHub reported that more than 12,000 organizations run Copilot code review automatically on every PR (GitHub, "60 million Copilot code reviews and counting", 2026). The questions below help turn context retrieval into a platform practice.

Does MCP replace traditional codebase RAG?

No. In 2025, the MCP specification separated resources, prompts, and tools (Model Context Protocol, "Specification 2025-11-25", 2025). RAG is the retrieval technique; MCP is the interface an agent uses to request that retrieval inside a workflow.

Do I need embeddings on day one?

No. In 2026, GitHub tied part of the agentic review improvement to intelligent context retrieval, not to one technique (GitHub, "60 million Copilot code reviews and counting", 2026). Start with lexical search and symbols. Add embeddings when conceptual questions escape them.

Does MCP authorize the agent to do anything?

It should not. In 2026, Anthropic's remote MCP server documentation recommends connecting only trusted servers and reviewing security practices and terms (Claude Platform Docs, "Remote MCP servers", 2026). In codebase RAG, each tool needs scope, limits, and logs.

How do I measure whether the context layer improved?

Measure review rework. In 2025, Stack Overflow saw 69% individual gains with agents, but only 17% collaboration gains (Stack Overflow, "2025 Developer Survey: AI", 2025). The layer improved when PRs arrive with fewer out-of-scope files and clearer evidence.

Closing

In 2026, Anthropic described MCP as a standard for connecting agents to external systems and reducing duplicate integrations (Anthropic, "Code execution with MCP: building more efficient AI agents", 2026). For software development, the most useful application is simple: give the agent a narrow, auditable way to ask about the codebase.

Do not start with big infrastructure. Start with a small interface. Three tools, short responses, logs, and a traced PR already change work quality. Once retrieval is visible, the team can improve the agent, the index, and the system architecture itself.

Sources Consulted

Google, "How are developers using AI? Inside our 2025 DORA report", retrieved 2026-07-02, https://blog.google/innovation-and-ai/technology/developers-tools/dora-report-2025/
GitHub, "60 million Copilot code reviews and counting", retrieved 2026-07-02, https://github.blog/ai-and-ml/github-copilot/60-million-copilot-code-reviews-and-counting/
Model Context Protocol, "Specification 2025-11-25", retrieved 2026-07-02, https://modelcontextprotocol.io/specification/2025-11-25
GitHub Changelog, "Secret scanning in AI coding agents via the GitHub MCP Server", retrieved 2026-07-02, https://github.blog/changelog/2026-03-17-secret-scanning-in-ai-coding-agents-via-the-github-mcp-server/
Claude Code Docs, "Connect Claude Code to tools via MCP", retrieved 2026-07-02, https://code.claude.com/docs/en/mcp
OpenAI Developers, "Model Context Protocol - Codex", retrieved 2026-07-02, https://developers.openai.com/codex/mcp
Stack Overflow, "2025 Developer Survey: AI", retrieved 2026-07-02, https://survey.stackoverflow.co/2025/ai
Claude Platform Docs, "Remote MCP servers", retrieved 2026-07-02, https://platform.claude.com/docs/en/agents-and-tools/remote-mcp-servers
Anthropic, "Code execution with MCP: building more efficient AI agents", retrieved 2026-07-02, https://www.anthropic.com/engineering/code-execution-with-mcp

Codebase RAG over MCP for agents that should not read everything