Two Philosophies of LLM Context: Recursive Decomposition vs. Selective Retrieval

January 27, 2026

How different approaches to the “context problem” reveal fundamental trade-offs in AI system design

The context window limitation of large language models has spawned an entire subfield of research. How do you get an LLM to reason about information that doesn’t fit in its working memory? A recent paper on Recursive Language Models (RLMs) proposes an elegant solution: let the model recursively examine and decompose long inputs. Meanwhile, in the trenches of building production AI coding agents, we’ve taken a completely different path with “knowledge packs”—curated, graph-structured knowledge delivered at exactly the right moment.

Both approaches work. Both have trade-offs. And the comparison reveals something interesting about where AI systems are headed.

The Problem, Stated Two Ways

The RLM paper frames the challenge as: How do we process prompts that exceed the model’s context window?

In building multi-agent coding systems, we frame it differently: How do we ensure the model has the right context, not just more context?

These sound similar but lead to fundamentally different solutions.

Recursive Language Models: Treating Context as Environment

The RLM approach is beautifully general. Instead of cramming everything into the prompt, the model treats the long input as an “external environment” that it can programmatically explore. The model breaks down the input into chunks, recursively calls itself on those chunks, and synthesizes results.

The results are impressive: models handle inputs 100x longer than their native context windows, with comparable or lower computational costs. It’s inference-time scaling—no fine-tuning, no architectural changes, just a clever algorithm.

Here’s what makes this approach powerful:

Generality: Works on any task where decomposition makes sense—document analysis, long-form QA, cross-document reasoning
Self-contained: No external infrastructure required beyond the model itself
Scalable: Costs scale with the useful information extracted, not the raw input size

But there’s an implicit assumption: the information needed is in that long input. The model’s job is to find it.

Knowledge Packs: Curated Context for Domain-Specific Agents

Our approach with Maestro’s knowledge pack system inverts this assumption. Instead of giving the model tools to explore a massive context, we ask: what if we could deliver exactly the context it needs, precisely when it needs it?

The system works like this:

A knowledge graph captures architectural patterns, design decisions, and coding conventions in a structured, queryable format (stored as a DOT graph in .maestro/knowledge.dot)
When an agent starts a task, the system extracts key terms from the task description
Full-text search finds relevant nodes in the knowledge graph
Graph traversal pulls in neighboring nodes (one hop) to provide relational context
The resulting “knowledge pack” is injected into the agent’s planning prompt

No recursion. No decomposition. Just surgical retrieval of institutional knowledge.

Story: "Add OAuth2 authentication to the API"
         ↓
    Extract terms: ["OAuth2", "authentication", "API"]
         ↓
    FTS5 search → Find matching nodes
         ↓
    Graph traversal → Include neighbors
         ↓
    Knowledge pack delivered:
      - rest-api (rule): "All REST APIs follow OpenAPI 3.0"
      - security-headers (pattern): "Required security headers"
      - jwt-tokens (pattern): "Token format and validation"
      - error-handling (pattern): "Wrap errors with context"

The agent doesn’t need to explore a 100,000-line codebase. It receives a curated 20-30 node subgraph containing the patterns and rules that actually matter for this specific task.

Different Problems, Different Solutions

RLMs solve the information discovery problem: the answer is somewhere in this haystack, help me find it.

Knowledge packs solve the institutional memory problem: this organization has learned things over time, help me apply that learning.

Consider what each approach handles well:

Scenario	RLM Approach	Knowledge Pack Approach
Analyzing a single 500-page document	Excellent—recursive decomposition shines	Poor fit—no pre-existing knowledge to retrieve
Maintaining consistency across 50 code changes	Possible but expensive—would need to re-analyze patterns each time	Excellent—patterns captured once, retrieved on demand
Cross-document reasoning	Strong—designed for this	Requires connected graph structure
Encoding “how we do things here”	Not designed for this	Core strength
Zero-shot on new domains	Works immediately	Requires building the knowledge graph first

The Build vs. Runtime Trade-off

RLMs front-load nothing. All the work happens at inference time. This is liberating—you can point it at any long input and it will figure things out. But it also means repeating work. Every time you analyze similar documents, you’re doing similar decomposition.

Knowledge packs front-load everything. Building the graph takes effort. Keeping it updated takes discipline. But once built, retrieval is nearly instantaneous (under 100ms in our benchmarks), and the same pattern can be applied to hundreds of tasks without re-analysis.

This mirrors a classic computer science trade-off: precomputation vs. on-demand computation. Neither is universally better.

Why We Chose Selective Retrieval

For AI coding agents, the selective retrieval approach has several advantages:

1. Agents operate in bounded domains

Unlike a general QA system that might be asked about anything, coding agents work within a specific codebase with specific conventions. The universe of relevant context is knowable and capturable.

2. Consistency matters more than discovery

The biggest challenge in multi-agent coding isn’t finding information—it’s ensuring agents don’t contradict each other’s decisions. A shared knowledge graph acts as a “single source of truth” that all agents draw from.

3. Humans need to review and evolve the knowledge

A DOT graph file in the repository is human-readable, Git-friendly, and reviewable in pull requests. When an agent discovers a new pattern, it can be added to the graph and reviewed by architects. RLM’s runtime discoveries are ephemeral.

4. The “big picture” is hard to decompose

Some knowledge doesn’t live in any single document. “We use the repository pattern for data access” isn’t stated anywhere—it’s implicit across dozens of files. Knowledge packs can capture these cross-cutting concerns explicitly.

MCP Tools: A Middle Ground?

We also expose knowledge access through MCP (Model Context Protocol) tools, allowing agents to query the knowledge graph during execution, not just at planning time. This gives agents some of the exploratory capability of RLMs while maintaining the curation benefits of knowledge packs.

The pattern looks like this:

Planning phase: Receive curated knowledge pack (pushed context)
Execution phase: Query for additional patterns if needed (pulled context)
Review phase: Propose updates to the knowledge graph

This hybrid approach lets agents surface gaps in the institutional knowledge while still benefiting from what’s already been captured.

Where RLMs Could Complement Knowledge Packs

The approaches aren’t mutually exclusive. We see potential for RLM-style techniques in:

Bootstrap analysis: When onboarding a new project, an RLM approach could analyze the existing codebase to seed the knowledge graph with discovered patterns
Long document processing: Specifications, RFCs, and design documents often exceed context limits. RLM decomposition could extract relevant patterns for graph insertion
Cross-repository reasoning: When patterns need to be consistent across multiple projects, RLMs could analyze relationships that span repositories
Knowledge graph validation: Periodically verify that captured patterns still match the actual codebase through recursive code analysis

The Philosophical Divide

At a deeper level, these approaches embody different beliefs about how AI systems should work:

RLMs say: “Give the model the ability to explore, and it will find what it needs.”

Knowledge packs say: “Capture what matters upfront, and deliver it at the right moment.”

For general-purpose AI, the RLM philosophy is probably more scalable—you can’t pre-curate knowledge for every possible domain. But for production systems with well-defined boundaries and high consistency requirements, the knowledge pack approach provides reliability that emergent discovery cannot match.

Practical Implications

If you’re building AI systems, the choice depends on your constraints:

Consider RLM-style approaches when:

Working with truly novel documents
No time/resources to build domain knowledge bases
Tasks require synthesizing information across many sources
The “right answer” lives in the input, not in institutional memory

Consider knowledge pack approaches when:

Operating in a bounded domain with recurring patterns
Consistency across many tasks matters
Humans need to review and approve the knowledge
Speed of retrieval matters (knowledge packs are much faster)
You’re building persistent systems that evolve over time

Conclusion

The RLM paper represents important progress on a fundamental limitation of LLMs. Being able to process arbitrarily long inputs opens new possibilities for document analysis, research, and reasoning tasks.

But for building production AI agents—especially in software development—we’ve found that the question isn’t “how do we process more context?” It’s “how do we deliver the right context?” Knowledge packs, backed by a curated graph and fast retrieval, answer that question in a way that scales with organizational complexity rather than document length.

The future likely involves both approaches: RLM-style techniques for discovery and exploration, knowledge pack approaches for applying institutional wisdom. The interesting work is in figuring out where each belongs.

This article reflects our experience building Maestro, a multi-agent AI coding system. The knowledge pack implementation uses SQLite FTS5 for search, DOT graph format for human-readable storage, and fire-and-forget persistence for performance.