Two Philosophies of LLM Context: Recursive Decomposition vs. Selective Retrieval
How different approaches to the “context problem” reveal fundamental trade-offs in AI system design
The context window limitation of large language models has spawned an entire subfield of research. How do you get an LLM to reason about information that doesn’t fit in its working memory? A recent paper on Recursive Language Models (RLMs) proposes an elegant solution: let the model recursively examine and decompose long inputs. Meanwhile, in the trenches of building production AI coding agents, we’ve taken a completely different path with “knowledge packs”—curated, graph-structured knowledge delivered at exactly the right moment.
Both approaches work. Both have trade-offs. And the comparison reveals something interesting about where AI systems are headed.
The Problem, Stated Two Ways
The RLM paper frames the challenge as: How do we process prompts that exceed the model’s context window?
In building multi-agent coding systems, we frame it differently: How do we ensure the model has the right context, not just more context?
These sound similar but lead to fundamentally different solutions.
Recursive Language Models: Treating Context as Environment
The RLM approach is beautifully general. Instead of cramming everything into the prompt, the model treats the long input as an “external environment” that it can programmatically explore. The model breaks down the input into chunks, recursively calls itself on those chunks, and synthesizes results.
The results are impressive: models handle inputs 100x longer than their native context windows, with comparable or lower computational costs. It’s inference-time scaling—no fine-tuning, no architectural changes, just a clever algorithm.
Here’s what makes this approach powerful:
- Generality: Works on any task where decomposition makes sense—document analysis, long-form QA, cross-document reasoning
- Self-contained: No external infrastructure required beyond the model itself
- Scalable: Costs scale with the useful information extracted, not the raw input size
But there’s an implicit assumption: the information needed is in that long input. The model’s job is to find it.
Knowledge Packs: Curated Context for Domain-Specific Agents
Our approach with Maestro’s knowledge pack system inverts this assumption. Instead of giving the model tools to explore a massive context, we ask: what if we could deliver exactly the context it needs, precisely when it needs it?
The system works like this:
- A knowledge graph captures architectural patterns, design decisions, and coding conventions in a structured, queryable format (stored as a DOT graph in
.maestro/knowledge.dot) - When an agent starts a task, the system extracts key terms from the task description
- Full-text search finds relevant nodes in the knowledge graph
- Graph traversal pulls in neighboring nodes (one hop) to provide relational context
- The resulting “knowledge pack” is injected into the agent’s planning prompt
No recursion. No decomposition. Just surgical retrieval of institutional knowledge.
Story: "Add OAuth2 authentication to the API"
↓
Extract terms: ["OAuth2", "authentication", "API"]
↓
FTS5 search → Find matching nodes
↓
Graph traversal → Include neighbors
↓
Knowledge pack delivered:
- rest-api (rule): "All REST APIs follow OpenAPI 3.0"
- security-headers (pattern): "Required security headers"
- jwt-tokens (pattern): "Token format and validation"
- error-handling (pattern): "Wrap errors with context"
The agent doesn’t need to explore a 100,000-line codebase. It receives a curated 20-30 node subgraph containing the patterns and rules that actually matter for this specific task.
Different Problems, Different Solutions
RLMs solve the information discovery problem: the answer is somewhere in this haystack, help me find it.
Knowledge packs solve the institutional memory problem: this organization has learned things over time, help me apply that learning.
Consider what each approach handles well:
| Scenario | RLM Approach | Knowledge Pack Approach |
|---|---|---|
| Analyzing a single 500-page document | Excellent—recursive decomposition shines | Poor fit—no pre-existing knowledge to retrieve |
| Maintaining consistency across 50 code changes | Possible but expensive—would need to re-analyze patterns each time | Excellent—patterns captured once, retrieved on demand |
| Cross-document reasoning | Strong—designed for this | Requires connected graph structure |
| Encoding “how we do things here” | Not designed for this | Core strength |
| Zero-shot on new domains | Works immediately | Requires building the knowledge graph first |
The Build vs. Runtime Trade-off
RLMs front-load nothing. All the work happens at inference time. This is liberating—you can point it at any long input and it will figure things out. But it also means repeating work. Every time you analyze similar documents, you’re doing similar decomposition.
Knowledge packs front-load everything. Building the graph takes effort. Keeping it updated takes discipline. But once built, retrieval is nearly instantaneous (under 100ms in our benchmarks), and the same pattern can be applied to hundreds of tasks without re-analysis.
This mirrors a classic computer science trade-off: precomputation vs. on-demand computation. Neither is universally better.
Why We Chose Selective Retrieval
For AI coding agents, the selective retrieval approach has several advantages:
1. Agents operate in bounded domains
Unlike a general QA system that might be asked about anything, coding agents work within a specific codebase with specific conventions. The universe of relevant context is knowable and capturable.
2. Consistency matters more than discovery
The biggest challenge in multi-agent coding isn’t finding information—it’s ensuring agents don’t contradict each other’s decisions. A shared knowledge graph acts as a “single source of truth” that all agents draw from.
3. Humans need to review and evolve the knowledge
A DOT graph file in the repository is human-readable, Git-friendly, and reviewable in pull requests. When an agent discovers a new pattern, it can be added to the graph and reviewed by architects. RLM’s runtime discoveries are ephemeral.
4. The “big picture” is hard to decompose
Some knowledge doesn’t live in any single document. “We use the repository pattern for data access” isn’t stated anywhere—it’s implicit across dozens of files. Knowledge packs can capture these cross-cutting concerns explicitly.
MCP Tools: A Middle Ground?
We also expose knowledge access through MCP (Model Context Protocol) tools, allowing agents to query the knowledge graph during execution, not just at planning time. This gives agents some of the exploratory capability of RLMs while maintaining the curation benefits of knowledge packs.
The pattern looks like this:
- Planning phase: Receive curated knowledge pack (pushed context)
- Execution phase: Query for additional patterns if needed (pulled context)
- Review phase: Propose updates to the knowledge graph
This hybrid approach lets agents surface gaps in the institutional knowledge while still benefiting from what’s already been captured.
Where RLMs Could Complement Knowledge Packs
The approaches aren’t mutually exclusive. We see potential for RLM-style techniques in:
-
Bootstrap analysis: When onboarding a new project, an RLM approach could analyze the existing codebase to seed the knowledge graph with discovered patterns
-
Long document processing: Specifications, RFCs, and design documents often exceed context limits. RLM decomposition could extract relevant patterns for graph insertion
-
Cross-repository reasoning: When patterns need to be consistent across multiple projects, RLMs could analyze relationships that span repositories
-
Knowledge graph validation: Periodically verify that captured patterns still match the actual codebase through recursive code analysis
The Philosophical Divide
At a deeper level, these approaches embody different beliefs about how AI systems should work:
RLMs say: “Give the model the ability to explore, and it will find what it needs.”
Knowledge packs say: “Capture what matters upfront, and deliver it at the right moment.”
For general-purpose AI, the RLM philosophy is probably more scalable—you can’t pre-curate knowledge for every possible domain. But for production systems with well-defined boundaries and high consistency requirements, the knowledge pack approach provides reliability that emergent discovery cannot match.
Practical Implications
If you’re building AI systems, the choice depends on your constraints:
Consider RLM-style approaches when:
- Working with truly novel documents
- No time/resources to build domain knowledge bases
- Tasks require synthesizing information across many sources
- The “right answer” lives in the input, not in institutional memory
Consider knowledge pack approaches when:
- Operating in a bounded domain with recurring patterns
- Consistency across many tasks matters
- Humans need to review and approve the knowledge
- Speed of retrieval matters (knowledge packs are much faster)
- You’re building persistent systems that evolve over time
Conclusion
The RLM paper represents important progress on a fundamental limitation of LLMs. Being able to process arbitrarily long inputs opens new possibilities for document analysis, research, and reasoning tasks.
But for building production AI agents—especially in software development—we’ve found that the question isn’t “how do we process more context?” It’s “how do we deliver the right context?” Knowledge packs, backed by a curated graph and fast retrieval, answer that question in a way that scales with organizational complexity rather than document length.
The future likely involves both approaches: RLM-style techniques for discovery and exploration, knowledge pack approaches for applying institutional wisdom. The interesting work is in figuring out where each belongs.
This article reflects our experience building Maestro, a multi-agent AI coding system. The knowledge pack implementation uses SQLite FTS5 for search, DOT graph format for human-readable storage, and fire-and-forget persistence for performance.