Vlad's Newsletter
"Not me" | Vlad's Newsletter Podcast
"Not Me" Podcast Episode #10: The End of Context Windows
0:00
-29:57

"Not Me" Podcast Episode #10: The End of Context Windows

MIT Just Made Your AI Brain Infinitely Bigger, and Most People Haven't Noticed Yet

Hey, it’s Vlad.

Everyone’s obsessed with building bigger AI brains.

More parameters. Longer context windows. Better reasoning.

But here’s what MIT just figured out: you don’t need a bigger brain.

You need a smarter one.

Researchers Alex Zhang, Tim Kraska, and Omar Khattab from MIT CSAIL dropped a paper that’s getting way less attention than it deserves. It’s called Recursive Language Models, or RLMs.

And it flips everything we know about AI limitations on its head.

“Not Me” Podcast Episode #10: The End of Context Windows

The Problem Nobody Solved

Let’s start with the ugly truth.

Every AI model you use has a memory problem.

GPT-5? It chokes after 272,000 tokens. Claude? Same ballpark. Even with these “massive” context windows, the models get dumber the more you feed them.

It’s called context rot.

Think about it. You paste a 50-page document and ask a simple question. The model starts hallucinating. Missing obvious facts. Getting confused.

Why?

Because cramming everything into the context window is like forcing someone to read a 10,000-page encyclopedia cover-to-cover before answering your question.

It’s absurd.

We’ve been treating AI like a student with a strict word limit on their exam. No wonder it struggles.


The MIT Breakthrough

Here’s where it gets interesting.

MIT asked a different question: What if the AI didn’t have to read everything at once?

What if it could treat the prompt as an external environment? A workspace. A filing cabinet can be explored strategically.

That’s RLM.

Instead of feeding GPT-5 your entire 10-million-token corpus directly, you store it as a Python variable. The model never sees it in the prompt. Instead, it writes code to peek at specific sections. Grep through for patterns. Chunk it up. And here’s the kicker: it can spawn sub-models to investigate specific parts.

It’s recursion. The model calls itself. Over and over. Each layer handles a smaller, more manageable piece.

Like hiring a research team instead of forcing one person to do everything.

Sound familiar? It’s the same philosophy behind sub-agents I wrote about recently. Stop making one AI do everything. Orchestrate.

Hot dissusion happening in “Not Me” Podcast

The Numbers Don’t Lie

Let’s talk results.

On the OOLONG benchmark, which is designed to torture AI with long context tasks, here’s what happened:

  • Base GPT-5? Crashed and burned. Near zero performance.

  • GPT-5 with RLM? 58% F1 score. From essentially nothing to majority correct.

That’s not an improvement. That’s a resurrection.

On the BrowseComp-Plus benchmark, RLM handled over 10 million tokens. Two orders of magnitude beyond the context window. And it did it for roughly the same cost as running the base model. Sometimes cheaper.

91.33% accuracy on a task where the base model literally couldn’t fit the input.

This isn’t incremental progress. This is a paradigm shift.


Why Programmatic Decomposition Beats Everything

You might ask: Why not just summarize the context? Compress it?

They tried that. It’s called context compaction.

Here’s the problem. Every time you summarize, you lose information. It’s entropy. Irreversible.

Summarization agents on the same benchmarks? 70% at best. Often worse.

RLM doesn’t summarize. It delegates. Big difference.

The model actively decides what to look at. Uses regex filters. Keyword searches. Strategic sampling. It behaves less like a student cramming for an exam and more like a senior researcher with a team of assistants.

And because each sub-call runs with a fresh context window, there’s no pollution. No context rot. Each recursive agent stays sharp.


What Most People Overlook

Here’s the thing that’s flying under the radar.

This approach is model-agnostic.

RLM works with GPT-5. With Qwen. With Claude. Open-source, closed-source, doesn’t matter.

It’s an inference strategy, not an architecture change. You don’t need to retrain anything.

And the cost structure is fascinating. Using GPT-5-mini for recursive calls while GPT-5 handles the final synthesis? Cheaper than running GPT-5 on truncated input. Better results. Lower price.

That’s the arbitrage nobody’s talking about.


The Bitter Lesson, Again

Alex Zhang called this a “bitter-lesson-pilled approach.”

If you don’t know Rich Sutton’s Bitter Lesson, it’s simple: general methods that leverage computation beat specialized hand-engineered solutions. Every time.

RLM fits perfectly.

Instead of designing clever compression schemes or specialized architectures, you give the model tools and let it figure out the strategy.

The model learns to peek first. Scan for relevant sections. Delegate the hard parts. Build up answers iteratively.

No human had to specify these behaviors. They emerge naturally when you give the model the right environment.

That’s the meta-lesson here.

Stop constraining AI. Start enabling it.

“Not Me” Podcast Episode 10

Practical Implications

So what does this mean for you?

If you’re building with AI, pay attention.

Long-horizon agents, the ones that need to process weeks or months of data, suddenly become viable. Legal document analysis? Entire codebase understanding? Research synthesis across hundreds of papers?

All unlocked.

Prime Intellect is already building RLMEnv, a training environment for this paradigm. They’re betting this is the next major breakthrough after reasoning scaling.

My prediction? Within 12 months, every serious AI infrastructure will support RLM-style inference.

The teams building this capability today will be the ones dominating tomorrow.

Vlad's Newsletter is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

The Overlooked Angle

What most coverage misses is the philosophical shift.

We’ve been treating context windows as hard limits. Physical constraints. Like asking “how do we fit more data in this box?”

MIT asked: “What if we don’t put the data in the box at all?”

That reframing is everything.

It’s not about bigger models. It’s about smarter orchestration.

Sound familiar?

It’s the same pattern we’re seeing with sub-agents. With MCP. With agentic workflows.

The future isn’t monolithic AI. It’s distributed intelligence. Each piece specialized. Each piece coordinated.

RLM is just another proof point.


Takeaway

The context window problem everyone complained about? Solved.

Not through brute force. Through elegance.

The AI doesn’t need to see everything at once. It needs the right tools to explore strategically.

That’s RLM.

MIT just gave us the blueprint. Now it’s on us to build with it.

The code is open source. The paper is public. The opportunity is sitting there.

Question is: are you going to use it?


Worth Reading While the Episode Downloads

  • Sub Agents – The frontier of tech just shifted again, and most people haven’t noticed yet.

  • Ideation – Forget validation. Build what no search result can show you.

  • AI Generalist – Playbook on How to Make $300K+ While Everyone Else Fights for Scraps


Post-Credit Scene

A few things worth your time this week:

📄 Read: “Why I Believe Recursive Language Models Are the Future of Long-Context Reasoning” – A developer’s breakdown of the RLM paper that goes beyond summary. Published this month.

🎧 Listen: Lenny’s Podcast: “We replaced our sales team with 20 AI agents” with Jason Lemkin. 1.2 humans managing 20 AI agents doing the work of 10 SDRs and AEs. This is happening now. (January 2026)

🔬 Deep Dive: Prime Intellect’s “Recursive Language Models: the paradigm of 2026” – They’re betting their entire research agenda on this. Worth understanding why.

🛠 Tool: The RLM GitHub repo is live. Supports OpenAI, Anthropic, local models. If you’re technical, start experimenting today.

🎙 Podcast: Practical AI: “2025 Was The Year Of Agents, What’s Coming In 2026?” – Chris and Daniel break down what actually mattered and what’s next. Grounded predictions, not hype.


Thank you for listening and reading. See you in the next edition.

Vlad

Discussion about this episode

User's avatar

Ready for more?