Trusted by six small UK businesses.

AI for business1 April 2026·7 min read

Anthropic Just Published a Blueprint for Building Production AI Agents

Claude Code is not a chatbot wrapper. It is a 55 directory, 331 module agent operating system. Here is what is inside and what it means for anyone building AI tools.

Anthropic recently released the source code for Claude Code, their AI coding agent. Most of the internet treated it as drama. A leak. Something to screenshot and tweet about. But if you actually read the code, what you find is something far more valuable than gossip. It is the most complete, production grade reference architecture for building an AI agent that has ever been made public.

This is not a teardown. It is a blueprint. And every principle inside it is yours to use right now.

What Claude Code Actually Is

Claude Code is not a chat interface bolted onto an API. It is a 55 directory, 331 module autonomous agent operating system. It reads files, writes code, runs terminal commands, manages its own memory, handles errors, recovers from failures, and coordinates sub agents, all without human intervention between steps.

If you are building AI tools for your business, or evaluating agencies who claim to, understanding what is inside this system will tell you more about the state of the art than any marketing page ever could.

The Core Loop: Async Generators

The entire agent runs as a single async generator. Every event, whether that is model output, a tool call, or an error, streams live and instantly. The interface renders character by character, not after thirty seconds of silence. You can abort it, pause it, or nest it inside a sub agent.

This matters because most AI tools you encounter are request response. You send a message, you wait, you get an answer. That model breaks down the moment you need an agent to do real work across multiple steps. Async generators solve this by treating the entire conversation as a continuous stream that can be interrupted, redirected, or composed at any point.

Streaming Tool Execution

Here is a detail that separates a production system from a prototype. Claude Code does not wait for the model to finish generating its full response before acting. The moment a tool call's input parameters arrive mid stream, execution begins immediately.

That saves two to five seconds of hidden latency on every single turn. Over a session with dozens of tool calls, that compounds into minutes of time recovered. Read only tools like file search and content grep run in parallel. Write tools like terminal commands and file edits run serially to prevent race conditions. You get the speed of parallelism and the safety of serial execution simultaneously.

Context Management: Four Tiers, Not One

This is where most AI agent builders get it catastrophically wrong. The standard approach is to truncate old messages when the context window fills up and hope for the best. Claude Code runs four compaction strategies in order of computational cost.

First, micro compaction caches tool results that have not changed. This runs every single turn and costs almost nothing. Second, snip trims old messages while protecting the most recent context. Third, auto compact summarises prior conversation when snipping is not enough. Fourth and finally, context collapse performs staged compression for the very longest sessions.

The cheapest strategy always runs first. The most expensive only fires when nothing else works. This is not just efficient engineering. It is what allows an AI agent to maintain coherent context across sessions that last hours or even days rather than forgetting everything after twenty messages.

Prompt Engineering at Production Scale

The system prompt is over 577 lines. But the length is not the point. The structure is. Everything before a specific boundary in the prompt is cached globally across all users. Everything after that boundary is cached per session or recomputed per turn. The result is that roughly 80% of every API call hits the prompt cache before any new tokens are processed.

On top of that, each tool generates its own description dynamically based on the live environment. The model never receives generic instructions. It only ever gets context specific ones. This is a masterclass in prompt economics. Most teams treat their system prompt as a static block of text and wonder why their API costs are high.

Permission System: A Rule Engine, Not a Toggle

Before any tool executes, it passes through a seven stage pipeline. Input validation, deny rules, allow rules, tool specific checks, hooks, a machine learning classifier, and finally a user prompt. Rules use glob patterns so you can write things like "allow all git commands" or "block all file deletions outside this directory".

Enterprise administrators can enforce blocks at the organisation level. Project maintainers can set rules at the repository level. Individual users can write shell scripts for edge cases the engine cannot handle natively. This is not a permissions toggle. It is a progressive trust system that scales from a solo developer to a large enterprise.

Error Recovery Is the Loop

The retry system alone is 823 lines of hardened code. When rate limited, it checks the Retry-After header before doing anything. Under twenty seconds? Stay in fast mode. Over twenty? Enter a thirty minute cooldown. Three consecutive server errors? Trigger a model fallback automatically. Context overflow? Recalculate the token budget and retry inline without crashing.

For unattended CI/CD sessions, it retries indefinitely with a five minute maximum backoff. A thirty second heartbeat prevents the process from being killed during long idle periods. Error recovery is not a wrapper around the main loop. It is the main loop.

Extensibility Without Touching Source Code

Skills are markdown files that inject prompts and restrict tool access. Hooks are shell scripts that fire on events like before or after a tool runs. MCP servers provide external tools over a standardised protocol with six different transport options. Plugins bundle all three into a single installable package.

None of these require changing a single line of source code. You extend the system by dropping files into the right directory. That is the entire extension model. It is the same principle that made VS Code dominant. Make the core solid and let the ecosystem build everything else.

What This Means for Your Business

If you are evaluating AI tools, building internal automation, or hiring someone to build AI systems for you, this is what production grade looks like. Not a chatbot on a web page. Not a "powered by AI" badge on a landing page. A system that handles context intelligently, recovers from failures gracefully, executes tools in parallel where safe to do so, and extends without fragile customisation.

The gap between demo quality AI and production quality AI is enormous. Most of what is being sold to small and medium businesses right now is demo quality dressed up with good marketing. The architecture inside Claude Code is the benchmark for what real AI tooling looks like, and it is now public knowledge.

At Bright Loop Media, we build AI systems grounded in these same principles. Not because we copied Claude Code, but because production engineering demands the same patterns regardless of who writes the code. If you want AI that actually works in your business rather than a chatbot that impresses for five minutes and then falls over, that is what we build.

Bottom line

Claude Code is not a chatbot wrapper. It is a 55 directory, 331 module agent operating system. Here is what is inside and what it means for anyone building AI tools.

Written by

Chris Ilabaca

Bright Loop Media. Wirral, working UK wide.

Free 45 min call

Free Bright Loop diagnostic

The 12-point Website Diagnostic Checklist.

The same audit we run on every new client’s site before we quote a rebuild. Twelve checks, ten minutes to read, prioritised against the issues that actually cost you money. Drop your email and we’ll send the link straight to your inbox.

One email with the link. No list-renting, no follow-up sequence unless you ask for one.

Ready to ship yours?

Book a free 45 minute call. We will tell you what to build, what to drop, and what it will cost. Honest answers, no pitch.

Not ready for a call? Quick fit check (3 questions, 30 seconds)