Agentic Coding in Practice: What It Actually Means When AI Writes 90% of Its Own Code

The Signal Worth Paying Attention To

Anthropic recently disclosed that roughly 90% of Claude Code's codebase is generated by Claude Code itself. Setting aside the recursive novelty of that fact, it marks something concrete: agentic coding has moved from an interesting experiment to a production workflow used at scale by the people building the model.

That shift deserves a clear-eyed look — not hype, not dismissal.

What Makes It "Agentic"

The difference between AI-assisted and agentic coding isn't cosmetic. Earlier tools like early Copilot iterations completed the line you were already writing. The model was an input method.

Agentic coding inverts this. You hand the model a task description, and it reads the codebase, forms a plan, writes the implementation, runs tests, reads the failure output, and iterates until things pass (or until it decides to ask you something). The control loop runs mostly inside the model, with you as the reviewer — not the typist.

# Example: delegating a non-trivial task to Claude Code
claude "Add JWT refresh token rotation to the auth middleware.
- Access token TTL: 15 minutes
- Refresh token TTL: 7 days
- ORM: Prisma with PostgreSQL
- Integrate with the existing test suite, don't break existing tests"

The model will read the existing middleware, understand the test conventions in use, write the implementation, and run the tests. Your job becomes reviewing the output.

Why It's Not a Push-Button Experience

Anthropic's own engineers describe using LLMs for programming as "difficult and unintuitive" — getting great results requires learning new patterns. A few that matter in practice:

Context quality drives output quality. Vague prompts produce generic code. Prompts that include the framework version, existing architectural patterns, and explicit constraints produce code that fits.

## Context to always include
- Framework: Next.js 16 App Router (not Pages Router)
- Auth: Clerk middleware in proxy.ts
- DB: Neon serverless Postgres via Prisma
- Target runtime: Edge (no Node.js APIs)
 
## Task
Implement a middleware check that redirects unauthenticated users
to /sign-in and users without an active subscription to /pricing.
Integrate into the existing proxy.ts — don't create a new file.

Critical review is still the job. AI-generated code carries the same risks as any unreviewed code: security gaps, performance problems, divergence from established patterns. The review responsibility doesn't go away — it just changes form.

Short loops beat long sessions. Rather than describing a complete feature in one prompt, breaking the task into small, verifiable steps produces better results. Implement → review → correct → repeat. The model's accuracy over long autonomous sessions degrades; frequent checkpoints keep it on track.

Model Landscape as of March 2026

Model	SWE-bench Verified	Notes
Claude Opus 4.6	~80%	Optimized for agentic tasks, autonomous operation
GPT-5.3-Codex	~77%	Leads Terminal-Bench 2.0
Llama 4 Scout	In progress	10M token context, open-source

Open-source is worth watching. Meta's Llama 4 Scout offers a 10 million token context window, and for code generation, document processing, and classification tasks, it's genuinely competitive with closed models. Teams that need to keep inference on-premises have a real option now.

What Shifts for Engineers

Agentic coding doesn't make engineers unnecessary. It changes what the work looks like:

System design becomes more important, not less. When the model handles local implementation, getting the overall architecture right is more consequential. A well-designed system gives the model clear boundaries. A poorly designed one produces coherent-looking code that doesn't fit together.

Reading code becomes as important as writing it. If the model writes the first draft, engineers spend more time in review. Fast, accurate code reading — spotting the subtle issue in 200 lines of AI-generated Typescript — is a skill worth developing.

Prompt structure is a craft. Decomposing a problem into well-scoped tasks, providing the right context, and knowing when to push back on the model's output requires a specific kind of thinking that gets better with practice.

Tests written first keep the model honest. TDD pairs naturally with agentic coding: write the test, hand the failing test to the model as the target, review the implementation. The model has a concrete definition of done to work toward.

Takeaway

The 90% figure from Anthropic is a milestone, not a ceiling. The trajectory points toward models handling a larger share of routine implementation work. The engineers who adapt earliest — building the workflows, calibrating the review habits, and sharpening the skills that AI can't replace — will have a clear advantage in the next 12 to 24 months.