Building one codebase that holds up with multiple AIs

The biggest difference isn't which AI dev tool you use.

It's how tightly you structure your process, and getting multiple AIs to catch each other's blind spots.

It's been quiet on my end. Mostly because I've had, and still have, a full focus on developing the next application. The most complex one so far. But more on that later.

What I do want to share here is how I build. After a while working with various AI dev tools, one thing has stuck with me most: the biggest difference isn't in the tool itself, but in the discipline around it. That's become the real win for me, and I'll come back to it below.

// DOCS_FIRST

First the experience, then the method

Through earlier proofs of concept, Snuffl.app, an internal sales pipeline application, AI Dev Tools and a few smaller projects like the Kijklijst app, I've gathered a fair amount of experience with different AI dev tools and tech stacks. What struck me from the start: organisation and process need to be tightly arranged to keep an overview and not fly off in all directions. That was already true before the generative AI era, but you arranged it differently.

My approach follows BMAD (Business Model Agile Delivery), with one core principle: documentation before code.

Every change starts with a requirements document and, when there's an architectural choice, an ADR (Architectural Decision Record) that captures what we build and why. Only then comes the code, in that same order: first the docs commit, then the implementation. That way, not just the result is preserved, but also the reasoning behind it. Agile, but with a paper trail that holds up.

On top of that, automated tests are written for every development to safeguard quality. And I work with a combination of Claude (Cowork & Code) and Codex. Alongside the documents for the platform's various developments, there are also more general documents about approach, lessons, architecture and security. The LLMs use those to get up to speed.

// THE_METHOD

My near-standard approach to a new development

1
With Claude Cowork I share my vision for a component to be developed, so Claude can think along, and I gather input where needed to feed it.
2
Claude Cowork reads up, starts research, and draws on its data across different angles via sub-agents.
3
Claude then comes back with extensive feedback that surprisingly often contains genuinely good ideas I hadn't thought of myself, but that fit perfectly.
4
Once the vision is sufficiently thought through and ready for development, I first share it with Claude Code, which takes the lead. Claude Code dives in first and often comes with good input related to the code and to what's been learned from earlier developments, also a document that's maintained.
5
After some back-and-forth and agreement, I give the go-ahead to write the requirements document, and an architecture document too if needed.
6
Once the documentation is ready, I share it with Codex, with a prompt written by Claude Code, for an extensive review. That often surfaces particular issues that Claude Code (and Fable 5 too, by the way) had overlooked.
7
The review goes back to Claude Code, which checks the points against the code, conventions, database and security. Often they're valid. Very often. This is where I feel an enormous gain over working with just one LLM.
8
After the changes, another Codex review goes over it to confirm everything was processed correctly. Sometimes more than once. In some cases I flip it: Codex in the lead, Claude Code as reviewer. And you see the same picture, Claude Code still pulls out things Codex missed.

Only with my approval on the documents may building begin, according to the strict rules that are in place. It looks like a lot of preparatory work, and relatively it is: I spend longer on the documentation and on jointly thinking through behaviour and implementation than the LLMs need to write the code.

// THE_REAL_WIN

AIs that catch each other's blind spots

This is the insight I most want to pass on. Whether I put Claude Code or Codex in the lead, the other consistently pulls out things the first one missed. Not occasionally, but structurally. Having two strong models review each other demonstrably produces sharper code than a single model, however good that one is.

In between, I sometimes have Gemini do a full code review. It didn't build along, but it looks over the shoulder on request. And what stands out in those full codebase reviews: all three (Codex, Claude Code, Gemini) reach the same verdict, but Gemini comes back with the fewest points to fix. Codex and Claude Code are much sharper on that. Claude most of all.

// PRINCIPLES

What it yields

A codebase that's documented from top to bottom, with these principles as the basis for every development (from the lessons document):

Security and scalability first

Data and compute frugality as a priority when working with data

Reuse over rewrite

UX deserves special attention

Follow existing conventions, solve structurally, no quick fixes

Update documentation with every development; AI reviewers are advisors, not authorities

And this is how the verdict from the full review of this platform read:

Core verdict: this is a technically mature, unusually well-engineered platform.
Across all four technical angles, engineering, architecture, security and front-end, the same picture returns: the foundations that genuinely matter for a multi-tenant platform that writes live changes to external systems on behalf of users, such as transactional integrity, idempotency, tenant isolation, audit trail and auth, are demonstrably implemented in code, not just claimed in documentation. The engineering discipline (docs-first BMAD workflow, blocked paths, double review, smoke evidence) sits well above the usual level for a product of this size.

There are always points that crept in despite the care. And that's precisely the advantage of this way of working: because everything is documented and structured, those points are easy and targeted to resolve.

// REVIEW_PROMPT

The prompt I use for a full review

For anyone who wants to try it themselves, this is the prompt I use to request a full codebase review (translated from the Dutch original):

Review all .md files in the project folder. Then do an extensive analysis of what's already there in terms of codebase. Make no edits, I want a report.

Key principles: security, scalability, architecture and overall setup, legal / compliance readiness, optimal UX.

First review the important docs: CLAUDE.md, AGENTS.md, SECURITY.md, PROGRESS.md, tasks/lessons.md, docs/architecture/, docs/ADR/, docs/requirements/, control/manifest.yaml, and the docs linked within them.

Then review all code and logic.

Review it from different angles with specialised agents: Software Engineer, Software Architect, Senior Security Officer, Senior Compliance Officer, UX Expert, Front-end developer.

Turn it into a report for a Product Owner, with appendices containing the details (complete, may be technical). And deliver a summary with conclusion.

// CLOSING

The application isn't live yet

But I hope to share and show more before too long. Stay tuned. Want to spar in the meantime about a docs-first AI development process? We're happy to think along.

// MORE_READING