There is a debate running through the AI developer community right now about multi-agent systems. One side builds “virtual software companies” — AI systems with a CEO, a CTO, a PM, a QA — running what looks like an agile sprint. The other side calls this theatre: fake boundaries, wasted tokens, and a fundamentally misguided attempt to paste human org charts onto a technology that doesn’t need them.
Both sides are partially right. And the part that matters most for anyone evaluating or buying AI systems is being missed entirely.
The role is not the architecture. The reasoning framework is.
When you see an AI system described as having a “Research Agent” and a “Sales Agent,” the interesting question is not what they’re called. It’s what they’re designed to think about, and what they’re not allowed to do.
Anthropic’s Claude Code uses a multi-subagent architecture, with one of those subagents — Explore — deliberately built read-only. It cannot create files, modify code, or delete anything. Not because it was given the “Explorer” job title, but because the engineers recognised that the exploration phase and the execution phase require fundamentally different constraints. Mixing them creates downstream errors. So they separated them — not by name, but by permission boundary and reasoning scope.
Google’s production multi-agent systems for code migration separate planners, orchestrators, coders, and AI-based judges into distinct components, each with defined inputs, outputs, and authority. xAI’s Grok 4.20 ships with four parallel agents by default — a captain that adjudicates, a research agent, a logic agent, and an adversarial reviewer whose explicit job is to disagree — scaling to sixteen for premium deep-research mode.
None of this looks like a virtual org chart. It looks like purpose-built reasoning tools with defined boundaries — which is an entirely different thing.
So what does this mean if you’re not building AI systems yourself?
It means the marketing language around AI agents is often describing the communication layer, not the engineering layer. When a vendor shows you a demo with named AI personas handing off tasks to each other, the right question is: what is each one actually constrained to think about, and what is it prevented from doing?
If the answer is “it’s just a label on the same underlying model,” you’re looking at a demo. If the answer involves different permission scopes, different reasoning prompts, and defined handoff logic — that’s an actual system.
The distinction matters because AI systems that are genuinely well-designed do a specific thing well: they narrow the reasoning scope so the output is more reliable. A generalist AI asked to do everything produces averaged thinking. An AI with a defined reasoning frame — “your job is to find the weaknesses in this plan, not validate it” — produces something categorically more useful.
The marketing shortcut, and why it works
“Virtual team of AI agents” is easier to explain than “orchestrator with specialised tool access and permission constraints.” One sounds like a team. The other sounds like plumbing. So the role language persists, even when what’s underneath is much more interesting.
This is not inherently dishonest. Used well, role framing gives a human operator a mental model for how to work with a system. Used badly, it’s a demo that falls apart under real workload because the reasoning was never actually designed — just named.
The tell is whether the “roles” were designed around task structure and constraint, or around making a slide deck legible.
What to actually look for
When evaluating any AI system — whether you’re buying one or building one — three questions cut through the positioning:
- What is each component prevented from doing, and why?
- How does information flow between components — and what gets preserved versus lost at each handoff?
- Is the reasoning scope genuinely different between components, or is it the same model with different labels?
The third question is the hardest to answer from a demo. It’s also the most important.
This is the lens we apply when we build at V8. Axia isn’t a generalist AI assistant with a friendly persona on top — it’s a managed operating layer where the reasoning components are designed around what each one is constrained to do, with human approval at every commercial decision. That architecture choice is invisible from a homepage. It shows up in whether the system holds up under real workload.
Alan Law is founder of V8 Global and architect of Axia. Leadership Insight posts examine the structural decisions behind AI-native commercial systems.
Ready to take the next step?
Join London's executive AI community — events, practical intelligence, and curated introductions for established business leaders.
How V8 thinks about AI architecture