The Competitive Moat Is Still Human

Mark Esposito

Chief Economist at micro1

Not as reassurance. As responsibility. And the more I sit with it, the more I think it captures something the mainstream discourse about AI keeps getting wrong.

We tend to ask whether AI will replace humans. It is the wrong question. The more consequential question is whether we are building AI systems that actually reflect the full texture of human judgment, not just its most legible outputs. The answer, for now, is that we are only beginning to try

From Reactive Feedback to Human Infrastructure

The first generation of human involvement in AI training was essentially reactive. Systems would produce outputs, humans would rate them, and that preference signal would flow back into training. It was better than nothing. It was also a fairly thin representation of how humans actually think, decide, and work.

What is emerging now is something more ambitious. The shift is from preference labeling toward replicating the full workflow of an end-to-end professional task. A law firm delivering a redlined contract does not do so through a series of simple judgments. It involves associates, outside counsel, financial experts, and multiple layers of domain knowledge converging into a single deliverable. That is the kind of complexity that makes AI systems genuinely useful, and it is also the kind of complexity that a thumbs-up or thumbs-down cannot capture.

This is what the phrase "human as infrastructure" is pointing toward. Human judgment is not a finishing layer applied after the technical work is done. It is the foundational material the system is built on. That reframe matters, because it changes what we invest in, what we measure, and what we consider a failure.

The Quality Control Problem Nobody Talks About Enough

Injecting human expertise at scale is harder than it sounds, and most of the difficulty is invisible to people who have not tried to do it seriously.

Even in early machine learning, crowd workers mislabeled entire subspecies of cats in a major benchmark dataset because they were drawing on contextual knowledge the model would never have access to. The annotator saw a pixelated image and inferred correctly, because humans are extraordinarily good at inferring from sparse signals. The model had no basis for that inference.

This problem has not gone away. It has become more sophisticated. When we ask domain experts to evaluate multi-step agentic workflows, we are asking them to assess something they may instinctively understand but struggle to articulate in ways that generalize. Two experienced tax accountants given the same agentic workflow may produce legitimately different intermediate outputs, not because one is wrong, but because the problem space allows multiple valid paths.

The implication is that quality control for expert human data cannot be reduced to agreement rates. It requires identifying the intermediate checkpoints where verification is possible, the moments in a complex workflow where we can confirm a step is correct before the system moves on. Verifiable intermediate steps, not end-to-end agreement, are the practical unit of quality in complex agentic training.

Context Is Not Optional

There is a related challenge that tends to get underestimated in enterprise AI deployments: the assumption that domain expertise transfers cleanly across contexts.

It does not. Consider a recruitment AI system designed not to fill one or two positions on a standard timeline, but to identify 150 qualified candidates across highly diverse domains within days. An external evaluator with deep recruitment expertise will default to the norms they know, and those norms are simply wrong for that use case. Their expertise is real. Their context is mismatched.

This is a generalizable problem. Enterprise AI systems fail not because they lack intelligence, but because the human expertise feeding them is evaluated against the wrong frame. Getting this right requires instruction guidelines that are genuinely self-contained, that explain the business context in enough depth that even a genuine expert needs to reorient before they start evaluating. That is painstaking work. It is also indispensable.

The Competitive Moat Is Still Human

Perhaps the most counterintuitive insight here is also the one I find most defensible: the more capable AI systems become, the more strategically important human judgment is, not less. The argument is not sentimental. It is structural. Human values and social expectations change. Laws change. What constitutes acceptable output in a given domain today will not be identical to what is acceptable in two or three years. If you want your AI system to remain aligned with humans, you need to keep distilling updated human judgment into it. That need does not diminish as models improve. It persists precisely because the goalposts are always moving.

This plays out as a competitive reality too. If two companies are building in the same space, and one commits to removing human judgment from the loop entirely while the other finds ways to incorporate domain expertise at key points, the latter will almost certainly win. Not because of sentiment, but because the system will be more reliable, more contextually accurate, and better calibrated to what users actually need.

What This Asks of Us

If AI will be as human as we make it, then making it well is a serious professional responsibility. It means treating AI training not as a temporary data job but as one of the more consequential forms of knowledge work of our era. It means building evaluation systems capable of handling genuine complexity, not just legible agreement. And it means being honest that the quality of what goes in determines, more than anything else, the quality of what comes out.

The technology is capable of capturing extraordinarily complex human behavior. Whether we give it the right material to learn from is entirely up to us.

This piece draws on a conversation hosted by the micro1 Virtual Series, featuring Ali Ansari (Founder and CEO, micro1) and Andrew Maas (VP of AI, micro1 & Professor at Stanford).

Read the Full Paper

Dr. Mark Esposito is a public policy scholar and social scientist affiliated with Harvard’s Berkman Klein Center for Internet and Society and the Center for International Development at Harvard Kennedy School. He leads policy clinics on the governance of technology worldwide. He is a Professor at Hult International Business School and Adjunct Professor at Georgetown University. He has co-founded several AI ventures, including Nexus FrontierTech, the AI Native Foundation, and The Chart ThinkTank, and serves as Chief Economist at micro1, a Silicon Valley–based AI lab. He is a member of the World Economic Forum’s Global AI Alliance, a Senior Advisor at Strategy& (PwC), a professorial fellow of the Mohammed Bin Rashid School of Government, and the co-author of 14 books.