AI is becoming more human than ever

Ali Ansari

CEO & Founder, micro1

Aarush Gupta

VP Human Data, micro1

This is not about AI.

It is about the people who shape and operate it.

AI can — and will — be as human as you can imagine.

As AI progresses, it becomes more human — not just in what it can do, but in how it learns. The real shift isn’t just bigger models; it’s increasing the fidelity with which human judgment is captured and operationalized over longer horizons.

From the 1950s–1990s, AI was rule-based (GOFAI): hand-written logic with no learning. In the 2000s, statistical machine learning emerged — models learned correlations from labeled data, but not true judgment. Around 2016–2019, systems like AlphaGo showed self-play reinforcement learning, optimizing closed objectives with minimal human input in constrained environments.

The major inflection came in 2018–2022 with internet-scale pretraining. Large language models absorbed vast amounts of human text, internalizing language and fragments of collective reasoning. Then in 2022–2024, reinforcement learning from human feedback (RLHF) made explicit human judgment central — tuning models not just for accuracy, but for helpfulness and alignment.

Now, frontier systems increasingly rely on structured expert data: doctors, lawyers, finance professionals, and other specialists whose domain-specific judgment is distilled directly into model weights. The trajectory is clear: from rules, to patterns, to self-optimization, to absorbing human language, to encoding structured human expertise. AI is becoming less a rule engine and more a compressed aggregate of human cognition.

Paradoxically, even as expert human judgment is distilled into models, we’re still at the very beginning of AI being as human as it will ever be.

Being human-first in this era means three things:

Building models that operate at human (and eventually superhuman) levels across domains while remaining aligned with human preferences
Treating the work of training AI — the structured judgment humans provide — as one of the most important jobs in the economy.
Designing interfaces that allow humans to supervise, and steer AI effectively across every function.

For us to fully utilize the massive compute spend, which is fundamentally a bet on future inference, we must unlock many new model capabilities. Those capabilities will be unlocked through the continued distillation of expert, structured judgment across all job functions.

The reason we are still at the very beginning is that most datasets today are still question-and-answer based. Many of these questions are now highly complex, requiring spreadsheets, memos, or other artifacts as responses. But fundamentally, experts are still creating rubrics that define the “golden answer” to complex questions — they are not replicating truly complex tasks.

Where we go from here is increasing the horizon of work, by turning questions into tasks. A task can be defined as a sequence of decisions the model must make on its own in order to act step by step.

Even if you assume a high per-step accuracy rate — say 90% across an average domain — and a task requires answering 20 sequential questions correctly, the total success rate becomes: 0.9^20 = 12%. That means the task succeeds only 12% of the time. Even at 99% per-step accuracy: 0.99^20 ≈ 82%. While 82% sounds high, for any critical system, where lives, financial outcomes, or legal decisions depend on the result, that would still be a disaster.

So the next step in putting the human-first into AI models is replicating real human tasks at much longer horizons. Not question-and-answer prompts — but full projects.

Projects replicate the real world.

In a legal case, for example, a partner may lead the case with three associates working underneath, two outside counsel firms providing opinions, and a finance team validating the numbers before submission. A single project may require 500 hours of work (not 20 hours for a short task). Within those 500 hours, there may be 10–15 distinct judgments that together produce a truly economically valuable artifact.

It is rare for meaningful artifacts to be created by a single individual. If the partner did the associate work, the finance review, and the outside counsel analysis alone, the outcome would be significantly worse.
What is being modeled in these environments is not just expertise, but coordinated judgment. The hardest human capability is not knowing the answer, it is delegation, review, conflict resolution, operating under uncertainty, and making cross domain trade offs. This is the intelligence within real world projects (and organizations). It must be the next real training target.

The same structure appears in every domain.

Take another example: a project manager finalizing a sprint set up. The model’s “task” may appear to be generating a sprint plan from a backlog. But in reality, a project manager collaborates with engineers on timelines, designers on scope and direction, and QA on end-to-end testing strategy. If one person did all of that alone, the sprint artifact would be quite bad.

The world where expert human judgment is distilled into models as long-horizon, multi-actor projects is not imminent. The point of painting this picture is to show the long horizon dimension of scale.

Over time, the world itself becomes the training environment. And the world is deeply human.

If models train on environments that truly represent how humans coordinate, decide, and produce value together, they will become as human-aligned as we can realistically make them. We are already near the ceiling on expertise. We already have PhDs, professors, and industry experts actively contributing. There is not much higher to climb on pure domain expertise.

The dimension left to scale is horizon and real-world fidelity. And thankfully, that dimension is effectively infinite. Because the horizon only ends when you have modeled the real world itself.

In addition to the horizon of tasks, there is another dimension of scale: the constantly changing human judgment.

The objective keeps moving because the standards by which humans evaluate outputs change as systems are deployed. Reinforcement learning environments are closed loops over reasoning: models generate trajectories, humans score them and those scores become the reward for future behavior. As long as judgment is stable, optimization works. But once real outputs introduce edge cases, what counted as acceptable last month begins to look incomplete or risky. If the scoring does not evolve with that shift, the system continues to optimize for an outdated definition of success. Metrics improve while reality has drifted.

Over time, these small shifts compound. The environment quietly accumulates contradictions. Months later the model is called unpredictable. It is optimizing a moving target that was never tracked.

This is most visible in high-stakes domains, where much of the judgment is tacit. Two experts may approve the same output for different underlying reasons, yet the training signal collapses that disagreement into a single label. Correct answers reached through shallow reasoning receive the same reward as those produced through careful reasoning, so the model learns surface correctness with very little safety margin and performance degrades on rare or adversarial cases. At a large scale, reviewers also begin to evaluate outputs relative to the model’s past behavior, and optimization starts reinforcing its own history rather than human intent.

Progress at that point depends on treating judgment itself as a first-class dataset. The environments that continue to compound preserve disagreement instead of flattening it, version their standards so the model learns which definition applies when, track shifts in risk tolerance as they occur and reward the strength of the reasoning rather than just the final answer. This makes the system slower and harder to benchmark, but it keeps it aligned with current human intent instead of yesterday’s rubric.

Because the objective moves with human judgment, the supply of expert input is not a temporary requirement. It is a permanent part of the constantly changing learning loop.

This makes expert human data not a transitional phase in AI development, but the core mechanism through which intelligence is shaped. If AI systems are meant to increase human abundance, then the work of training that experts carry out must be treated with the seriousness it deserves. The quality, safety, and alignment of AI models are directly tied to the well-being, clarity, and motivation of the experts contributing their judgment.

A physician who distills diagnostic reasoning into a model is not labeling data; they are encoding years of intuition that will assist thousands of other doctors and, in turn, tens of thousands of patients. A lawyer structuring judgment across a complex transaction enables future practitioners to operate with greater precision. An artist aligning generative systems with human standards of beauty shapes millions of downstream interactions. This is cognitive infrastructure.

Putting humans first in this sector means treating experts in proportion to the leverage of their impact. Compensation should reflect the multiplier effect of their expertise. Experts should have clarity on how their work is used and how it improves models. Onboarding must be rigorous & respectful, recognizing that structured judgment is high-skill work, not commoditized labor. Systems should incorporate transparent feedback loops, flexible workflows that preserve autonomy, and ethical guardrails that protect professional standards.

If AI becomes the cognitive layer of the global economy, then the humans shaping it are constructing its ethical and intellectual foundation. As models grow more capable, being human-first does not mean slowing progress — it means elevating the people whose judgment makes progress possible and designing systems where their contribution is respected, amplified, and aligned with long-term human flourishing.

As we build millions of environments where engaged, motivated humans contribute structured judgment that is distilled into models, we will layer AI agents on top of those models. Agents that operate in the real world and perform economically useful tasks. But they cannot (and should not) operate without human supervision.

A human-first approach here means designing agent interfaces where humans are clearly in charge. Humans remain the decision-makers who supervise, calibrate, and ultimately approve outcomes. This does not mean micromanaging every step. It means intentionally designing systems where control is explicit and authority is preserved.

Take recruitment as an example. A strong AI recruiter agent would programmatically post roles to the appropriate platforms, conduct large-scale outreach, interview candidates, generate structured evaluation reports, and advance candidates to the offer stage. But the human recruiter remains central. The human sets up the interview environment, calibrates the first set of candidate profiles, reviews the final reports, and ultimately approves — and sends — the offer to their fellow humans.

The end state of any useful agent should be an interface where the human manages the agent as its operator and decision authority. There should not be a scenario where an agent independently runs an end-to-end hiring process, sourcing through offer, without meaningful human oversight.

This is how humans remain in control, and why economically significant decisions ultimately require human judgment. Importantly, this does not limit AI’s impact. A single recruiter, augmented by an agent, could interview 1,000 candidates per day, source 10,000 if needed, and reach offer-stage decisions with 10x the throughput in a fraction of the time. The role of the human evolves. It becomes more strategic, higher leverage, and more enjoyable — while maintaining control and accountability over the AI system.

As the horizon expands, the world becomes a training environment.
And the world is human.

If we treat judgment as real infrastructure — and build systems where humans set the standards, update the objectives, and remain the final authority — AI will scale without losing alignment.

The future of AI will not be artificial.

It will be human judgment, amplified.

Read the Full Paper