Back
January 7, 2026
human data will be a $1 trillion/year market

Ali Ansari
CEO & Founder, micro1
This is not a short-term prediction. It is a structural claim about where the economy converges.
To believe this, you need to accept two assumptions:
- Digital and physical intelligence can eventually automate the tedious parts of the economy
- Self-learning intelligence without human data is impossible at the frontier
automation is the most useful & liberating thing humanity can do
If AI systems can automate functions, then automating all functions is the highest-leverage task for humanity.
Automation compresses time. It allows:
- Aspirations to be fulfilled faster, by orders of magnitude
- Humans to focus on the enjoyable, judgment-heavy parts of work while robots and agents to handle the rest
As humans gain time, they create more. Net-new work is initially creative and high-value. Over time it becomes legible, repeatable, and ready for automation. Once automated, it continues delivering value while freeing humans to focus on new creative work. This loop is permanent.
Automation does not eliminate human work. It pushes humans toward higher-value, more creative work.
At a societal level, automation reshapes the economics of the world. As AI systems take on more production and coordination, the cost of producing goods and services collapses while availability explodes.
At the same time, distribution becomes increasingly optimal. Digitally and physically intelligent systems coordinate supply and demand with less friction, less waste, and less delay, making access faster, cheaper, and more reliable every year
AI models learn from humans forever
Every artificially intelligent system learns from humans in some form:
- Demonstrations
- Supervised fine-tuning
- Preference learning
- Complex rubrics and evaluations
- Continual corrections
Even self-play and synthetic data depend on human grounding — humans define objectives, rewards, and what “good” looks like.
As a result:
- Every function in the economy contains useful learning signal
- Every decision, exception, failure, and tradeoff creates data
But raw activity is not enough. That data must be:
- Recorded
- Structured
- Evaluated
- Packaged into usable pipelines
And importantly, functions must continue running while they are being automated. Automation is iterative, not instantaneous.
this creates a universal obligation and opportunity
To iteratively automate functions, every company, government agency, or institution running real operations must consume and produce structured data related to those functions. In most cases, it will not be optimal for them to create or structure that data themselves, due to scale inefficiencies, high fixed costs, and the operational difficulty of producing high-quality, reusable structured data in-house.
We already see this dynamic today. For example, many lawyers produce more leverage per hour working on standardized, structured legal data through platforms like micro1 than they do performing unstructured work inside individual law firms. At micro1, over 1,000 lawyers work in structured data creation and earn on average ~20% more than in traditional firm roles. Law firms themselves are unlikely to become large-scale producers of structured training data, but they will increasingly be consumers of that data, either directly or by having it embedded in the tools they use.
This creates a powerful incentive structure.
Labs that are automating functions will pay for this data, because long term the value gained from incremental automation far exceeds the cost of acquiring the data.
As a result:
- Entities are incentivized to produce high-quality human data not just to automate themselves, but because that data has external market value
- Every hour of work can simultaneously:
- Run the organization
- Train AI models
- Generate additional revenue for the organization
- Run the organization
Human labor becomes not just labor to produce goods & services, but a revenue-generating asset on its own.
the ultimate convergence: 5%+ of human time is spent on human data
It’s reasonable to think that most functions in the economy will spend some amount of time trying to automate themselves. Not fully, and not all at once, but continuously pushing work out of the human loop as it becomes repeatable and scalable.
Today, even knowledge workers spend the majority of their time on communication and coordination rather than on what we would consider actual productive work. As automation advances, tedious parts of knowledge work are progressively removed, and automation increasingly absorbs coordination, scheduling, routing, and routine communication. The result is a larger share of human time being spent on judgment heavy knowledge work.
Even under conservative assumptions, it is reasonable to expect that in a more automated economy roughly 75% of work time is still spent on communication and coordination, while about 25% is spent doing actual work.
Not all of that work needs to be structured. But a meaningful fraction does. Work that produces decisions, judgments, demonstrations, evaluations, and exceptions becomes far more valuable when captured in a structured, reusable form, both to complete the task and to enable future automation. If only one fifth of that actual work is performed in structured environments, that implies roughly 5% of total human labor time is spent generating structured human data.
With global GDP at roughly $100T, and labor representing about 50% of that, total labor spend is around $50T annually. Five percent of that corresponds to roughly $2.5T per year of human time directed at enabling automation, creating demonstrations, feedback, evaluations, and learning signals for AI systems.
Certainly not all of this will become explicit spend in the human data market. Much of it will remain implicit, fragmented, or unpriced. But even with aggressive discounting, you still arrive at something on the order of $1T per year.
automation reshapes labor, it doesn’t shrink it
This results in automation scaling, as automation scales, some amount of what was spent on human labor is redirected towards:
- Energy
- Compute
- AI labor
However, total human labor spend continues to increase.
Why?
- Automation creates time.
- Time enables creativity.
- Creativity produces net-new functions within the economy.
Those functions are initially done by humans. Over time, they follow the same automation cycle.
human labor gets more expensive because:
- Human time is finite at any moment
- Creativity and judgment are scarce
- Net-new ideas command premium value
As automation expands, humans concentrate more of their time on higher-leverage work. While total human hours do grow over time, that growth cannot be rapidly accelerated in response to demand. The fastest and dominant way the labor market expands is by increasing the value created per human hour.
As this continues:
- Total human labor spend rises
- A larger share of human time is spent generating learning signals and enabling automation
we should never call it annotation again
The importance of this work in shaping AI means calling it “data labeling” or “annotation” is completely inaccurate. These phrases describe mechanical tasks, when the real value comes from human judgment, expertise, and decision-making expressed in structured form.
A more accurate description is expert human data creation or structured human judgment.
This is how human expertise compounds in an automated economy. It explains why human data scales with automation rather than disappearing, and why it becomes a first-class economic input over time.
human brilliance is needed more than ever
This does not require extreme assumptions. It only requires that automation continues to work, and that intelligence continues to learn from humans. If that is true, then human data is not a phase or a temporary bottleneck. It is a structural input to the economy.
Human judgment is captured, structured, and refined.
That judgment becomes the training substrate of intelligence.
That intelligence, in turn, produces more automation.
As functions are automated, human time is freed. That time is spent creating new functions to automate, and the beautiful cycle continues.
.webp)
