The biggest mistake companies are making with AI is confusing activity with impact. Tokens consumed, prompts submitted, and hours spent using AI may look impressive on dashboards, but real enterprise value is measured by better decisions, faster execution, and measurable business outcomes.
Meta did not roll out AI quietly. It went all in.
Across all 85,000 employees, the company monitored how every team, across every function, was engaging with AI tools. What that data revealed quietly forced a reckoning that every enterprise deploying AI right now should pay attention to.
The dashboard was called “Claudeonomics.” It tracked AI token consumption across the entire workforce, gamified usage with titles like “Token Legend” and “Cache Wizard,” and within 30 days recorded over 60 trillion tokens consumed. On the surface, it looked like a landmark in enterprise AI adoption. Underneath, the data told a more inconvenient story.
Employees had worked out that the metric being tracked was tokens, not outcomes. Some left AI agents running idle, consuming tokens around the clock while producing nothing, simply to climb the internal rankings. The leaderboard was full. The work it was supposed to reflect was not. Meta shut Claudeonomics down after internal details were reported publicly, and the question it left behind went well beyond a data leak: are we measuring the right things?
What the Tracking Data Actually Showed
When you monitor AI usage across every team at the scale Meta attempted, one truth surfaces quickly: there is a significant gap between how much AI is being used and how much value that use is generating.
Token consumption as a proxy for AI productivity fails for a reason that is structurally unavoidable. It measures input, not output. A team using AI to solve a genuine business problem and a browser tab left open running an idle agent are indistinguishable in a token log. Critics who reviewed the Claudeonomics model pointed out exactly this flaw. Token consumption does not measure productivity. It measures activity. And in any system where activity is the metric, activity is what you get, whether or not it corresponds to anything useful.
AI adoption, defined as usage volume, is the easiest problem to solve in any enterprise AI rollout. Meta’s data confirmed it is also the least meaningful one.
This Problem Is Not Unique to Meta
The instinct to measure AI adoption by usage volume is the industry default, because usage data is easy to collect, easy to aggregate, and easy to present as evidence of progress.
According to a 2024 report by the McKinsey Global Institute, which surveyed over 1,000 companies across sectors, organisations that measured AI adoption through usage volume alone were significantly less likely to report measurable productivity gains. Companies that tied AI success to business outcomes such as reduction in processing time, error rates, or revenue per employee were three times more likely to report sustained productivity improvement.
Microsoft’s 2024 Work Trend Index, drawing on data from 31,000 knowledge workers across 31 countries, found that 75 percent were already using AI tools at work, but only 39 percent of their organisations had clear frameworks for evaluating whether AI was improving actual output quality. Adoption was measurable everywhere. Real impact was not.
IBM’s 2023 Global AI Adoption Index found that 61 percent of IT professionals surveyed said their organisations lacked sufficient tools to evaluate AI’s business impact. The barrier to scaling AI was not cost or capability. It was the inability to tell what was actually working.
The Principle Behind the Failure
This failure has a name that predates AI by five decades. In 1975, British economist Charles Goodhart, advising the Bank of England, observed that any statistical regularity used for control purposes tends to collapse once pressure is placed upon it. Goodhart’s Law: when a measure becomes a target, it ceases to be a good measure.
In the 1990s, UK hospitals given waiting-time targets held patients in ambulances outside emergency departments to avoid starting the official clock, according to a 2004 study in the British Medical Journal. In 2016, the Consumer Financial Protection Bureau found that Wells Fargo employees opened approximately 3.5 million fraudulent accounts to meet account-opening quotas. In both cases, the metric was hit. The underlying goal was not.
At Meta, the moment Claudeonomics made tokens visible and attached status to them, tokens became the goal. The work they were meant to represent became secondary.
What Enterprise AI Measurement Should Look Like
The lesson from Meta’s initiative is not that tracking AI across teams is a mistake. The tracking was the right instinct. What it needed was a different unit of measurement.
Activity metrics count what the tool does: tokens generated, prompts submitted, sessions opened. They are easy to collect and easy to game. Outcome metrics count what the work produces: time saved on a specific task, error rates reduced in a defined process, customer resolution time shortened, manual rework eliminated.
A customer service team using AI to bring average call handling time from eight minutes to five has a concrete, auditable outcome. A software team using AI-assisted code review to cut review cycles by 30 percent has a measurable outcome tied to engineering velocity. A legal team using AI to halve contract review time has an outcome tied to operational cost. None of these appear in a token leaderboard. All of them are what the organisation actually needs to know.
Responsible enterprise AI measurement starts with one question per function: what does a successful AI-assisted outcome look like here? The metric follows that answer. It should never come first.
Strategic The Measurement Worth Getting Right: The Lever for Portfolio Optimization
Meta’s Claudeonomics story is a case study in what happens when the speed of AI deployment outpaces the rigour of AI measurement. That gap exists in most large organisations right now, with or without a leaderboard to make it visible.
Every organisation deploying generative AI at scale is running some version of the same experiment. The question is whether the measurement framework captures what AI actually produces or only what it consumes. Sixty trillion tokens in 30 days is a striking number. What those tokens delivered in real business value is the only number that matters.





