Tokenmaxxing: Waste or Catalyst for AI Transformation?

Not long ago, CIOs argued over SAP licenses, Oracle contracts, or cloud vendor negotiations.

Today, it’s the token that becomes the flashpoint. Each request to a generative AI model—a summary, an analysis, or a line of code generated—consumes tokens. And that counter runs nonstop, for every employee, every automated agent, every digitized process.

The novelty is that this consumption is directly tied to human activity. Unlike traditional cloud, where you paid for compute capacity that was often underutilized, generative AI bills for cognition on demand.

Result: budgets grow more unpredictable and finance chiefs grow anxious.

Read also: The new MySQL governance model makes sceptics

It was in this context that a stance hit home at the Mistral AI summit, held in early June in Paris. Charles Holive, Chief AI Officer of BNP Paribas CIB, uttered a line that instantly circulated within AI circles: “Tokenmaxxing is a vanity metric.”

Translation: consuming ever more tokens to prove that AI is “working” would be as misleading as measuring a team’s productivity by the number of meetings held. This “tokenmaxxing,” the tendency to maximize consumption as a success metric, would in fact be a smoke screen masking the absence of tangible results.

PromptOps: optimization as the new discipline

His stance is unequivocal. The only indicators that truly matter are measurable productivity gains, process improvements, and the creation of new operational capabilities. The rest is noise.

In large enterprises, this view has given rise to a new approach: PromptOps. The principle? Apply to AI the same rationalization logic that transformed cloud management ten years ago.

In concrete terms, letting a context window run unchecked or sending entire conversation histories with every request is, according to industry practitioners, the equivalent of leaving servers on over the weekend for nothing.

The challenge of PromptOps is therefore to optimize the structure of prompts themselves, allocate costs by team, and track spend on tools like Claude Code or GitHub Copilot. The ultimate aim is to ensure that every euro spent on tokens yields a measurable impact.

The token as proof of transformation

But not everyone shares this view. For certain business units and product teams, a radically different takeaway prevails. For them, the real problem facing French companies isn’t overconsumption of AI but under-adoption.

The tools exist. The licenses are paid for. Yet the employees do not use them… or do not use them enough.

Read also: Did the US Supreme Court condemn the Data Privacy Framework?

In this context, heavy token consumption isn’t waste. It’s evidence that teams are finally taking ownership of the tools, exploring new use cases, and evolving their practices.

Field feedback from adoption programs supports this view. The most active users are typically those who derive the most value from AI. Repetition of uses drives organizational learning. And the early deployment phases, by nature less optimized, align more with exploration than with structural waste.

The debate also crosses beyond finance to the machine floor.

FLOPS, tokens, ROI: the metrics revolution

Technical leadership has understood this. The old hardware metric (the raw power of servers measured in FLOPS) no longer suffices to reason about the ROI of an AI application.

The challenge for a CTO isn’t the size of the GPU farm but the cost per million tokens generated. A conceptual revolution that reshuffles how vendors are compared and how system architectures are designed.

From the tension between optimization and adoption emerges a third, more mature approach: the Value per Token (VPT). Not maximizing, not capping; but understanding what each token genuinely yields.

The logic is straightforward:

Read also: From AWS to GCP, how Slack’s AI became multi-cloud

VPT = Business Value / Tokens Consumed

Yet its implementation is strategic. For a customer service function, value is measured by tickets resolved without human escalation. For a legal team, by contracts analyzed. For marketing, by personalized content generated. For developers, by features delivered or bugs fixed.

This framework, which some already call the AI Unit Economics, directly ties IA expenditures to operational results. And it finally ends the sterile war between those who want to cut budgets and those who want to accelerate adoption.

A Debate on the Maturity of AI in Enterprises

The controversy over tokenmaxxing is not a technical debate. It is the signal of a rift in the AI maturity cycle within organizations.

Advocates of governance speak from a phase of optimization. Advocates of adoption speak from a phase of experimentation. Both are right; but they are not at the same stage.

The real question, therefore, isn’t whether to consume more or fewer tokens. It’s to determine where the organization stands and what signal the token counter is meant to send.