Tokenmaxxing

The new vanity metric dressed up as productivity. And why the craftsmen will eat the flexers.

Apr 23, 2026

Hey.

A word has been spreading through Silicon Valley like gossip at a wedding.

Tokenmaxxing.

The premise is simple. Burn as many AI tokens as humanly possible. Not because the job requires it. Because the CEOs of the frontier labs, the buyers writing the checks, and the investors funding the whole thing have started saying, out loud, that token consumption is the new evidence of work.

In early April, the story went public. Meta was running an internal leaderboard called Claudeonomics. It ranked employees by token usage. The top contender reportedly burned through 281 billion tokens in 30 days. They handed out titles like “Token Legend.” The leaderboard came down shortly after it leaked, but the idea did not.

Jensen kicked it off at GTC:

“If that $500,000 engineer did not consume at least $250,000 worth of tokens, I’m going to be deeply alarmed.”

Reid Hoffman gave it a cautious nod at the Semafor summit. Token usage, he said, is a decent dashboard for engagement. Not output. Engagement.

Sequoia partners started tweeting variants of the same idea. Writer and Sendbird leaned in. Parasail, a cloud inference startup, just raised $32M on the thesis that tokenizing creates the next compute giant. The CEO said developers are telling him, “Give me tokens. Just give me tokens. I want them fast. I want them cheaply. I want them now.” His company generates 500 billion tokens a day.

And just like that, a generation of managers picked up a new ruler.

We have seen this movie before

Every technology wave invents a bad metric.

Industrial era, we counted hours at the desk.
Early software, we counted lines of code.
Cloud era, we counted tickets closed.
Social era, we counted posts per week.
AI era, we count tokens burned.

Every single time, the metric measured input, not output. Every single time, a generation of workers learned to game it while the actual value slipped sideways into someone else’s company.

Think about the lines-of-code era for a second. At Microsoft in the nineties, engineers were reviewed partly on KLOC, thousand lines of code shipped per month. So they shipped thousand-line functions that could have been fifty. Bill Gates himself eventually said measuring programming by lines of code was like measuring aircraft construction by weight. Heavier is not better. A heavier plane is a worse plane.

Twenty-five years later, we are doing it again. Swap lines of code for tokens consumed. Same mistake, different unit.

Appian’s CEO nailed it bluntly. He compared tokenmaxxing to the Soviet Union, evaluating chandeliers by their weight. A heavy chandelier means nothing if the room stays dark.

Token count is lines of code in a hoodie.

Same guy, new outfit. The pattern is older than software. You can trace it back to medieval guilds measuring masons by stones laid, not cathedrals built. Different material. Same mistake. Every. Single. Time.

And there is a reason we keep making it. Goodhart’s Law. When a measure becomes a target, it ceases to be a good measure. The minute Meta built the Claudeonomics dashboard, it stopped measuring productivity and started measuring Claudeonomics performance. Which is a completely different thing.

The part that makes it feel true

Let me steelman the other side for a moment, because the idea is not entirely stupid.

If nobody on your team is touching AI, that is a real problem. Adoption matters. Experimentation matters. A company where the junior designer burns more tokens than the senior engineer is a company where someone is going to lose their job in 18 months, and it is not the designer.

Hoffman’s version of the argument is the honest one. Tokens measure that people are in the loop. They are trying things. They are failing cheap. They are not sitting in fear of the new tool. In his words, “you want a wide variety of people using it essentially, collectively, and simultaneously.”

Fine. That is a floor, not a ceiling.

There is also a real dynamic inside big companies that makes tokenmaxxing feel necessary. Change resistance. The senior VP who refuses to touch Claude because “that is what my team is for.” The middle manager who writes a three-paragraph email by hand every morning out of principle. Leadership sees these people, correctly identifies them as a drag, and reaches for the blunt instrument. The dashboard.

A dashboard that says you must burn tokens will, at minimum, flush out the people who refuse to engage. In that narrow sense, it works.

The problem is not tracking token usage. The problem is treating it as the scoreboard. Confusing the flushing-out metric with the winning metric.

Because the moment you put a number on a leaderboard, you change behavior. And the behavior you get is almost never the behavior you wanted.

Midjourney 8.1 and prompt “ Tokenmaxxing”

The Craftsman’s Counter

Here is the framework I want you to hold on to. Two kinds of AI users now. They look identical from the outside. They are not.

The Flexer

Runs one giant prompt and pastes the output
Brags about context window size
Measures the day in tokens
Confuses motion for progress
Proud of the leaderboard rank
Rework ratio: unmeasured, probably 60% plus
Output quality: random

The Craftsman

Runs many small tasks across parallel sub-agents
Thinks in loops, not in prompts
Measures the day in finished things
Throws away 80% of what the model produces
Proud of the three decisions that shipped
Rework ratio: tracked and falling
Output quality: compounding

Both can hit the same token count.

Only one is building anything.

In Homo Laborans, I wrote that loud work performs for the room, hard work performs for the result. Tokenmaxxing is loud work with a new costume. It wants the room to see the number. It has nothing to say about the room the work actually lives in.

The craftsman mindset is not new. It just has new tools. Homo Laborans was about the loop. Sub Agents was about how to structure the loop when the tools changed. Tokenmaxxing is what happens when you keep the new tools but forget the loop.

The part nobody talks about

Here is what is overlooked, and it is the thing that will separate the winners from the losers over the next 24 months.

Token usage is a capital expense, not a productivity metric.

You would not measure a manufacturing company by how much electricity it consumes. You would measure it by units shipped per kilowatt-hour. Efficiency, not draw. In fact, the companies with the highest electricity bills are often the worst-run manufacturers, because they are wasting energy on rework, defects, and idle machines.

AI is electricity for knowledge work. So the question stops being how much did you burn and starts being:

How many decisions per dollar did you ship?

That is a metric. That is a scoreboard. That is something a CFO can defend and a craftsman can compete on.

Salesforce saw this and coined a term for it. Agentic Work Units. HubSpot’s CEO put it cleaner on LinkedIn: “outcome maxxing >> token maxxing.” Appian’s CEO called tokenmaxxing silly to anyone who would listen. Andrew Ng has been warning about vanity metrics in AI for two years. They are all circling the same idea. Every serious operator is quietly rejecting the leaderboard.

Because every serious operator knows the 281 billion token engineer at Meta might have shipped three features. Or zero. Nobody in that reporting chain could tell you which. That is the indictment. Not the token count. The invisibility of the output.

The TechCrunch reveal

The quiet receipt showed up in a TechCrunch investigation last week. Firms tracking 10,000 plus engineers found something uncomfortable.

Yes, AI tools like Claude Code, Cursor, and Codex produce more code than ever.

And engineers are going back to revise that AI-generated code far more often than before.

Read that twice. More output. More rework. The token counter goes up. The ship date does not.

That is the entire tokenmaxxing era in one sentence.

There is a technical name for this pattern in manufacturing. It is called the first-pass yield problem. How much of what comes off the line is good enough to sell without being touched again. Tokenmaxxing is optimizing for volume while silently destroying yield. And nobody is measuring it because the dashboard only shows volume.

And then, this morning, GPT-5.5

Here is the part that made me sit down and write this edition today.

While everyone is arguing about token dashboards, OpenAI just dropped GPT-5.5. Codename Spud. The first trained-from-scratch base model they have released since GPT-4.5. Natively omnimodal. 256K context window. Built to power their upcoming Super App, bundling ChatGPT, Codex, and a dedicated browser into a single desktop product.

Eight months ago, we were celebrating GPT-5 as “a major step towards placing intelligence at the center of every business.” Seven hundred million weekly users on ChatGPT. A unified reasoning system. The launch was framed as an era-defining release.

Now it is a footnote.

5.5 ships. Then 5.6. Then 6. Then whatever Anthropic lands next week. Then whatever Google answers with. Every two to three months, the floor moves. Every quarter, the “most capable model ever built” becomes the “last generation.” The keynote slide ages faster than the conference badge.

So let me ask the question I have been sitting with for a month.

When does this end. Or does it never end.

Model after model after model. Each one supposedly the one that changes everything. Each one obsolete before the onboarding email finishes sending.

This is the question that sits underneath tokenmaxxing and makes the whole thing twice as dangerous.

Because if the model capability is doubling every six months, then the number of tokens required to produce the same outcome is falling every six months. A task that took 10,000 tokens on GPT-5 takes 3,000 on 5.5 and might take 800 on 6. The team that built its entire productivity narrative on “we consumed 281 billion tokens this quarter” just watched their scoreboard become meaningless. Because next quarter, the same work consumes 90 billion. Did they become less productive? Or did the world just get more efficient?

The dashboard cannot tell you. The dashboard was never built to tell you that.

Here is my honest answer to the question. I do not think it ends. Not in any timeframe that matters to the decisions you are making this year. The model-after-model cadence is not a sprint toward a finish line. It is the new steady state. We are not climbing toward a final AI. We are living inside the climb.

Which means two things.

One. Any metric tied to a specific model’s token cost is going to be wrong in six months. Including the Claudeonomics leaderboard. Including your company’s next AI dashboard. Including every ROI deck a consultant is about to charge you for.

Two. The only things that do not become obsolete are decisions shipped, customers served, problems solved, and products that live in the world. These are model-agnostic. The loop does not care whether you ran it on 5.5, 6, or 7. It cares that you ran it.

Model after model after model. The models are the weather. The loop is the climate.

How this reshapes hiring in the next 24 months

Here is the second-order consequence almost nobody is pricing in.

The tokenmaxxing era will produce a talent signal crisis.

Think about what a hiring manager sees on a resume today. Education, companies, titles, maybe a portfolio. What are they going to see in 2027?

“Top 5% token consumer at Meta, 2026.”
“Led the AI transformation dashboard at Shopify.”
“Managed a $250K annual token budget.”

These are going to be resume items. I promise you. And hiring managers are going to read them and feel impressed, the same way a hiring manager in 2005 felt impressed by “five years at IBM” and the same way a hiring manager in 2015 felt impressed by “scaled the Rails monolith.” They will be proxy signals for capability that may or may not map to actual capability.

The best engineers will start hiding their token usage, not advertising it. Watch for this. It will be the quiet signal that separates the real operators from the performers. When the craftsman sees the flexer bragging about billions of tokens, the craftsman will know exactly what they are looking at. A candidate with a leaderboard trophy and a thin portfolio of shipped work.

Small companies will figure this out first. They already are. The Series A founder running LinguaLive or Rick or a dozen other indie products does not care about your token count. They care about what you shipped last Saturday.

Big companies will figure it out in two to three years, after the first wave of tokenmaxxing promotions fails to translate into revenue and the Board starts asking hard questions about the AI transformation ROI. There will be a great reset. Some VPs will lose their jobs. Some dashboards will get deleted. The Craftsman will still be there, quietly shipping.

What I actually do

Full transparency. I run a lot of parallel Claude Code instances. I have written about my sub-agent arsenal. I probably burn more tokens in a week than most mid-size startups do in a quarter.

I am not proud of the number. I am proud of what came out of it.

LinguaLive, shipped. Rick, generating real revenue autonomously. Belkins Home, built on eight years of proprietary data that my competitors cannot shortcut. Folderly AI, running as an autopilot subsystem.

If I had tracked my tokens and nothing else, I would have been a folk hero inside Meta. Instead, I tracked finished things. I built a portfolio.

Tokens are the cost of thinking. Shipped products are the proof of it.

I will give you the exact mental model I use, because this is the useful part.

Every prompt I send is a lottery ticket. I have written before about the instances lottery. AI outputs are samples from a distribution. Some are brilliant. Most are average. Some are garbage. If you want a brilliant output, you buy more tickets, and you develop the judgment to recognize which one won. That is where the tokens go. Not into one giant prompt. Into forty variations of a smaller prompt, run in parallel, triaged ruthlessly.

This looks like tokenmaxxing from the outside. The bill looks the same. Internally, it is the exact opposite discipline. The flexer burns tokens, hoping for a miracle. The craftsman burns tokens buying optionality, then throws away everything that did not clear the bar.

The difference is the throwing away. The market sees the tokens. The market never sees the trash pile. The trash pile is where the quality comes from.

And when GPT-5.5 lands and my token bill drops for the same output, I am not going to feel less productive. I am going to feel sharper. Because the loop is mine. The model is theirs.

Edge for small operators

Here is the part that matters to you, if you are running something.

The tokenmaxing trend is the single biggest gift the market has given small operators in five years.

Because the giants are now measuring the wrong thing.

When Meta’s brightest engineer is graded on token volume, they will burn tokens. When Shopify’s manager has to prove AI cannot do the job before hiring, they will prompt harder, not ship harder. When the NVIDIA keynote says $250K worth of tokens is the floor, every enterprise dashboard will be rewritten to reward the floor.

Meanwhile, the solo founder in a bedroom in Kyiv, or Lagos, or Lisbon, is shipping.

They do not have a dashboard. They do not have a leaderboard. They have a product, a customer, and a very short feedback loop. They are running the same models. They are burning a fraction of the tokens. They are winning deals from incumbents ten thousand times their size.

The big companies are measuring engagement. The small ones are measuring outcomes. Guess which one wins.

This is how incumbent disruption has always worked, by the way. The giant optimizes for the last decade’s metric while the insurgent optimizes for the next one. Kodak measured film rolls while Instagram measured daily shares. Blockbuster measured store visits while Netflix measured next-watch predictions. The metric is the giveaway. It always is.

If you are a small operator right now, this is a generational window. Your competition is structurally distracted. They are building a leaderboard. You are building a product.

Playbook if you are a founder

Kill the token dashboard before someone inside your company builds it. If anyone on your team says “we should track AI usage per person,” ask them one question. “And then what will we do with that information?” If they cannot give you a decision that depends on that number, do not build the dashboard. Information that does not drive a decision is cost, not value.

Replace it with these five metrics. Post them in your Slack. Review them weekly.

Decisions shipped per week. A decision is anything a customer or a teammate can now act on. Not a draft. Not a brainstorm. A decision. Count them. Reward them.
Rework ratio. How much of what AI produced made it to production without a human rewriting it. Lower is better. Track it weekly. This is the single best leading indicator of team health in the AI era.
Time from idea to live. The gap between a Slack message and a thing in the world. Shrink it every sprint. This is what your customers actually feel.
Cost per shipped feature. Include tokens, compute your hours, and your teammate’s hours. Divide by features actually in users’ hands. Not features in staging. Features in production.
Customer response time. If your AI-assisted team cannot answer customers faster than your pre-AI team, something is very wrong with how you are using the tools.

These are boring. They do not make a great leaderboard. They also do not lie. And they survive every model release, because they were never tied to the model in the first place.

Playbook if you work inside a big company

This edition is not only for founders. If you are employee number 38,000 at a mega-corp that just rolled out a token dashboard, here is how you navigate it.

First, hit the floor. Get your token number above the adoption threshold. Do not fight the dashboard directly. That fight is unwinnable, and it will get you labeled as a change-resister. Use the tools. Burn the floor-level tokens.

Second, above the floor, play a different game. Start tracking your own decisions-shipped metric privately. In a Notion doc. In a journal. In a weekly Loom. Every Friday, write down the three most concrete things you made happen that week. Not three things I worked on. Three things that exist in the world now and did not exist on Monday. Do this for six months. You will have a portfolio that is worth ten times the Meta leaderboard rank.

Third, when promotion season comes, bring both. “I am in the top quartile of AI adoption. Here are the 48 decisions I shipped in the last quarter.” The first number satisfies the dashboard. The second number is what actually moves the decision. One without the other loses. Both together win.

Fourth, if your company is so dashboard-captured that shipped outcomes do not matter and only the leaderboard does, update your resume. Quietly. You are working for a company that is going to underperform the market over the next five years, and you do not want to be on that cap table when the correction hits.

The ugly truth

I will say the quiet part out loud.

Tokenmaxxing is going to produce a generation of workers who are extremely good at using AI and extremely bad at shipping anything.

They will have beautiful prompts. They will have parallel agents. They will have context windows the size of novels. And they will have a portfolio of half-finished experiments that their managers rewarded with promotions because the dashboard said so.

You have met these people before. In the late 2000s they had the most impressive PowerPoint decks you ever saw and could not close a sale. In the 2010s they had GitHub profiles that looked like Christmas trees and could not ship a feature. In the 2020s they had Twitter threads with a hundred thousand likes and no product. The archetype does not change. Only the costume.

The companies that survive will be the ones who remember, as they always remember eventually, that the dashboard is not the product. The numbers on the screen are a representation of work. They are not the work itself. When you confuse the representation for the thing, you stop building the thing.

And when GPT-5.5 becomes 6 and 6 becomes 7 and 7 becomes whatever Anthropic ships next month, the token dashboards will keep flashing green while the actual output stays flat. Or drops. Because the metric never measured output in the first place.

The model got faster. The loop did not change.

That is the sentence I want you to keep. When the next model lands, repeat it. When your company rolls out its next AI transformation dashboard, repeat it. When a VP tells you to hit a token quota, repeat it.

The market does not pay for tokens consumed.

The market pays for problems solved.

There is no secret. Only hard work, repeatable hard work. Iteration after iteration.

Post-Credit Scene

And like you already know and get addicted to my post-credit scene, here is a new batch of info to consume.

Book

The Tyranny of Metrics by Jerry Z. Muller (Princeton University Press, 2018). The definitive takedown of measurement gone wrong, written eight years ago, somehow more relevant in 2026. If you read one thing this month to inoculate yourself against every AI dashboard your company is about to roll out, make it this. Read it with a pen.
Goodhart’s Law (essay and concept). Not a book, but everyone building or resisting a dashboard right now should know what Goodhart said. When a measure becomes a target, it ceases to be a good measure. The entire tokenmaxxing era is a live demonstration of this law.

Podcast

Latent Space with Ryan Lopopolo of OpenAI Frontier, April 2026. On harness engineering and multi-agent orchestration. He literally opens with “the age of the token billionaires.” The cleanest rebuttal to tokenmaxxing in audio form, from inside the lab doing the most of it.
Equity podcast, “Tokenmaxxing, OpenAI’s shopping spree, and the AI Anxiety Gap” on TechCrunch. Good counterpoint view, more skeptical than Latent Space, worth the 40 minutes.

Essay

“Tokenmaxxing is making developers less productive than they think” by Tim Fernholz in TechCrunch. Short, sharp, correct. The data section on rework ratio is the page to screenshot for your next board meeting.
“Tokenmaxxing: Big Tech’s Costly Productivity Trap” by Liat Ben-Zur. Strong incentive-design lens on the whole phenomenon. The line that stuck with me: “The moment the metric becomes the goal, the game shifts from building defensible products to performing defensible theater.”

Product

Claude Code Sub-Agents. Still the single biggest lever for shifting from flexer to craftsman. Orchestration over volume, every time. If you are still using Claude as one giant prompt, you are leaving an order of magnitude on the table.

Film

Thanks for reading.

Vlad

Vlad's Newsletter

Discussion about this post

Ready for more?