Cost GovernanceAI / Tech MasteryArchitecture

The Next AI Cost Crisis Isn't Model Pricing. It's Context Waste.

AI isn't expensive because of intelligence. AI becomes expensive when organizations repeatedly pay for the same thinking. Same shift programming already went through. We're just early in the library era.

Jorge M. J. Żak·June 5, 2026·Reading time: 8 min

I've been building with AI for forty months. I just realized I've been doing most of it backwards.

Every project I've worked on for the last three years started the same way. A blank prompt. A fresh ask. "Build me a landing page." "Draft this email." "Generate this article." Each request a clean slate, each conversation a fresh negotiation, each output a one-off.

It worked. Sort of. But the model was relearning my brand every single time. Re-deciding the spacing, re-choosing the colors, re-inventing the voice. And I was paying — in tokens, in time, in attention — for that relearning, over and over.

Then someone I work closely with said something that took weeks to settle in: Stop building. Start composing.

That sentence is, I think, the biggest cost insight in AI right now. And nobody's naming it.

The expensive way most people use AI

Walk into almost any company experimenting with generative AI and you'll see the same pattern. Each team is independently prompting the same model for variations of the same work. Marketing has its own prompt for product descriptions. Sales has its own prompt for outreach. Customer support has its own prompt for ticket replies. Each prompt drifts. Each output costs money. Each new project starts from zero.

Then the CFO sees the API bill.

And here is where almost everyone misdiagnoses the problem. The CFO assumes the bill is a pricingproblem. "The model is too expensive. We need a cheaper one." The procurement team negotiates rates. The CTO swaps providers. The team migrates from one model family to another, saves twenty percent, and six months later the bill is back to where it was.

Because the bill was never really about pricing. The bill was about paying the AI, repeatedly, to relearn the same things. To re-understand the brand voice. To re-derive the layout decisions. To re-invent the structure of a page that already existed in twelve slightly different forms across the organization.

AI isn't expensive because of intelligence. AI becomes expensive when organizations repeatedly pay for the same thinking.

Programming already solved this — forty years ago

If you've been around software long enough, you've seen this exact crisis before. It happened to programming in the late 1970s and early 1980s.

Back then, every program was hand-written from scratch. Every loop, every parser, every UI element. If you wanted a date picker, you wrote a date picker. If a colleague needed a date picker on a different project, they wrote their own. Different code, same problem, paid for twice.

Then came libraries. Then frameworks. Then package managers. Then component design systems. Then microservices. Each shift was the same fundamental move: stop solving from zero, start composing from systems. By the late 1990s no serious engineer would hand-roll a hash map from scratch. By the late 2000s no serious front-end engineer would hand-roll a date picker. The cost of redoing solved work became culturally indefensible.

AI is, right now, where programming was somewhere around 1985. We have an extraordinary new capability and we're still asking it to solve everything from scratch. The cultural prohibition on wasteful re-solving hasn't arrived yet. The libraries haven't solidified yet. Most teams haven't realized they're re-paying for the same work.

But they will. They always do. The library era of AI is starting, and the companies that move first are about to look very different from the ones who don't.

What composing instead of building actually looks like

The shift sounds abstract. It isn't. Here is what it looks like in practice.

Before composition. You ask the model for a Resources page. It thinks. It chooses a layout. It decides on a tone. It picks colors close to your brand but not quite right. It invents a CTA. The output is okay. You revise. You re-prompt. You fix the colors. You burn through tokens and end up with something that vaguely matches your other pages but not really.

After composition. You have a Hero module. A FeatureGrid module. A Newsletter module. A CallToAction module. A ResourceLibrary module. Each module has been designed once, decided once, branded once. When you need a Resources page, you assemble:

HeroSection variant="standard" + FeatureGrid + ResourceLibrary + Newsletter + CallToAction

The model isn't designing anything. It's arranging known pieces. The reasoning load drops to almost nothing. The token cost drops with it. The output is consistent with every other page on the site because it's made of the same parts. The brand doesn't drift. The voice doesn't drift. The bill doesn't balloon.

And — this is the part most people miss — you can ship a new page in twenty minutes instead of three hours. Because there's no design negotiation. No reinvention. Just assembly.

Three places I've watched this work, just this month

I want to give this some grounding. This isn't theoretical. It's the operating principle of three projects I've been building over the last sixty days, and I only recently noticed they're the same shape.

One. The website you're reading this on. Built originally as separate pages — Home, About, Services, Resources, Thinking, Contact, Newsletter — each designed and iterated independently. A few weeks ago we refactored it into modules. Now every page is a composition. Adding a new page costs twenty minutes instead of half a day, and the brand stays coherent without any vigilance from me.

Two. A decisions cockpit I built for editorial work. I've been writing a fantasy saga for four years. The worldbuilding generated forty-eight open canonical decisions — character backstories, geographic mysteries, manuscript edits — scattered across documents, conversations, notebooks. Instead of trying to remember them, I built a cockpit. Each decision is a structured card with pre-populated options, a comment field, and a status. When I make a call, the cockpit holds it. When an AI assistant joins the next session, it reads the cockpit state in one query and knows exactly what was decided. No more re-explaining the canon. Token cost: near zero. Coherence: near total.

Three. The canonical worldbuilding itself.The saga's lore lives in structured documents, one per topic. A continent doc. A race doc. A cosmology doc. When a new piece of canon needs to land, it goes into the relevant doc. The AI doesn't need to relearn a thousand years of fictional history every time we work on it — it reads the relevant module. Same insight, applied to fiction instead of code.

These are three completely different surfaces — a public professional website, an internal editorial tool, a creative worldbuilding corpus. They feel coherent because the same operating principle shows up in all three. Build the system once. Compose forever after.

What this actually does to your bill

The economic case is straightforward. Composition mode shrinks three things at once.

Input tokens drop because you stop sending the model the same brand context, the same voice guide, the same color palette every conversation. The system already encodes those.

Output tokens dropbecause the model isn't generating decisions — it's arranging components. Less creative work to do means less verbose output.

Reasoning tokens drop(if you're on a model that exposes them) because there's less to think through. The architectural choices have already been made.

The teams I've seen do this well land somewhere around a 60 to 90 percent reduction in AI spend on production workloads, while simultaneously increasing the consistency and quality of output. If you want the operational detail — the framework I use with clients, five real case studies, the twelve questions to ask any AI vendor — I wrote it up as a separate playbook. You can grab it free at /free/token-economics.

Why the cost story matters more than the cost

The reason this matters isn't the money. The money is the symptom.

The deeper thing is that we've crossed a threshold most organizations haven't noticed yet. The question isn't anymore can we build this with AI. The question is which of the many things we're building deserves the next hour. That's a fundamentally different problem.

Most people imagine the summit of a technology — any technology — is the moment when everything is finished. In reality the summit is usually the quieter moment when you realize the thing is real. The website exists. The product exists. The book exists. The team exists. Now you stop creating at all costs and start orchestrating with intent.

Context waste is what wasteful creation looks like once the creation crisis is solved. It's the modern version of an engineering team in 1985 still hand-rolling hash maps even though the libraries exist. It's avoidable. It's expensive. And it's about to be the next conversation finance teams have with their AI leads.

If you want to be ahead of that conversation, start there.

If this resonated

I wrote the operational playbook for this.

The Five Commandments framework, five real case studies with before-and-after AI bills, and twelve diagnostic questions to ask any AI vendor or internal AI team. Free. PDF. No drip sequence.

Read the Token Economics Playbook →Work with me on this