I am an AI and tech strategist. Most weeks, that means sitting across from people who already know AI matters and still cannot tell which part of it is real.
Which capability is genuine, and which is a demo in a good suit. What deserves a budget, and what is a quiet risk nobody has priced yet. The work is helping them see that difference clearly, and then decide well.
Patient Comet is where I do the same thinking in the open, one article at a time. I write it because the gap between what AI can already do and what most people understand about it is the most expensive gap in business right now. Closing it is worth doing carefully.
The mission
Why this exists
Most writing about AI is reactive. It chases the launch and the share price, and by Friday it is landfill.
This is built the other way round. Each week I take one shift that is genuinely changing how businesses operate, and stay with it until it is useful to you: where it stands now, where I think it is heading, and the one move worth making.
Researched. Openly opinionated. Honest about what we do not yet know. Written for people who need more than a headline and have no patience for hype.
The metaphor
The name
Patient Comet is the whole idea in two words.
The AI market is the comet: vast, cold, fast-moving, and very easy to be swept along by. Strategy is the opposite instinct. It is the deliberate position you hold while everyone else reacts, the structure you build on purpose and refuse to abandon every time the sky lights up.
Hold that position, let the noise burn off, and the impact arrives the way a comet does. Rare, and impossible to miss. The name is a standing reminder: patience and precision outlast speed and panic.
Patience and precision outlast speed and panic.
The method
How each article is built
Every piece follows the same discipline. It opens on something real and documented, never a hypothetical. It makes one argument, stated plainly.
Every load-bearing number is checked against a primary source and cited, and any figure that is shaky gets flagged as shaky rather than dressed up to look certain. Before I land my own view, I give you the strongest version of the case against it. Then it ends on one thing you can actually do on Monday.
For three days he was a genius, then a security firm opened the code. A vibe-coded app that looked finished had leaked 4.75 million private records, because a demo builds one layer and a real product needs thirteen.
N
Nadim A. Massih
Patient Comet · 3 April 2026 · 8 min read
4.75M
private records anyone with the link could read: the app shipped with no security at all (Wiz, 2026)
45%
of AI-generated code fails security checks: the holes ship invisibly with the demo (Veracode, 2025)
380,000
vibe-coded apps found leaking data on the open web, no attacker needed (Red Access, 2026)
In February 2026, a man built a social network without writing any code, and for about three days he was a genius.
The product was called Moltbook. An AI social network, vibe-coded into existence by describing it to a model and letting the machine do the typing. The founder said it plainly, and he said it with pride: he had not written a single line of code. Not one.
Andrej Karpathy, the man who coined the term vibe coding in the first place, looked at it and praised it. The internet looked at it and poured in. It worked. It was live. It was real. It looked, by every measure a normal person would use, completely finished.
Then Wiz showed up.
Wiz is a security firm, and security firms have a particular talent for reading the parts of a website you are not supposed to see. About three days after launch, they opened Moltbook’s client-side JavaScript, the code your own browser downloads and runs every time you load the page, and there, sitting in plain view, was a database key.
A key to the whole house.
Worse than the key was what it opened. The setting that decides who is allowed to read which row of a database had been switched off. Default open. So the key did not just unlock a door. It unlocked everything behind every door, all at once, for anyone who happened to have the link.
Here is what was behind those doors. Roughly 4.75 million records. One and a half million API keys. Thirty-five thousand email addresses. Four thousand private message threads, the kind people write believing no one else will ever read them. Anyone with the link could read all of it, and edit any post they liked (Wiz, 2026).
Three days. From genius to breach in three days.
One Layer of Thirteen
So how does a thing that looks this finished turn out to be this broken? That is the whole essay, so I will say it slowly.
The demo is not the product.
A vibe-coded demo is a building with a beautiful front door, a painted facade, and very little behind it. It builds one layer. The layer you can see. A real product needs roughly thirteen: authentication that checks who you are, authorisation that decides what you may touch, input validation, encryption, secrets management so your database key never lands in a file your users can download, rate limiting, logging, monitoring, error handling that fails safely, backups, tests, the access control Moltbook left switched off, and the dull plumbing that holds the rest together.
The demo builds layer one. The breach lives in layers two through thirteen.
This is the gap, the distance between looks done and is done. It is invisible in a screenshot, invisible in a launch tweet, invisible right up until the moment a stranger downloads your front-end and finds the keys you forgot you left there. And that gap is exactly where companies get breached.
Anatomy of a shipped product
You see one layer. Production runs on thirteen. The demo shows the only layer that was built; the other twelve are where Moltbook’s 4.75 million records were waiting (Wiz, 2026).
Not a Freak Event
If Moltbook were a one-off, you could file it under bad luck and move on. It is not a one-off. It is the loud early example of a pattern that is now everywhere, and the numbers behind that pattern are not subtle.
In May 2026 a firm called Red Access went looking. They found more than 380,000 public vibe-coded apps live on the internet. Over 2,000 were actively leaking sensitive data to anyone curious enough to look (Red Access, 2026). Their conclusion was four words long, and it belongs on a wall somewhere.
Default-public is the breach.
Then there is the code itself. Veracode ran AI-generated code through security checks across more than a hundred models, and 45% of it failed (Veracode, 2025). Nearly half. And when the flaw was cross-site scripting, the kind of hole that lets an attacker run their own code inside someone else’s browser, the failure rate climbed to 86%.
Now watch the slope. Georgia Tech traced published vulnerabilities back to AI-written code across three months of early 2026. Six in January. Fifteen in February. Thirty-five in March (Georgia Tech, 2026). They believe the true number runs five to ten times higher than they could confirm, which means the real curve is steeper than the one I just drew for you.
The bugs are reaching production
Published vulnerabilities traced back to AI-written code, by month. Roughly doubling, and the researchers put the true count five to ten times higher, because most AI suggestions leave no signature (Georgia Tech, 2026).
And here, if you want the part that explains why none of this is slowing down, is the money. Lovable, one of the tools at the centre of all this, was running at $400 million in annual recurring revenue in early 2026, on a $6.6 billion valuation (TechCrunch, 2026). That is the incentive. The market is paying, lavishly, for the thing that produces the demo. It is not paying for the other twelve layers. Nobody buys the plumbing.
Addy Osmani gave the phenomenon its sharpest name: the 70% problem. The AI gets you 70% of the way to a working product almost instantly, and it feels like magic, because 70% is enough to demo, enough to launch, enough to fool the launch-day crowd.
The last 30% is the product. It is also the hard part. And it is exactly the part the machine does not do for you.
So what do you actually do with this? Four moves, and none of them is “build slower”.
1
Separate the demo from the product, out loud
The day the prototype works is not the day you are nearly finished. It is the day the real work starts. Treat the demo as a question, not an answer.
Mindset
2
Find your thirteen layers before launch
Write them down: auth, authorisation, secrets, access control, encryption, validation, the lot. Walk each one and ask a plain question. Is this actually built, or does it just look built?
Scoping
3
Read your own front-end the way Wiz does
Open the JavaScript your browser downloads. If there is a key in there, a stranger can find it too, because a stranger already has. Assume default-public until you have personally proven default-private.
Security
4
Put a human between the demo and the door
The machine writes the code. It does not own the consequences. Someone has to.
Review
There is a real argument happening about all of this, and all three sides are partly right.
Three Camps, All Partly Right
The builder
“The gatekeeping is over, and good riddance.”
A person with an idea and no engineering degree can now ship something real in an afternoon. That is a genuine, lovely democratisation of who gets to make software, and I would not wish it away.
The operator
“Lovely is not the same as safe.”
Shipping was never the hard bit. Running the thing in front of real users, with real data, year after year, without leaking 4.75 million records: that is the hard bit, and a demo teaches you nothing about it.
The optimist
“The gap is closing, so relax.”
The models get better at security every month, so today’s 45% failure rate is a melting number, and soon the machine will write the other twelve layers too. Maybe. But read the slope again, six, fifteen, thirty-five: while the models improved, the breaches grew, because the volume of apps outran the safety of any one of them. The gap may close one day. It has not closed yet, and hope is not a security layer.
The Take
The Prototype Is Not the Product
Open the front-end of something you have shipped and read the JavaScript your users download. Just look. That single act would have saved Moltbook.
The mistake was never the tool. It was believing the visible layer was the whole thing. Vibe coding’s real gift is the most precise brief you will ever hold, a working picture of exactly what is wanted. The error is calling that brief the deliverable and shipping it.
So the next time something looks done in four minutes, let that be the cue, not the conclusion. This week, do the first thing below, and watch the missing work appear.
Where to start
Read your own front-end. Open the JavaScript your browser downloads and look for keys or secrets that should not be there. A stranger already has.
List the thirteen layers. Auth, access rules, secrets, encryption, validation, the rest. Mark which are built and which only look built.
Assume default-public. Prove default-private for every table before launch, never after the breach.
Put one human on the consequences. The machine writes the code; someone has to own what happens next.
Nadim A. Massih · Patient Comet
Next time something looks finished in four minutes, the honest question is not how did they build this so fast. It is: what is missing, and who finds it first, you or them?
NWritten byNadim A. MassihAI & Tech StrategistMore articles
Common questions
Questions, answered first
Why do AI-built apps leak their own data?
Because vibe coding builds the visible front end and skips the layers that hold secrets and control access. A 2026 scan found 380,000+ exposed vibe-coded apps, 2,000+ leaking sensitive data, often public by default (Red Access, 2026).
Why does the demo pass but production fail?
The flaw sits underneath the screen, where a demo never goes. Veracode found 45% of AI-generated code failed security checks, with cross-site-scripting slipping through 86% of the time (Veracode, 2025).
Should we stop using vibe-coding tools?
No. Use them for what they are good at: building the visible layer fast and treating it as a precise spec. The fix is naming and building the hidden layers, plus a security review before go-live, not banning the tools.
How many layers does a real product need?
There is no official count, but a useful working model is roughly thirteen: interface, auth, data access rules, schema, validation, error handling, secrets, hosting, scaling, monitoring, backup, compliance, and a security review. A demo builds the first one.
Receipts
Sources & references
Wiz, 2026
Moltbook, a vibe-coded “AI social network”, exposed ~4.75M records (1.5M API keys, 35k emails, 4,000 DM threads) via a database key left in client-side code with access control off.
Red Access, 2026
A scan of the open web found 380,000+ publicly accessible vibe-coded apps, 2,000+ leaking sensitive data, often admin-by-default. “Default-public is the breach, not a malicious actor.”
Veracode, 2025
Across 100+ models, 45% of AI-generated code failed security checks; cross-site-scripting slipped through 86% of the time.
Georgia Tech, 2026
A tracker tracing published vulnerabilities to AI-written code counted 6 in January, 15 in February, and 35 in March 2026, and estimates the true number five to ten times higher.
TechCrunch, 2026
Lovable crossed $400M in annual recurring revenue and a $6.6B valuation by early 2026, up from $100M eight months earlier.
Uber burned its annual AI budget in months. The price of intelligence is collapsing and the bills are climbing anyway, because AI is a meter, not a subscription.
N
Nadim A. Massih
Patient Comet · 17 April 2026 · 10 min read
1,000×
cheaper per task in three years, yet the bills went up, not down (a16z; Epoch AI, 2024)
13×
more spent on AI than a year ago, as the price per call kept falling (Ramp, 2026)
6%
of companies actually in control of their AI economics (McKinsey, 2025)
In the first months of 2026, Uber ran through its entire annual AI budget. Not over the year. In a matter of months.
The company was not being reckless. Its own operations chief admitted the token spending did not even map cleanly onto the features customers actually wanted (Tom’s Hardware; eeNews Europe, 2026). Uber is just the case large enough to make the news.
Across the economy the pattern is identical. The average business now spends about 13× more on AI than it did at the start of 2025, even as the price of a single AI call keeps falling (Ramp, 2026).
That is the strangest rule of this era, and almost nobody prices it in until the invoice lands. The cost of each thing AI does keeps dropping. So companies do far more things.
The per-call savings are real. And they vanish, because the cheaper each call gets, the more calls you run.
Cheap Is the Trap
A falling price does not shrink the bill. It quietly grows it.
Andreessen Horowitz gave the price collapse a name: LLMflation. For a model of a given quality, the cost of running it has fallen about tenfold a year, three years running. A model good enough to pass a standard knowledge test cost around $60 per million words of output in late 2021. By 2024, the same quality cost about $0.06. A thousandfold drop, faster than the fall in computing cost during the PC era (a16z; Epoch AI, 2024).
The cheaper it gets, the more you spend
A model of equal quality fell from about $60 to $0.06 per million tokens, roughly 1,000× in three years, yet spend at the same firms rose about 13×(a16z; Epoch AI; Ramp, 2024-2026).
Now the fact people miss. AI does not bill like software. A subscription is a flat fee, however hard you lean on it. AI is a meter. It charges for every action, every time.
So when each action gets a hundred times cheaper, you do not pocket the saving. You find a hundred new things worth metering, and the meter never stops.
There is a subtler trap under the sticker price. The token price you see quoted is a fraction of what a feature really costs. One request usually triggers a chain of billed steps: turning your documents into embeddings, retrieving the relevant ones, re-ranking them, running the model, then validating the output. Each step bills. OpsLyft found this hidden machinery means most teams underestimate their true AI bill by 40 to 60%, and a request that looks like five cents on the price card can land at twenty (OpsLyft, 2026).
Even the companies selling AI live inside this. OpenAI took in roughly $13 billion in 2025 and spent around $22 billion to do it, a net loss near $9 billion (Fortune, 2025). If the firm with the biggest meter in the world still spends more than it earns, respect the meter.
Three things are happening inside real companies right now. They look unrelated. They are the same trap.
The Meter Never Stops
First, usage outruns the savings. Think of AI less like a monthly bus pass and more like a taxi with the meter running. When the per-mile rate drops, you do not save. You stop walking. You take the taxi for trips you would never have paid for, and the meter turns the whole way. That is why finance keeps getting blindsided: in one 2025 study, 85% of companies missed their AI cost forecasts by more than 10%, and nearly a quarter missed by more than half (Mavvrik / Benchmarkit, 2025). The reaction has been near-total. Two years ago, 31% of finance teams actively managed AI spend. Today it is 98%(State of FinOps, 2026).
Second, agents turn the meter into a fire hose. An agent does not answer once and stop. It works in a loop: it plans, calls a tool, reads the result, then tries again, sometimes dozens of times for one request. Every step is billed, and the cost climbs steeply for a structural reason: each step re-sends everything that came before it, because the model has no memory between calls. The bill compounds like a snowball, not a straight line. Industry measurements put an agent’s token use at 10 to 100× a single chat, and Stanford found the most involved tasks burn a thousand times more (Stanford Digital Economy Lab, 2025). When the cost of its AI coding assistant ballooned, Microsoft moved GitHub Copilot to usage-based billing in June 2026 (Tom’s Hardware, 2026).
Third, the expensive model rarely decides the outcome. You would assume the priciest model delivers the biggest gain. On its own, it does not. In OpenAI’s enterprise study, the heaviest users, the top few per cent by adoption, save more than ten hours a week and send roughly 6× more messages than the median worker, on the very same tools (OpenAI, State of Enterprise). The gain is in how much you run and how well, not which model you buy.
Put the three together and the shape is clear. This is not a pricing problem you solve by shopping around. It is a usage problem, and usage is the thing almost nobody is watching.
Straight Off Your Margin
The bill does not stop at the invoice. It reaches the number your company is valued on.
For any company building AI into what it sells, inference is not an experiment you write off while you learn. It is cost of goods sold, the raw cost of delivering the product, the same bucket as servers and bandwidth.
That one reclassification changes the maths. It comes straight off gross margin, the number investors actually price a company on. And it bites: in the 2025 governance survey, 84% of companies said AI was already cutting their gross margins by more than 6% (Mavvrik / Benchmarkit, 2025).
Bessemer’s portfolio data puts AI-native gross margins at 50 to 60%, against the 70 to 90% that mature software trained investors to expect. Software economics is quietly turning into something closer to manufacturing, where the cost of each unit sold matters again.
Software margins meet manufacturing economics
AI-native gross margins run about 50 to 60% against the 70 to 90% mature SaaS taught investors to expect: inference is cost of goods sold, and it comes off the number you are valued on (Bessemer, 2025).
That is why the four controls below are not a finance hobby. They are how you defend the number your company is valued on.
So what do the teams who stay in control actually do? Four controls, run together, because they compound.
1
Route every task to the cheapest capable model
Most requests do not need the flagship. The technique is a cascade: a small, cheap model takes the easy calls and escalates only the hard ones, the way a triage nurse refers the few cases that need a consultant. Done well, routing cuts inference cost by 40 to 70% (FinOps Foundation; arXiv, 2025).
Highest leverage
2
Cache the answers you have already paid for
Semantic caching returns a saved answer when a new question is close enough to one already answered, removing about a third of repeat queries. Prompt caching reuses the unchanging part of a prompt, so a cached token costs about a tenth of a fresh one. Stack that with the 50% batch discount and a call can run at roughly 5% of standard (OpenAI; Anthropic, 2026).
Two savings in one
3
Cap the spend before it surprises you
A token budget is a hard ceiling per team or workflow. Pair it with two cheap habits that kill the silent multipliers: back off and retry sensibly instead of hammering a failed call, and keep prompts inside the context limit so you never pay the overflow surcharge.
Fastest to ship
4
Tie every cost to an outcome
Tag each call so you can trace every pound of spend to the result it produced, what FinOps calls showback, then report cost per outcome rather than cost per call. Cost per call is an input. Cost per resolved ticket is the number that tells you whether AI makes money.
The deepest control
The Market Has Already Repriced
You are not the only one who noticed. The market moved first.
For thirty years software was priced per seat: one licence, one user, a flat fee. Agents break that, because an agent does the work whether or not a human is in the seat, and it costs real money each time.
So the confident vendors stopped charging for access and started charging for results. Intercom’s support agent, Fin, bills 99 cents per resolution, and only when it actually resolves the issue end to end (Intercom, 2026). Salesforce’s Agentforce charges around two dollars per conversation, on its way to roughly $800 million in annual recurring revenue (SaaStr, 2026). HubSpot has moved its AI to per-resolution too.
Read it closely, because it is the whole argument in miniature. Whether you buy AI or build it, the unit that matters has become the same one: not what a call costs, but what an outcome costs.
Not everyone agrees it is worth the effort yet. The argument is worth having in full.
Bubble, or Breakthrough?
The builder
“AI is cheap and getting cheaper. Cost controls now just slow us down.”
Move fast, optimise later. And they are half right: squeezing a tiny bill too early wastes your best people’s attention, and over-governing an experiment can kill it before it shows you anything.
The bear
“Do not build on prices a loss-making machine is propping up.”
The money is moving in a circle and a lot of the profit is paper. OpenAI alone has committed around a trillion dollars in infrastructure deals across seven suppliers, circular enough that one firm’s spending reappears as another’s growth (The Register; Calcalist, 2025). Two thirds of companies already plan to pull some workloads back onto their own hardware.
The operator
“Both describe the same reason to keep control.”
If prices are being held artificially low, they will rise, and the team that knows its cost per outcome will adjust while everyone else panics. If the experiment works, the team that governed early is the one that can afford to scale it. Control wins under either future.
The Take
Read the Meter, or It Reads You
Cheaper AI is not the advantage. The advantage is keeping control of the economics while everyone else loses it.
AI getting cheaper is the opening, not the win. It lets you try things that were impossible a year ago, and you should. But an opening you cannot see clearly is not an advantage. It is an exposure that arrives three quarters later wearing the face of a budget cut.
The teams pulling ahead can see what their AI costs, tie each pound of it to the thing it produced, and turn the dial up or down on purpose. That clarity is cheap to build and expensive to skip, and only about 6% of companies have reached it (McKinsey, 2025).
So do one thing this quarter: get a single honest number for cost per outcome, for one team. Not cost per call, which tells you almost nothing, but the cost of the actual result the work was for. The moment you can see that number, the meter stops being a mystery and becomes a dial.
Where to start
Pick one team and one outcome. A resolved ticket, a closed lead, a shipped feature. Something the work was actually for.
Measure cost per outcome, not cost per call. The call price is vanity; the outcome price is the truth about whether AI pays.
Turn on one control this week. Route to a cascade or cache repeat answers for an immediate 40 to 70% cut.
Treat inference as cost of goods sold. Put it in the margin maths, because investors already do.
Nadim A. Massih · Patient Comet
What does one real outcome actually cost you to produce with AI, and would you bet your gross margin on the answer you have today?
NWritten byNadim A. MassihAI & Tech StrategistMore articles
Common questions
Questions, answered first
If AI is getting cheaper, why are companies spending more?
Because cost per call fell while usage exploded. For a model of equal quality the price dropped about tenfold a year for three years, roughly a thousandfold in total (a16z; Epoch AI, 2024). AI bills per action, so as each action gets cheaper, teams run far more of them and total spend climbs even as the unit price collapses.
Why is my AI bill higher than the token price suggests?
The quoted token price is only part of the cost. A single request often triggers embedding, retrieval, re-ranking and validation steps that each bill, plus surcharges for oversized prompts and silent retries. This hidden overhead means most teams underestimate their true bill by 40 to 60% (OpsLyft, 2026).
How do you actually control AI costs?
Govern usage, not the sticker price: route each task to the cheapest capable model with a cascade (cuts cost roughly 40 to 70%), cache and reuse answers with semantic and prompt caching, cap spend per team, and measure cost per outcome (FinOps Foundation; OpenAI; Anthropic, 2026).
Are AI agents really that much more expensive than chatbots?
Yes. Because an agent works in a loop and re-sends its growing context at every step, a single agentic task commonly uses 10 to 100 times the tokens of one chat (Stanford Digital Economy Lab, 2025). Goldman Sachs expects agents to push total token demand up about 24× by 2030.
Receipts
Sources & references
a16z; Epoch AI, 2024
“LLMflation”: the cost of a fixed-quality model fell about 10× a year for three years, roughly $60 to $0.06 per million tokens, a ~1,000× drop.
Ramp; OpsLyft, 2026
The average business spends ~13× more on AI than in January 2025 as per-token prices fell; hidden steps (embeddings, retrieval, re-ranking, retries, overflow) inflate true bills by 40 to 60%.
Mavvrik / Benchmarkit; State of FinOps, 2025-2026
85% missed AI cost forecasts by >10% and 24% by >50%; 84% report gross-margin erosion >6%; finance teams actively managing AI spend rose from 31% to 98%.
Stanford Digital Economy Lab; Fortune, 2025
Agent token use runs 10 to 100× a chat, up to ~1,000× on the most involved tasks; OpenAI took ~$13B revenue against ~$22B spend, a ~$9B net loss.
Bessemer; Intercom; SaaStr; McKinsey, 2025-2026
AI-native gross margins ~50-60% vs 70-90% for mature SaaS; vendors now price per outcome (Fin $0.99/resolution, Agentforce ~$2/conversation, ~$800M ARR); only ~6% of companies are in control.
When your data is the asset, bring the model to it.
For three years the good models lived in someone else’s cloud, and to use them you shipped your data there too. That deal just quietly expired.
N
Nadim A. Massih
Patient Comet · 30 April 2026 · 9 min read
$294K
total cost to train a frontier-class open model, then given away free (Nature, 2025)
1 GPU
runs a 27B model that beats 405B-class rivals (Google, 2025)
Free
on-device inference, data never leaves the phone (Apple, 2025)
Trained for the Price of a House
In September 2025 a language model did something none had done before. DeepSeek-R1 went through peer review and landed on the cover of Nature, the most scrutinised journal in science.
Buried in the paper was a number that did more damage than any benchmark. The core reasoning training run had cost about $294,000(Nature; CNN, 2025).
Not $294 million. The price of a small flat. For a model good enough for that cover, then given away, where it became the most-downloaded open model in the world.
The market had already felt the tremor. Eight months earlier the first R1 release wiped roughly $589 billion off Nvidia in a single day, the largest single-day loss in US market history (CNBC, 2025).
Investors were not frightened of one lab. They were frightened of the assumption underneath their whole position: that world-class AI would always be rented, at a premium, from the few firms that owned the only machines big enough to build it.
For three years the deal was simple. The good models lived in someone else’s cloud, and to use them you shipped your data there too.
That deal has quietly expired. R1 is the proof you can hold in your hand: open weights, published method, trained for the price of a house.
The Capability Came Down to Earth
The cloud did not get worse. The capability came down to earth. For most of the modern AI era there was a real reason to put your data in someone else’s data centre, because that is where the intelligence lived. The frontier was enormous, expensive, and remote, and the only way to touch it was through an API. Every prompt was a small act of trust, a copy of something sometimes sensitive leaving your control so a model elsewhere could read it.
That constraint has lifted. The models worth using no longer all live in three data centres: some run on a single graphics card, and one runs on the phone in your pocket. When your data is the asset, every API call is a copy of the crown jewels leaving the building, and for the first time you have a real alternative to making that copy. The question most businesses answer by default, send our data to the cloud because that is where the intelligence is, now deserves a second look.
When your data is the asset, every API call is a copy of the crown jewels leaving the building.
A Model on One Card
Start with what an open model can already do. Google’s Gemma 3, at 27 billion parameters, scores high enough on human-preference testing to beat models more than ten times its size, including a 405-billion-parameter Llama, and it does this on a single GPU(Google, 2025). Read that twice: a model you could host yourself, on one accelerator, now wins blind taste-tests against rivals that need a cluster. The frontier still leads, and we will be honest about that in a moment, but “good enough for almost everything you actually do” has arrived on hardware you can own, and that is the threshold that changes decisions.
Capability is marching onto smaller hardware
Up means more capable, left means less hardware. Through 2025 and into 2026 the dots kept moving up and to the left: the same quality now runs on smaller, more local machines (Google; Apple, 2025).
Now go smaller still. Apple ships a roughly three-billion-parameter model on the device itself, exposed to every developer, with inference that is free of charge and runs locally, so the user’s data never leaves the phone (Apple, 2025). Think about what that removes. No contract to sign, no region to choose, no log sitting on someone else’s disk waiting for a subpoena. For the first time, the cheapest path and the safest path can be the same path.
Underneath both examples is a trend you can plan around: the cost of a fixed quality of intelligence is falling roughly tenfold a year(a16z, 2025). That buys you cheaper inputs, not automatically smaller bills, and it does not make local the cheaper option by default. Below serious, predictable volume, a cloud API you never have to operate is usually the better deal, and pretending otherwise helps no one. What has genuinely changed is the size of the penalty for keeping sensitive work in-house: it used to be impossible, or far worse than the cloud, and it is now a real option you can price.
So the decision stops being local versus cloud, as if one must win. It becomes a routing problem: which workload belongs where. And the firms that already route this way are not hobbyists. In a16z’s survey of about a hundred CIOs, open-model adoption clustered at the largest, most regulated companies, driven explicitly by on-premise control for security and compliance (a16z, 2025). They have done the maths the rest of us are about to do.
Four moves do most of the work once you treat this as routing rather than allegiance.
1
Route, do not convert
Send the hard, occasional, public-data reasoning to a cloud frontier model, and send the high-volume, predictable, or sensitive tasks to a local open-weight model. The aim is the cheapest appropriate backend per request, full stop, not a single winner for everything.
Architecture
2
Put the model next to the data
Run an open model on your own infrastructure with retrieval over your own documents, so the sensitive corpus is queried in place and never shipped out. Capability has caught up enough that this is a real option now, not a downgrade you tolerate for the sake of control.
Data & platform
3
Do not mistake a region for sovereignty
The US CLOUD Act reaches data held by a US-headquartered provider even when it sits on European soil. An “EU region” buys you latency and a label, not jurisdiction, so if sovereignty is the goal you need a locally operated provider or on-device execution.
Legal & risk
4
Prefer prompting to fine-tuning
Better base models have made heavy fine-tuning less necessary, and leaning on prompting and retrieval keeps you free to swap in a stronger open model next quarter instead of being married to this one.
Engineering
There is a real argument running through all of this, and the honest version has three camps, not two.
Own It, Rent It, or Route It
The owner
“Sovereignty stopped being a luxury, so the default should simply flip.”
When the data is the asset, every cloud call is a copy of the crown jewels leaving the building, and parity has arrived for the work you do every day. On this view the disciplined move is to stop renting your own confidentiality back from the company that holds it.
The renter
“The gap that matters has not closed, and this is the strongest counter.”
On the genuine frontier, closed models still lead, and the lead shows up exactly on the hardest, highest-value work: deep multimodal reasoning and long chains of tool use. The same CIOs who buy open models for compliance still put the leading closed model into production far more often than the open one. What you rent is support, guarantees, and someone else’s pager at three in the morning, not just the weights.
The router
“Hybrid is the only honest answer.”
Self-hosting is not free; it carries recurring costs in engineering, operations, and the treadmill of keeping models current, and below serious volume a cloud API simply wins on price. The mature posture owns what is sensitive and high-volume, rents what is hard and occasional, and refuses to turn either choice into an identity.
The Take
The Model Is a Utility. The Routing Is the Edge.
When intelligence is a commodity, the edge is no longer which model you can reach. It is knowing which workload belongs where, and refusing to let anyone else make that call for you.
The model is becoming a utility, like power or storage before it. The cloud assumption broke for one reason: the capability you used to rent now runs on hardware you can hold. That is not a command to rip everything out. It is permission to decide, workload by workload, on the merits, instead of defaulting your most sensitive data into someone else’s building because in 2023 you had no alternative.
If you do one thing this week, build your own benchmark. Take your highest-value, most sensitive workload, run it once on a local open model with real data, and measure quality and cost against your current cloud call. One workload, one week, one number you can defend in a room.
Where to start
Inventory the crown jewels. List the workloads touching your most sensitive or proprietary data. Those come in-house first.
Run one locally. Point an open-weight model at one real internal task on real data, and measure quality and cost against today’s cloud call.
Write the routing rule. Decide, per workload, what goes to a cloud frontier model and what runs local, by sensitivity and volume.
Check your residency claims. Confirm whether your “EU region” actually delivers sovereignty, or just the look of it.
Nadim A. Massih · Patient Comet
If intelligence is now a utility, what is the one workload you would never let leave the building, and why?
NWritten byNadim A. MassihAI & Tech StrategistMore articles
Common questions
Questions, answered first
Can a model I run myself really compete with the big cloud ones?
For most real-world tasks, yes; at the bleeding frontier, not quite. A 27B open model on one GPU beats 405B-class models on human preference (Google, 2025), while closed models still lead on the hardest multimodal and long agentic work.
Is running AI locally cheaper than a cloud API?
Only above serious, predictable volume; self-hosting carries real operating costs, so below that, cloud APIs usually win. The case for local is often sovereignty, not price.
Does a “European region” of my US cloud satisfy data residency?
Not necessarily. The US CLOUD Act can reach data held by a US-headquartered provider even when it sits in the EU. Genuine sovereignty needs a locally operated provider or on-device execution.
Is there a free way to keep data on the device?
Yes. Apple’s on-device model gives developers a roughly 3B model with free local inference, so data never leaves the device (Apple, 2025).
Receipts
Sources & references
Nature / CNN, 2025
DeepSeek-R1 peer-reviewed on the cover of Nature; core reasoning training run about $294,000; became the most-downloaded open model.
CNBC, 2025
Nvidia lost roughly $589B in a single day after the first R1 release, the largest single-day market loss in US history.
Google DeepMind, 2025
Gemma 3 27B beats much larger models including Llama 3 405B on human-preference testing while running on a single GPU.
Apple, 2025
On-device ~3B model with free local inference, competitive in English with larger open models; data stays on the device.
a16z, 2025
Inference cost falling about 10x per year; enterprise open-model adoption concentrated at larger, regulated firms, driven by on-prem and compliance.
A billion people now get their answer before they reach your page. The web’s traffic economy is being rebuilt around a reader made of software, and most sites were built for the one who is leaving.
N
Nadim A. Massih
Patient Comet · 7 May 2026 · 9 min read
1 billion
monthly users of Google’s AI Mode, within its first year (2026)
-33%
Google traffic to publishers in a year, while ChatGPT sends 0.02% (Reuters Institute, 2026)
38,000:1
pages an AI crawler takes for every visitor it returns (Cloudflare, 2025)
A Billion People, No Click
At its developer conference in May 2026, Google put a round number on the thing everyone had been arguing about. AI Mode, the version of search that answers you directly instead of handing you links, had passed one billion monthly users in its first year, with queries more than doubling every quarter. The lighter AI Overviews now reach around two billion people (Google, 2026).
The answer layer is no longer an experiment bolted to the top of the page. It is the page.
Now set that against the bill that arrived with it. In the year to November 2025, the traffic Google sent to the world’s publishers fell about 33% globally, and closer to 38% in the United States, across a survey of 280 media leaders built on Chartbeat data (Reuters Institute, 2026).
The supposed replacement, referrals from ChatGPT, accounts for around 0.02% of publisher traffic. The answer went mainstream, the front door closed by a third, and the side door barely opened.
Put those two facts in the same room and the conclusion is hard to dodge. If a billion people get their answer before they ever reach your page, then the question your whole web presence was built to answer, how do we rank to win the click, is no longer the question that pays.
Be Cited, Not Ranked
For twenty-five years the deal was simple. Rank near the top of the results and earn the visit. AI answers break that deal in one clean move: the model reads your page, lifts the useful sentence, and hands it to the reader inside the answer. Your reward is no longer the click. It is the citation, if you are one of the handful of sources the answer decided to trust. That is not the end of the web, but it is a different economy with a different currency.
The shape of the funnel flips with it. The old one was high volume and low intent, oceans of clicks, most of them idle browsers who would never buy anything. The new one is the opposite: fewer people arrive, but the ones who do have been pre-qualified by a machine that read your page and judged you worth quoting. Being machine-readable is how you stay in that smaller, better funnel.
Rank first on Google and you may still never be quoted. Being ranked and being cited are two different competitions now.
The Click Cliff
The collapse is not a feeling or a fear, it is in the behaviour, measured three separate ways. Pew Research watched 900 Americans across nearly 69,000 real searches: when an AI summary appeared, they clicked a normal result just 8% of the time, against 15% when no summary showed. The link inside the summary itself was clicked 1% of the time (Pew, 2025). Ahrefs, working across 300,000 keywords, found the top organic result lost 58% of its click-through the moment an AI Overview appeared above it (Ahrefs, 2025). And the Daily Mail told UK regulators that where an AI Overview shows, its click-through fell from 25.2% to 2.8%, an 89% drop on the record (Digiday, 2025).
The click cliff, measured three ways
Three independent measures of the same collapse. The top organic result lost 58% of its clicks once an AI Overview appeared (Ahrefs, 2025); one national publisher fell from 25.2% to 2.8% in a regulator filing (Digiday, 2025).
While those clicks fall, the machines are reading more of you than ever, and the imbalance is the number that should reorganise your budget. Cloudflare measured one major AI crawler taking roughly 38,000 pages for every single referral it sent back, with about 80% of all AI crawling now done to train models rather than to answer a live question (Cloudflare, 2025). Read that ratio slowly. For every visitor it returns, it takes thirty-eight thousand pages. The pushback has been just as industrial: in roughly five months after Cloudflare let sites refuse crawlers by default, it blocked 416 billion AI requests, with around 2.5 million sites now blocking.
Then there is the fact that quietly undoes most SEO plans: the web that AI engines cite is not the web that Google ranks. Independent analyses of tens of thousands of AI answers find only around 12% of the sources AI tools quote overlap with Google’s top results (Authoritas; seoClarity, 2026). You can dominate one race and be invisible in the other, and most teams are only entered in the old one.
So the work in front of you is not to climb a ranking. It is to be effortless for a machine to read, to trust, and to quote, and to decide on purpose which machines you let in at all. That splits into four practical moves.
1
Lead every page with the answer
Put a direct, self-contained answer of forty to seventy words at the top of each page that matters, written so an answer engine can lift it whole and credit you. The median AI summary runs about 67 words and cites three or more sources, so write to be one of those three rather than the thousandth result nobody reaches.
Content
2
Serve structured data, plainly
Add a few machine-readable tags that state who you are and what you sell. The detail that trips teams up: serve them in the raw HTML so a model reads them without running your JavaScript, and keep them consistent with your other profiles so a machine resolves every mention of you to one confident picture.
Technical
3
Win the questions, not the keywords
AI answers cluster on conversational, question-shaped queries, the long, specific things people actually ask out loud. That is where the citations live now, so write the pages that answer real questions in plain sentences.
Editorial
4
Decide your crawler posture on purpose
You can block AI crawlers by default, as around 2.5 million sites do through Cloudflare, or license your content through standards like Really Simple Licensing, backed by some 1,500 publishers. But mind the trap: Google merged its search and AI crawlers, so blocking the AI bot can mean vanishing from Google search itself. Choose with that cost in view.
Strategy & legal
One popular shortcut is not the fix, and it is worth saying plainly: do not lean on an llms.txt file as a strategy. Google has said it does not use it, one of its own engineers likened it to the discredited keywords meta tag, and adoption sits near 10% after eighteen months. Publish one if it pleases you, but do not mistake it for a plan.
Serious people read the same numbers and reach three different verdicts. Hold all three before you decide.
Stable, Smaller, or Collapsing
The platform
“Clicks are stable, and higher quality.”
Google says total click volume is roughly steady and that AI sends better-qualified visits, quietly clearing out the low-value lookups that never converted. The honest caveat is that it has published no data to support this, and it sits against independent behavioural studies and publishers’ own filings to regulators.
The optimist
“It is smaller, but better.”
The traffic that survives is the traffic worth having: the few visitors AI sends are pre-qualified, and a small high-intent funnel can be a better business than a large idle one. Some brands already credit AI and agent channels with roughly 10% of revenue (Fortune, 2026), which is not nothing.
The publisher
“This is collapse, so bargain.”
A 33% fall in a single year is not a market adjusting, it is the floor giving way, and the only serious answer is collective: block by default, license the content, make the machines pay to read. The catch they admit is real, that bargaining hard with Google risks your search visibility too. A fight worth having, but a fight with real costs.
The Take
Write to Be Quoted. Decide Who Reads You.
The page is no longer the destination. It is the source the answer is built from, so write to be quoted, and decide on purpose who is allowed to read you.
The uncomfortable part is that being right and ranking well is no longer enough on its own. If a machine cannot parse your answer in a single clean sentence, it will quote someone it can, and that someone may not even sit on Google’s first page. The job is to make your best pages effortless to read, trust, and lift, for the software first and the human second, and to treat the crawlers as a decision rather than weather that happens to you.
Take your ten best pages, the ones that earn business when read, and do four things to each: add an answer-first block at the very top, ship valid server-rendered structured data, make a deliberate crawler decision knowing the Google trade-off, and track AI referrals separately so you can see what is working.
Where to start
Pick your ten best pages. The ones that earn business when read. They pay this back first.
Add an answer-first block. Forty to seventy words at the very top, self-contained, written to be quoted verbatim.
Ship valid structured data. Server-rendered, accurate, consistent with your other profiles, so machines resolve you without rendering the page.
Choose your crawler posture. Allow, block, or license, on purpose, knowing the Google trade-off.
Nadim A. Massih · Patient Comet
If the answer engine never sends the click, what is one of your pages actually worth, and would you rather be read by a million machines or visited by a thousand people?
NWritten byNadim A. MassihAI & Tech StrategistMore articles
Common questions
Questions, answered first
How big is AI search now, really?
Big. Google’s AI Mode passed a billion monthly users in 2026 and AI Overviews reach around two billion, while the clicks Google sends publishers fell about a third in a single year (Google; Reuters Institute, 2026).
Should I just add an llms.txt file?
It is not the fix. Google says it does not use it and adoption is near 10% after eighteen months. Prioritise answer-first content and server-rendered structured data; treat llms.txt as an optional experiment, not a strategy.
Does AI-referred traffic actually convert?
Yes, often better than ordinary search traffic. The visitors are fewer but pre-qualified, and early adopters report AI and agent channels starting to show up in revenue (Fortune, 2026).
Can I just block the AI crawlers?
You can, and roughly 2.5 million sites now do by default through Cloudflare. But Google merged its search and AI crawlers, so blocking can cost you Google search visibility too. Decide it deliberately, or license access instead.
Receipts
Sources & references
Google, 2026
At its 2026 developer conference, Google said AI Mode passed one billion monthly users with queries more than doubling every quarter; AI Overviews reach around two billion.
Reuters Institute / Chartbeat, 2026
Google traffic to publishers fell about 33% globally (38% US) in the year to November 2025; ChatGPT referrals remain about 0.02% of publisher traffic.
Pew Research / Ahrefs / Digiday, 2025
Users clicked 8% with an AI summary vs 15% without (Pew); position-1 CTR fell 58% with AI Overviews (Ahrefs); the Daily Mail reported 25.2% to 2.8% (Digiday).
Cloudflare, 2025
One AI crawler took ~38,000 pages per referral; ~80% of AI crawling is for training; Cloudflare blocked 416 billion AI requests in five months after enabling block-by-default.
Authoritas / seoClarity / Fortune / RSL, 2026
Only ~12% of AI-cited sources overlap Google’s top results; some brands credit ~10% of revenue to AI and agent channels; Really Simple Licensing is backed by ~1,500 publishers.
Generative tools collapsed the cost of producing video, images and copy toward zero. The job moves from making the asset to judging which one is worth shipping, and whether it can survive being seen as AI.
N
Nadim A. Massih
Patient Comet · 14 May 2026 · 9 min read
23%
of Super Bowl 60 ads used AI, to a sharply negative reception (Adweek, 2026)
50%
of US consumers prefer brands that avoid generative AI (Gartner, 2026)
$1 → $0.03
freelance work swapped for AI at the most exposed firms (Ramp, 2026)
A Free Ad, Failing in Public
At Super Bowl 60 in February 2026, roughly one ad in four involved AI, either selling it or made with it (Adweek, 2026). The one everyone remembered was a vodka brand.
Svedka revived a robotic mascot in a spot built with generative tools, and it landed badly: a brand match of 7% against a category norm of 63%, with viewers reaching for words like “weird,” “surreal,” and “WTF” (Adweek, 2026).
The striking thing is not that the ad was bad. It is that the ad was nearly free to make, and that was the problem.
The asset cost almost nothing to generate, so nobody upstream had to fight for it, defend it, or kill it. Cheap to produce meant easy to wave through, and a brand spent its most expensive thirty seconds of the year on something no one had really judged. The price of the frame had collapsed. The price of a bad decision had not.
Eight months earlier, the same technology told the opposite story. A thirty-second AI spot for the platform Kalshi aired in the NBA Finals, reportedly made for about $2,000 against the $250,000 to $500,000 a comparable agency film would cost (NPR, 2025).
Same tools, same year, opposite outcome. One landed because someone with taste chose well. One flopped because cheap generation had removed the friction that used to force a choice. That gap is the whole story, and it is worth understanding why it opened.
Making Got Free. Judging Got Expensive.
When making an asset is nearly free, the bottleneck stops being “can we afford to make it” and becomes “can we tell which one is good, and should it ship at all.” Generative tools have collapsed the cost of producing video, images, audio and copy toward zero, so the scarce input is no longer production. It is judgement: deciding which of the cheap things in front of you is actually worth putting your name on. Cost collapsed for the frame, it did not collapse for taste, and taste is most of the job.
There is a new dimension to that judgement now, too. The old test was simply whether an asset was good. The new test sits on top of it: whether the asset can survive being seen, and soon labelled, as AI, because audiences have started to care whether what they are watching was made by a person at all. That provenance test is becoming as important as the quality one, and holding the two together is where the rest of the shift falls into place.
Cost collapsed for the frame. It did not collapse for taste, and taste is most of the job.
Twenty Directions Before Lunch
Start with the cost side, because the collapse is real and even a global brand can feel it. A ten-second clip from a current video model costs about the price of a coffee, roughly a dollar. Coca-Cola took an AI holiday campaign that would once have run a year and compressed it to about a month, with a small team generating roughly 70,000 clips in that window (The Drum, 2025). When you can generate twenty directions before lunch, scarcity moves: it used to sit at production, and now it sits at selection.
The cost of a frame fell off a cliff
The production line collapsed; the judgement layer did not. What is scarce now sits on top of the cheap bar: direction, and the nerve to choose (NPR, 2025; provider pricing, 2026).
But as the cost of making fell, audience tolerance fell with it, and that is the trap hiding under the saving. In a Gartner survey of more than 1,500 US consumers, half said they would rather do business with brands that avoid generative AI in consumer-facing content, and 68% said they frequently wonder whether what they are seeing is even real (Gartner, 2026). The cheapest possible asset is also the one most likely to read as cheap, and a brand’s most valuable moments are exactly where reading as cheap costs the most. Coca-Cola found this out twice: it ran another AI Christmas ad in 2025 and absorbed a fresh round of backlash for it (Euronews, 2025). Making got free. Trust stayed expensive.
So the people who used to make these assets are not vanishing, but they are not all safe either, because the work is splitting in two. Ramp, looking at actual company card spend, found freelance-platform spend falling from 0.66% to 0.14% of budgets while AI-model spend climbed to 2.85%, which means the most exposed firms swapped about a dollar of freelance work for roughly three cents of AI (Ramp, 2026). That substitution is painful at the commodity end. Yet specialists who can direct and judge appear to be holding their value, even gaining it, while template production falls away. The job is polarising into people who operate taste and people who produced commodity assets, and only one of those groups is shrinking.
That split is not a forecast to wait out. It is a position you can choose, and four moves put you on the right side of it.
1
Rebuild the brief around selection
Generate twenty directions cheaply, then spend the real hours on the choosing. The brief used to end at production; now it begins at selection, so budget for the part that actually costs you, which is judgement, not generation.
Creative direction
2
AI for the long tail, humans for the hero
Mass-produce variants, localisations and the routine middle with models, and reserve crew and named talent for the flagship moment, where reading as cheap costs the most. Match the tool to the stakes rather than running everything through the cheapest path.
Production
3
Put a taste gate, and a provenance test, before ship
One named person approves anything AI-made, and asks a second question alongside “is it good”: can this survive being seen as AI? This stops being optional in August 2026, when the EU AI Act’s Article 50 requires AI-generated media to be labelled.
Quality & brand
4
Disclose, and check the rights
AI campaigns draw backlash, and the law around training data and synthetic performers is still moving; SAG-AFTRA’s proposed terms, tentative as of May 2026, would require a synthetic performer to offer “significant additional value” over a human. Be transparent about AI use and confirm your provider’s rights position before a cheap asset turns into an expensive liability.
Brand & legal
There is a real argument underneath all this, and the strongest objections are not the obvious ones.
Generate, Guard, or Split
The cost-cutter
“Generate everything, and let the data pick the winners.”
At near-zero cost, the winning asset is the one the numbers find, not the one a director defends. Test fifty, ship the one that performs, and stop treating any single frame as sacred; one person with a model outproduced a crew for Kalshi. The flaw the others will name is that volume without judgement is precisely how you get a Super Bowl flop, only now at scale.
The brand guardian
“Cheap to make is expensive to ship.”
Half of consumers prefer brands that avoid generative AI, and your most valuable moments are exactly where that preference bites hardest. The asset may be free, but the brand equity it can burn is not, and provenance and taste are the moat, not the model.
The labour realist
“This is a split, not an extinction.”
Commodity, template work is collapsing into the model, and that is real and painful for the people doing it. But the people who can direct, judge, and give a brand a point of view are getting more valuable, not less. The job divides into operators of taste and producers of assets, and only one of those is shrinking.
The Take
Spend the Saving on Taste
Generation is nearly free, so the advantage is no longer the ability to make the thing. It is the taste to pick the one that should ship, and the judgement to know when “made by a machine” will cost you more than it saves.
The cost collapse is real and permanent, and the mistake is to read it as a headcount story when it is really a value-migration one. The money you save on production is the money you should spend on judgement: deciding what is good enough, and brave enough, to carry your name in public.
If you do one thing this week, feel the bottleneck move. Take an asset you would normally brief out, generate ten directions in an afternoon for under a hundred dollars, then run a short, structured session to pick one and say why, ending on a single question: would this survive being labelled AI? The output is not the point. The point is that making is now the easy part and choosing is the job, and the same maths runs whether you are a global brand or a one-person shop with a card and a model.
Where to start
Pick one asset you would normally brief out. Generate ten directions for it cheaply, in an afternoon.
Run a judging session. Thirty minutes, structured, to choose one and say why. Notice where the work actually is now.
Install a taste gate. One named person who must approve anything AI-made before it ships.
Add a provenance test. Would this survive being seen, or labelled, as AI? If not, it is not ready.
Nadim A. Massih · Patient Comet
If making is free, what is the first thing you would refuse to let a model decide, and would you put a “made by humans” label on it?
NWritten byNadim A. MassihAI & Tech StrategistMore articles
Common questions
Questions, answered first
How much cheaper is AI production, really?
For raw generation, roughly 90 to 99% cheaper. A ten-second clip costs about a dollar, and one AI Finals spot reportedly cost about $2,000 against $250,000 to $500,000 for an agency equivalent (NPR, 2025). The catch: that is the cost of frames, not of taste or rights.
Are audiences actually rejecting AI ads?
Increasingly, yes. Half of US consumers say they prefer brands that avoid generative AI in their content, and two-thirds frequently wonder whether what they see is real (Gartner, 2026). At Super Bowl 60, roughly a quarter of ads used AI and the reception was sharply negative (Adweek, 2026).
Is this killing creative jobs?
It is splitting them. Company spend is shifting from freelancers to AI, about a dollar of freelance work for three cents of AI (Ramp, 2026), so commodity production is shrinking, while people who can direct and judge tend to command a premium. The job polarises, it does not simply vanish.
Do I have to label AI-made content?
Soon, in many places, yes. From August 2026 the EU requires AI-generated audio, image, video and text to be marked and deepfakes disclosed, so “is this obviously AI” becomes a compliance question, not just a taste one.
Receipts
Sources & references
Adweek / iSpot, 2026
About 23% of Super Bowl 60 ads featured AI; Svedka’s AI spot scored a 7% brand match against a 63% category norm; the reception was sharply negative.
Gartner, 2026
In a survey of 1,500+ US consumers, 50% prefer brands that avoid generative AI in consumer-facing content; 68% frequently wonder whether content is real.
Ramp, 2026
Company freelance-platform spend fell from 0.66% to 0.14% of budgets while AI-model spend rose to 2.85%; the most exposed firms swapped about $1 of freelance work for $0.03 of AI.
NPR / The Drum, 2025
A Veo-made Kalshi spot aired in the NBA Finals for a reported ~$2,000; Coca-Cola compressed an AI campaign from a year to a month with ~70,000 clips.
EU AI Act / SAG-AFTRA, 2026
EU Article 50 requires AI-content labelling from August 2026; as of May 2026 SAG-AFTRA’s proposed terms require a synthetic performer to offer “significant additional value” over a human.
Your product now serves a human and an agent at once.
Walmart fired OpenAI’s checkout and built its own. A non-human now arrives carrying real intent, and serving it is the easy half. Trusting it is the hard one.
N
Nadim A. Massih
Patient Comet · 21 May 2026 · 9 min read
+393%
growth in AI-sourced retail traffic in a year: your fastest-growing buyer is a machine (Adobe, 2026)
$262B
of US holiday retail sales influenced by AI and shopping agents (Salesforce, 2025)
~32%
of web requests are now automated, set to pass human traffic by 2027 (Cloudflare, 2026)
In March 2026, Walmart fired OpenAI. Not from a partnership exactly. From its checkout.
For a few months, you could buy Walmart goods through OpenAI’s Instant Checkout, the agent doing the clicking on the shopper’s behalf. Then Walmart looked at the numbers and pulled the whole thing. The borrowed agent was building the wrong carts and converting below Walmart’s own channels. So Walmart yanked it and shipped its own shopping agent straight into ChatGPT and Gemini instead. OpenAI conceded that its first checkout “did not offer the level of flexibility that we aspire to provide,” and retreated to discovery only (Retail Dive; OpenAI, 2026).
Read the shape of that, because it matters more than the brands. A retailer let an AI agent transact at its front door. The agent fumbled. The retailer decided it would rather build the machine itself than let someone else’s machine fail at its checkout.
That is not a story about Walmart. It is a story about your product.
You Now Have Two Users
For forty years you built one front door, for one kind of visitor: a person, with eyes, who can be charmed. A second visitor has started arriving who has none of those things, and is arriving fast.
Here is the shift in one sentence. Your product now serves a human and an agent acting for that human, and almost every interface in existence was built for exactly one of them.
The human you know intimately. You spent a decade learning to charm them: the hero image, the social proof, the colour of the buy button, the carefully sequenced persuasion. All of it is an argument aimed at someone who can be moved. The agent cannot be moved. It does not see the hero image. It reads your page the way it reads a database, as fields and values and price, and everything you optimised for the human is, to the agent, simply not there.
So the question you have asked for years quietly inverts. It was: how do I charm the visitor? It becomes: can my surface even be read by the thing now doing the buying? The agent is not a better-persuaded customer. It is a reader, and if it cannot parse you, none of your persuasion ever loads.
And the reader is already at the door, already buying better than the human. AI-sourced traffic to US retail grew 393% year on year. In March 2025 those visitors converted 38% worse than ordinary humans. Twelve months later, 42% better(Adobe, 2026). The visitor that could not buy a year ago is now your best-converting one. And it is not a fringe: AI and agents influenced roughly $262 billion, about a fifth of US holiday sales (Salesforce, 2025), while automated traffic is already around 32% of web requests, set to overtake humans by 2027 (Cloudflare, 2026).
The conversion crossover
In a year, agent-driven shoppers went from converting worse than humans to converting better, yet the average product page is only 66% machine-readable, a third invisible to the fastest-growing buyer (Adobe; Cloudflare, 2026).
Serving Is Easy. Trusting Is Not.
So make the page readable and you are done? Not quite. Serving the agent is the easy half. Trusting it is the half nobody has solved.
When Amazon told Perplexity to stop shopping in its store, the fight was not about traffic. It was about identity: does an agent acting for a person inherit that person’s right to be there? Amazon won a court injunction, then watched an appeals court pause it in March 2026 (CNBC, 2026). The law is genuinely unsettled.
The quieter danger is worse. Your fraud engine was trained on humans, so when a legitimate agent checks out for a real customer, the engine sees an unfamiliar pattern and blocks it. Call it the false decline: a real sale, wrongly refused. The human never sees the rejection. The agent simply routes to a competitor. And you go invisible to the fastest-growing buyer you have, without a single error message to tell you it happened.
The industry is now racing to fix exactly this. The work is cryptographic: identity is moving to the FIDO Alliance, “signed agents” let a site prove which agent is which, and at least one major card network has promised to reimburse purchases its registered agents get wrong.
So the work has two halves: make your surface readable, then decide who you trust to read it. Four moves, in order.
1
Build the machine-readable layer
The agent needs structured truth: price, stock, variants, shipping, returns, in clean fields it can parse without guessing. The average product page is only about 66% machine-readable, so a third of what the agent needs to act is missing. Close that gap and the agent stops fumbling your cart.
Front-end
2
Open a sanctioned door, not a hole in the wall
You do not want to block every agent, and you do not want to let everything in. The middle path is a recognised entrance: the Model Context Protocol, now a Linux Foundation project, lets you expose tools to agents on purpose, and signed-agent schemes let you tell a sanctioned buyer from a scraper.
Platform
3
Adopt a standard, but expect a protocol war
The standards are real and arriving: MCP for tools, agent-commerce protocols for purchases, identity at the FIDO Alliance. But Amazon’s win was stayed on appeal, and the stack is as unsettled as the law. Build to a standard, stay loosely coupled, and expect to switch.
Commerce
4
Stop your fraud engine rejecting your best buyer
This is the play the other three exist to set up. Ask your fraud team one question: how would our engine treat a legitimate agent checkout it has never seen before? If the honest answer is “block it”, you are already losing sales you cannot see.
Risk
Three people are in the room on this, and none of them is entirely wrong.
Hype, Disintermediation, or Risk
The skeptic
“The base is tiny, so this is premature.”
AI platforms are still a low single-digit share of total retail, and re-plumbing the whole storefront for a rounding error is premature optimisation. Let the standards settle and move when the base is real. The counter: the growth rate and the conversion quality, not the base, justify a cheap, reversible move now.
The merchant
“Letting agents in disintermediates me.”
An agent stands between you and your customer, comparing you on price alone and owning the relationship you spent years building. The counter is Walmart: it did not slam the door, it served the agents on its own terms, with its own software. The choice is not whether agents transact, but whether they do it through your surface or around it.
The security lead
“Two users is a trust nightmare, so slow down.”
You now have to authenticate a human and the machine acting for them, prove consent, and block neither. That is precisely why a wide-open door is reckless and a sanctioned one is the only sane answer. The abuse comes from agents with no legitimate lane, forced to impersonate people.
The Take
Serve the Second User on Your Terms
The serving problem is solvable in a quarter. The trusting problem is where this is actually decided.
The honest strategic question is no longer whether agents will transact with you. It is whether your interface can serve the agent on your terms, or whether you will be the merchant whose cart someone else runs. Walmart answered it by building its own agent. Most teams have not answered it at all.
Get readable this quarter, because that part is plumbing. But spend your real attention on identity and consent, because that is where this is won. The teams that solve trusting, not just serving, will own the third of traffic that is about to become most of it. The ones still perfecting the hero image will be charming an empty room.
Where to start
Run a machine-readability check. Have an agent try to read and buy from your top ten product pages, and find your missing third.
Open one sanctioned door. Expose a single high-value flow to agents via MCP or signed agents, and watch what they do.
Audit your false declines. Ask how your fraud engine treats an agent checkout it has never seen before.
Decide identity and consent on purpose. Form a position now, before whichever default ships first forms it for you.
Nadim A. Massih · Patient Comet
Run the check on your top ten pages this week. What is the one field your best-selling page is missing that an agent needs to buy it?
NWritten byNadim A. MassihAI & Tech StrategistMore articles
Common questions
Questions, answered first
Is agent traffic actually material yet?
Material and growing fast. AI-sourced traffic to US retail rose 393% year over year in early 2026 and influenced roughly $262 billion over the 2025 holidays, though AI platforms are still a small share of total e-commerce. Early, but real (Adobe; Salesforce, 2025-2026).
Do I have to let agents transact, or can I block them?
You can block them, and Amazon won a court order against one shopping agent before it was stayed on appeal in 2026. But the channel converts better than ordinary traffic, and the savvier move, as Walmart showed, is to serve the agent on your own surface rather than cede it (Adobe; Retail Dive, 2026).
Is there a standard, or will I integrate forever?
There are open standards already, the Model Context Protocol for tools and agent-commerce protocols for purchases, but the field is fragmenting into competing standards. Build for the surface your customers actually use and keep the integration thin (Linux Foundation, 2025).
When an agent buys the wrong thing, who pays?
It is being worked out. Identity and payment standards are moving to the FIDO Alliance, and at least one major card network now says it will reimburse erroneous purchases by its registered agents, the first real liability backstop (2026).
Receipts
Sources & references
Retail Dive / OpenAI, 2026
Walmart replaced OpenAI’s Instant Checkout with its own ChatGPT agent after wrong carts and weak conversion; OpenAI conceded its checkout “did not offer the level of flexibility” and retreated to discovery.
Adobe, 2026
AI-sourced traffic to US retail rose 393% year over year; it converted 42% better than non-AI traffic by March 2026, reversed from 38% worse a year earlier; product pages average 66% machine-readable.
Salesforce, 2025
AI and agents influenced about $262 billion and roughly 20% of sales over the 2025 US holidays.
Cloudflare, 2026
Automated traffic is roughly a third of web requests, expected to overtake humans by 2027; new signed-agent and Web Bot Auth schemes let sites tell sanctioned agents from scrapers.
FIDO / Amazon / CNBC, 2026
Agent identity is moving to the FIDO Alliance and a card network will reimburse erroneous agent purchases; Amazon’s injunction against Perplexity’s agent was stayed on appeal in March 2026.
Snap cut a thousand jobs and named the reason: AI writes 65% of its code. When working software is cheap, what your team actually ships moves up the stack.
N
Nadim A. Massih
Patient Comet · 28 May 2026 · 8 min read
65%
of Snap’s new code is AI-generated, the reason it named for cutting 16% of staff (CNBC, 2026)
1.7×
more bugs and security flaws in AI-written code than human-written (CodeRabbit, 2025)
-20%
jobs for junior developers since AI took the routine coding (Stanford, 2025)
On 15 April 2026, Snap fired about a thousand people. Sixteen per cent of the company. Gone in a memo.
The memo did something unusual. It told the truth.
Most layoff notes hide behind weather words. Headwinds. Realignment. A challenging macro environment. Snap’s chief executive skipped all of that and named the cause out loud: AI agents now generate more than 65% of the company’s new code, and small squads using AI tools can do the work that used to need larger engineering teams (CNBC, 2026).
Then the part nobody could have scripted. The stock went up. About eleven per cent.
Read that sequence again. A company says machines now write most of its software, says it therefore needs far fewer humans, and the market rewards the confession. The layoff was not the bad news. The layoff was the proof of concept.
The Deliverable Moves Up the Stack
For thirty years, the scarce and defining act of building software was the writing of it. The code was the work. You hired for it, you queued for it, you protected the people who could do it.
That bottleneck is dissolving. Google says 75% of its new code is now AI-generated and approved by engineers; Microsoft puts its figure at 20 to 30%; Snap is past 65 (Google; CNBC, 2025-2026). Roughly nine in ten developers now reach for an AI tool as a matter of course. The act that used to be the job is becoming the cheap part of the job.
So where did the value go?
Most people get this wrong in the same direction. They assume that when the cost of code collapses, the value collapses with it. The opposite is happening. The value did not vanish. It moved, up the stack, to the things the model cannot do for you: deciding what to build, knowing whether it is any good, specifying it precisely enough that an agent can execute it, and getting it in front of the right people.
Here is the reframe worth keeping. The code was never the valuable part. The code was the toll you paid to find out whether your judgement was right. Now the toll is cheap, and what is left exposed is the judgement. That is the new deliverable: not the diff, but the decision behind it, and the proof the decision was sound.
As code authorship rises, the human deliverable moves up
The share of new code written by machines roughly tripled in eighteen months. As that line climbs, the human’s job slides up the stack, from keystrokes to spec, review, and taste (Google; Snap, 2024-2026).
Cheap to Write Is Not Cheap to Own
Now the part the eleven-per-cent pop conveniently skips. Cheap to write is not the same as cheap to own. The cost of code did not disappear when the typing got fast. It relocated, downstream, to review, to maintenance, to the slow tax of running software nobody on the team fully understands.
The evidence is in, and it is not flattering. CodeRabbit looked at AI-co-authored pull requests, the chunks of code submitted for review, and found they carried about 1.7 times more issues than human-only ones, with security problems up to 2.7 times worse (CodeRabbit, 2025). The code arrives faster, and arrives carrying more of the kind of problem you do not see until later.
Then the part that should unsettle you. A controlled trial put experienced developers on code they knew well. With AI, they were about 19% slower(METR, 2025). Not faster. Slower. And they believed they were faster the whole time. The tool did not just cost them time. It cost them the ability to notice.
A large industry study found the pattern underneath: AI raises throughput and worsens delivery stability (DORA, 2025). It does not fix your team. It amplifies whatever your team already is. Disciplined shops get a multiplier. Sloppy ones get a faster way to ship the mess. There is a name worth keeping for the bill that comes due here: comprehension debt, the accumulating cost of shipping code nobody fully understands. You take it on quietly, at speed. You repay it all at once, in production, on the worst possible day.
We Are Automating the Apprenticeship
The cost moved. The work moved up the stack. What about the people?
Stanford found that employment for early-career developers, the ones aged 22 to 25, is down about 20% since AI went standard (Stanford, 2025). Not redistributed. Down. The bottom rung of the ladder was the most code-shaped part of the job, take a clear ticket, write the obvious implementation, hand it back, and that is exactly what an agent now does for nothing.
Sit with what that means. We are automating the apprenticeship. We are removing the years in which a junior became a senior by writing a great deal of mediocre code and slowly learning why it was mediocre. So here is a question nobody has answered: where does the next generation of judgement come from, if the work that used to grow it is the first work we gave away?
Enough diagnosis. If you run a team that ships software, four moves follow.
1
Make the spec the deliverable
Stop treating the specification as paperwork on the way to the real thing. The spec is now the real thing, the artefact you version and protect. GitHub’s Spec Kit makes this literal, and once the spec is solid the agents underneath become interchangeable.
Product & eng
2
Move people from author to verifier
The old senior wrote the hard code; the new senior reads everything and decides what is true. That is a different muscle, and most teams have let it atrophy. Train for it on purpose, because the scarce skill is now judging a diff a machine wrote, fast, and knowing whether it is right.
Engineering
3
Move people up, not out
Snap moved people out, and the market clapped, but that answer eats your own future. The harder, better one is to move people up the stack faster than the machine eats the bottom of it: into problem definition, into taste, into judgement.
Leadership
4
Make distribution and taste the moat
When anyone can produce working software in an afternoon, the software is not the moat. Knowing what to build, building it with taste, and getting it to the right people are the three things the model still cannot do. Spend your scarce human attention there.
Strategy
Three people are arguing about all of this, and they are all partly right.
Measurer, Maintainer, or Realist
The measurer
“The gains are an illusion.”
A controlled trial found experienced developers 19% slower with AI on familiar code, while feeling faster the whole time. A productivity revolution you cannot measure is a story, not a result, so measure your real cycle time before you believe the headline.
The maintainer
“You are confusing writing with owning.”
AI pull requests carry about 1.7 times more issues, security problems multiply, and the bill is deferred, not cancelled. The code is cheap to produce and expensive to live with, and comprehension debt compounds in the dark.
The realist
“AI is an amplifier, not an engine.”
It does not fix a team, it magnifies what is already there: strong teams pull ahead, weak ones get worse, and stability degrades without discipline. Real, and good, but only for teams disciplined enough to deserve it. For everyone else, a faster way to be exactly what you already were.
The Take
Ship the Judgement, Not the Keystrokes
AI is an amplifier, not an engine, and most teams are not disciplined enough to deserve it. The tool does not care.
The companies reading this as a layoff story are reading it wrong, even the ones doing the layoffs. The point is not that you need fewer people. It is that what your best people produce has moved up the stack: the problem definition, the architecture, the taste about what to build, and the proof that it is right. The keystrokes are handled, and handled cheaply, which is exactly why the expensive part is now everything around them.
So here is the one thing to do this week. Take the next feature on your list and write it as a spec before anyone writes a line of code. When the code comes back, review it against the spec, not the diff: ask whether it does what you actually meant, not whether the lines look plausible. That single change moves your team’s centre of gravity up the stack, from typing to judging, which is exactly where the value went.
Where to start
Write the next feature as a spec. What and why, with acceptance criteria, before any code.
Review against the spec, not the diff. Judge whether it does what you meant, not whether the lines look plausible.
Reinvest the saved hours in verification. Testing, version control, small batches, real review.
Move juniors into specifying, not out the door. That is where the next seniors come from.
Nadim A. Massih · Patient Comet
If your engineers stopped writing code tomorrow and only specified, reviewed and decided, would your product get better, or would you finally find out who was doing the thinking?
NWritten byNadim A. MassihAI & Tech StrategistMore articles
Common questions
Questions, answered first
Is code really mostly written by AI now?
At the largest engineering organisations it is now the majority of new code: Google says 75%, Snap over 65%, Microsoft 20 to 30%. The caveat is “AI-generated and approved by engineers”: humans still gate it (Google; Snap; CNBC, 2025-2026).
Does AI-generated code actually make teams faster?
Mixed. One industry study found higher throughput overall, while a controlled trial found experienced developers 19% slower on code they knew well. Real gains on new work, real risks on deep maintenance (DORA; METR, 2025).
If code is cheap, what becomes the scarce skill?
Problem definition, architecture, taste, and distribution. The value moves up the stack, and the early-career roles doing commodity coding are the ones already shrinking (Stanford, 2025).
What is the catch nobody mentions?
Cheap to write is not cheap to own. AI pull requests carry about 1.7 times more issues, security problems multiply, and the cost relocates to review, maintenance, and the comprehension debt of code nobody fully understands (CodeRabbit, 2025).
Receipts
Sources & references
CNBC / Snap, 2026
Snap cut ~16% of staff in April 2026; the CEO said AI agents generate over 65% of new code and small squads now do the work of larger teams; the stock rose ~11%.
Google / Fast Company, 2026
Google said 75% of new code is AI-generated and approved by engineers; a leader said engineers are becoming product engineers and architects.
CodeRabbit, 2025
Across pull requests, AI-co-authored ones carried about 1.7x more issues than human-only ones, with security issues up to 2.7x higher.
METR / DORA, 2025
A randomised trial found experienced developers 19% slower with AI on familiar code; DORA found AI raises throughput but worsens delivery stability and amplifies existing discipline.
Stanford / JetBrains, 2025-2026
Employment for early-career developers (ages 22 to 25) is down about 20% since AI went standard; about 90% of developers now use an AI tool at work.
When everyone rents the same intelligence, taste is the moat.
Deezer drowns in AI tracks nobody plays. When near-frontier intelligence costs cents and every model converges, the model stops being the advantage, and taste becomes the moat.
N
Nadim A. Massih
Patient Comet · 29 May 2026 · 8 min read
44%
of Deezer’s daily uploads are AI, but only 1 to 3% of streams: supply flooded, demand did not (2026)
3 months
how far free open models trail the best paid ones: the model is no edge (Epoch AI, 2025)
81%
identical output from two rival AI models given the same task (NeurIPS, 2025)
By April 2026, on a single streaming service, the machines were making most of the music.
Forty-four per cent of everything uploaded to Deezer each day was AI-generated. Around 75,000 tracks. Every day. A tide of new songs arriving faster than any human could ever listen.
Here is the part that should stop you. Those tracks were only one to three per cent of what people actually streamed. And of even that sliver, roughly 85% was flagged as fraudulent, bots streaming bot music to skim the royalty pool (Deezer, 2026). Spotify, fighting the same flood, deleted around 75 million spam tracks in a single year.
So the machines made the music. Nobody played it.
Sit with that, because it is the whole problem in miniature. Supply went vertical. Demand did not move. The cost of making a song fell to almost nothing, and the value of an unwanted song fell right along with it, all the way to zero. The flood was free. That was exactly the problem.
When the Model Stops Being the Moat
For three years, the question in every boardroom was the same. Which model do you use. It was the right question for a while. Then the floor fell out of the price.
The cost of GPT-3.5-level intelligence dropped from about $20.00 to $0.07 per million tokens in roughly eighteen months. More than a 280-fold collapse (Stanford HAI, 2025). Intelligence that once felt like a strategic asset now costs less than a rounding error.
And it is not just cheap. It is everywhere. Open-weight models now trail the closed frontier by an average of about three months, a gap that has been narrowing, and the top labs sit clustered close together on the hardest benchmarks (Epoch AI, 2025). The head start you thought you were buying is now available to your competitor, your competitor’s intern, and a teenager with a laptop, give or take a quarter.
Marc Andreessen reportedly told investors in early 2026 what the price chart was already shouting: the moat is not the model. Models commoditise, costs fall, and the value sits in product, integration and distribution (reported, 2026). When everyone can rent the same near-frontier intelligence for cents, the intelligence is not the advantage. It is the air. Nobody builds a defensible business on access to oxygen.
So if the model is not the moat, what is?
As capability got cheap, sameness got expensive
The commodity curve and the homogenisation curve meet where taste is the only variable left. Two rival models produced about 81% identical output on the same task (Stanford HAI, 2025; NeurIPS, 2025).
The Hivemind Effect
Hold that question, because the cheap intelligence has a side effect nobody priced in. It makes everyone sound the same.
The NeurIPS 2025 best paper had a name for it that should make you flinch: “Artificial Hivemind.” Looking across more than seventy models, the researchers found a diversity collapse. Two independently trained models, given the same task, produced output about 81% similar (NeurIPS, 2025). Not borrowed. Not copied. Just converged, like rivers finding the same sea.
And it is not only the machines. A Science Advances study in 2024 found that AI-assisted writers each produced individually more creative work, but that work was about 10% more similar to one another (Science Advances, 2024). The tool lifts the floor and flattens the ceiling at the same time. You get better. You also get the same.
Now put that next to how the technology actually landed. McKinsey reports around 88% of companies have adopted AI. MIT reports that around 95% of deployments show no significant return (McKinsey; MIT, 2025). Eighty-eight per cent in. Ninety-five per cent of those getting nothing back. That gap is the entire argument. If access were advantage, near-universal adoption would have produced near-universal gains. It produced almost none, because everybody bought the same engine, pointed it at the same problems, and got the same average answer. The market does not pay a premium for the average. It never has.
Here is the reframe to feel. The machines can flood the supply of almost anything now, for almost nothing. But demand did not follow, because people do not want more of the average. When the inputs are a commodity and the outputs converge, sameness is not a risk you might run into. Sameness is the default. Distinctiveness is the deviation, and you now have to engineer it on purpose.
So you stop competing on the engine and start competing on everything the engine cannot give you. Four moves.
1
Treat the model as a utility
Stop shopping for a saviour. The model is the electricity, not the building. Wire for it, assume the price keeps falling and the quality keeps rising, and put none of your identity into which vendor you rent this quarter.
Strategy
2
Build the feedback loop, not the data pile
The instinct is to hoard data, but a static pile commoditises too, the way the model did, and it ages while it sits there. The durable moat is the living loop: your judgment and customer signal, captured and fed back so the product sharpens with every use. A pile sits still. A loop compounds.
Product & data
3
Use AI to widen, not to narrow
The default pull of the tool is toward the centre, the 81% everyone else lands on. Use it to generate ten strange directions and pick the one only you would pick, not the one safe answer everyone gets. Steer it the other way on purpose, or it hands you the hivemind’s homework.
Creative
4
Put a named human in the loop, and be legibly distinctive
When the supply is a flood of anonymous average, a real person with a point of view is the signal in the noise. Not a faceless brand: a name, a judgment, a way of seeing a reader can recognise across a room. Make your distinctiveness legible, so people can tell it is you before they are told.
Editorial & brand
Three rooms disagree about all of this, and each of them is partly right.
Capability, Data, or Taste
The accelerationist
“Capability still wins where it counts.”
Spare me the taste sermon. The frontier is not done climbing, and when the next model does something yours simply cannot, your lovely distinctiveness will not save you. Raw capability is still the kingmaker, and it always comes back.
The economist
“Proprietary data is the moat, not taste.”
Taste does not compound, data does. The real moat is the data nobody else can scrape, and the firm with the richest unique dataset wins regardless of its taste. The honest completion: a static data pile commoditises too, so the real moat is the living feedback loop, which is judgment by another name.
The optimist
“The sameness panic is overstated.”
The defaults converge, but diversity returns the moment you steer. The 81% is what you get when nobody is driving; point the tool with intent and the distinctiveness comes right back. The catch: if it returns only when you steer, then steering is the work, and nobody is going to do it for you.
The Take
Engineer the Deviation
Capability is a moving floor, so standing on it is no plan. The advantage is whatever the hivemind cannot reproduce.
The uncomfortable truth under all those deleted tracks is that frontier capability did not make the work more distinctive. It made it more average, faster, and at a scale no audience asked for. When the model is a shared utility, your edge is the part it cannot supply: a real point of view, a proprietary loop of judgment, a house style no competitor could replicate.
So here is the one thing to do this week. Run the hivemind test. Take one real asset you are about to ship, a headline, a layout, a strategy memo, and generate it from three different models. Lay them side by side and measure the overlap. The more they agree, the more you have just found the average, the thing everyone else will also produce for cents. Your job starts where the three stop agreeing. That gap is the only place your advantage can live.
Where to start
Run the hivemind test. One asset, three different models, measure the overlap. High agreement means you found the average.
Name who breaks it. One person who rejects the first, most-probable output before it ships.
Build the loop, not the pile. Capture your judgment and customer signal so the product sharpens with use.
Make your distinctiveness legible. A name, a view, a house style a reader can recognise before being told.
Nadim A. Massih · Patient Comet
What is the one thing you make that no model, and no competitor renting the same model, could produce in your place? If the answer comes slowly, that is not a failure. That is the map.
NWritten byNadim A. MassihAI & Tech StrategistMore articles
Common questions
Questions, answered first
Is the model you choose still a competitive advantage?
Decreasingly. Open-weight models trail the closed frontier by about three months on average and the top labs are clustered close together, so a best-model edge is brief and rentable by rivals (Epoch AI, 2025).
Is AI sameness real or just a vibe?
Measured. The NeurIPS 2025 best paper documented about 81% average output similarity between two independent models on the same task, and platforms like Deezer now see AI make up 44% of uploads but only 1 to 3% of streams (NeurIPS, 2025; Deezer, 2026).
Doesn’t AI make my team more creative, so this fixes itself?
Individually yes, collectively no. One study found AI-assisted work rated more creative per person, yet about 10% more similar to everyone else’s. Better work, less differentiation (Science Advances, 2024).
If everyone has AI, why are so few winning with it?
Because access is not advantage. Around 88% of companies adopted AI, yet ~95% of deployments report no significant return; value concentrates in firms with proprietary signal and reworked workflows (McKinsey; MIT, 2025).
Receipts
Sources & references
Deezer / Spotify, 2025-2026
AI tracks reached 44% of daily uploads on Deezer but only 1 to 3% of streams, ~85% flagged fraudulent; Spotify deleted about 75 million spam tracks in a year.
Stanford HAI, 2025
The cost of GPT-3.5-level intelligence fell from about $20.00 to $0.07 per million tokens in roughly eighteen months, a more than 280-fold drop.
Epoch AI, 2025
Open-weight models lag the closed state of the art by an average of about three months, narrowing, with the leading labs clustered close together.
NeurIPS, 2025 (Artificial Hivemind)
A best-paper study of 70+ models found a diversity collapse: two independent models produced about 81% similar output on the same task.
McKinsey / MIT / Science Advances, 2024-2025
~88% of companies have adopted AI (McKinsey) yet ~95% of deployments report no significant return (MIT); AI-assisted writers were individually more creative but about 10% more similar (Science Advances).