> RAM prices are crashing because new models won’t need as much
Reality begs to differ [0] and following the link for that text goes to an article [1] where they talk about Google's TurboQuant which supposedly will lower the RAM requirements. Now if that means RAM prices come down (as speculated, not reported on, in the link) or the AI companies just do more things with their extra ram is yet to be determined. The fact this article links there with text "RAM prices are crashing" throws the entire rest of the article into doubt for me.
RAM prices are most certainly not crashing (yet) and treating it as a forgone conclusion because _one_ lab found gains could be made and hasn't even reported on the efficiency of their method is just irresponsible. It's almost as bad as when LLMs link things to prove their point, you visit the link, and find it says nothing of the sort or even the opposite.
> Now if that means RAM prices come down (as speculated, not reported on, in the link) or the AI companies just do more things with their extra ram is yet to be determined.
Jevons paradox only applies if demand hasnt already been saturated.
The fact that public LLM usage is leveling off at a price of $0 and Jensen "we make the shovels in this gold rush" Huang is rather desperately claiming that you need to spend $250k/year in tokens to be taken seriously suggests that demand saturation may not be that far off.
Whether Jevons' Paradox applies to software engineers I think is another open question. Im constantly being told that it doesnt and that LLMs make half of us redundant now, but Im skeptical - so much automation I see is broken or badly done.
LLMs haven't remotely begun to be integrated into the lives of the typical person. Not even close. The typical person is using LLMs not at all as it pertains to their daily life tasks. They're using them almost entirely for limited discussion matters (eg having a discussion with GPT about a medical issue, or a work related matter).
This is the first or second inning in the LLM rollout. It'll take 15-20 more years for full integration of AI agents into the life of the typical person.
The claw experiments for example can just barely be considered alpha stage. They're early AI garbage unfit for the average person to utilize safely. That new world hasn't gotten near the typical person yet.
The compute requirements to get to full integration of AI agents into the life of the average person - billions of them - is far beyond 10x where we're at now.
TurboQuant has a specific benefit by compressing the KV cache at a negligible cost to quality. That mainly means that the context lengths can go up in models for the same amount of memory, however the KV cache only accounts for something like 20% of the overall model size, and this will not dramatically decrease memory demands in the way that some of the more sensationalist reporting has stated.
The open source tooling got quantization support 3 years ago! It was a lesser type of quantization, but more than enough to prove that the savings just go to bigger models.
I’m not disagreeing with you, but consumer RAM prices are lagging indicators. If commercial RAM prices are dropping then consumers will see those price drops last, especially given the fact that several consumer manufacturers turned to commercial only.
Is there a source that says commercial RAM prices are dropping? I was recently told (without a source, so I am not sure if it is true or not) that OpenAI never even bought any of the RAM they signed deals on last year, and that those deals were just letters of intent. So if prices are coming down I wouldn't be shocked but the economy is pretty well vibe coded these days so who even knows.
If they see them. Plenty of businesses are still charging pandemic prices for all kinds of goods and simply pocketing the difference.
Cars come to mind instantly. Prices exploded in 2020/1, due to legitimate shortages, most of which have been plus or minus resolved, but the prices for new (and used!) cars never came back down.
I do wonder how closely prices consumer RAM kits follow the wholesale prices for NAND chips manufacturers see internally. The pcpartpicker graphs you linked show consumer prices have leveled out and may even be starting to fall. Depending on how the economics shake out this could mean we've hit an inflection point.
My personal prediction is that once the VC bill comes due and prices for frontier models starts to climb, competition for efficiency will heat up. The main AI use-cases seem to be falling into buckets, and I doubt serving gigantic, do-it-all general models for every use-case under the sun is remotely cost-effective.
If common use-cases start to be more efficiently served by smaller, more efficient purpose-built models (or systems thereof), it'd make the big frontier models increasingly niche. Cursor's Composer 2 model is a great example of this.
In any case, I think it's pretty fair to speculate we may be seeing RAM prices start falling sooner rather than later.
Consumer vs NAND is an absolutely fair distinction to make, I'm not sure how to track those prices. My main issue the article saying "RAM prices are crashing" (which I can't find any evidence of) and linking to an article that doesn't even repeat that claim, it instead just speculates that maybe RAM will come down in price due to this new idea.
> In any case, I think it's pretty fair to speculate we may be seeing RAM prices start falling sooner rather than later.
I sure hope so. RAM, HDDs, and SSDs are all crazy-high right now and I was in the market for literally all 3 but have paused all my buying because I can't justify the costs as they stand today.
RAM prices haven't crashed yet and it'll take time because it has to propagate within the supply chain. Micron is -20% from the top already https://www.investing.com/equities/micron-tech
Stock price is the best forward indicator I can think of
yeah good point, although it's just one of all the catalysers I mentioned. I fact I had written most of the post already before I saw the news about RAM.
I would think that we are going to see RAM prices increase even more, given, among other things, pure helium disruptions and increased electricity prices.
I haven't looked closely into TurboQuant, but perhaps it will revolutionize just as much as the 1-bit llm did...
They've all avoided loading up their LLMs with ads to this point. That is going to change dramatically over the next 2-3 years. All of them will be loaded with ads, and Google will partake as expected given their ad network & capabilities in that realm. They'll match GPT's ad roll-out.
You get more Claude tokens from a Google subscriptions via antigravity than from anthropic. Especially if you use the 5 other "family" accounts you can share the subscription with...
Not crashing yet. The article is looking 1 to 5 years to come.
Given Nvidia's CEO's agitation I would give credit to the prediction, and if it's correct the price will go back to what it was, or even lower of investment in capacity are made today.
Honestly you're both wrong. RAM prices spiked speculatively, and they're going down for the same reason. Market people always want to argue in fundamentals, when in practice *ALL* the high frequency components of the signal are down to a bunch of traders trying to guess where it's going in the short term.
At best those guesses are informed by ground truth ("AI needs a lot of RAM!" "Sam cornered the marked!" "TurboQuant needs less RAM!"), but they remain guesses, and even then you can't tell the difference between that and random motion.
No, they signed a bunch of contracts for future deliveries. That's not a supply constraint. The factories making RAM continued operating and serving their existing deliveries, and in fact they still are.
Freshman economics would say that supply is fine and that prices shouldn't move. But they did anyway. And the reason is speculation.
I don't get it tbh. What market participants were speculating here? There aren't futures markets in RAM as far I know, though I certainly don't know much. And the supply constraints appear to have been pretty real (though maybe not immediate) if eg. Valve was begging publicly for RAM consignments. Were there pure-play speculators filling warehouses with DDR5?
Have we gotten anymore word on the potential Helium constraints that SK Hynix was making noise about after the strike on the helium plant in the Middle East that suppplied 60% of S. Korea's Helium? Because that could definitely put a kink in things, since SKH is one of the 3 remaining big DRAM producers.
I’ll believe they’re going down when it doesn’t cost $550 for the $105 ram I purchased 1 year ago. Yes consumer prices lag commercial prices yada yada, I think any hot takes are pointless until we see lower prices or far more convincing evidence it’s coming. When it costs basically a MacBook neo for 32gb of DDR5 ram it’s hard to hear “ram is coming down for sure”
Yeah, I also stopped reading at that point. If I want a bunch of random, made up facts to sell lukewarm opinions or steer the uneducated masses, I'll tune in on a Trump press conference. Why does this feel like someone is desperately trying to make reality mirror his flailing market bets?
There is also demand for ram in others areas of data centers. As we are all pushed deeper into clouds, i can see the rise of ram for data storage (ram drives) continue to eat into the supply. A module of ddr5 will be more useful in a netflix rack streaming movies 24/7 than in a gaming PC where it may only be used an hour or two every day.
It’s incredible how polarizing the AI rush is. I keep the perspective that the technology is an absolute step change but I have no idea where the cards will fall. I take a lot of issue with these style of articles. I get a sense that the authors are being overly defensive.
The cost to serve tokens is absolutely profitable today and that’s been true for at least a year. What’s unclear is how R&D and capex fit into the picture. I am not that pessimistic on this front either though. For the data center build outs, demand for tokens is still exceeding supply. On the R&D front, well most of us here on HN have benefited from decades of overinflated engineering salaries being paid by often companies that were not profitable and not only unprofitable, usually without a plan for success. In this current rush, companies cannot keep up with supply, it’s a much easier math problem when you have something that people want (tokens) and you need to figure out profitability when including R&D.
And unlike the traditional "this will replace humans right away", I think what this introduce is a lot of incentive to spread those token in places where there was never any incentive to hire a software engineer for previously. In turn, that will drive a lot of business activity in those area that will potentially fail given the current quality of the output.
This feels like a boom before bust scenario, and I'm not even sure if it will bust.
Seriously, what value are tokens providing other than justifying layoffs. Concretely. Today. Not in the speculating scenario that cardiologist could be replaced with models.
We see this new trend of agentic coding, again a promise software will be written that way going forward, despite the number of fiasco already experienced when trusting a model turned bad. The use case may provide value, but right now all it does is fullfil the push for token consumption all these AI leaders are advocating for.
Tulip futures skyrocketed, it was economic speculation on a useless asset, not supply and demand. Crypto is the analogy, not AI. Given that the major AI labs other than GDM are private, this is even more true.
Agentic coding absolutely blew up from demand, users are not being tricked into paying $200 a month, and they’re not complaining about hitting rate limits because it’s useless.
users are not being tricked into paying $200 a month
I can't believe people actually believe that people and companies are tricked into paying for tokens. My $20 Codex subscription is so useful, I can easily see myself paying $200 for it.
It's ridiculous to call this tulips, in the sense of a speculative asset whose price depends on resale. A more similar recent example is the dotcom boom and bust based on building internet infrastructure, or the 2008 crash which was based on cyclical infrastructure overinvestment. These crashes were characterized by demand growth not keeping up with investment because the target markets were tapped out. Not clear when we'll get there with AI. The consumer market seems saturated on chatbots but we're not even close to saturated for b2b or self driving for example. And this discounts other new technological offerings which may unlock larger consumer markets (products where people are willing to pay $100 a month instead of 10 or 20)
All that said the dotcom boom is extremely analogous and that crash was quite bad.
dotcom was maybe 100B a year focused on the US and mostly VCs. AI is perhaps 250B global VC (with more than half of ALL VCs concentrated in one sector) and another 800B+ from non-VC. These numbers are basically a guess but structurally we are set up for something much, much worse.
But unlike the dot com boom, demand for tokens has not let up and there is increasing demand. I don’t know where it falls, certainly companies don’t get or right and they either over or under build. With the current demand rate changes it’s hard to understand why you would stop building today.
This matches what I'm seeing too. The refactoring and test generation use cases are where it actually saves real time. The tricky part is longer sessions where you lose track of context window and quota usage.
I know there is a large force on HN that want to deny the value of tokens and I know it’s anecdotal but the writing is on the wall. If it’s not valuable to your workflow today it will be soon. I already have tests being written, automated hooks into bugs where an initial PR gets generated with a potential fix. It’s far from perfect but junior engineers are far less productive.
> Seriously, what value are tokens providing other than justifying layoffs
Like the OP said, it's incredible how polarizing this debate is. When I read comments like yours, I feel like a significant part of the global workforce in IT must be living on another planet? Or they never really used Claude Code, Codex, OpenCode, ... intensively before because of company policies?
I legitimately am at least 10x more productive than a year ago, and I can prove it in number of commits and finished monetizable features developed per day. Obviously my workflows still very much require an active, constantly context-switching human-in-the-loop, but to me there's absolutely no question both output volume & quality have skyrocketed.
I say this as someone who has used them to boilerplate/scaffold a bit of code by this point: Economic Value of LLMs is debatable, if only because they're being too broadly applied.
This is changing the narative. Nobody really cares about tulips and some dumb throwaway comparison. Unless LLMs are worth an awful lot the math here does not make sense. That is both debatable and important.
Maybe we need to focus on a better definition of "bust" but we will surely see something along the lines of the hype-cycle graph in AI; what technology has not fallen into the trough before (best case) reaching a more steady-state of use and growth?
I can get Kimi K2.5 inference on openrouter for about $0.5/MTok input + $2.5/MTok output, from six providers that have no moat besides efficiently selling GPU time. We can assume they are doing so at a profit (they have no incentive to do this at a loss), giving us those numbers as the cost to serve a 1T-a32b model at scale.
Now we don't know the true size of any of the proprietary models, but my educated guess is that Sonnet is in about the same parameter range, just with better training and much better fine tuning and RLHF. Yet API pricing for Sonnet is $3/MTok input + $15/MTok output, exactly six times as expensive. Even Haiku is twice as expensive as Kimi K2.5.
I find it difficult to believe in a world where those API prices aren't profitable. For subscription pricing it's harder to tell. We hear about those that get insane value out of their subscription, but there has to be a large mass who never reaches their limits. With company-wide rollouts there might even be a lot of subscription users who consume virtually no tokens at all.
This is false. We may assume it's the most efficient way of generating revenue given their GPUs, but their overall profitability will just be a guess. They would still have incentives to run hardware at maximum, even when it's uncertain to eventually recoup costs.
> a world where those API prices aren't profitable
A lab with employees and models in training has other costs than the operating expenses of a GPU farm.
Yes. I would not consider Kimi a particularly good model relative to its size, and making a SotA model is a lot more expensive. But training costs are explicitly excluded when talking about the cost to serve tokens
Check the token prices for open weight LLMs at various independent inference providers.
That gives you a very good estimate of "how much can you serve the tokens of a model of the size N for while making a profit".
Now, keep in mind: Kimi K2.5 is 1T MoE. Today's frontier LLMs are in the 1T to 5T range, also MoE. Make an estimate. Compare that estimate with the actual frontier lab prices.
Most/all private labs have cited inference is profitable. This was happening before the large push to scrap plans and largely charge folks the underlying api rates. Second take a look at the pricing of open models. Now certainly it’s not direct 1-1 comparison but we can use it as a baseline. Now of course folks might not be telling the truth but one of those situations where I see too many markers on the true side.
For supply look at outages and growth rates at companies like openrouter. The demand is growing every week.
Don’t confuse inference (api usage) with the consumer plan products. When people say inference is profitable they are referring to the cost to serve a token via the API. The consumer products are absolutely a question mark on profitability and as we see with most of the business and enterprise plans, going away for pure on demand use (api cost) full time.
3.99 at 8x instances, with a minimum 2 week commitment. Good luck getting 70% usage average during that time. Useful when you're running a training round and can properly gauge demand, not so great when you're offering an API.
I wish this was higher up. I have been tracking the same since Thanksgiving ‘25 and the growth is unreal. Again I don’t know where the cards fall maybe the industry overspent on capex but it’s at least easier to see why they are spending based on demand. The risk of being left out is greater than overbuilding.
It's not exactly stuff of "wet dreams of the ownership class" to say that of the possible white collar careers, software engineering is pretty hard to beat in terms of salary vs work you need to put in.
Engineering salaries are significantly higher than nearly every other industry on average and on median. Much of this is driven by VC funding rather than sound, profitable, bootstrapped businesses with sustainable profit margins.
Engineering salaries have also been driven upwards significantly the past ~10 years (since the post-2008 crash recovery), while wage growth in the US is mostly stagnant. I don’t have a source handy for that, but there are plentiful studies.
Outside of the US this may be less true, but I took GP’s “most of us on HN” to mean people who work in US tech companies which are primarily concentrated in high COI areas.
Thank you for saying it better than I could have. It’s probably an unnecessary jab but I know how well I benefited financially in an industry where not much was expected in terms of output, lavish perks and huge base salary and stock compensation. Absolutely some companies are extremely profitable per headcount but I look at the sea of failures and how well engineers have generally done. It sets the tone for this massive negativity I see around AI when so many of us have benefited from VC money that failed.
My main worry is - once this is all over, the market consolidates and using LLMs will become a requirement in job listings, what's the highest price per million tokens companies will be able to charge us?
Currently on a given day I'm chewing through approximately the equivalent of my lunch money, but where there's opportunity to extract wealth, someone will find a way to do it.
My (potentially naive) take is that open models will save us. The biggest markets for LLMs (e.g. coding) are narrow-enough to be served well by smaller models with proper RL. Cursor's Composer 2 (created from a Kimi K2.5 base) is a great example, and I expect it to be the first of many.
The wealth of great open models provide an excellent base for fine-tuning, distillation, and RL. I see a lot of untapped potential in the field of bespoke, purpose-built models that can be served far more cheaply than the frontier competition. I would not be surprised if we see frontier-adjacent experiences running comfortably on a Mac Mini by year end.
With frontier models seemingly hitting diminishing returns in quality, I struggle to see a world in which gigantic, expensive, general-purpose models don't become increasingly niche.
Jensen is already talking about $1000/mil tokens soon.
But there is no real higher limit. Imagine a LLM which could answer the question "what does my company need to do to beat the competition?". And then realize that the competition asks their LLM the same question. So now everybody is bidding the price up or using more tokens to get a better answer
> The cost to serve tokens is absolutely profitable today
How can you possibly say that? Everyone knows that's not the case, these companies are losing money every day selling tokens. Revenue is not the same thing as profit.
Don’t confuse what I say. Bottom line these companies are not profitable yet but it is profitable to serve a token via the API. They have increasing demand, not enough supply, models are getting better on quick timelines. For sure there may be some losers but it’s not hard to see that that token serving can be a profitable activity.
Don’t confuse plan changes with profitability of inference. When people talk about the cost to serve a token and it being profitable we are referring to API cost not the plans which absolutely subsidize some level of use. Hard to know what is breakeven on plan math.
That's not a necessarily profitability thing as much as a demand thing. The only way to improve the supply for those willing to pay more is to take it away from those paying less. Once supply catches up to demand things will change
There are private companies which rent/buy GPUs, run open-weight LLMs on them and sell the tokens. They absolutely make profit, and their clients think they get a good deal and are buying the tokens.
I think they’re losing money because they have to amortize the costs of training the models in the first place, which is where most of the resource sink is.
This is why they were freaking out about DeepSeek just taking the trained model weights and slapping an interface on it.
The problem with that comparison is restaurants largely don’t have much room to adjust price or optimize cost. The AI industry is too new with many unknowns right now so investors are willing to take risk. For the hyperscalers the bet is that being left out is going to be a greater loss than overbuilding.
That’s the wrong analogy. Model training is more like the setup costs of developing the menu and training staff. What’s driving the costs is important when talking about financial sustainability. If it’s mostly coming from optional R&D investments instead of the direct costs of producing the food then you can simply not exercise the option and be profitable. If it’s more coming as a variable cost that scales with each meal served that’s a very different situation.
Yeah it should be factored in, but it’s a different set of implications for long term sustainability. They don’t actually need to test and optimize a new menu every day or week. If they decide to just stick to the same one longer they can get way more return from each dollar spent on development. It’s just that right now the rate of improvement you get with training is really high and nobody can afford to fall behind their competition.
Open router is an upper bound of compute cost for the open source models. So people assume that opus and sonnet really isn’t sucking up 10x the resources because open source models aren’t 10x worse. Idk if it’s true or not, but haiku is $5/m tokens and it is much worse than the $2-3/mt models imo
Can you cite your source of an analyst at Cursor. I read the article and looking through the boatload of links but struggled to find what you are referring to. Ty
Ty for sharing and agree. I think there is confusion with some folks in the comments for this post confusing inference profitability and plan profitability. Most plans as we can tell are probably teetering the line of profitability and that’s why we have seen some like Cursor really tighten how many tokens you get.
> On the R&D front, well most of us here on HN have benefited from decades of overinflated engineering salaries being paid by often companies that were not profitable and not only unprofitable, usually without a plan for success.
This is a classic HN mistaking the map for the territory. R&D and capex absolutely figure into de-facto profitability and sustainability for AI labs, despite their separate treatment in accounting.
> well most of us here on HN have benefited from decades of overinflated engineering salaries being paid by often companies that were not profitable and not only unprofitable
This is a really concerning perspective: people were paid what they were worth. Software is or was one of the few remaining arenas wherein a person can find a middle or upper middle class lifestyle consistently.
I will also note: a startup raising an 8 MM series A and eventually fizzling out is not the same at the hundreds of billions invested in these AI companies without a path to profitability. It is utterly absurd to pretend these are the same thing: any company ingesting that much cash needs to justify its capacity to survive.
> Software is or was one of the few remaining arenas wherein a person can find a consistently
Software salary inflation and expansion has made this the case. Tech’s accessibility to the educated has accelerated gentrification massively, rising up prices on rent and food. While the statement is correct, tech’s contribution to income inequality is part of the issue. If you’ve lived in Austin or Chicago (especially Austin) prior to ~2010 you’ll have seen this first hand.
Oh come on there are no “classic HN mistakes” here. Inference is profitable but bottom line is not yet. This is a very young industry and unlike those of the past, it’s much easier to picture a possibility of profitability. It’s absolutely different in that the marginal cost scales linear but solving for the R&D portion of a product where supply cannot keep up is a lot easier than some SaaS where the underlying product is not being used.
The salary jab was probably a little harsh.
Your ending is a bit of a fizzle too. There are many capex intense businesses that do just fine.
> Software is or was one of the few remaining arenas wherein a person can find a consistently.
I want to add something additional to this: it is one of the few fields that can afford middle or upper middle class lifestyle and is accessible.
I have no doubt if I could redo my life with the necessary resources I’d be more than capable of putting myself through med school and gone with a secure career that paid more than I ever made in software.
But at this stage of life? I don’t have the time or money to spend a decade+ paying some institution tens of thousands of dollars to hopefully maybe have a real career.
Once software as a career dies, I suspect many will find themselves locked out the middle class for generations.
It was kind of a flash in the pan moment where you could leave your retail floor manager job, crash course this thing called "javascript" in a 3 month class, and then get hired for a six figure remote job if you could choke out a mildly competent github repo.
> I suspect many will find themselves locked out the middle class for generations.
On the other hand, once software as a high paying career dies there will be nothing to prop up the status quo (high cost of housing, for example) so the middle class will return to being much more accessible to modestly paid jobs.
> any company ingesting that much cash needs to justify its capacity to survive.
What, why? There are tons of low-margin capex-intensive business out there.
I think AI will end up like being like hosting. All the models will converge to being pretty-decent and the companies will have to compete on efficiency since they are selling a generic commodity.
You can already see Anthropic fears this scenario since they try so hard to make people use their first-party tools rather than plugging Claude in as a generic part of a third-party stack.
> This is a really concerning perspective: people were paid what they were worth.
Even interpreting what-they-were-worth in the usual sense, I’m not so sure about this. We have seen wage collusion reported by the usual US West Coast-based companies. And some news on here[1] have reported that some engineer with a salary of $100K[2] might be producing $1M of value. And even factoring in the usual “but benefits and overhead” comes out to a solid factor of profit per programmer/engineer.
Despite that the sense I get (only from this site since that is my only reference) is that the so-called overpaid engineers are incredibly content to just have this happen to them. As long as they are paid well compared to other workers, it’s fine. No matter the profit factor. In fact, the discourse is very much focused on how “privileged” they were if the tide ever changes. Instead of realizing how much value they provided, collectively.
Outlets for capturing more of the value they create is entrepreneurship (Hello HN). Never any collective organizing. And entrepenurship is easily bought via aqcuisition.
Collective bargaining would have been relevant in case they ever get automated... by the very software they co-created.
One could imagine that this “privileged” collection of programmers could have served as a vanguard for the collective good of programming professionals as well as collective ownership of software goods, using their privilege to that end. The former never happened, and the latter is partly realized in people’s free time (see the OSS maintainer in Nebraska meme).[3]
[1] All from recollection since this is just news from the Frontier to me
[2] Of course the pay might be much higher now; this might have been a while ago
[3] when it isn’t simply exploited by corporations just using OSS without giving any back; a logical turn of events when no license or law forces them to contribute back
> This is a really concerning perspective: people were paid what they were worth.
The parent comment doesn't discount that, only pointing out that "what they were worth" was inflated due to a speculative environment. Wherein lies your concern?
I think calling it inflated is to play to a narrative that labor was overvalued broadly in tech.
Salaries across industries in the US have remained flat since the 1970s. Calling the one sector that can provide access a middle class lifestyle inflated s to play into a narrative capital is eager to tell, even if OP didn't intend that.
> Salaries across industries in the US have remained flat since the 1970s
What do you mean? The real (meaning adjusted for inflation) hourly wage in the US has increased by around 20% since 1970.
What has changed since the 1970s is that wages are no longer coupled to productivity. Perhaps that is what you are thinking of? But that should be an obvious truism for anyone in tech. We create the very things that cause that to be the case!
That prices change from one point in time to another is a trivial fact.
“Inflated due to a speculative environment” is not an accurate way to frame labor prices that held for many years. At that point, the prices were simply high due to high demand relative to supply (compared to other types of labor).
> At that point, the prices were simply high due to high demand relative to supply
That goes without saying. The investigation here is into demand. Which was said to be overinflated due to speculation. As noted, many of the companies hiring the developers did not have viable businesses.
Yeah, if we just ignore R&D, fixed costs, depreciation, and the fact that there's a high likelyhood investor were expecting a return, yeah, ignoring all of that, and trusting their number we may say inference turns a profit.
In accounting, almost anything you want can be true, at least for some time.
> nobody is sure if even their metered pricing is profitable
This is most likely wrong. Lab executives insist that serving tokens is profitable. It's the cost of training next-gen models that requires them to keep raising ever larger rounds. More importantly, many independent providers price tokens of open-weight models at a fraction of Anthropic's prices.
But are they actually profitable, or do they employ creative accounting where only parts of overhead expenses are counted against all of inference revenue, similar to what Uber did?
OpenAI' numbers show that they definitely are not profitable on inference, and even worse, revenue growth scaled linearly with inference cost from 2024 to 2025, which means they can't outgrow this problem. See https://www.wheresyoured.at/oai_docs/
If they shut down all training today they’d be absolutely printing money for the next couple quarters and then die with a bang once the other lab releases the next frontier to the public.
How? They're already burning $2 bills to make $1, court documents shown that Anthropic has already been lying around revenue (claimed to have made $19 billion when it's actually $5 billion to date [1]).
Not hard to believe they're lying about other things when they've been lying about the capability of their products since inception.
That is not what the article says, it says $19B ARR.
I don’t necessarily see a contradiction. $19B run rate, achieved very recently, is actually consistent with $5B lifetime earnings, because their growth curve is so sharp. Zitron is not good at math.
Didn't link to Zitron site but if you can't see how dishonest it is to say you have $19b ARR when the reality is you have only a total of $5b IDK what to tell you. Says more about how you think and why you think it's okay for corporations to be misleading.
This is not lying, that is just what run rate revenue means! It makes sense to use as a metric when a company’s user base is growing as fast as Anthropic’s is.
It makes sense to be extremely misleading about actual accounting figures? In what world is it okay to say you have $19b in ARR when you have only ever generated $5b for the entire duration of your company's existence?
Did Enron start a business school I'm unaware of something?
I'd be surprised if they're making money on inference just from that. There's no way someone paying $20 p/m and using it all day is not spending way more on even just the electricity for tokens, let alone the capex.
I don't really get the last bit. It's hard to imagine what a new fangled "frontier model" could do that would blow anyone out of the water. Like what does this look like? Really good benchmarks? Who cares about that anymore?
> Lab executives insist that serving tokens is profitable.
Maybe marginally profitable, but right now they need to give out subsidies for people to use their products (Antigravity, Codex, Claude Code et al) in an actually useful manner that prevents churn and at the scale they need to justify usage growth forecasts, which they need to keep the wheel turning.
Probably if you look at the users who exclusively use the simple chat box interfaces (i.e. ChatGPT, Gemini in UI, Claude in UI) plans it is actually profitable, but I'd also say that's not where most of the usage comes from.
I'd love to actually look at both usage + profitability from each user segment to see if their PxQ growth expectations from non-enterprise usage make any sense.
> Many independent providers price tokens of open-weight models at a fraction of Anthropic's prices.
Are those open-weight models as good as Anthropic? Are they the same parameter class?
It's a loss leader but this is normal. Same has happened with Uber, Airbnb, Amazon, etc. Using VC money to buy marketshare and once you have it, you can milk it.
The question is more around the moats that these companies have and it seems to me while their models are amazing technology, they don't really have a moat. The open/chinese models still continuously catch up to the american ones.
And what possible moat. It isn't hard to foresee that in just a couple of years, models outpacing the latest frontier tech we have today will run on consumer hardware. With open source workflows anyone can pull in to run, providers won't see a penny.
Another scenario is that dense models get replaced entirely, in which case the likelyhood of OpenAI and co pioneering the concept is pretty slim. They will be left with billions worth of infrastructure which cost them 10 times that 2 years earlier, faced with the reality touched by the article: liquidate.
The point is that you can’t just serve tokens without also training the next models.
It’s an inseparable part of your costs, so naturally you can’t be profitable unless the price you are charging ALSO covers training.
Is that right? I think that you can serve tokens without training the next models. It would be bad strategy, but it would work. So it's an important question, are they covering their operating expenditure? If they are the business has legs (and it will be worth spending a lot to train the next models). If not, maybe not.
If a major model provider were to just halt progress on developing new and improved models, the open weight alternatives would catch up in a couple years.
They would have a period of great margin, followed by possibly zero margin as enterprises move to free options.
They would have to come up with a lot of great products around the inferior models to justify charging at that point.
Also, an out-of-date model which doesn't know about last year's world events, hit songs and new JS libraries is a depreciating asset even before you consider low-cost competitors catching up. So you'd presumably have to do some training just to keep the model up to date at the current quality level (unless you completely give up and just sweat the assets). And on the other side of that coin: over the next few years, do the latest, biggest models continue to generate user-perceived real-world improvements sufficient to keep users wanting the latest and greatest?
i don't think it will work, it's too easy to switch models. When google comes out with a new model people will just switch. I think Google wins in the long run, they have the money to just wait until everyone else goes bankrupt and they also have the Apple contract and therefore the mobile market.
There are companies that already do nothing but serve tokens using models trained by others. Just running infrastructure and collecting a reasonable fee for their troubles. It's only a bad strategy if you want to claim to investors that you'll gain monopoly market share if only they could give you a few more billion dollars.
The impetus to continue training at the pace they are is driven by the competition. So if the money starts drying up, then they’ll naturally slow down because they’ll have to figure out how to do more with less.
I suspect that once the models hit a point of “good enough” for certain use cases companies will start putting R&D focus in other areas that may be less expensive. Like figuring out how to run more efficiently, UI/UX conventions that help users get what they’re trying to accomplish in fewer steps, various kinds of caching of requests, etc. So the cost to serve tokens over time should only come down, and will probably start coming down more rapidly as the returns to model training slow down.
That’ll probably be a while though, because each successive model tends to be a lot better than the last.
Not counting training models as part of your gross margin is just creative accounting. It's an inherent part of being able to provde the service for OpenAI, Anthropic etc.
Even so, their subscriptions are significantly cheaper than the token pricing via API. So at some point they will need to get rid of subscriptions or increase the subscription prices dramatically... And that's assuming their current token pricing is actually profitable. Which it probably isn't.
Lastly, I would not trust one word that comes out of an executive of an AI company (or any other large company, for that matter).
I wouldn't trust those claims from any private companies, even public ones play the most insane tricks in earnings calls to inflate numbers or heck, just make up new ones.
I'm not saying they're wrong, but I don't take much stock in their words.
This article tries to build upon a lot of half-truths or incorrect facts, like this:
> OpenAI is struggling to monetize. They turned to showing ads in ChatGPT,
The ads aren’t going into your paid plans (except maybe a highly discounted tier, depending on the market). The ads are a play to offer a free version. Having an ad-supported free tier isn’t new.
The discussion about being unprofitable also repeats the reductionist view that these companies are losing money and therefore the business model doesn’t work. This happens with every VC cycle where writers don’t understand that funded companies are supposed to lose money while they grow. That’s what the investment money is for.
We have very strong indicators that inference is not a money loser for these companies and is likely very profitable. They should be spending large amounts of money on R&D to get ahead and try new things while they’re serving up tokens.
The “but they’re losing money” argument never seems to be brought out against competitors that literally give away their models for free and for which we can calculate the cost of serving 400B-1T parameter open weight models.
> The ads aren’t going into your paid plans (except maybe a highly discounted tier, depending on the market). The ads are a play to offer a free version. Having an ad-supported free tier isn’t new.
Sounds like it is new for ChatGPT though. That's also how it started with TV and Youtube, first on the free tier then expanding to the paid ones.
YT has a Premium Lite paid tier (at least in the U.S.) that does show ads on music and in certain other areas of the app, such as shorts, searching, browsing, etc.
YouTube doesn't show ads on the paid plan. If you're talking about sponsored segments those would be impossible to moderate, and YouTube does offer easy skipping of those.
Do you have any evidence that inference revenue is growing faster than training costs? RLVR is significantly less compute-efficient than token-prediction pretraining - especially as labs are trying to train models to achieve agentic tasks which take tens of minutes per rollout.
> The ads aren’t going into your paid plans (except maybe a highly discounted tier, depending on the market). The ads are a play to offer a free version. Having an ad-supported free tier isn’t new.
This statement doesn't discount the original statement: that ads are going into GPT, which Sam called a last resort.
> The discussion about being unprofitable also repeats the reductionist view that these companies are losing money and therefore the business model doesn’t work. This happens with every VC cycle where writers don’t understand that funded companies are supposed to lose money while they grow. That’s what the investment money is for.
Usually propped-up companies don't last in the long term once the VC subsidy runs out. There's a difference between getting VC money in order to buy rocket parts, and getting VC money in order to charge $7 when you would really need to charge $10. The latter problem never goes away.
> This article tries to build upon a lot of half-truths or incorrect facts, like this:
yeah i was wondering why my bullshit detector was going off. This feels as if someone who cooks for Ramsey's kitchen is trying to predict the end of the market hike.
>The “but they’re losing money” argument never seems to be brought out against competitors that literally give away their models for free and for which we can calculate the cost of serving 400B-1T parameter open weight models.
To be fair people aren't exactly bullish on the prospects of deepseek or z.ai either, it's just they're below radar so they don't get mentioned.
> companies are supposed to lose money while they grow
At what point do we declare that a company has "grown" and now must make money? OpenAI is a multi-billion dollar company right now, surely that's a point at which they should be profitable, instead of propped up by further investment and borrowing.
> We have very strong indicators that inference is not a money loser for these companies
All of the economic analysis that I've read strongly states the opposite. Running a GPU is a net loss /even for the data centre operators/. For them to break even, they currently charge OpenAI/Anthropic/Etc more than OpenAI/Anthropic/Etc make per-token.
It's a winner-takes-all market and everyone wants to be the next Google and not the next Lycos or AskJeeves etc.
It'd be interesting to see what they spend all the money on though as we seem to be hitting diminishing returns and I'm not sure if the typical enterprise user really cares about small improvements on benchmarks.
It seems like it'd probably be better to spend all that on marketing, free trials, exclusivity/bundle deals etc. ChatGPT already has a strong advantage there as it has so much brand recognition. I've seen lay people refer to all LLM's as ChatGPT like my grandparents did with Nintendo and all video game consoles.
It’s absolutely not winner take all. LLMs have become a commodity and the cost of switching models is essentially nil.
Even if ChatGPT has brand recognition amongst lay people, your grandparents aren’t the ones shelling out $200/mo for a Claude code subscription and paying for extra Opus tokens on top of that. Anthropic’s revenue is now neck and neck with OpenAI, but if tomorrow they increased the price of Opus by 5x without increasing its capabilities, many would switch to Gemini, GPT 5.4, Cursor, or any cheap Chinese model. In fact I know many engineers that have multiple subscriptions active and switch when they hit the rate limits of one, precisely the tools are so interchangeable.
At some point it could even become cheaper to just buy 8x H100s and host Qwen/Deepseek/Kimi/etc yourself if you’re one of those companies paying $3k/mo per engineers in tokens.
I have non-tech friends telling me about preferring other models like gemini, this feels like the early days of search engines when people were willing to switch to find better results.
> It's a winner-takes-all market and everyone wants to be the next Google
absolutely isn't! if billed per token, there is no reason to be married to a single model family provider at all. the models have very different strengths and weaknesses, you should be taking advantage of this at all times.
people used to say this about search engines and web browsers, as well
regardless, eventually Google became the universal default for both. When it comes to software, the average person doesn't shop around for the technologically optimal choice, they just use what everyone else is using.
Where to go next? I don't think anyone has gotten close to automating everyday PC usage, likely via screen capture and raw keyboard+mouse inputs. Imagine how much bigger would that market be than vibecoding.
tbh I don't think this use case is going to be as big as people seem to think
there are a lot of reasons, but in brief - I think AI desktop use is a product that the average person isn't going to get much value out of. to make an analogy - the creators of Segway thought people would buy them in large numbers, but it turned out most people don't mind walking manually (or at least, don't mind it enough to spend money on a scooter). I think makers of AI Desktop Use products are going to find out the same thing as it relates to everyday tasks like checking email and shopping.
I don't think it's winner-takes-all. Google is Google in 2026 because Lycos and AskJeeves were bad in comparison. The average user doesn't care whose LLM they're using because they're all close enough. It's hard to see past the bubble bursting, but I expect most people will use multiple of them depending on context (Copilot via the integration in windows, Gemini via Siri on their phone, etc), likely without paying.
> Magnificent 7 companies are increasing capex to their biggest ever to differentiate their tech from each other and the big AI labs, but the key realization is that they don’t have to spend it to win. It’s a defensive move for them, if they commit $50B, OpenAI and Anthropic need to go raise $100B each to stay competitive, which makes them reliant on investors’ money.
Stay competitive how? If the Magnificent 7 aren't spending the money, then how could it possibly hurt OpenAI/Anthropic to not raise equal amounts of money? Maybe you can pull together an explanation, but this author didn't even try to do so.
This piece seems poorly thought-out, but well designed to get shared.
Promote writers who will actually explain their claims carefully.
>> Building a datacenter is supposed to be a “safe” investment in normal times, so banks give private credit and mortgages to finance them.
Except the investment is more like a railway or utility. It generates like 3% return, which is definitely not good enough for the people providing the money, or (in the case of the profitable companies) anywhere near the double-digit returns they make on their technology products. I won't be surprised when we see consolidation of marginal players and abandonment of the losers, just like you can find rail lines to nowhere, and fiber that's never been used.
> Taking this into account, Google is extremely well positioned to weather the storm. When they announce capex expenditure, they don’t spend it overnight. They can simply deploy month by month until their competitors struggle to raise and get forced to capitulate. At that point they can just ramp down the spending and declare victory in a cornered market. They don’t need capex, they just need to make it very clear for everyone that nobody can outspend them.
Have you tried Gemini 3.1 lately? It is not even close to Opus 4.6 never mind Claude 5.
This post, like many pessimistic takes, seriously discounts innovation and the exponential takeoff of recursive self-improvement.
Exponential take-off is great until it stops- genuinely, what are the signals showing any of the large models are performing exponential takeoff and recursive self-improvement?
Currently a lot of that appears to be marketing hype to drive up usage. Is it exponential, or are the labs spending exponentially more for smaller and smaller gains from LLMs?
The article says "...and RAM prices are crashing because new models won’t need as much," and I went and read the link. The link was a puff piece for a very specific compression mechanism that...no one is using?
I do hope that RAM prices come down but this was just wishful thinking.
> They lose a big customer for their cloud services. Even worse considering that now, using the AI they helped fund, everyone can compete with their sub-par products. GitHub is a good candidate for disruption, and that’d be just the start.
Look, I'm a Microsoft hater like the rest of us, but calling Microsoft's products sub-par discredits the author a good bit. I invite anyone who thinks this to try and compete with them. Go after something like Word, for example. Then prepare to be awed by what some of the most brilliant programming minds ever can produce after grinding for four decades.
Microsoft's AI, on the other hand, is underwhelming at the moment and might well go the way of Windows Phone. Plus enough people hate the copilot icons everywhere that Microsoft is hinting at dialing down a bit.
MS Office should last a while if they stop calling it "Copilot 365 Office" or whatever it was.
You can have an opinion about a tool as a user, without ever having ability to create such a tool yourself, that's literally what every tech and auto reviewer does.
There are some frustrating parts, but subpar is an odd way to describe GitHub to me. I’m pretty happy with what they’re doing, and find the UX super helpful. I do agree Actions needs a debug mode but otherwise I get a ton of value out of the service for $20/month?
I'm sure Word is full of arcane backwards compatible tricks that 20% of users use, but I find it hard to differentiate the Pareto 80% of the product from Google Docs or any other competitor (LibreOffice?) Adding rich text, tables, headings and colors is pretty much a solved problem for all of these softwares. Adding images or handling more complex layouts sucks everywhere, it's not like that Word has a great user experience and the other don't. All of them are bad.
IMHO, if we had any of the competitors being the de-facto standard for word processing, the vast majority of users wouldn't feel the difference. Power users would for sure, but I'm not sure they're many or they use existential features. If Word didn't have a near monopoly in office settings due to aggressive marketing, OS presence and a proprietary file format that constantly changes and never renders well outside of Microsoft products, it could disappear without anyone (save Microsoft) losing much.
Yes. That 80% you find useful is served fine by Google Docs, but there’s a good reason the enterprise overwhelmingly goes for Word, and it lives deep in that 20% and a lot of the time has zero overlap with others.
A lot of this make me imagine an Aeroplane flown by a mad pilot, overloaded and running out of fuel. The passengers are all blaming the guy sitting in the back knitting a parachute and telling him that the chute will never work because the wool is the wrong colour.
The tragedy is when it's all over one of the surviving passengers will go "See! I knew we were going to crash because of that knitter"
History doesn't have to repeat. There's barely anything else going on in terms of innovation, and AI is a real step function technology. We might be overspending but there's no way we're getting another AI winter like last time (remember last time investment in 90s AI had to compete for resources with the internet boom).
> AI is here to stay. If used right, chances are it will make us all more productive. That, on the other hand, does not mean it will be a good investment.
The dotcom bubble burst and 26 years later we’re all hopelessly addicted to the internet and the top companies on the stock market are almost all what would have been called “dotcoms” then.
The railroad bubble burst in 1846 not because trains were a dead end - passenger number would increase more than 10x in the UK in the following 50 years.
From the beginning of this I’ve wondered the same question: how do these companies justify spending such massive amounts now (and 3 or 4 years ago) when software and hardware efficiencies will bring down the cost dramatically fairly soon?
They basically decided that scaling at any cost was the way to go. This only works as a strategy if efficiency can’t work, not if you simply haven’t tried. Otherwise, a few breakthroughs and order of magnitude improvements and people are running equivalent models on their desktops, then their laptops, then their phones.
Arguably the costs involved means that our existing hardware and software is simply non viable for what they were and are trying to do, and a few iterations later the money will simply have been wasted. If you consider funnelling everything to nvidia shareholders wasting it, which I do.
The decision is the right one. Scaling at any cost is the right way to go.
You cannot find the efficiency if you haven't been experimenting at scale, this is true personally as well.
If someone haven't been burning a few B tokens per month, everything coming out of their mouth about AI is largely theory. It could be right or wrong, but they don't have the practice to validate what they're talking about.
Not everyone scaling to that degree would have the right answer or outcome, many would be wrong and go bust. But everyone who didn't will not have the right answer.
They're not just betting on the current tech, they're building out infra like this because probably any future tech currently being researched will also require massive data centers.
Like how the gpt llms were kind of a side project at openai until someone showed how powerful they could be if you threw a lot more parameters at it.
There could be some other architecture in the works that makes gpts look old - first to build and train that new ai will be the winner.
I think their current goal is to capture as much market as they can while they still have the best models, their only moat. Look at Anthropic, they are clearly trying to lock their users in their ecosystem by refusing to follow conventions (AGENT.md etc) and restricting their tools exclusively to their own services.
Because whoever wins the AI race (assuming they don't overshoot and trigger the hard takeoff scenario) becomes a living god. Everybody else becomes their slave, to be killed or exploited as they please. It's a risky gamble, but in the eyes of the participants the upside justifies it. If they don't go all in they're still exposed to all the downside risk but have no chance of winning.
I don't expect hardware prices to go down unless the third option (economic collapse) happens before somebody triggers the dystopia/extinction option.
This is an awful article. I don't know how it reached #1 on HN.
Bottom line is that H100 prices are near 3 year highs, A100s are still profitable to run, B200 prices are increasing, no one has enough compute. Google, OpenAI, Anthropic, Meta, AWS, Azure are all compute constrained. Every single one of them said so publicly. Neo clouds are telling customers they're all sold out now and you even have to book compute in advance if you're an AI company.
OpenAI is struggling to monetize. They turned to showing ads in ChatGPT, something Sam Altman once called a “last resort”, while Anthropic is crushing them with the more profitable corporate customers and software engineers.
AI bubble is bursting because OpenAI is trying to monetize free users on ChatGPT with ads but Anthropic is kicking butt in AI. What kind of logic is that? So it seems like AI can be monetized as Anthropic shows. Is AI going to burst because OpenAI can't monetize but Anthropic can?
I wouldn’t be surprised at all if in the next couple of quarters we see OpenAI looking for an exit. It will be interesting because the sizes are now so big that we will probably know all the details. The most likely buyer is Microsoft, they already own a lot of it, and because of that, they are the most interested in showing a win.
I'll take the opposite stance. I think OpenAI is going to be bigger than Microsoft in market cap within the next 3 years. I think Anthropic and OpenAI are going to run laps around current big tech except maybe Google. For example, in a few years, I think AI agents could completely replace Microsoft Office, Microsoft's cash cow.
Independent reports state that Claude metered models are priced 5x more expensive than their subscribers pay
Already dispelled. It isn't 5x more expensive than their subscribers pay. Inference has a gross margin of 50%+. It's been repeated over and over again by Anthropic CEO, OpenAI CEO, and just about anyone who's done deep analysis on token profitability. If you don't believe OpenAI and Anthropic CEOs, just look at inference providers on Openrouter. They don't have VCs backing them selling tokens at a loss. They should be making margins on every token in order to keep the lights on.
> I think OpenAI is going to be bigger than Microsoft in market cap within the next 3 years.
I am yet to see how a one-legged business model with just a single product (that is not crude oil), without a plan and money is going to become sustainable. Oh yeah, maybe they'll finally make money on those autonomous lethal weapons. That sounds the easiest.
Sure. I'll give you a basic plan without any insider knowledge on OpenAI.
First, OpenAI and Anthropic are the leaders in model capabilities. Google is a close 3rd but 3rd nonetheless.
Second, ChatGPT likely has about 1 billion active users right now. I think ads on ChatGPT will surpass even Google search ads in the future. There will be a class of users who will never pay for ChatGPT subscriptions and that's ok. Meta and Google are two of the most profitable companies in history who almost rely solely on free users for their cash cows. "Ask ChatGPT" is already "google it" for the masses.
Third, there is so much untapped revenue potential from science, medicine field that OpenAI can eventually own with Anthropic. Microsoft stands no chance here since they can't build competing models.
Fourth, I can easily see ChatGPT morphing into agents for consumers and people will pay for them.
Just some basic ideas based on public knowledge. I'm sure there are plenty more.
I'm not going to bet my house that OpenAI will become bigger than Microsoft in 3 years, but I'll put down a few hundred dollars on this bet.
I would be very sad to lose services like ChatGPT. It has significantly improved my workflow by digesting and analyzing huge documents, and helping me to synthesize and respond better. May be I am part of a minority.
If somehow recovering the capex expenditure is not counted, if somehow the cost of developing future models is not counted, then yes, inference costs of current leading models allow a profit.
But those things are tied together.
Even xAI, that now has a reasonably competitive model, is struggling to achieve PMF. Meta is in shambles because their models have underperformed for years now.
The problem with these kind of posts is that "How" is almost useless, I can tell you how the bubble pops: The value of these AI companies crash and take out a lots of other stuff with it.
The interesting questions are: "What triggers it" and "what also goes tits up"?
The issue with high/international finance is that a good percentage of it (if not more) is fraudulent or semi fraudulent bollocks.
"Here is a startup that is worth x million because y" Both of those statements are bollocks. However its in the interest of most people to agree with that bollocks to get money. If enough money is given there is a chance that the startup will make money.
If we look a few year back, NFTs fulfil that niche quite nicely. It was obviously bollocks, but a very convenient way to launder money, or run a series of rugpull operations.
The problem we have to contend with now is that the sheer amount money that has been invested all disappearing at once would require 2007/8 levels of coordination to unfuck. The US government does not have the requisite number of admins to pull that off again, and no political will to ever have that expertise again. So if AI does go pop, and it takes a lot of money with it, I would put a guess on china doing the money lubrication and extracting a subtle but richly ironic level of control in exchange
Also, its no guarantee that AI will trigger the next bubble popping, my money is on Private Equity.
> The problem with these kind of posts is that "How" is almost useless, I can tell you how the bubble pops: The value of these AI companies crash and take out a lots of other stuff with it.
That's like saying "I know exactly how you're going to die, your heart will stop"
Okay lets suppose all those companies are profitable if training would stop today. What if token demand is shrinking ? I think big parts of the current demand is artificially build by e.g. FOMO and marketing without real value generated by them. There is no indication in economic data about some productivity boom resulting from AI usage. Next thing is Energy costs - that will soon eat into profitability too. I don't see how this bubble can't burst.
> OpenAI is struggling to monetize. They turned to showing ads in ChatGPT, something Sam Altman once called a “last resort”, while Anthropic is crushing them with the more profitable corporate customers and software engineers. Their shopping feature flopped and they shut down Sora, both supposed to be revenue drivers.
I don't think Sora ever thought of as a "revenue driver" considering how notoriously expensive and unpredictable video generation via inference is. OpenAI is just a repeat of Uber—minus the scandals—in a different decade. Uber got itself into tons of businesses related to transportation on the assumption that it would all be viable "one day." Same stuff that OpenAI is going.
I would say, once the bubble bursts—which is likely, considering the geopolitical environment—OpenAI, Anthropic, and Alphabet are likely to be the winners, with a lot of small players at the tail end. Anthropic won over programmers and OpenAI on everyone else. For millions of people, AI = ChatGPT, so I would bet that OpenAI can still become profitable, once they cut down their expenses.
I don't see this bubble really popping as-in sinking the economy. Some circular investing and enough write offs will happen to avoid the largest recession indicators from informing the general population that there's actually a recession. You also have a government willing to do shady shit for their own benefit at the expense of responsible governing and ethics, and we have already seen the business leaders of the biggest tech companies cozy up to the administration.
My guess is that cloud companies will scoop up the data centers for pennies on the dollar and the GPUs get written off or fire-sold to enthusiasts still wanting to run local models. Then they can offer exceptionally low initial prices to new customers and get more people to be locked in. Or maybe we see a couple of new cloud companies start up but that would likely need lower interest rates.
DC infra will be scooped up by cloud guys, that's a given. As for GPUs.. well low-precision tflops have other uses besides inference. You can run Doom for example.
I think ultimately the AI bubble is bound to burst solely based on the fact that no AI company has turned a profit. A business model consisting of pure speculation on profitability when profit has not come in for 4 years now indicates that the tech industry is over-betting on AI. That plus consumer backlash at the way AI is jacking up consumer prices on RAM and etc means that the bubble is bound to burst. To paraphrase Linus Torvalds, AI is a helpful tool but I look forward to the day it’s a regular part of life and the hype cycle ends
I could see OpenAI hitting financial issues which triggers some media induced panic and for people to claim the AI bubble has popped.
However, the core utility of the best AI (read: Anthropic's ATM, by miles), will still exist and be leveraged by those who have learned to use it well.
I could also see the exponentially declining power requirements offsetting the exponential-but-slower rate of AI compute demand, which then renders a lot of unused capacity in these massive data centers.
I think of it like the old mainframes in the 70s which would take an entire city block to run, and now we have the equivalent of millions, if not billions of them in our pockets.
Anthropic isn’t the best by any reasonable measure. They’re the best in some areas and get pwned in others.
In general AI is very much like human intelligence in the regard that no two models are the same just like no two people are the same. IOW if you are a single model shop you might even not have any idea that you’re falling behind.
A lot of anthropic's recent improvements are coming from the task focus and improved orchestration around the models, not purely massive changes in the models themselves.
This bodes well for us being at a point that even if the bubble burst, we'd still have usable AI going forward.
The coming months are the reckoning in which the poor quality of the tooling and the safeguards around them become evident and hopefully eventually rectified.
By which I mean the competent organizations are the ones that will come up with cultural and technical solutions to manage the quantity and quality of the code better.
Others will suffer severe quality issues. Not because the "AI"s produce inherently inferior code but because the volume of the code is too high to manage review of, and to have good internal organizational knowledge of to manage the pages in the middle of the night when servers go down because of code nobody really understood.
I produce masses of independent project work all day long in my spare time using these tools and they blow me away. But in the context of professional work on teams of other coworkers the results are difficult to reason about and often impossible to competently review and it's not clear the results are superior.
'
IMHO companies that drink too deep from the well without caution could be burned badly.
Aside:
I hate to say it, but there is no sense in which Anthropic has the clearly better product than OpenAI at this point. I know Claude caught developer's hearts through the fall, but GPT5.4 is a more powerful, careful, and competent model for coding and Codex is a far less buggy and more performant TUI. For the last 3 months I've gone back and forth between the two and I always run anything written by Claude Opus 4.6 by myself and my coworkers through Codex for review and it is constantly finding severe correctness issues to the point where I simply won't subscribe to Anthropic's product anymore.
On top of that, OpenAI provides far higher token limits. Even their $20 plan goes quite far.
If I was just building crud websites, probably Claude Code would be fine, and it does indeed show more "initiative" and "imagination" but I've seen it build way too many race conditions and correctness issues to trust it or the work my coworkers make with it.
Consumers and retail investors will bear most of the brunt from this bubble. Even taxpayers, as the government will most likely bail out the "too big to fail" ai companies in the "race against China". All based on bullshit, hype, and greed.
Excellent reading to realize how the rich greedy investment monkeys with no plan other than "let's build a data center" will ultimately drag the market and the economy down. This time it may not explode as abruptly as in dotcom era, but will slowly sink as the stupid US data center boom proves unprofitable. Billions burned for nothing more than a run for the money.
Just checked and my API bill for this stuff is about $2.50 this month. Am I really the minority here? I know there is a lot of kids into the openclaw and paying for subscriptions and stuff, but after that literally no one I know (who isn't a developer) is paying for it, and seemingly would never dream of paying for it. It would be like paying for Gmail to them I think.
I just dont understand why it justifies so much spending!
nope, nothing will either directly or indirectly affect me. Let it happen sooner, rather than later, and unleash the mobs at the tech bros that set the world on course to make everybody's life more miserable. We'll still be here to get the scrapped RAM and GPUs to train and infere local models thank you very much.
The current best models are already very capable of disrupting the job of millions of people. I don’t think a scenario where we just go back to pre-Claude Code exists and I’m sure the same models can be tuned for much of other white collar work at similar capability
might be: there is too much busy work as it is, but we need people to work in order to make money in order to spend it in order to keep the circus from going under. It's the circle of life
Let me remind you that you are not paying the full price for the service and all the value of those company is out of thin air. More or less the premise of the article. *when* you will be asked the real price, we'll see if the company will prefer a human or a bot it can't pass blame to
People keep saying this but nothing of the sort has happened.
People continue to work, some proportion of the those working use LLM’s regularly.
Enough time has passed that subjective statements about the future don’t pass muster. Look at the numbers - there has been no large scale lay offs since correcting for over hiring. Has hiring slowed down? Sure. However I’d wager most firms are finding it pretty difficult to think of projects to take that will generate positive NPV. If that’s the case why would they hire? Moreover the focus has returned to cash flows - not product based growth metrics. Which again re-inforces the point about project selection.
Efficiency generated growth does not continue on forever - it’s short lived.
.... so what? the technology exists, the models exist. Even when the bubble bursts things will not go to the state "before AI". Even if model development would stop today (not the worst thing to happen) it would still be the most impactful invention since the printing press
Another possibility not really addressed here --- local LLMs.
AI on hardware you own and control --- instead of a metered service provider. In other words, a repeat of the "personal computing" revolution but this time focused on AI.
Yeah, I don't think local LLM's will keep up with what the massive corporations put out. But they might get to a level of performance where it just doesn't matter for most users.
And people would prefer to run a model locally for 'free' (not counting the energy cost) rather than paying for an LLM subscription.
Local LLMs don't sound profitable at all for those building them. If you really wanted a SOTA model, you would be paying eye watering amounts to own it unless you got an open sourced one.
Aren't you conflating the technical side of it with the economic one?
A bubble doesn't necessarily mean that the the underlying tech/innovation isn't useful. It's a financial and economic phenomenon that is pretty well understood and researched:
- During the hype cycle, investors tend to overestimate the short to mid term effects and underestimate the long term effects.
- It's near impossible to pick the winners in advance, and research has shown that investors underestimate how many losers there will be.
- The financial system/market works very well when there are localized issues with debt. Those get seemingly automatically detected and repaired. But broad increases in credit not so much. Those spread into the whole system in non-obvious and complex ways and destabilize the whole system, which can lead to very large corrections.
Thanks for clearing this up, as I don't work in that area.
Personally I'd say that it's a problem that prices of consumer goods go up that far to satisfy this part of the market. We could need a more sensible way to advance the technology.
- AI is a genuinely transformative technology on par with the internet and on track to probably surpass the smartphone
- The inflated valuations, the circular flows of money (or "money"), and the financial cup-shell game mean that the players of the game are all a few bad weeks away from catastrophe. This is, of course, nothing new for SV -- but the scale this time is new. Some believe it will soon collapse -- "bubble," thus.
The question is when will the frontier AI companies turn a profit on said transformative technology since other than NVIDIA and big tech, it is losing them tens of billions and who will survive a crash when it comes.
This is when you know you are in a bubble when people with a clear financial incentive are going on to newsletters, podcasts and posting extremely outlandish predictions to sell the public on something.
The amount of engineers becoming snake-oil salesmen and vibe-coders becoming cybersecurity experts overnight selling AI courses is a good indicator which I am looking at.
Reality begs to differ [0] and following the link for that text goes to an article [1] where they talk about Google's TurboQuant which supposedly will lower the RAM requirements. Now if that means RAM prices come down (as speculated, not reported on, in the link) or the AI companies just do more things with their extra ram is yet to be determined. The fact this article links there with text "RAM prices are crashing" throws the entire rest of the article into doubt for me.
RAM prices are most certainly not crashing (yet) and treating it as a forgone conclusion because _one_ lab found gains could be made and hasn't even reported on the efficiency of their method is just irresponsible. It's almost as bad as when LLMs link things to prove their point, you visit the link, and find it says nothing of the sort or even the opposite.
[0] https://pcpartpicker.com/trends/price/memory/
[1] https://tech.sportskeeda.com/gaming-news/how-google-s-new-tu...
I think it is determined:
https://en.wikipedia.org/wiki/Jevons_paradox
The fact that public LLM usage is leveling off at a price of $0 and Jensen "we make the shovels in this gold rush" Huang is rather desperately claiming that you need to spend $250k/year in tokens to be taken seriously suggests that demand saturation may not be that far off.
Whether Jevons' Paradox applies to software engineers I think is another open question. Im constantly being told that it doesnt and that LLMs make half of us redundant now, but Im skeptical - so much automation I see is broken or badly done.
This is the first or second inning in the LLM rollout. It'll take 15-20 more years for full integration of AI agents into the life of the typical person.
The claw experiments for example can just barely be considered alpha stage. They're early AI garbage unfit for the average person to utilize safely. That new world hasn't gotten near the typical person yet.
The compute requirements to get to full integration of AI agents into the life of the average person - billions of them - is far beyond 10x where we're at now.
The recent blog post from Google announcing TurboQuant does not change anything regarding RAM planning for the big labs.
TurboQuant itself is already a year old! So even smaller labs have probably seen and implemented it.
Cars come to mind instantly. Prices exploded in 2020/1, due to legitimate shortages, most of which have been plus or minus resolved, but the prices for new (and used!) cars never came back down.
My personal prediction is that once the VC bill comes due and prices for frontier models starts to climb, competition for efficiency will heat up. The main AI use-cases seem to be falling into buckets, and I doubt serving gigantic, do-it-all general models for every use-case under the sun is remotely cost-effective.
If common use-cases start to be more efficiently served by smaller, more efficient purpose-built models (or systems thereof), it'd make the big frontier models increasingly niche. Cursor's Composer 2 model is a great example of this.
In any case, I think it's pretty fair to speculate we may be seeing RAM prices start falling sooner rather than later.
> In any case, I think it's pretty fair to speculate we may be seeing RAM prices start falling sooner rather than later.
I sure hope so. RAM, HDDs, and SSDs are all crazy-high right now and I was in the market for literally all 3 but have paused all my buying because I can't justify the costs as they stand today.
Stock price is the best forward indicator I can think of
To be fair, they got it from us. This happened to me plenty of times long before modern LLMs.
I haven't looked closely into TurboQuant, but perhaps it will revolutionize just as much as the 1-bit llm did...
Jevons Paradox. When are we going to learn that efficiency gains in AI does not decrease hardware usage?
Given Nvidia's CEO's agitation I would give credit to the prediction, and if it's correct the price will go back to what it was, or even lower of investment in capacity are made today.
A RAM price drop due to some magic efficiencies assumes everything else doesn't change, which I doubt anyone honestly thinks will be the case.
Honestly you're both wrong. RAM prices spiked speculatively, and they're going down for the same reason. Market people always want to argue in fundamentals, when in practice *ALL* the high frequency components of the signal are down to a bunch of traders trying to guess where it's going in the short term.
At best those guesses are informed by ground truth ("AI needs a lot of RAM!" "Sam cornered the marked!" "TurboQuant needs less RAM!"), but they remain guesses, and even then you can't tell the difference between that and random motion.
Didn't OpenAI buy up 40% of the capacity all at once?
Freshman economics would say that supply is fine and that prices shouldn't move. But they did anyway. And the reason is speculation.
Have we gotten anymore word on the potential Helium constraints that SK Hynix was making noise about after the strike on the helium plant in the Middle East that suppplied 60% of S. Korea's Helium? Because that could definitely put a kink in things, since SKH is one of the 3 remaining big DRAM producers.
The cost to serve tokens is absolutely profitable today and that’s been true for at least a year. What’s unclear is how R&D and capex fit into the picture. I am not that pessimistic on this front either though. For the data center build outs, demand for tokens is still exceeding supply. On the R&D front, well most of us here on HN have benefited from decades of overinflated engineering salaries being paid by often companies that were not profitable and not only unprofitable, usually without a plan for success. In this current rush, companies cannot keep up with supply, it’s a much easier math problem when you have something that people want (tokens) and you need to figure out profitability when including R&D.
And unlike the traditional "this will replace humans right away", I think what this introduce is a lot of incentive to spread those token in places where there was never any incentive to hire a software engineer for previously. In turn, that will drive a lot of business activity in those area that will potentially fail given the current quality of the output.
This feels like a boom before bust scenario, and I'm not even sure if it will bust.
Seriously, what value are tokens providing other than justifying layoffs. Concretely. Today. Not in the speculating scenario that cardiologist could be replaced with models.
We see this new trend of agentic coding, again a promise software will be written that way going forward, despite the number of fiasco already experienced when trusting a model turned bad. The use case may provide value, but right now all it does is fullfil the push for token consumption all these AI leaders are advocating for.
Agentic coding absolutely blew up from demand, users are not being tricked into paying $200 a month, and they’re not complaining about hitting rate limits because it’s useless.
All that said the dotcom boom is extremely analogous and that crash was quite bad.
It's adding tests for me and doing medium complexity refactors that I'd otherwise have to spend hours on
And based on reality (code) rather than my feelz of what I vaguely remember the code to have been doing in some long past.
there is an even larger force on HN that financially _needs_ the value of tokens to be inflated (so much so that bots have overwhelmed the site)
Like the OP said, it's incredible how polarizing this debate is. When I read comments like yours, I feel like a significant part of the global workforce in IT must be living on another planet? Or they never really used Claude Code, Codex, OpenCode, ... intensively before because of company policies?
I legitimately am at least 10x more productive than a year ago, and I can prove it in number of commits and finished monetizable features developed per day. Obviously my workflows still very much require an active, constantly context-switching human-in-the-loop, but to me there's absolutely no question both output volume & quality have skyrocketed.
That claim is totally worthless without you providing concrete information how you measured that.
The question is how big the fail is if you measure it in 3 month increments going back to late 2022.
As long as there are more amount of success, then it should be net positive.
> For the data center build outs, demand for tokens is still exceeding supply.
Can you provide any numbers for this?
Now we don't know the true size of any of the proprietary models, but my educated guess is that Sonnet is in about the same parameter range, just with better training and much better fine tuning and RLHF. Yet API pricing for Sonnet is $3/MTok input + $15/MTok output, exactly six times as expensive. Even Haiku is twice as expensive as Kimi K2.5.
I find it difficult to believe in a world where those API prices aren't profitable. For subscription pricing it's harder to tell. We hear about those that get insane value out of their subscription, but there has to be a large mass who never reaches their limits. With company-wide rollouts there might even be a lot of subscription users who consume virtually no tokens at all.
This is false. We may assume it's the most efficient way of generating revenue given their GPUs, but their overall profitability will just be a guess. They would still have incentives to run hardware at maximum, even when it's uncertain to eventually recoup costs.
> a world where those API prices aren't profitable
A lab with employees and models in training has other costs than the operating expenses of a GPU farm.
Are you sure? Surely there is a lot of interesting data in those LLM interactions.
But that's moving the goalposts? The original claim was on inference itself, not the whole company.
> The cost to serve tokens is absolutely profitable today and that’s been true for at least a year.
That gives you a very good estimate of "how much can you serve the tokens of a model of the size N for while making a profit".
Now, keep in mind: Kimi K2.5 is 1T MoE. Today's frontier LLMs are in the 1T to 5T range, also MoE. Make an estimate. Compare that estimate with the actual frontier lab prices.
For supply look at outages and growth rates at companies like openrouter. The demand is growing every week.
This is why switching to local open weight models saves a lot of money. (Even though it’s not apples to apples.)
300k tokens for that hour.
OpenAI charges $6.
Those are pessimistic assumptions.
[1] https://lambda.ai/instances
It’s insane
'Overinflated' relative to what? You make some good points but I don't accept this as a premise.
Median senior SWE salaries in SF: https://www.levels.fyi/t/software-engineer/levels/senior/loc...
Median income in metro areas: https://www.cnbc.com/2024/07/11/the-median-salary-for-the-25...
Engineering salaries are significantly higher than nearly every other industry on average and on median. Much of this is driven by VC funding rather than sound, profitable, bootstrapped businesses with sustainable profit margins.
Engineering salaries have also been driven upwards significantly the past ~10 years (since the post-2008 crash recovery), while wage growth in the US is mostly stagnant. I don’t have a source handy for that, but there are plentiful studies.
Outside of the US this may be less true, but I took GP’s “most of us on HN” to mean people who work in US tech companies which are primarily concentrated in high COI areas.
now compare the profit per employee at tech (software engineering) companies and those industries..
Currently on a given day I'm chewing through approximately the equivalent of my lunch money, but where there's opportunity to extract wealth, someone will find a way to do it.
The wealth of great open models provide an excellent base for fine-tuning, distillation, and RL. I see a lot of untapped potential in the field of bespoke, purpose-built models that can be served far more cheaply than the frontier competition. I would not be surprised if we see frontier-adjacent experiences running comfortably on a Mac Mini by year end.
With frontier models seemingly hitting diminishing returns in quality, I struggle to see a world in which gigantic, expensive, general-purpose models don't become increasingly niche.
But there is no real higher limit. Imagine a LLM which could answer the question "what does my company need to do to beat the competition?". And then realize that the competition asks their LLM the same question. So now everybody is bidding the price up or using more tokens to get a better answer
How can you possibly say that? Everyone knows that's not the case, these companies are losing money every day selling tokens. Revenue is not the same thing as profit.
This is why they were freaking out about DeepSeek just taking the trained model weights and slapping an interface on it.
Of course they are profitable if you ignore their cost to bring a product to market.
Yeah it should be factored in, but it’s a different set of implications for long term sustainability. They don’t actually need to test and optimize a new menu every day or week. If they decide to just stick to the same one longer they can get way more return from each dollar spent on development. It’s just that right now the rate of improvement you get with training is really high and nobody can afford to fall behind their competition.
Can you explain why you know better than the analyst at Cursor cited in this article?
I feel like giving a Richard Nixon lecture now.
> well most of us here on HN have benefited from decades of overinflated engineering salaries being paid by often companies that were not profitable and not only unprofitable
This is a really concerning perspective: people were paid what they were worth. Software is or was one of the few remaining arenas wherein a person can find a middle or upper middle class lifestyle consistently.
I will also note: a startup raising an 8 MM series A and eventually fizzling out is not the same at the hundreds of billions invested in these AI companies without a path to profitability. It is utterly absurd to pretend these are the same thing: any company ingesting that much cash needs to justify its capacity to survive.
Software salary inflation and expansion has made this the case. Tech’s accessibility to the educated has accelerated gentrification massively, rising up prices on rent and food. While the statement is correct, tech’s contribution to income inequality is part of the issue. If you’ve lived in Austin or Chicago (especially Austin) prior to ~2010 you’ll have seen this first hand.
The salary jab was probably a little harsh.
Your ending is a bit of a fizzle too. There are many capex intense businesses that do just fine.
I want to add something additional to this: it is one of the few fields that can afford middle or upper middle class lifestyle and is accessible.
I have no doubt if I could redo my life with the necessary resources I’d be more than capable of putting myself through med school and gone with a secure career that paid more than I ever made in software.
But at this stage of life? I don’t have the time or money to spend a decade+ paying some institution tens of thousands of dollars to hopefully maybe have a real career.
Once software as a career dies, I suspect many will find themselves locked out the middle class for generations.
On the other hand, once software as a high paying career dies there will be nothing to prop up the status quo (high cost of housing, for example) so the middle class will return to being much more accessible to modestly paid jobs.
What, why? There are tons of low-margin capex-intensive business out there.
I think AI will end up like being like hosting. All the models will converge to being pretty-decent and the companies will have to compete on efficiency since they are selling a generic commodity.
You can already see Anthropic fears this scenario since they try so hard to make people use their first-party tools rather than plugging Claude in as a generic part of a third-party stack.
LLM hosting is the next VPS.
Even interpreting what-they-were-worth in the usual sense, I’m not so sure about this. We have seen wage collusion reported by the usual US West Coast-based companies. And some news on here[1] have reported that some engineer with a salary of $100K[2] might be producing $1M of value. And even factoring in the usual “but benefits and overhead” comes out to a solid factor of profit per programmer/engineer.
Despite that the sense I get (only from this site since that is my only reference) is that the so-called overpaid engineers are incredibly content to just have this happen to them. As long as they are paid well compared to other workers, it’s fine. No matter the profit factor. In fact, the discourse is very much focused on how “privileged” they were if the tide ever changes. Instead of realizing how much value they provided, collectively.
Outlets for capturing more of the value they create is entrepreneurship (Hello HN). Never any collective organizing. And entrepenurship is easily bought via aqcuisition.
Collective bargaining would have been relevant in case they ever get automated... by the very software they co-created.
One could imagine that this “privileged” collection of programmers could have served as a vanguard for the collective good of programming professionals as well as collective ownership of software goods, using their privilege to that end. The former never happened, and the latter is partly realized in people’s free time (see the OSS maintainer in Nebraska meme).[3]
[1] All from recollection since this is just news from the Frontier to me
[2] Of course the pay might be much higher now; this might have been a while ago
[3] when it isn’t simply exploited by corporations just using OSS without giving any back; a logical turn of events when no license or law forces them to contribute back
Well I’m sure they’ll be thrilled to know they can collect $100 a week more in unemployment benefits than their neighbor.
The parent comment doesn't discount that, only pointing out that "what they were worth" was inflated due to a speculative environment. Wherein lies your concern?
Salaries across industries in the US have remained flat since the 1970s. Calling the one sector that can provide access a middle class lifestyle inflated s to play into a narrative capital is eager to tell, even if OP didn't intend that.
What do you mean? The real (meaning adjusted for inflation) hourly wage in the US has increased by around 20% since 1970.
What has changed since the 1970s is that wages are no longer coupled to productivity. Perhaps that is what you are thinking of? But that should be an obvious truism for anyone in tech. We create the very things that cause that to be the case!
“Inflated due to a speculative environment” is not an accurate way to frame labor prices that held for many years. At that point, the prices were simply high due to high demand relative to supply (compared to other types of labor).
That goes without saying. The investigation here is into demand. Which was said to be overinflated due to speculation. As noted, many of the companies hiring the developers did not have viable businesses.
In accounting, almost anything you want can be true, at least for some time.
This is most likely wrong. Lab executives insist that serving tokens is profitable. It's the cost of training next-gen models that requires them to keep raising ever larger rounds. More importantly, many independent providers price tokens of open-weight models at a fraction of Anthropic's prices.
OpenAI' numbers show that they definitely are not profitable on inference, and even worse, revenue growth scaled linearly with inference cost from 2024 to 2025, which means they can't outgrow this problem. See https://www.wheresyoured.at/oai_docs/
Not hard to believe they're lying about other things when they've been lying about the capability of their products since inception.
[1] https://www.reuters.com/commentary/breakingviews/anthropic-g...
I don’t necessarily see a contradiction. $19B run rate, achieved very recently, is actually consistent with $5B lifetime earnings, because their growth curve is so sharp. Zitron is not good at math.
Did Enron start a business school I'm unaware of something?
I'd be surprised if they're making money on inference just from that. There's no way someone paying $20 p/m and using it all day is not spending way more on even just the electricity for tokens, let alone the capex.
Key points - if you compare it to openrouter costs for ~similar sized models it is ~90% gross margin.
And this claim came from Cursor - not Anthropic!
Maybe marginally profitable, but right now they need to give out subsidies for people to use their products (Antigravity, Codex, Claude Code et al) in an actually useful manner that prevents churn and at the scale they need to justify usage growth forecasts, which they need to keep the wheel turning.
Probably if you look at the users who exclusively use the simple chat box interfaces (i.e. ChatGPT, Gemini in UI, Claude in UI) plans it is actually profitable, but I'd also say that's not where most of the usage comes from.
I'd love to actually look at both usage + profitability from each user segment to see if their PxQ growth expectations from non-enterprise usage make any sense.
> Many independent providers price tokens of open-weight models at a fraction of Anthropic's prices.
Are those open-weight models as good as Anthropic? Are they the same parameter class?
The question is more around the moats that these companies have and it seems to me while their models are amazing technology, they don't really have a moat. The open/chinese models still continuously catch up to the american ones.
Another scenario is that dense models get replaced entirely, in which case the likelyhood of OpenAI and co pioneering the concept is pretty slim. They will be left with billions worth of infrastructure which cost them 10 times that 2 years earlier, faced with the reality touched by the article: liquidate.
They would have a period of great margin, followed by possibly zero margin as enterprises move to free options.
They would have to come up with a lot of great products around the inferior models to justify charging at that point.
I suspect that once the models hit a point of “good enough” for certain use cases companies will start putting R&D focus in other areas that may be less expensive. Like figuring out how to run more efficiently, UI/UX conventions that help users get what they’re trying to accomplish in fewer steps, various kinds of caching of requests, etc. So the cost to serve tokens over time should only come down, and will probably start coming down more rapidly as the returns to model training slow down.
That’ll probably be a while though, because each successive model tends to be a lot better than the last.
Even so, their subscriptions are significantly cheaper than the token pricing via API. So at some point they will need to get rid of subscriptions or increase the subscription prices dramatically... And that's assuming their current token pricing is actually profitable. Which it probably isn't.
Lastly, I would not trust one word that comes out of an executive of an AI company (or any other large company, for that matter).
I'm not saying they're wrong, but I don't take much stock in their words.
> OpenAI is struggling to monetize. They turned to showing ads in ChatGPT,
The ads aren’t going into your paid plans (except maybe a highly discounted tier, depending on the market). The ads are a play to offer a free version. Having an ad-supported free tier isn’t new.
The discussion about being unprofitable also repeats the reductionist view that these companies are losing money and therefore the business model doesn’t work. This happens with every VC cycle where writers don’t understand that funded companies are supposed to lose money while they grow. That’s what the investment money is for.
We have very strong indicators that inference is not a money loser for these companies and is likely very profitable. They should be spending large amounts of money on R&D to get ahead and try new things while they’re serving up tokens.
The “but they’re losing money” argument never seems to be brought out against competitors that literally give away their models for free and for which we can calculate the cost of serving 400B-1T parameter open weight models.
Sounds like it is new for ChatGPT though. That's also how it started with TV and Youtube, first on the free tier then expanding to the paid ones.
Why is OpenAI specifically losing money hand over fist then?
However, it seems to make a lot of sense. Anthropic literally added $6b ARR in February 2026 alone. I doubt training costs go up that fast.
This statement doesn't discount the original statement: that ads are going into GPT, which Sam called a last resort.
> The discussion about being unprofitable also repeats the reductionist view that these companies are losing money and therefore the business model doesn’t work. This happens with every VC cycle where writers don’t understand that funded companies are supposed to lose money while they grow. That’s what the investment money is for.
Usually propped-up companies don't last in the long term once the VC subsidy runs out. There's a difference between getting VC money in order to buy rocket parts, and getting VC money in order to charge $7 when you would really need to charge $10. The latter problem never goes away.
yeah i was wondering why my bullshit detector was going off. This feels as if someone who cooks for Ramsey's kitchen is trying to predict the end of the market hike.
To be fair people aren't exactly bullish on the prospects of deepseek or z.ai either, it's just they're below radar so they don't get mentioned.
At what point do we declare that a company has "grown" and now must make money? OpenAI is a multi-billion dollar company right now, surely that's a point at which they should be profitable, instead of propped up by further investment and borrowing.
> We have very strong indicators that inference is not a money loser for these companies
All of the economic analysis that I've read strongly states the opposite. Running a GPU is a net loss /even for the data centre operators/. For them to break even, they currently charge OpenAI/Anthropic/Etc more than OpenAI/Anthropic/Etc make per-token.
The strategy is always:
* Build something useful
* Give it away for free to get people exited
* Convince investors that this is going to rule the world
* Grow to dominate the world
* Enshittify
It'd be interesting to see what they spend all the money on though as we seem to be hitting diminishing returns and I'm not sure if the typical enterprise user really cares about small improvements on benchmarks.
It seems like it'd probably be better to spend all that on marketing, free trials, exclusivity/bundle deals etc. ChatGPT already has a strong advantage there as it has so much brand recognition. I've seen lay people refer to all LLM's as ChatGPT like my grandparents did with Nintendo and all video game consoles.
Even if ChatGPT has brand recognition amongst lay people, your grandparents aren’t the ones shelling out $200/mo for a Claude code subscription and paying for extra Opus tokens on top of that. Anthropic’s revenue is now neck and neck with OpenAI, but if tomorrow they increased the price of Opus by 5x without increasing its capabilities, many would switch to Gemini, GPT 5.4, Cursor, or any cheap Chinese model. In fact I know many engineers that have multiple subscriptions active and switch when they hit the rate limits of one, precisely the tools are so interchangeable.
At some point it could even become cheaper to just buy 8x H100s and host Qwen/Deepseek/Kimi/etc yourself if you’re one of those companies paying $3k/mo per engineers in tokens.
absolutely isn't! if billed per token, there is no reason to be married to a single model family provider at all. the models have very different strengths and weaknesses, you should be taking advantage of this at all times.
regardless, eventually Google became the universal default for both. When it comes to software, the average person doesn't shop around for the technologically optimal choice, they just use what everyone else is using.
there are a lot of reasons, but in brief - I think AI desktop use is a product that the average person isn't going to get much value out of. to make an analogy - the creators of Segway thought people would buy them in large numbers, but it turned out most people don't mind walking manually (or at least, don't mind it enough to spend money on a scooter). I think makers of AI Desktop Use products are going to find out the same thing as it relates to everyday tasks like checking email and shopping.
Stay competitive how? If the Magnificent 7 aren't spending the money, then how could it possibly hurt OpenAI/Anthropic to not raise equal amounts of money? Maybe you can pull together an explanation, but this author didn't even try to do so.
This piece seems poorly thought-out, but well designed to get shared.
Promote writers who will actually explain their claims carefully.
Except the investment is more like a railway or utility. It generates like 3% return, which is definitely not good enough for the people providing the money, or (in the case of the profitable companies) anywhere near the double-digit returns they make on their technology products. I won't be surprised when we see consolidation of marginal players and abandonment of the losers, just like you can find rail lines to nowhere, and fiber that's never been used.
Have you tried Gemini 3.1 lately? It is not even close to Opus 4.6 never mind Claude 5.
This post, like many pessimistic takes, seriously discounts innovation and the exponential takeoff of recursive self-improvement.
Currently a lot of that appears to be marketing hype to drive up usage. Is it exponential, or are the labs spending exponentially more for smaller and smaller gains from LLMs?
I do hope that RAM prices come down but this was just wishful thinking.
Look, I'm a Microsoft hater like the rest of us, but calling Microsoft's products sub-par discredits the author a good bit. I invite anyone who thinks this to try and compete with them. Go after something like Word, for example. Then prepare to be awed by what some of the most brilliant programming minds ever can produce after grinding for four decades.
MS Office should last a while if they stop calling it "Copilot 365 Office" or whatever it was.
markdown have much less of that brilliance and thankfully I also needed none of it.
Last time I authored a word document is probably 2 years ago for a government interaction.
If it does all go down in flames, even floor value is not going to be that valuable.
I can't predict the future but it's smelling a lot like a recession already under way that is bigger than the sub-prime crash.
The tragedy is when it's all over one of the surviving passengers will go "See! I knew we were going to crash because of that knitter"
> AI is here to stay. If used right, chances are it will make us all more productive. That, on the other hand, does not mean it will be a good investment.
The railroad bubble burst in 1846 not because trains were a dead end - passenger number would increase more than 10x in the UK in the following 50 years.
This is high up there on the list of things people say before, you know, it does
They basically decided that scaling at any cost was the way to go. This only works as a strategy if efficiency can’t work, not if you simply haven’t tried. Otherwise, a few breakthroughs and order of magnitude improvements and people are running equivalent models on their desktops, then their laptops, then their phones.
Arguably the costs involved means that our existing hardware and software is simply non viable for what they were and are trying to do, and a few iterations later the money will simply have been wasted. If you consider funnelling everything to nvidia shareholders wasting it, which I do.
You cannot find the efficiency if you haven't been experimenting at scale, this is true personally as well.
If someone haven't been burning a few B tokens per month, everything coming out of their mouth about AI is largely theory. It could be right or wrong, but they don't have the practice to validate what they're talking about.
Not everyone scaling to that degree would have the right answer or outcome, many would be wrong and go bust. But everyone who didn't will not have the right answer.
Like how the gpt llms were kind of a side project at openai until someone showed how powerful they could be if you threw a lot more parameters at it.
There could be some other architecture in the works that makes gpts look old - first to build and train that new ai will be the winner.
I don't expect hardware prices to go down unless the third option (economic collapse) happens before somebody triggers the dystopia/extinction option.
Bottom line is that H100 prices are near 3 year highs, A100s are still profitable to run, B200 prices are increasing, no one has enough compute. Google, OpenAI, Anthropic, Meta, AWS, Azure are all compute constrained. Every single one of them said so publicly. Neo clouds are telling customers they're all sold out now and you even have to book compute in advance if you're an AI company.
AI bubble is bursting because OpenAI is trying to monetize free users on ChatGPT with ads but Anthropic is kicking butt in AI. What kind of logic is that? So it seems like AI can be monetized as Anthropic shows. Is AI going to burst because OpenAI can't monetize but Anthropic can? I'll take the opposite stance. I think OpenAI is going to be bigger than Microsoft in market cap within the next 3 years. I think Anthropic and OpenAI are going to run laps around current big tech except maybe Google. For example, in a few years, I think AI agents could completely replace Microsoft Office, Microsoft's cash cow. Already dispelled. It isn't 5x more expensive than their subscribers pay. Inference has a gross margin of 50%+. It's been repeated over and over again by Anthropic CEO, OpenAI CEO, and just about anyone who's done deep analysis on token profitability. If you don't believe OpenAI and Anthropic CEOs, just look at inference providers on Openrouter. They don't have VCs backing them selling tokens at a loss. They should be making margins on every token in order to keep the lights on.I am yet to see how a one-legged business model with just a single product (that is not crude oil), without a plan and money is going to become sustainable. Oh yeah, maybe they'll finally make money on those autonomous lethal weapons. That sounds the easiest.
First, OpenAI and Anthropic are the leaders in model capabilities. Google is a close 3rd but 3rd nonetheless.
Second, ChatGPT likely has about 1 billion active users right now. I think ads on ChatGPT will surpass even Google search ads in the future. There will be a class of users who will never pay for ChatGPT subscriptions and that's ok. Meta and Google are two of the most profitable companies in history who almost rely solely on free users for their cash cows. "Ask ChatGPT" is already "google it" for the masses.
Third, there is so much untapped revenue potential from science, medicine field that OpenAI can eventually own with Anthropic. Microsoft stands no chance here since they can't build competing models.
Fourth, I can easily see ChatGPT morphing into agents for consumers and people will pay for them.
Just some basic ideas based on public knowledge. I'm sure there are plenty more.
I'm not going to bet my house that OpenAI will become bigger than Microsoft in 3 years, but I'll put down a few hundred dollars on this bet.
> Anthropic is already in a push to reduce costs and increase revenue
Yeah, it's totally a bad sign when a company tries to... reduce costs and increase revenue.
Usually in a land grab like this you spend, spend, spend.
Uber was still paying to subsidize customer's rides until fairly recently to kill off the competition.
When AI companies look to cut cost: a sign of bubble bursting.
When RAM price goes up: a sign of bubble bursting.
When RAM price goes down: a sign of bubble bursting.
But those things are tied together.
Even xAI, that now has a reasonably competitive model, is struggling to achieve PMF. Meta is in shambles because their models have underperformed for years now.
The interesting questions are: "What triggers it" and "what also goes tits up"?
The issue with high/international finance is that a good percentage of it (if not more) is fraudulent or semi fraudulent bollocks.
"Here is a startup that is worth x million because y" Both of those statements are bollocks. However its in the interest of most people to agree with that bollocks to get money. If enough money is given there is a chance that the startup will make money.
If we look a few year back, NFTs fulfil that niche quite nicely. It was obviously bollocks, but a very convenient way to launder money, or run a series of rugpull operations.
The problem we have to contend with now is that the sheer amount money that has been invested all disappearing at once would require 2007/8 levels of coordination to unfuck. The US government does not have the requisite number of admins to pull that off again, and no political will to ever have that expertise again. So if AI does go pop, and it takes a lot of money with it, I would put a guess on china doing the money lubrication and extracting a subtle but richly ironic level of control in exchange
Also, its no guarantee that AI will trigger the next bubble popping, my money is on Private Equity.
That's like saying "I know exactly how you're going to die, your heart will stop"
I don't think Sora ever thought of as a "revenue driver" considering how notoriously expensive and unpredictable video generation via inference is. OpenAI is just a repeat of Uber—minus the scandals—in a different decade. Uber got itself into tons of businesses related to transportation on the assumption that it would all be viable "one day." Same stuff that OpenAI is going.
I would say, once the bubble bursts—which is likely, considering the geopolitical environment—OpenAI, Anthropic, and Alphabet are likely to be the winners, with a lot of small players at the tail end. Anthropic won over programmers and OpenAI on everyone else. For millions of people, AI = ChatGPT, so I would bet that OpenAI can still become profitable, once they cut down their expenses.
Given the tech bros involved, we just don't know about them yet. Also was this comment generated using AI? Look at all the em dashes.
My guess is that cloud companies will scoop up the data centers for pennies on the dollar and the GPUs get written off or fire-sold to enthusiasts still wanting to run local models. Then they can offer exceptionally low initial prices to new customers and get more people to be locked in. Or maybe we see a couple of new cloud companies start up but that would likely need lower interest rates.
However, the core utility of the best AI (read: Anthropic's ATM, by miles), will still exist and be leveraged by those who have learned to use it well.
I could also see the exponentially declining power requirements offsetting the exponential-but-slower rate of AI compute demand, which then renders a lot of unused capacity in these massive data centers.
I think of it like the old mainframes in the 70s which would take an entire city block to run, and now we have the equivalent of millions, if not billions of them in our pockets.
In general AI is very much like human intelligence in the regard that no two models are the same just like no two people are the same. IOW if you are a single model shop you might even not have any idea that you’re falling behind.
I think this is a good comparison to current AI.
billions of them in our pockets.
AI in your pocket (but first on the desktop) is a real possibility.
This bodes well for us being at a point that even if the bubble burst, we'd still have usable AI going forward.
By which I mean the competent organizations are the ones that will come up with cultural and technical solutions to manage the quantity and quality of the code better.
Others will suffer severe quality issues. Not because the "AI"s produce inherently inferior code but because the volume of the code is too high to manage review of, and to have good internal organizational knowledge of to manage the pages in the middle of the night when servers go down because of code nobody really understood.
I produce masses of independent project work all day long in my spare time using these tools and they blow me away. But in the context of professional work on teams of other coworkers the results are difficult to reason about and often impossible to competently review and it's not clear the results are superior. ' IMHO companies that drink too deep from the well without caution could be burned badly.
Aside:
I hate to say it, but there is no sense in which Anthropic has the clearly better product than OpenAI at this point. I know Claude caught developer's hearts through the fall, but GPT5.4 is a more powerful, careful, and competent model for coding and Codex is a far less buggy and more performant TUI. For the last 3 months I've gone back and forth between the two and I always run anything written by Claude Opus 4.6 by myself and my coworkers through Codex for review and it is constantly finding severe correctness issues to the point where I simply won't subscribe to Anthropic's product anymore.
On top of that, OpenAI provides far higher token limits. Even their $20 plan goes quite far.
If I was just building crud websites, probably Claude Code would be fine, and it does indeed show more "initiative" and "imagination" but I've seen it build way too many race conditions and correctness issues to trust it or the work my coworkers make with it.
About 2 months ago this place was unbearable - filled with doom and hype AI posts. I welcome the calming and eventual slow release of the bubble.
I just dont understand why it justifies so much spending!
Back to the mines. The Vulkan only writes itself when prompted with well-conditioned problem statements.
> checks list ...
nope, nothing will either directly or indirectly affect me. Let it happen sooner, rather than later, and unleash the mobs at the tech bros that set the world on course to make everybody's life more miserable. We'll still be here to get the scrapped RAM and GPUs to train and infere local models thank you very much.
Let me remind you that you are not paying the full price for the service and all the value of those company is out of thin air. More or less the premise of the article. *when* you will be asked the real price, we'll see if the company will prefer a human or a bot it can't pass blame to
People continue to work, some proportion of the those working use LLM’s regularly.
Enough time has passed that subjective statements about the future don’t pass muster. Look at the numbers - there has been no large scale lay offs since correcting for over hiring. Has hiring slowed down? Sure. However I’d wager most firms are finding it pretty difficult to think of projects to take that will generate positive NPV. If that’s the case why would they hire? Moreover the focus has returned to cash flows - not product based growth metrics. Which again re-inforces the point about project selection.
Efficiency generated growth does not continue on forever - it’s short lived.
AI on hardware you own and control --- instead of a metered service provider. In other words, a repeat of the "personal computing" revolution but this time focused on AI.
TurboQuant could be a key step in this direction.
And people would prefer to run a model locally for 'free' (not counting the energy cost) rather than paying for an LLM subscription.
Ding, ding, ding --- we have a winner.
https://techstartups.com/2026/03/26/nvidia-backed-ai-startup...
https://tiiny.ai/
At this rate, I’d almost prefer to talk on a private mailing list with vetted resumes.
A bubble doesn't necessarily mean that the the underlying tech/innovation isn't useful. It's a financial and economic phenomenon that is pretty well understood and researched:
- During the hype cycle, investors tend to overestimate the short to mid term effects and underestimate the long term effects.
- It's near impossible to pick the winners in advance, and research has shown that investors underestimate how many losers there will be.
- The financial system/market works very well when there are localized issues with debt. Those get seemingly automatically detected and repaired. But broad increases in credit not so much. Those spread into the whole system in non-obvious and complex ways and destabilize the whole system, which can lead to very large corrections.
etc.
Personally I'd say that it's a problem that prices of consumer goods go up that far to satisfy this part of the market. We could need a more sensible way to advance the technology.
In my opinion this is incomparable to what we are seeing with agentic AI that is rapidly replacing handwriting code.
I figure chances are AI is not going to stop here.
- AI is a genuinely transformative technology on par with the internet and on track to probably surpass the smartphone
- The inflated valuations, the circular flows of money (or "money"), and the financial cup-shell game mean that the players of the game are all a few bad weeks away from catastrophe. This is, of course, nothing new for SV -- but the scale this time is new. Some believe it will soon collapse -- "bubble," thus.
The question is when will the frontier AI companies turn a profit on said transformative technology since other than NVIDIA and big tech, it is losing them tens of billions and who will survive a crash when it comes.
This is when you know you are in a bubble when people with a clear financial incentive are going on to newsletters, podcasts and posting extremely outlandish predictions to sell the public on something.
The amount of engineers becoming snake-oil salesmen and vibe-coders becoming cybersecurity experts overnight selling AI courses is a good indicator which I am looking at.
This is good. It's how you know they lacked the intellectual rigour required to be engineers in the first place and thus never were.
"No longer?" It never was.
Especially with AI boosters being allowed to degrade the comments section and shilling their paid blogs and violating the HN guidelines.