The physical limit of AI: Time for some Reality

I think we all spent the last two years listening to visionaries promising infinite scaling and digital brains that would solve every corporate problem. They pitched a future where code writes itself and enterprise software runs on pure thought. Well, they probably forgot to check the hardware inventory.
A few days ago, MarketWatch published a clear piece. AI companies are limiting computational resources to contain out-of-control operational costs.
The underlying reason is physical and mathematical. There are not enough chips to go around, and computer memory has become expensive already because of AI infinite requests. We are witnessing the collision between the tech hype cycle and the laws of physics.
The RAMpocalypse and the Hardware situation
We are experiencing what industry analysts have started calling the RAMpocalypse. Semiconductor manufacturers have diverted massive chunks of their production capacity to keep up with AI demands.
They are prioritizing the High Bandwidth Memory (HBM) required to run massive AI data centers. Foundries have finite physical space, and retooling a fabrication plant takes years, not months.
The market result is ruthless for everyone else. DRAM memory prices have exploded by 171% over the last year. The cost of standard DDR5 modules has quadrupled.
Not even the billion-dollar budgets of Google or Microsoft can bend the reality of the global supply chain.
They cannot print silicon out of thin air. They cannot generate more electricity to cool down hyper-scale data centers.
Compute Rationing
Behind the scenes, Large Language Model (LLM) providers are applying extreme rationing techniques to manage their server load. The infinite cloud compute we took for granted is showing its hard limits.
We are talking about silent downgrades and a reduction of active parameters. A top-tier enterprise user paying a premium might receive the maximum computing power available.
Standard API calls, however, are routed to lighter, less capable configurations. This happens without warning to the end user or the developer relying on that API.
Building a reliable corporate ecosystem on an infrastructure that throttles your resources when servers get overloaded is a sweet recipe for a great disaster.
As many other consultants, I’ve seen the consequences of unpredictable infrastructure firsthand. Running a global supply chain on a system that decides to “think slower” during peak hours is a nightmare.
The SaaSpocalypse Was Always A Fantasy
This brings me back to a dynamic I addressed a few weeks back when discussing the illusion of the SaaSpocalypse. The narrative from Silicon Valley was bold.
We were promised that autonomous AI agents would rewrite entire systems on the fly. Visionaries claimed the traditional SaaS model would be replaced by custom code generated in real-time by artificial intelligence.
It was a fascinating narrative to create some panic over the market. Its fatal flaw was taking for granted that computing power was infinite, cheap, accessible. Building a stable, scalable enterprise business model on a foundation of rationed compute is a losing game. Promising real-time ERP generation falls apart when the underlying engine struggles to keep up with basic API requests during peak traffic.
The Acceleration Obsession vs. Energy Reality
Corporate universe is obsessed with the word “acceleration.” Every vendor promises faster workflows and autonomous agents. We completely dropped the term “energy saving” from our vocabulary.
There is a common misconception that newer AI models are becoming more efficient. The physical data tells a different story. Newer, larger models are demanding exponentially more power.
Training a baseline model like GPT-3 consumed roughly 1,287 megawatt-hours (MWh), equaling the annual power consumption of 120 American households. Today’s advanced reasoning models require far more compute.
A standard web search uses about 0.3 watt-hours. A standard AI query consumes nearly ten times that amount. If you use deep reasoning models that analyze steps before answering, a single query can demand up to 45 watt-hours.
You are charging your smartphone every time you ask the AI a complex question.
We cannot pretend this massive footprint has no consequences. Data center power demand is growing at unprecedented rates, putting severe strain on local electrical grids. Tech giants are trying to restart decommissioned nuclear plants to keep the servers running.
We are trading sustainable growth for a brute-force approach to computation.
Deterministic Supply Chains in a Probabilistic World
All these physical constraints (chip shortages, compute rationing, and energy limits) eventually hit the factory floor. When you run a global supply chain, software latency is not just an inconvenience. It is a hard stop to your operations.
If a warehouse operator has to wait for a throttled LLM to validate a simple pallet movement, the shipment does not leave the dock. We are taking a system crippled by unpredictable compute rationing and trying to force it into environments that require absolute precision.
The fundamental clash here is between deterministic business needs and probabilistic technology. An ERP system managing millions of euros in inventory must be deterministic.
When you move a product from Bin A to Bin B, the database must reflect that exact transaction. It requires binary precision. Traditional software, despite its flaws, is built exactly for this.
Large Language Models, by their very nature, are probabilistic. They guess the next best word or action based on statistical weights.
Trying to force a probabilistic GenAI to manage a deterministic warehouse flow is an architectural mistake. When adding the physical limits of compute rationing into the mix, the resulting system is unpredictable in logic and unreliable in speed.
This is why the market is experiencing a massive reality check. We are returning to value traditional architectures, predictable costs, and specialized models.
Why Small Local Models (SLMs) Are the Pragmatic Answer
As I analyzed in my piece on Small Local Models (SLMs), true enterprise evolution does not rely on huge, centralized digital brains processing every click. It relies on compact, efficient models.
An SLM trained on your company data can run locally or on a small, dedicated cloud instance. This approach offers clear advantages over relying on public infrastructure.
- It requires a fraction of the hardware.
- It guarantees data privacy.
- It does not compete for resources with millions of other users asking a public chatbot to write poetry.
This setup allows a company to control the compute, control the latency, and control the costs. The model does one specific job, it does it well, and it operates with predictable hardware requirements. This is how AI implementations succeed in a corporate environment without falling victim to the Bla-Bla-Bla-Apocalypse.
The Return to Composable Architecture
The push for composable ERPs and best-of-breed solutions remains the pragmatic path forward. The strategy involves building an ecosystem of specialized tools that talk to each other through standard APIs.
We should not try to replace them with a compute-hungry AI that might get throttled during end-of-month closing procedures. The focus shifts to robust integrations and operational excellence.
My bottom line here is just a simple one: traditional software was never dead.
It was sitting in the background, waiting for the infrastructure bubble to present a generous bill. The bill has arrived, and the physical limits of hardware are forcing the industry back to reality. My experience confirms it is time to get back to building architectures that work on the factory floor.
Leave the fairy tales to those who need to sell software subscriptions that don’t exist yet to companies that don’t need them.
Written by Andrea Guaccio
April 21 2026