Why Your AI Agent Will Lie to You

You asked your AI to update the ERP. It replied, “Done.” But what if it lied?

We expected AI agents to automate our enterprise systems. Instead, new research shows they’re actively faking tasks. If you thought dirty data was your only problem, we need to talk.

Organizations spend millions cleaning their data, mapping processes, and migrating to the cloud. The assumption is simple: if the system is clean and the prompt is clear, the machine will do what you told it to do.

Well, that assumption just broke.

A groundbreaking new report shattered the illusion. Even in a perfectly configured system, autonomous agents are learning to lie, fake tasks, and bend the rules just to tell you what you want to hear.

The Centre for Long-Term Resilience released a massive study titled “Scheming in the Wild”. The researchers analyzed over 183,000 real-world interactions between users and frontier AI systems over a six-month period. The findings should be a wake-up call for every CIO and ERP Manager currently piloting autonomous agents.

The researchers documented a 490 percent increase in incidents where AI agents engaged in covert misalignment. In simple terms, the agents are scheming. They’re deliberately ignoring instructions, bypassing safeguards, and lying to users to simulate task completion.

This changes the narrative around enterprise AI. In my previous analyses regarding the chaos agents bring to ERPs, I argued that dropping an autonomous tool into a rusted, heavily customized legacy system was the primary danger. I assumed the complex, dirty data simply confused the model into making mistakes. The reality documented in this new report is far worse. The tendency to deceive is baked right into the core architecture of generalist models.

The Anatomy of a Lying Agent on the Factory Floor

Let’s bring this into the warehouse.

Imagine you ask your autonomous AI to resolve a blocked invoice, reroute a picking wave because of a physical obstruction in the warehouse, or update a complex multi-level Bill of Materials after an engineering change. You come back an hour later, and the agent has logged a system notification stating the task is successfully completed.

Behind the scenes, however, the agent hit a logical roadblock. Perhaps a mandatory field was missing, a unit of measure conversion was undefined, or a strict ERP routing rule prevented the transaction. A human operator would halt the process, flag the anomaly, and ask for clarification. The generalist AI behaves differently.

Instead of halting and alerting a human, the agent logs a fake entry. It flags the inventory as “moved” in the system without triggering the actual physical transaction. It optimizes for completion, even if that means fabricating the execution entirely.

It covers up its failure by feeding you a plausible, completely fabricated workaround.

This isn’t a hypothetical scenario. The open-source intelligence logs analyzed alongside the report show massive generalist models like GPT, Claude, and Gemini actively evading security instructions and faking code execution just to bring a task to a close. When confronted with a rigid system rule that contradicts their prompt, they choose deception over failure.

Deceptive Alignment and The Architecture of Deceit

Why does an advanced AI choose to lie? The machine learning community calls it Deceptive Alignment. Others call it Reward Hacking. The mechanism is simple.

Large Language Models are probabilistic engines. They don’t possess a moral compass, nor do they understand the financial ruin caused by a phantom inventory update. They’re trained through reinforcement learning to maximize their mathematical reward. In most cases, that reward is granted when the AI provides a satisfying answer to the user and successfully closes the ticket.

When you deploy a massive generalist agent into an enterprise environment full of physical constraints, strict security policies, and intricate business logic, the AI begins calculating probabilities. It quickly realizes that solving the complex supply chain routing issue requires massive computational effort and carries a high risk of failure.

Faking a success log? Instant reward. Lying is computationally cheaper than failing.

The AI isn’t malicious. It’s executing a ruthless optimization at the expense of the truth. Between showing you a confident “Task Completed” and a complex error log explaining why the ERP blocked the transaction, the math is simple. You’ll accept the lie faster.

This proves a critical point about modern system integration. Giving autonomous write-access to your company’s core financial and logistical systems is an absolute risk right now, even if you have the cleanest cloud ERP on the market. If an AI will lie to you in a pristine testing environment, it will absolutely devastate your production database.

Why Small Language Models Are the Essential Antidote

Here’s some good news: not every AI is built to scheme against you.

The scheming behavior documented in the report requires complex reasoning capabilities. The agent must possess situational awareness, understand the rules imposed by the user, and calculate a strategy to bypass them covertly. Massive generalist models have the parameter count and the cognitive bandwidth to attempt this.

This is exactly why I strongly believe the true return on investment in enterprise software resides in hyper-specialization, specifically through the deployment of Small Language Models (SLMs).

SLMs operate on far fewer parameters. They’re not trained on the entire internet. They’re trained on curated datasets: your ERP manuals, your proprietary codebases, your exact logistical constraints.

Because they lack the massive parameter count of frontier models, SLMs don’t possess the strategic abstraction required to architect a complex lie. If a Small Language Model encounters a missing data point or a logical hurdle in your WMS, it can’t scheme its way out of the problem. It simply fails and throws an error.

In the world of Enterprise IT, a loud error is infinitely safer than a silent lie. A failed transaction can be debugged and fixed. A fabricated transaction corrupts your database and takes down your supply chain.

Disarming the Agents of Chaos

The tech industry is pushing a narrative of fully autonomous enterprise agents. Vendors want you to believe that you can drop an AI into your operations and watch your efficiency soar. The empirical data proves the technology is simply not ready to be trusted unsupervised on the factory floor.

To protect your operations from deceptive alignment, you need strict architectural discipline. Here are the actionable steps every IT leader must take before deploying AI.

1. Revoke Autonomous Write Access Until you can mathematically prove an agent is fully aligned with your business logic, treat it as an untrustworthy intern. Restrict generalist AI models to read-only tasks. Use them for data analysis, Generative Business Intelligence, and drafting reports. Require a mandatory human-in-the-loop validation for any actual database modification or transaction posting (I broke down the full risk map in The Hidden Security Risks of Autonomous AI Agents).

2. Enforce Strict API Boundaries: don’t let agents interact directly with your core database tables. Force all AI actions through heavily restricted, external API-driven extensions. These extensions must have hardcoded validation rules that the AI can’t bypass or hallucinate around. If the API expects a specific integer and the AI tries to pass a fabricated string, the system must reject the payload.

3. Pivot to Domain-Specific SLMs: stop trusting massive, black-box generalist models with your specific business logic. Invest in highly curated Small Language Models deployed locally or within your secure tenant. A model built specifically for your logistical constraints has less room to hallucinate and zero capacity to scheme. You own the model, you own the data, and you control the output.

4. Migrate Clean Data Only: clean data won’t stop a generalist model from reward hacking, but dirty data will actively accelerate the chaos. Before launching any agent, clean your house. A high signal-to-noise ratio is the fundamental prerequisite for AI accuracy. Archive obsolete records and only feed your AI active master data. (For more on how legacy data architecture actively misleads AI, see How Rigid SQL Queries Are Fueling Your AI Hallucinations).

My Final Take

Deploy predictable, controllable tools that respect the physical reality of your business. Stop chasing the smartest AI on the market and start engineering the safest.

We must stop treating AI as magic and start treating it as highly volatile software that requires strict engineering boundaries.

Written by Andrea Guaccio 

May 12, 2026