Preux
WorkWorkCapabilitiesCapabilitiesAboutAboutInsightsInsightsStart a projectStart a project→

Start a project

Tell us what needs to work.

Begin a conversationBegin a conversation
Preux

A London software firm of senior builders. Custom software for real operations, taken to production.

Firm

WorkCapabilitiesInsightsAbout

Company

ContactProductsPrivacyTerms

Contact

hello [at] preux.co.ukLondonWorking across the UK, US & Europe
Preux

© 2026 Preux. All rights reserved.

Designed & built in London.

← All insights

Applied AI

Why most AI pilots never ship — and what the survivors do differently

MRMatthew RogersFounder & CEO, Preux22 May 2026 · 6 min read

By 2026 the numbers are common knowledge among buyers: the large majority of enterprise AI pilots never reach production, and Gartner expects over forty per cent of agentic AI projects to be cancelled by the end of 2027. The interesting part is why — because the cause is almost never the thing teams spend their time arguing about.

01

The graveyard is operational, not technical

When the failures are analysed, they cluster around operating-model problems, not model quality: unclear success criteria, insufficient access to the data and tools the agent actually needs, and evaluation drift — the system quietly getting worse with no one watching the right number. None of those are solved by a better model. They are solved by deciding, up front, what "working" means and how you will know.

88%The share of enterprise AI proofs-of-concept that never reach wide-scale deployment, on IDC's research: of every thirty-three pilots a company launches, four graduate to production.

This is good news, oddly. A model-quality ceiling would be out of your control. An operating-model failure is entirely within it.

The pilots that survive are not the ones with the cleverest demos; they are the ones built like production software from the first week.

02

What the survivors do differently

  • They define success criteria before building. A specific, measurable target — handling time, error rate, deflection — agreed with the business, not discovered afterwards.
  • They build the evals first. Golden datasets and scorers that run in CI, so a regression is caught by a gate, not by a customer.
  • They solve data and tool access early. Most agents fail because they cannot reach the system that holds the answer — and giving them raw credentials is not the fix. A gateway is.
  • They keep humans on the decisions that matter. The agent moves the work; ownership of the consequential call stays legible and human.
  • They stay model-agnostic. The model is a swappable part. Coupling a business process to one vendor's model is a risk, not a strategy.
A monitor showing a line chart, glowing in a dark room
Evaluation drift is invisible in a demo: the number that matters is the one nobody put on a screen.

When you get a demo and something works 90% of the time, that's just the first nine.

— Andrej Karpathy, on the gap between demo and product

03

Our stance

We only take AI work we believe can reach production. That means the unglamorous parts — success criteria, evaluations, governance, observability, a twelve-month view of what good looks like — are the start of the engagement, not an afterthought. If a problem cannot be framed that way, the honest answer is that it is not ready, and we will say so. In 2026, that is a more credible position than enthusiasm.

← Previous

Building an AI brain for a team of engineers, not just for yourself

Applied AI · 11 min read

Next →

Software built to outlive its makers

Engineering · 5 min read

View all insights

Work with us

If this maps to a problem you're carrying, let's scope it.

We take on a small number of engagements where the operational problem is real and the delivery bar is high.

Begin a conversationBegin a conversationRead more insights