Pilot momentum meets organisational drag
Early AI work often feels fast: small teams, contained scope, generous assumptions. The slowdown tends to appear at the point where AI becomes someone’s day job rather than someone’s experiment.
What changes when AI touches the operating core
Pilots typically sit outside the constraints that shape daily performance. A customer service summary tool, for example, may look impressive in a sandbox. Once placed inside a contact centre workflow, it must integrate with case systems, handle edge cases, respect retention rules, and be usable at 09:15 on a Monday when volumes spike. The pilot’s success metric also changes. A 20 per cent reduction in average handling time sounds attractive, but it is only value if it translates into fewer paid hours, higher capacity without burnout, or higher conversion with stable quality.
One established assumption is quietly breaking: that digital change is mainly a technology rollout. With AI, the unit of change is the decision, the judgement, and the hand-off between people and systems. That is organisational design work, not just software deployment.
Why now feels different
Two pressures converge. Cost of experimentation is falling, so portfolios are growing. At the same time, regulatory scrutiny and public sensitivity are rising, particularly where AI influences eligibility, pricing, safeguarding, or hiring. The result is a paradox: more prototypes than ever, and fewer production launches than hoped. The constraint has moved from model capability to institutional readiness.
Production means dependable, not impressive
The question is not whether an AI system can perform in ideal conditions, but whether it can be trusted as part of a business service with repeatable outcomes and clear accountability.
Service levels beat demos
A useful production definition is operational: an AI capability is in production when it has an owner, a budget line, service levels, incident management, and a plan for change. That shifts the conversation from accuracy to reliability. How often does it fail? How quickly is failure detected? What happens when upstream data changes? Who can switch it off without committee meetings?
This reframing helps avoid a common trap: treating the model as the product. In most cases the product is the workflow. The model is an ingredient, sometimes replaceable. Organising around the workflow keeps attention on cycle time, customer outcomes, staff experience, and risk.
“Good enough” is context, not compromise
Many uses do not require near-perfect outputs. Drafting internal summaries, routing requests, or highlighting anomalies can create value even with occasional errors, provided there is a safe recovery path. Other uses, such as adverse customer decisions, clinical triage, or financial reporting, demand tighter control and clearer human authority. Operationalising AI therefore becomes a segmentation exercise: where is automation safe, and where is oversight non-negotiable?
AI in Business: Strategies and Implementation
Artificial intelligence is transforming how modern businesses operate, compete, and grow but its power lies not in blind adoption, but in strategic discernment. This module moves beyond the hype to explore how leaders can make informed...
Learn more
Portfolio economics under partial information
Moving to production forces an economic view of AI. The challenge is that benefits are often indirect, and costs appear in unfamiliar places: risk, rework, and change effort.
Unit economics makes AI governable
Consider a claims operation processing 200,000 cases a year. If AI reduces average processing time by 6 minutes per case, that is 20,000 hours of capacity. Whether that becomes financial value depends on operating intent: reducing overtime, avoiding hiring, improving cycle time to reduce customer churn, or reallocating staff to complex cases. Without an explicit conversion mechanism, “time saved” becomes a feel-good metric.
Costs also need a unit lens. It is not unusual for a production-grade AI feature to carry ongoing costs that sit outside the original pilot budget: monitoring, legal review, model refresh, workflow changes, training, and manual sampling. Some organisations find that total run costs land in the range of £0.10 to £2.00 per transaction depending on risk controls and volume. This can still be excellent value, but only if priced into the business case rather than discovered later.
Which bets deserve industrialisation
Portfolios tend to contain a mix: some initiatives are productivity levers, others are customer experience plays, some are resilience investments. The trade-off is not simply value versus cost. It is also reversibility. A reversible use case, such as internal drafting, can be scaled with lighter governance and fast learning. An irreversible one, such as automated rejection of applicants, should earn its way into production through stronger evidence, clearer accountability, and more conservative rollout.
A helpful test is whether a use case has a measurable baseline and a controllable lever. If there is no stable baseline for cycle time, error rates, or revenue conversion, attribution becomes guesswork. If the operating model cannot act on the lever, benefits leak away.
Governance that enables pace and trust
Governance is often blamed for slowing progress. In practice, weak governance can be slower, because every launch becomes a negotiation. The goal is not more control, but clearer control.
Centralise decisions, federate delivery
AI-native organisations often separate what must be consistent from what can vary. Consistency typically belongs in areas such as risk policy, model approval thresholds, monitoring standards, and vendor terms. Variation can sit in business teams who own workflows and outcomes. This avoids two failure modes: central teams that become bottlenecks, and local teams that create invisible risk.
Roles matter more than org charts. Many organisations benefit from explicit accountability for: an AI service owner who is on the hook for performance, a risk owner who can pause deployment, and a workflow owner who can change how people work. Without named owners, “shared responsibility” becomes “no responsibility”.
Cadence turns learning into capability
Operationalising AI requires a rhythm that treats models as living components. Monitoring, sampling, and review need scheduled time, not goodwill. At LSI, our AI-native learning platform improves through continuous feedback loops rather than term-based delivery; organisations often need a similar shift, where AI performance and human adoption are reviewed routinely, with authority to adjust workflows and controls.
Change management is part of governance. If incentives still reward old behaviour, adoption stays superficial. For example, if contact centre staff are measured mainly on speed, they may over-trust AI summaries to move faster, increasing quality risk. If they are measured mainly on compliance, they may avoid using the tool, erasing value. Metrics create behaviour, whether intended or not.
Master's degrees
At LSI, pursuing a postgraduate degree or certificate is more than just academic advancement. It’s about immersion into the world of innovation, whether you’re seeking to advance in your current role or carve out a new path in digital...
Learn more
ROI metrics that survive scrutiny
AI value is real, but fragile. It can be overstated through optimistic baselines, or undermined by poor measurement. Robust metrics help sustain investment and protect credibility.
Leading signals and lagging outcomes
Lagging measures such as cost reduction, revenue lift, churn, or loss rates matter, yet they move slowly and are influenced by many factors. Leading indicators offer earlier guidance, provided they are tied to outcomes. Examples include adoption in the moments that matter (not logins), percentage of work items touched by AI, rework rates, exception volumes, and human override frequency.
Override frequency is especially informative, but ambiguous. High overrides can mean low trust or poor quality; low overrides can mean high trust or complacency. The useful metric is not the number itself, but the pattern by segment, and what happens to downstream outcomes.
Attribution without fantasy
In production, credible ROI often relies on controlled comparisons: staged rollouts, matched cohorts, or time-sliced trials. Where this is not feasible, transparency about assumptions becomes part of reputational risk management. A range is often more honest than a point estimate. For instance, a productivity initiative might be modelled as 10 to 25 per cent cycle time reduction, then tracked against actual capacity released and how it was reallocated.
There is also a governance question hiding inside ROI: which benefits “count”? Some improvements reduce risk exposure rather than costs. If risk reduction is not valued in investment decisions, the organisation may drift towards high-return, high-risk automation that later becomes expensive in fines, remediation, or brand damage.
Human authority in the loop
The hardest production decisions are not technical. They are ethical, legal, and psychological: when is it acceptable to let an automated output stand, and when must a person remain accountable?
Designing for safe failure
AI systems will be wrong sometimes. Production design therefore needs explicit failure modes: graceful degradation, clear escalation paths, and audit trails. A practical distinction is between assistance and adjudication. Assistance supports a person who remains accountable. Adjudication replaces judgement. Many organisations discover that assistance delivers most of the value with less risk, especially in the first production waves.
Reputational risk often emerges from surprises rather than errors. A single opaque decision that cannot be explained, even if statistically sound, can damage trust. Explainability is not only a model property. It includes the story the organisation can tell about governance, testing, monitoring, and recourse.
A production readiness decision test
A useful test before scaling is whether the organisation can answer, in plain language, five questions: What outcome will change? What will staff do differently? What could go wrong and how would it be detected? Who has authority to pause or reverse? What evidence would change the decision?
The insight is that operationalising AI is a commitment to managed uncertainty. Production is less a finish line than an agreement to learn in public, with safeguards. The uncomfortable question is this: which existing performance targets, incentives, or power structures would need to be renegotiated for AI value to become real, and is there willingness to pay that social cost?
London School of Innovation
LSI is a UK higher education institution, offering master's degrees, executive and professional courses in AI, business, technology, and entrepreneurship.
Our focus is forging AI-native leaders.
Learn more