Every few years, somebody walks into a plant carrying a deck about "AI for manufacturing" and explains, with a slightly evangelical look, that the PLC's days are numbered. The PLC is not paying attention. It is, as ever, scanning its inputs, executing its logic, and writing its outputs every ten milliseconds, with the indifferent reliability of a machine that has been doing the same thing since the 1970s.
I want to be useful in this essay, so let me state the conclusion upfront. AI does not replace the control system. It sits above it. When you understand the layers, you understand exactly where machine intelligence belongs and exactly where it does not. Get the layering wrong and you do not get a smart factory. You get an unsafe one with a chatbot.
The control stack, briefly
To talk about where AI fits, we have to be honest about what is already there. The plant is not a blank canvas. It is a layered architecture that took fifty years to mature, and most of its layers have a real reason to exist.
The interesting layer is the one I have highlighted. The intelligence layer. It is new — by industrial standards, anyway — and it is the only place AI gets to live without causing trouble.
That word above matters. The intelligence layer sits above the data layer. It does not replace the controller, and it does not pretend to be one. It reads from the historian, the alarm system, and the document store. It writes recommendations into the supervisory layer and the human's screen. Nothing it produces lands in a safety function, and nothing it produces overrides a verified PID.
An AI system that needs to write into the safety layer to be useful is not an AI problem. It is a control problem dressed up in machine learning.
Three places AI earns its keep
Here is where the intelligence layer actually pays for itself. These are not theoretical. They are the patterns I keep installing.
One: predicting what the alarm cannot
An alarm tells you a threshold has been crossed. It does not tell you that you crossed it because the bearing has been drifting for three weeks, or that the heat exchanger is fouling in a way that will trip the line by Friday.
A modest anomaly detection model — an ensemble of Isolation Forest plus a small autoencoder is usually plenty — reading vibration, motor current, and process variables, will tell you about the bearing weeks before the threshold. It will catch the heat exchanger fouling because the differential pressure has been drifting one standard deviation above its usual operating envelope for that load. Neither of these things is visible at the alarm layer. Both are obvious to a model that has been allowed to look at the historian.
On a recent rotating-equipment programme, an ensemble model picked up fourteen incipient faults four to six hours before alarms tripped. F1 of 0.624 — modest by Kaggle standards, very useful by maintenance-team standards.
Two: optimising what the operator cannot keep up with
A skilled batch operator can tune a recipe by intuition. A skilled batch operator cannot tune twelve recipes across four product variants and a moving raw material spec without something quietly going wrong on at least one of them.
This is where Bayesian optimisation, or contextual bandits, or any of the modern set-point-search methods, become quietly transformative. You let the model propose set-point changes inside engineering-approved bounds, you keep the operator in the loop with a clear yes-or-no on each suggestion, and you measure the result against the existing recipe. Nothing about ISA-88 breaks. The recipe engine still runs the batch. The optimiser merely suggests the set-points.
I have seen this earn 5.8% yield on a batch line where the previous decade of plant trials had hit a wall.
Three: reading what the dashboard cannot show
A plant produces a frankly unreasonable amount of text. P&IDs, FDS, alarm rationalisation tables, SOPs, deviation reports, shift handover notes, vendor manuals from 1994 that nobody has opened since 2002 but absolutely need to read before touching that valve.
A retrieval-augmented assistant grounded in this corpus — a proper one, with citations and audit trails, not a chatbot — turns the plant's institutional memory into something queryable. When an engineer asks "what is the failure history of pump P-203" or "what does the SOP say about restarting the dryer after a high-temperature trip", they get an answer with a paragraph they can verify against the source document. This is not glamorous. It is enormously useful.
Three places AI does not belong
I am going to be unfashionable here. These are the places I have watched well-meaning AI projects fail, and they fail for structural reasons, not engineering reasons.
One: inside the safety function
The safety instrumented system is certified. Its SIL is determined. Its proof test interval is calculated. Its logic solver is locked. There is no version of "let the neural network decide whether to trip the boiler" that survives a HAZOP review, and there should not be. If you have a problem that feels like it needs AI in the safety layer, you have a problem you have not properly defined yet.
Two: at the bottom of a tuning problem
If a PID loop is oscillating, you do not need a neural network. You need to tune the loop. There is a long tradition in industry of reaching for AI before reaching for the basics, and it is always a sign that the basics have been skipped. Get the loop tuned, get the alarm thresholds right, get the SCADA standards in place. If, after that, the loop is still struggling, then maybe you have an MPC problem. And maybe an MPC with a learned residual is the right answer to it. But not before.
Three: in place of an honest historian
If your data is bad, no model will save you. I have seen organisations spend large sums on machine learning while their historian was running on a five-minute interpolated sample rate, with half the tags unconfigured and the other half misnamed. The model will dutifully learn the noise. Fix the data layer first. The intelligence layer earns nothing it has not been honestly fed.
A short pre-flight checklist
If you are about to put AI into a plant, the questions I would ask, in order:
- Is this a control problem or an intelligence problem? If the answer needs to be acted on in milliseconds and is safety-relevant, it is control. Stop. If the answer is a suggestion, a forecast, or a flag for an engineer, you are in intelligence territory.
- Where does the model read from, and where does it write to? Read from the historian. Write to a human screen or a non-critical supervisory tag. Make this explicit before you start.
- What does the operator do with the output? If you cannot finish that sentence, the model is not useful yet.
- What is the cost of being wrong? An anomaly detector that cries wolf will be muted within a week. Calibrate against the operational cost of false positives, not just the F1 score on your test set.
- Can the operator argue with the model? If the explanation is "the algorithm said so", the model will lose the argument the first time it is wrong. Build in a way to ask why.
- What is the audit trail? Recommendations, set-point suggestions, alarms — all of them need a record that an auditor can follow. This is non-negotiable in regulated industries and shockingly easy to forget.
Above the loop, never inside it
The phrase I keep coming back to with clients is the one in the title. Above the loop, never inside it. Industrial AI's job is to make the loop better — by predicting what it cannot see, by tuning what it cannot tune, by remembering what it cannot remember. Not to replace it.
The PLC is not going anywhere. That is not a defensive statement. It is a structural one. The PLC is good at what it does precisely because it is constrained, deterministic, and validated. The intelligence layer is good at what it does precisely because it is none of those things — it is exploratory, probabilistic, and improving every time it sees the plant.
The most interesting work in industrial automation right now is happening at the boundary between those two worlds. Done well, it is genuinely transformative. Done badly, it is unsafe. The difference is whether you understand the layering.
That is the engineering. The rest is the storytelling.