At 4:10 in the afternoon on August 14, 2003, the operators in FirstEnergy's Akron control room were doing exactly what they had been trained to do.
The operational displays that monitored the grid all looked normal, no alerts. Sure, the regional grid was carrying a heavy summer load, but nothing the readouts suggested it couldn't handle.
What they didn't know was that the software managing their alarm system had crashed ninety minutes earlier. The screens looked normal because the layer responsible for telling them something was wrong had simply stopped working.
By 5:16 pm, 55 million people across eight states and Ontario had lost power. The cascade began with a transmission line sagging into overgrown trees in Ohio. Under normal circumstances the alarm system would have flagged it, the operators would have responded and it would have been contained. Not this time.
Losing power for a day is an inconvenience. But the stakes have been higher.
On the night of June 1, 2009, an Airbus A330 was crossing the Atlantic at cruise altitude when ice crystals clogged the pitot tubes1 and the autopilot disconnected. The airplane was still flyable, but the crew of Air France 447 had gotten used to monitoring a functioning autopilot for hours on end, functionally turning them into passengers as well. The plane entered a stall that lasted three and a half minutes. All 228 people aboard died.
Two different systems. Two different failure modes. One consistent principle.
The real failure was overreliance on an otherwise reliable monitoring layer that no longer functioned. High reliance on high reliability can be a dangerous cognitive hazard.2
I.Engineering deception
Every high-stakes industry that encounters this problem has eventually designed friction back into its systems. Not because the friction serves the system's function, but because it serves the human operators.
On locomotives, the deadman switches require the driver to periodically press a pedal or button. The train doesn't need the input, it exists purely to confirm that a conscious human is guiding a 400,000 pound vehicle speeding along at 70 miles an hour. Tesla's Autopilot applies steering torque requirements for the same reason. The car doesn't need your hands, but (at least for a time) the system demands them anyway.3
The metal detector that most everyone has walked through at some point in their life4 has a random alarm capability programmable from zero to one hundred percent, explicitly designed to prevent guard complacency. The machine is highly effective at detecting all types of metals and even discriminating between ‘threat’ and ‘non-threat’ items.5 However it has been machine engineered, in part, to lie. Despite their reliability (Air France 447 notwithstanding), aviation autopilots are deliberately configured to require periodic manual input or to disengage at inconvenient moments. Following the disaster at Three Mile Island, nuclear control rooms were redesigned to require more active operator interaction, partly reversing earlier efficiency improvements that had made the operator's job too passive. Even anesthesiology systems now generate periodic confirmatory prompts during uneventful surgeries because attention lapses during long procedures.6
The engineering insight across all of these is the same.
II.The accountability gap
Agentic AI will likely go down as the hardest version of this problem any industry has ever deployed. Consider this: the train driver who falls asleep crashes the train. The radar operator who stops watching misses the threat. In both cases, the consequences arrive fast and the human responsible has genuine skin in the game: their job, their license, their reputation, in some cases their life.
An AI agent operating on behalf of an enterprise has none of that. It has no license to lose. It has no career to protect. Despite the human-like nature of its work and interactivity, when it drafts a contract, submits a filing, denies an insurance claim, screens a job applicant, or advises a customer, the accountability for everything it does belongs entirely to the organization that deployed it.
The legal system isn't waiting for new AI laws to govern this seismic shift. They've been applying existing rules and laws to meet the growing usage. For example, in 2024, Air Canada's7 chatbot gave a passenger incorrect information about bereavement fares. Air Canada argued in court that its chatbot was “a separate legal entity” responsible for its own outputs. The tribunal found this argument “remarkable” and awarded the passenger damages anyway.8 Air Canada was responsible for everything on its website, including what its third-party AI chatbot said.
UnitedHealth's nH Predict algorithm drove post-acute care coverage denials at a rate that internal data showed was reversed on appeal ninety percent of the time. The company's own telemetry became the centerpiece of the litigation. Cigna's PxDx system processed and denied over three hundred thousand claims in two months at an average review time (documented in internal performance data) of 1.2 seconds per file. Workday's hiring algorithm was held to function as an “agent” of its customers, exposing the vendor to discrimination liability for decisions made by software it had sold and walked away from.
In Raine v. OpenAI, plaintiffs didn't wait for discovery. They ran the decedent's chat history through OpenAI's own publicly available Moderation API and produced a forensic reconstruction: 377 self-harm flags, 23 at confidence levels above 90%, while the system itself mentioned suicide more than 1,200 times across the conversation.9
The pattern is consistent across every forum that has heard these cases. “The AI did it” has been roundly rejected as a defense. Internal telemetry, the logs, the flags, the audit data, the performance metrics, is being treated as proof that the organization knew, or should have known, what its system was doing.10 The defendant in each case is accountable for an outcome they likely didn't review, may not have understood, and in most cases weren't designed to catch.
III.Who guards the guards?
The organizational response to this pattern follows a predictable sequence. “Let's set up a governance policy!” Quickly followed by “we need a monitoring dashboard!” When the speed of the data flowing through the agent becomes too much to deal with, someone usually suggests, “why don't we just create an AI system to watch the AI system?”
That last step is the kind of meta solution that just reproduces the original problem one layer abstracted. An algorithm watching an algorithm is still a system with no skin in the game, producing outputs that require a human to interpret, act on, and be accountable for. The organization that implements AI oversight tooling experiences the same vigilance decrement with respect to the oversight tool that it experienced with the original system. The dashboard becomes the screen nobody is watching because they have no reason not to trust it.11
There is also a harder problem. Certain determinations that agentic AI routinely approaches, like whether an output is unlawful or whether the system has crossed a legal line, can't be made by software. Only a lawyer can extend privilege over an identified legal matter, so that as the organization works toward resolution, that work is protected.12
This is what unauthorized practice of law means in practice, and it's what no monitoring dashboard can solve.
IV.Reasonable effort
Organizations deploying AI at scale are accumulating a governance debt that hasn't been fully priced.13 The document trail being assembled in current litigation, telemetry data, moderation scores, internal audit results, denial-rate statistics, is being used to establish what the organization knew and when.
That cuts both ways. In Walters v. OpenAI, it was OpenAI's own documentation of its safety engineering that was used successfully in its defense. In Raine, the same category of internal data became the foundation of an intentional misconduct theory. The distinguishing variable in both cases is whether the organization can demonstrate ‘reasonable effort’: a documented, legally grounded response to the signals its systems were generating.
This isn't a policy or a dashboard, but a process with legal judgment at the center of it, capable of identifying when a line has been approached, determining whether it has been crossed, and creating a privileged record of the response. That's what separates governance documentation from governance.
V.Trustee in the toolroom
The FirstEnergy operators weren't overtly negligent. They were watching in good faith, and doing exactly what they'd been trained to do. What they were missing was a layer that would have told them their monitoring layer had failed, one with enough independence from the system it was watching to catch what the system couldn't report about itself.
The organizations getting genuine value from AI while managing the legal risk are not the ones with the most sophisticated oversight tooling. They are the ones that have reintroduced accountable human judgment at the points where it actually matters: where the output touches a regulated decision, where the system approaches a legal boundary, where the question requires judgment that carries professional weight.
That is a legal infrastructure problem, and it is now a solvable one. Privlex exists at the intersection of legal judgment, technical depth, and operational design: able to identify when an AI system is operating out of bounds, make the determination that only a lawyer can make, and create the privileged record that protects the organization's path to resolution. For companies stalled between what AI promises and what they can't yet risk, that infrastructure is what can move them forward.
Joe Ewing
Co-Founder and CTO
Privlex
Twenty years building, modernizing, and scaling complex platforms across commercial, regulated, and defense environments, from generative AI to FedRAMP / IL4 / IL5 cloud delivery. Joe previously served as Chief Technology Officer at Clarion AI Partners.
His experience spans large-scale enterprise implementations, AI-enabled and data-integrated systems, and modernization for mission-critical workflows. Earlier in his career, Joe led platform and cloud modernization for U.S. defense, intelligence, and civilian agencies, delivering secure systems under NIST, FedRAMP, and IL4/IL5.
At Privlex, we help organizations unlock real value and simultaneously close the gap between the governance they have on paper and the governance they really need.