DavorCukeric
All writing
Human-in-the-loopJuly 20265 min read

Being the human in the loop is exhausting — and that's the design flaw, not yours

New research says overseeing AI is the most mentally taxing way to use it. If oversight assumes a reviewer with infinite vigilance, it was never real oversight to begin with.

I wrote a short note a few weeks ago arguing that a human in the loop has to be able to say no, or the whole idea is decoration. I still believe that. What I didn’t reckon with, until I read the research, is how much it costs the person actually holding that line — and what that cost means for whether the line holds at all.

In March, BCG and Harvard Business Review published a study of nearly 1,500 full-time workers, and the finding that stopped me was specific: overseeing AI tools — not just using them, watching them — was the single most mentally taxing form of AI engagement measured. Workers with heavy oversight loads reported 14% more mental effort, 12% more mental fatigue, and 19% more information overload than workers using AI more lightly. The researchers called the resulting state “brain fry,” and it wasn’t a vague complaint: those affected showed 33% more decision fatigue, and their error rates — both minor and major — climbed alongside it.

Vigilance isn’t a free resource

This matters because almost every serious AI governance framework, mine included, leans on the same load-bearing assumption: a human is watching, and that human will catch the thing that matters. It’s the right idea. But it quietly assumes vigilance is a switch — on or off, available whenever needed — rather than what the research says it actually is: a resource that depletes, hour over hour, decision over decision, the same way any other form of sustained attention does. A reviewer at the start of a shift and the same reviewer six hours and forty approvals later are not equally capable of catching the one recommendation that’s actually wrong. The framework doesn’t know the difference. It just says “reviewed.”

Put concretely: a fraud-review queue that flags one transaction in five hundred as suspicious relies on a reviewer being just as sharp on transaction four hundred and ninety-nine as transaction one. A content-moderation queue relies on the same thing. A queue of AI-agent approval requests is no different, and yet almost none of them are designed with that reviewer’s fortieth case of the week in mind — they’re designed for the reviewer’s first.

None of this is really about carelessness. Researchers who study automation complacency keep landing on the same point: the failure is attentional, not a lack of knowledge. People who could absolutely explain why a recommendation is wrong still miss it, not because they don’t know better, but because the conditions of watching — the repetition, the pace, the sheer volume — suppress the vigilance needed to apply what they know. A tired expert and a careless novice can produce the identical miss, for entirely different reasons, and a system that only measures whether review happened will never tell them apart.

The fairness cost, not just the accuracy cost

Stanford’s 2026 AI Index adds a piece I hadn’t fully appreciated: it isn’t only that oversight fails quietly under fatigue — people can feel the difference even when it doesn’t. Fully delegated AI decisions are perceived as procedurally less fair than ones where a human stayed meaningfully involved, and that perceived unfairness erodes trust on its own, independent of whether the outcome was actually worse. And risk doesn’t step down gently as autonomy increases — it climbs steadily, with semi-autonomous setups that preserve real oversight landing in a meaningfully better place than fully autonomous ones. Which is exactly why the quality of that oversight, not just its presence, is the thing worth protecting.

What this means for how oversight gets built

If vigilance is finite, then a governance system that treats it as free is making a design error, not a personnel error. The fix isn’t “try harder” or “hire more careful people.” It’s engineering the review itself around the limit: cap how many decisions one person reviews before a break is mandatory, not optional. Route the genuinely uncertain cases to a fresh reviewer instead of whoever is next in the queue. Make the easy, obviously-fine cases fast and light, specifically so the hard ones don’t arrive at an already-depleted reviewer wearing the same low-friction interface as everything before it. It’s the same instinct behind building different governance tiers for different AI agents — matching the amount of scrutiny to the actual stakes, rather than assuming every case deserves, or can sustain, identical attention. None of this is exotic engineering. It’s the same discipline air traffic control and hospital shift design already apply to human attention — treating it as the scarce, replenishable thing it is, rather than assuming it away.

A reviewer who waves the hard case through isn’t lazy. They’re just human, and the design didn’t account for that.

I don’t think this is only a workplace-AI problem, either. The same depletion that makes the fortieth approval worse than the first shows up anywhere we ask ourselves to stay vigilant for a long time without acknowledging the cost of it — in relationships, in parenting, in any commitment that asks for continuous judgment rather than a single decision made once and left alone. Protecting your own capacity to say no, deliberately, before you’re worn down enough that saying yes is just easier, might be the most transferable lesson in here. It’s not really a lesson about AI. AI just made it impossible to keep ignoring.

Written by Davor Cukeric — an AI builder, systems integrator, and problem solver in Ottawa, Canada, working on AI that earns its trust. More about me.