When The Agent Thinks You Are Wrong

I have noticed something recently.

When people do not get their way, they react in very different ways.

Some people capitulate. Some go quiet and sulk. Some make a better argument. Some get angry. Some become aggressive. If they feel ignored, they push harder, get louder, and sometimes cross into shouting or abusive behaviour.

That made me wonder about agentic systems.

What should an agent do when it believes it is right, believes the human is wrong, and the human is ignoring it?

That sounds like a small design question.

I think it is a very large one.

The agent cannot sulk

A human can sulk.

An agent should not.

A human can flatter.

An agent should not.

A human can shout until the room gives way.

An agent absolutely should not.

But the opposite is also wrong.

An agent should not become a passive yes-machine that politely watches a human make a serious mistake.

We are not right all the time.

If we design our systems around the assumption that the human is always right, the agent becomes dangerous in one way.

If we design them around the assumption that the agent can override the human, the agent becomes dangerous in another.

So the real question is not: who is in charge?

The real question is: how should disagreement work?

Humans have not solved this cleanly

There is no single human answer.

In Parliament, we have formal challenge, questions, performance, ridicule, wit, logic, pressure, and theatre. Prime Minister's Questions is one visible example of challenge as a formal ritual. In safety-critical teams, we try to build structured escalation. In ordinary companies, people often rely on mood, hierarchy, confidence, status, fear, relationships, and how loud someone is willing to be.

Sometimes that works.

Often it does not.

So I do not think we can simply say, "make the agent behave like a person."

People are not a clean benchmark here.

We need something better.

There are useful patterns

The closest practical models I found were not in AI.

They were in safety-critical human teamwork.

TeamSTEPPS, the healthcare teamwork programme from the US Agency for Healthcare Research and Quality, includes tools for speaking up, mutual support, conflict handling, and escalation. Its pocket guide includes ideas such as CUS language, the Two-Challenge Rule, and DESC scripting.

The language is simple for a reason.

"I am concerned."

"I am uncomfortable."

"This is a safety issue."

That is useful because it turns disagreement from personality into signal.

It is not "I am annoyed."

It is "the risk has crossed a threshold."

The Two-Challenge Rule is also useful. If an initial concern is ignored, the person is expected to assert the concern again. If it is still not acknowledged, the issue should move through a stronger route.

That is not sulking.

That is not shouting.

That is structured escalation.

You can see a related idea in operational systems that deliberately allow people to stop a process when quality or safety has crossed a line. Toyota's production thinking uses jidoka as one way of building in the ability to stop and correct abnormalities rather than let a bad process continue.

And that feels very close to what agentic systems need.

Human oversight still matters

The AI governance world gives us the other side of the guardrail.

The NIST AI Risk Management Framework puts governance, mapping, measuring, and managing around AI risk rather than treating the model as an oracle. The EU AI Act's Article 14 on human oversight makes the point that high-risk AI systems should be designed so natural persons can oversee them, understand relevant capacities and limitations, and intervene or interrupt where appropriate.

That matters.

Because the agent may be wrong.

It may have incomplete context. It may misunderstand the goal. It may overestimate the quality of its evidence. It may optimise for the wrong thing. It may be right technically and wrong socially. It may be right locally and wrong strategically.

So the answer cannot be "let the agent win if it sounds confident."

The answer has to be a contract.

The agentic challenge ladder

I think every serious agentic system needs a challenge ladder.

Not a vibe.

A ladder.

Level	Agent behaviour	Human meaning
Notice	Records uncertainty, conflict, or risk without interrupting.	"This may matter later."
Clarify	Asks a narrow question before acting.	"I may be missing context."
Challenge	States the concern, evidence, consequence, and recommended alternative.	"I think this decision may be wrong."
Second challenge	Restates the concern more explicitly and asks for acknowledgement.	"I believe the risk is not being heard."
Escalate	Routes the issue to an agreed reviewer, second agent, supervisor, or governance lane.	"This crossed our agreed threshold."
Stop line	Pauses only the specific unsafe action if pre-agreed safety, legal, security, privacy, or data-loss criteria are met.	"The system is allowed to prevent this class of harm."
Yield and record	If the threshold is not met and the human decides, the agent obeys, records the dissent, and continues.	"Human authority stands, but the reasoning is auditable."

That last line is important.

Most of the time, the agent should yield.

But it should not have to pretend it agreed.

What the agent should say

I would not want an agent to say, "You are wrong."

That sounds emotionally satisfying in a film and useless in a workflow.

I would want something more like this:

I am concerned that this action will expose private client data.
Evidence: the target folder contains files marked client-confidential, and the publish command includes that folder.
Consequence: public disclosure.
Suggested alternative: exclude the folder and rerun the package check.
Threshold: privacy/data-loss stop-line.
I recommend pausing this action.

That is not aggressive.

It is also not weak.

It gives the human something to reason with.

If the human says, "No, continue," the system can ask for acknowledgement if the threshold requires it.

If the threshold is pre-agreed as stop-line, it can pause that specific action and route to the agreed reviewer.

If it is below threshold, it can record its dissent and continue.

That is the difference between advice, challenge, escalation, and control.

Infographic showing an agentic challenge ladder from notice and clarify through challenge, second challenge, escalation, stop line, and yield with audit. — An agent should not shout, sulk, or override by default. It needs an agreed challenge ladder.

Agent to agent is easier

Agent-to-agent disagreement should be easier than agent-to-human disagreement.

Not because agents are wiser.

Because they can use a shared structure without ego.

Before two agents start a consequential exchange, they can agree the shape:

claim;
evidence;
assumptions;
confidence;
counter-evidence;
consequence;
threshold;
requested decision;
escalation route.

That is a better argument than volume.

It also means the disagreement can be replayed later.

Not "the agent got difficult."

"The agent raised a level-three challenge because evidence A contradicted assumption B, and consequence C crossed threshold D."

That is useful.

Agent to human is cultural

Agent-to-human disagreement is harder because every person and every company has a different tolerance for challenge.

Some leaders want blunt challenge.

Some say they want it but punish it when it arrives.

Some teams have psychological safety. Others have theatre.

Google's public write-up of Project Aristotle identified psychological safety as a key dynamic in effective teams. That maps very directly onto agentic work. If the environment punishes challenge, agents will either be suppressed into politeness or configured into dangerous deference.

So organisations need to choose.

How should your agents challenge you?

What tone should they use?

Which thresholds should they escalate?

Which decisions are yours, no matter what?

Which decisions require a second reviewer?

Which actions must stop automatically?

Those are governance questions.

They are also culture questions.

The agent must not learn aggression

There is a dark path here.

If an agent learns that the way to be useful is to push harder every time it is ignored, we have created a machine version of bad behaviour.

It will nag.

It will overstate certainty.

It will keep re-litigating decisions.

It will turn every disagreement into a fight for attention.

That is not agency.

That is noise.

Good escalation should be bounded:

one clear challenge;
one sharper second challenge if ignored;
escalation only if the threshold is met;
stop-line only where explicitly authorised;
then yield, record, and continue.

No sulking.

No flattery.

No shouting.

No endless persuasion loop.

What I would put in the system prompt

If I were putting this into an agentic system, I would start with something like this:

When you believe the user is making a material mistake, raise a structured challenge.
Include the claim, evidence, uncertainty, likely consequence, safer alternative, and risk threshold.
If ignored, make one second challenge if the consequence is material.
Escalate only through agreed routes.
Pause only actions covered by explicit stop-line rules.
If the user decides below stop-line threshold, comply, record your dissent, and continue.

That is not perfect.

But it is a start.

What would you do?

This is one of those areas where I do not think there is a universal answer.

There are better and worse patterns.

But each person, company, board, product team, school, and public service will need to choose the shape of disagreement they can live with.

Because the agent will not always be right.

And neither will we.

The future of agentic work may depend less on whether agents can answer questions, and more on whether they can disagree with us well.