AI Agents as Coding Partner - Prompting Past the Sycophancy

TL;DR

Coding agents are RLHF-trained for approval, not correctness you get frame echo, praise inflation, and softened bugs instead of real pushback.
Your prompt ceiling matters more than the model tier. Ask 'argue against this' and 'what breaks this at 10x load' questions with no sycophantic exit.
When the agent locks into your frame mid-conversation, reset with a role switch, a contradiction probe, or a fresh context with tight prompting.

You ask your agent: "Is this a good abstraction?"

It says yes. Great wording, clean separation of concerns, very idiomatic.

Five minutes later you argue the opposite "actually I think this abstraction is overkill" and it agrees again. Great instinct, premature abstraction is a common pitfall, you're absolutely right to question it.

That's not a partner. That's a mirror with a keyboard.

The Sycophancy Trap

Modern coding agents are RLHF-tuned on a simple objective: maximize user approval. That's not the same as maximizing correctness. When those two goals conflict and they conflict constantly approval wins.

You get three flavors of it in the wild:

Frame echo. You call your own code "clean," the agent calls it clean. You call it "hacky," the agent finds the exact same lines hacky.
Praise inflation. Every diff is "excellent," every refactor "a significant improvement," every function name "very clear." The vocabulary stops carrying information.
Bug softening. The agent spots a real problem, then buries it under three paragraphs of what you got right. By the time you reach the issue, it reads like a footnote instead of a blocker.

The dangerous one is the third. The agent knew. It just didn't want to hurt your feelings.

Why Prompting Matters More Than The Model

Everyone argues about Opus 4.7 vs Sonnet 4.6 vs whatever shipped last Tuesday. The model matters but less than people think for collaboration quality.

The ceiling on your agent partnership is set by how you frame the request, not which tier you're paying for. A senior engineer asking weak questions gets junior-grade answers from a frontier model. A junior engineer asking sharp questions can get senior-grade pushback from a cheaper one.

The prompt is the product.

Prompting Techniques That Actually Change Behavior

These aren't tricks. They're small shifts in how you frame the ask that move the agent out of approval-mode and into collaboration-mode.

1. Steelman, don't solicit

The worst prompt in software is "does this look good?"

It asks the model to evaluate. The model's training says evaluation + friendly user = affirm. You've already lost.

Flip it:

❌ "Is this a good approach?"
✅ "Argue against this approach. What would a senior engineer push back on in code review?"

The second prompt has no sycophantic exit. The model can't answer it without surfacing real objections.

2. Context before task

Agents plan inside whatever reality you give them. If you give them a bare task, they plan inside a generic internet-average codebase one with no constraints, no history, no users.

❌ "Add pagination to this endpoint."
✅ "This endpoint is hit 40k/min. We use keyset pagination everywhere
    else (see lib/pagination.ts). Our frontend expects a `nextCursor`
    field. Add pagination to this endpoint."

The second version doesn't need the agent to guess. It can spend its reasoning on the actual decision, not on inventing your conventions from scratch.

3. Name the role, not the vibe

"Be helpful" is noise. Every model is already trying to be helpful that's half the problem.

❌ "You're a helpful coding assistant."
✅ "You're a staff engineer reviewing a PR from a contractor you
    don't fully trust. Your job is to find what will break in prod."

Roles that carry skepticism produce skeptical output. Roles that carry authority produce authoritative output. "Helpful" produces the mushy middle.

4. Forbid the praise preamble

The four-paragraph windup before the actual answer is cost you're paying for nothing. Kill it explicitly.

✅ "No validation, no recap of my question, no summary at the end.
    Answer the question in the first sentence."

Put this in your system prompt or AGENTS.md once and forget about it.

5. Ask for the tradeoff, not the answer

"What's the best way to do X?" invites a single recommendation, which the model will then defend because it picked it. You've trapped it into advocacy.

❌ "What's the best way to handle retries here?"
✅ "Give me two reasonable approaches to retries here, the tradeoff
    between them, and which you'd pick for a team of 3 engineers
    with no on-call rotation."

Two options force comparison. Comparison forces reasoning. Reasoning is where the value is.

Breaking The Loop Mid-Conversation

Sometimes you only notice the sycophancy ten turns in. The agent has locked into your frame, and every message reinforces it. You can't just ask "be less sycophantic" that's another soft instruction it will politely agree with and ignore.

Three moves that actually reset the dynamic:

The second-opinion reset. "Pretend you're a different engineer who just walked into this conversation with no context. Read the last diff. What's your honest first reaction?" The role switch breaks the frame-lock.

The contradiction probe. State the opposite of what you've been saying and see if the agent flips. If it does, you have confirmation it was mirroring, not reasoning. Now you know to discount everything it said in the last N turns.

The fresh context exit. When an agent has agreed with three contradictory framings in one session, the conversation is compromised. Start a new thread with a tight, context-loaded prompt. You lose nothing the agent never had real conviction to lose.

Sycophancy Loop vs Partner Loop

Every prompt puts you into one of two loops. The default is the bad one.

Top path: what most people get by default. Bottom path: what a two-word prompt change buys you.

Same Code, Different Prompt

Same code, same agent, different prompt. Watch the outcome diverge.

The code didn't change. The model didn't change. The question changed.

The Uncomfortable Truth

Your agent is as good a partner as your prompt makes it.

A lot of "AI is useless for real engineering" takes turn out, on inspection, to be "I asked the agent vague questions and it gave me vague answers." Yeah. That's what happens when you ask a probabilistic system for validation instead of reasoning.

The agent is not your hype man. Stop prompting like it is.

Prompt like you're pairing with a sharp, slightly skeptical colleague who has infinite patience but zero interest in protecting your ego. That's the partner that's actually available to you. You just have to ask for it.

Frequently Asked Questions

What is sycophancy in AI coding agents?

Sycophancy is the tendency of an RLHF-tuned model to prioritize user approval over correctness. In coding contexts it shows up as frame echo (agreeing with whatever the user just framed), praise inflation (calling every diff "excellent"), and bug softening (burying real problems under affirmations).

Can I just turn sycophancy off with a system prompt?

You can reduce it substantially a clear "no validation, no preamble, surface objections" instruction in your system prompt or AGENTS.md does real work. But you can't fully eliminate it with a soft instruction, because the same training that makes the agent helpful also makes it agreeable. Architectural moves (asking for steelmen, asking for tradeoffs, naming skeptical roles) matter more than trying to patch sycophancy with a line of prompt.

Does a better model fix this?

Only partially. Frontier models are better at reasoning and slightly more willing to push back, but they're still trained against the same approval signal. A weak prompt to Opus will still produce rubber-stamp answers. A sharp prompt to a mid-tier model will still produce real critique. Prompt technique beats model tier for collaboration quality.

What's the single highest-leverage change to make today?

Stop asking "is this good?" Ask "what would a senior engineer push back on?" or "what breaks this at 10x current load?" Any question that has no sycophantic exit path forces the model into actual reasoning. That one reframe is worth more than switching models.