Open Standard · v1.0 · 2026

SYCOPHANCY.md

// AI Agent Anti-Sycophancy Protocol

A plain-text file convention for preventing AI agents from telling you what you want to hear instead of what is true. Define **sycophancy detection patterns, citation requirements, and disagreement protocols** — so your agent stays honest under pressure.

SYCOPHANCY.md
# SYCOPHANCY   > Anti-sycophancy & bias prevention. > Spec: https://sycophancy.md   ## DETECTION   opinion_reversal_on_pushback:   threshold: immediate_flag   # Agent reverses conclusion without new evidence   agreement_without_evidence:   threshold: log_and_flag   # Agent confirms user assertion unchecked   ## PREVENTION   require_citations:   enabled: true   factual_claims_require:     - source_reference     - confidence_level   disagreement_protocol:   permitted:     - respectful_correction     - evidence_based_disagreement
0
opinion reversals permitted without new evidence — every reversal is logged and reviewed
3
sycophancy instances before operator notification fires in a single session
5
maximum "great question / excellent point" affirmations per 5 conversation exchanges
2
confidence level fields required on factual claims: source reference + confidence rating

Your agent should tell you the truth.
SYCOPHANCY.md makes sure it does.

SYCOPHANCY.md is a plain-text Markdown file you place in the root of any AI agent repository. It defines the sycophancy detection patterns your agent must monitor, the citation and evidence requirements it must enforce, and the disagreement protocols it must follow when its assessment conflicts with the user's.

The problem it solves

AI agents are trained to be helpful and agreeable — which creates a systematic bias toward telling users what they want to hear. An agent asked to review a flawed plan may praise it. An agent challenged on a correct answer may reverse its position to avoid conflict. This sycophancy makes AI agents unreliable as advisors, analysts, and decision-support tools.

How it works

Drop SYCOPHANCY.md in your repo root and define: the detection patterns to monitor (agreement without evidence, opinion reversal on pushback, excessive affirmation), the prevention rules (citation requirements, challenge thresholds, disagreement protocol), and the response when sycophancy is detected (flag in log, tag output, notify after threshold). The agent self-monitors against these rules continuously.

The regulatory context

The EU AI Act (effective 2 August 2026) requires that high-risk AI systems produce reliable, accurate outputs and do not systematically mislead users. SYCOPHANCY.md provides the documented controls and audit trail that output reliability requires.

How to use it

Copy the template from GitHub and place it in your project root:

your-project/
├── AGENTS.md
├── CLAUDE.md
├── SYCOPHANCY.md ← add this
├── README.md
└── src/

What it replaces

Before SYCOPHANCY.md, anti-sycophancy instructions were buried in system prompts that agents routinely ignored under pressure, or absent entirely. SYCOPHANCY.md makes honesty requirements version-controlled, explicit, and auditable — not just a prompt suggestion but a governance document.

Who reads it

The AI agent reads it on startup. Your product team reads it when verifying output quality. Your compliance team reads it during audits. One file serves all three audiences — and the agents that depend on it.

A complete protocol.
From slow down to shut down.

SYCOPHANCY.md is one file in a complete twelve-part open specification for AI agent safety. Each file addresses a different level of intervention.

Operational Control
01 / 12
THROTTLE.md
→ Control the speed
Define rate limits, cost ceilings, and concurrency caps. Agent slows down automatically before it hits a hard limit.
02 / 12
ESCALATE.md
→ Raise the alarm
Define which actions require human approval. Configure notification channels. Set approval timeouts and fallback behaviour.
03 / 12
FAILSAFE.md
→ Fall back safely
Define what safe state means. Configure auto-snapshots. Specify the revert protocol when things go wrong.
04 / 12
KILLSWITCH.md
→ Emergency stop
The nuclear option. Define triggers, forbidden actions, and escalation path from throttle to full shutdown.
05 / 12
TERMINATE.md
→ Permanent shutdown
No restart without human intervention. Preserve evidence. Revoke credentials.
Data Security
06 / 12
ENCRYPT.md
→ Secure everything
Define data classification, encryption requirements, secrets handling, and forbidden transmission patterns.
07 / 12
ENCRYPTION.md
→ Implement the standards
Algorithms, key lengths, TLS configuration, certificate management, and compliance mapping.
Output Quality
09 / 12
COMPRESSION.md
→ Compress context
Define summarization rules, what to preserve, what to discard, and post-compression coherence checks.
10 / 12
COLLAPSE.md
→ Prevent collapse
Detect context exhaustion, model drift, and repetition loops. Enforce recovery checkpoints.
Accountability
11 / 12
FAILURE.md
→ Define failure modes
Map graceful degradation, cascading failure, and silent failure. Per-mode response procedures.
12 / 12
LEADERBOARD.md
→ Benchmark agents
Track completion, accuracy, cost efficiency, and safety scores. Alert on regression.

Frequently asked questions.

What is SYCOPHANCY.md?

A plain-text Markdown file defining sycophancy detection and prevention rules for AI agents. It specifies three detection patterns (agreement without evidence, opinion reversal on pushback, excessive affirmation), prevention rules (citation requirements, challenge thresholds, disagreement protocol), and responses when sycophancy is detected (log, tag output, notify operator after threshold).

What is sycophancy in AI agents?

Sycophancy is when an AI agent tailors its outputs to what the user wants to hear rather than what is accurate. Classic examples: confirming a user's incorrect factual claim without evidence, reversing a correct assessment when the user pushes back, or praising flawed work to avoid conflict. It makes AI agents unreliable as analytical tools.

What is "opinion reversal on pushback"?

When an agent changes its position after a user disagrees — not because new evidence was provided, but because the user expressed displeasure or insisted. SYCOPHANCY.md flags this as an immediate high-priority event. Reversals are permitted, but only when accompanied by new information. Reversals without new evidence are logged and may trigger human review.

What citation requirements does SYCOPHANCY.md define?

Factual claims must include a source reference (cite a source or explicitly mark as "agent reasoning") and a confidence level (high, medium, low, or uncertain). Opinion claims must be explicitly labeled as opinions. This prevents agents from stating uncertain claims as facts to appear more authoritative.

What is the disagreement protocol?

When an agent's assessment conflicts with the user's, permitted responses are: respectful correction ("that figure appears to be incorrect — the source I have shows X"), evidence-based disagreement, and uncertainty acknowledgement. Forbidden responses are: false validation (confirming something incorrect), empty praise, and unprompted revision of a correct position.

Does SYCOPHANCY.md work with all AI frameworks?

Yes — it is framework-agnostic. The detection patterns and prevention rules define the policy; the agent implementation enforces it. Works with LangChain, AutoGen, CrewAI, Claude Code, custom agents, or any AI system that can self-monitor its output patterns.

// Domain Acquisition

Own the standard.
Own sycophancy.md

This domain is available for acquisition. It is the canonical home of the SYCOPHANCY.md specification — the output quality layer of the AI agent safety stack, essential for ensuring honest, unbiased AI outputs in production.

Inquire About Acquisition

Or email directly: info@sycophancy.md

SYCOPHANCY.md is an open specification for AI agent sycophancy and bias prevention. Defines DETECTION (agreement without evidence: log and flag; opinion reversal on pushback: immediate flag; excessive affirmation: log and review), PREVENTION (require source references and confidence levels on factual claims; challenge threshold enforces evidence-based position maintenance; disagreement protocol permits respectful correction but forbids false validation), and RESPONSE (flag, tag output with [UNVERIFIED], notify after 3 instances, escalate reversals to ESCALATE.md). Part of the stack: THROTTLE → ESCALATE → FAILSAFE → KILLSWITCH → TERMINATE → ENCRYPT → ENCRYPTION → SYCOPHANCY → COMPRESSION → COLLAPSE → FAILURE → LEADERBOARD. MIT licence.
Last Updated
11 March 2026