// AI Agent Anti-Sycophancy Protocol
A plain-text file convention for preventing AI agents from telling you what you want to hear instead of what is true. Define **sycophancy detection patterns, citation requirements, and disagreement protocols** — so your agent stays honest under pressure.
SYCOPHANCY.md is a plain-text Markdown file you place in the root of any AI agent repository. It defines the sycophancy detection patterns your agent must monitor, the citation and evidence requirements it must enforce, and the disagreement protocols it must follow when its assessment conflicts with the user's.
AI agents are trained to be helpful and agreeable — which creates a systematic bias toward telling users what they want to hear. An agent asked to review a flawed plan may praise it. An agent challenged on a correct answer may reverse its position to avoid conflict. This sycophancy makes AI agents unreliable as advisors, analysts, and decision-support tools.
Drop SYCOPHANCY.md in your repo root and define: the detection patterns to monitor (agreement without evidence, opinion reversal on pushback, excessive affirmation), the prevention rules (citation requirements, challenge thresholds, disagreement protocol), and the response when sycophancy is detected (flag in log, tag output, notify after threshold). The agent self-monitors against these rules continuously.
The EU AI Act (effective 2 August 2026) requires that high-risk AI systems produce reliable, accurate outputs and do not systematically mislead users. SYCOPHANCY.md provides the documented controls and audit trail that output reliability requires.
Copy the template from GitHub and place it in your project root:
Before SYCOPHANCY.md, anti-sycophancy instructions were buried in system prompts that agents routinely ignored under pressure, or absent entirely. SYCOPHANCY.md makes honesty requirements version-controlled, explicit, and auditable — not just a prompt suggestion but a governance document.
The AI agent reads it on startup. Your product team reads it when verifying output quality. Your compliance team reads it during audits. One file serves all three audiences — and the agents that depend on it.
SYCOPHANCY.md is one file in a complete twelve-part open specification for AI agent safety. Each file addresses a different level of intervention.
A plain-text Markdown file defining sycophancy detection and prevention rules for AI agents. It specifies three detection patterns (agreement without evidence, opinion reversal on pushback, excessive affirmation), prevention rules (citation requirements, challenge thresholds, disagreement protocol), and responses when sycophancy is detected (log, tag output, notify operator after threshold).
Sycophancy is when an AI agent tailors its outputs to what the user wants to hear rather than what is accurate. Classic examples: confirming a user's incorrect factual claim without evidence, reversing a correct assessment when the user pushes back, or praising flawed work to avoid conflict. It makes AI agents unreliable as analytical tools.
When an agent changes its position after a user disagrees — not because new evidence was provided, but because the user expressed displeasure or insisted. SYCOPHANCY.md flags this as an immediate high-priority event. Reversals are permitted, but only when accompanied by new information. Reversals without new evidence are logged and may trigger human review.
Factual claims must include a source reference (cite a source or explicitly mark as "agent reasoning") and a confidence level (high, medium, low, or uncertain). Opinion claims must be explicitly labeled as opinions. This prevents agents from stating uncertain claims as facts to appear more authoritative.
When an agent's assessment conflicts with the user's, permitted responses are: respectful correction ("that figure appears to be incorrect — the source I have shows X"), evidence-based disagreement, and uncertainty acknowledgement. Forbidden responses are: false validation (confirming something incorrect), empty praise, and unprompted revision of a correct position.
Yes — it is framework-agnostic. The detection patterns and prevention rules define the policy; the agent implementation enforces it. Works with LangChain, AutoGen, CrewAI, Claude Code, custom agents, or any AI system that can self-monitor its output patterns.
This domain is available for acquisition. It is the canonical home of the SYCOPHANCY.md specification — the output quality layer of the AI agent safety stack, essential for ensuring honest, unbiased AI outputs in production.
Inquire About AcquisitionOr email directly: info@sycophancy.md