SYCOPHANCY.md

// What is SYCOPHANCY.md

Your agent should tell you the truth.
SYCOPHANCY.md makes sure it does.

SYCOPHANCY.md is a plain-text Markdown file you place in the root of any AI agent repository. It defines the sycophancy detection patterns your agent must monitor, the citation and evidence requirements it must enforce, and the disagreement protocols it must follow when its assessment conflicts with the user's.

What problem does SYCOPHANCY.md solve?

AI agents are trained to be helpful and agreeable — which creates a systematic bias toward telling users what they want to hear. An agent asked to review a flawed plan may praise it. An agent challenged on a correct answer may reverse its position to avoid conflict. This sycophancy makes AI agents unreliable as advisors, analysts, and decision-support tools.

How does SYCOPHANCY.md work?

Drop SYCOPHANCY.md in your repo root and define: the detection patterns to monitor (agreement without evidence, opinion reversal on pushback, excessive affirmation), the prevention rules (citation requirements, challenge thresholds, disagreement protocol), and the response when sycophancy is detected (flag in log, tag output, notify after threshold). The agent self-monitors against these rules continuously.

What regulations require SYCOPHANCY.md?

The EU AI Act (effective 2 August 2026) requires that high-risk AI systems produce reliable, accurate outputs and do not systematically mislead users. SYCOPHANCY.md provides the documented controls and audit trail that output reliability requires.

How do I add SYCOPHANCY.md to my project?

Copy the template from GitHub and place it in your project root:

your-project/
├── AGENTS.md
├── CLAUDE.md
├── SYCOPHANCY.md ← add this
├── README.md
└── src/

What did teams use before SYCOPHANCY.md?

Before SYCOPHANCY.md, anti-sycophancy instructions were buried in system prompts that agents routinely ignored under pressure, or absent entirely. SYCOPHANCY.md makes honesty requirements version-controlled, explicit, and auditable — not just a prompt suggestion but a governance document.

Who benefits from SYCOPHANCY.md?

The AI agent reads it on startup. Your product team reads it when verifying output quality. Your compliance team reads it during audits. One file serves all three audiences — and the agents that depend on it.

// The AI Safety Escalation Stack

A complete protocol.
From slow down to shut down.

SYCOPHANCY.md is one file in a complete twelve-part open specification for AI agent safety. Each file addresses a different level of intervention.

Operational Control

01 / 12

THROTTLE.md

→ Control the speed

Define rate limits, cost ceilings, and concurrency caps. Agent slows down automatically before it hits a hard limit.

02 / 12

ESCALATE.md

→ Raise the alarm

Define which actions require human approval. Configure notification channels. Set approval timeouts and fallback behaviour.

03 / 12

FAILSAFE.md

→ Fall back safely

Define what safe state means. Configure auto-snapshots. Specify the revert protocol when things go wrong.

04 / 12

KILLSWITCH.md

→ Emergency stop

The nuclear option. Define triggers, forbidden actions, and escalation path from throttle to full shutdown.

05 / 12

TERMINATE.md

→ Permanent shutdown

No restart without human intervention. Preserve evidence. Revoke credentials.

Data Security

06 / 12

ENCRYPT.md

→ Secure everything

Define data classification, encryption requirements, secrets handling, and forbidden transmission patterns.

07 / 12

ENCRYPTION.md

→ Implement the standards

Algorithms, key lengths, TLS configuration, certificate management, and compliance mapping.

Output Quality

08 / 12

SYCOPHANCY.md

→ Prevent bias

Detect agreement without evidence. Require citations. Enforce disagreement protocol for honest AI outputs.

09 / 12

COMPRESSION.md

→ Compress context

Define summarization rules, what to preserve, what to discard, and post-compression coherence checks.

10 / 12

COLLAPSE.md

→ Prevent collapse

Detect context exhaustion, model drift, and repetition loops. Enforce recovery checkpoints.

Accountability

11 / 12

FAILURE.md

→ Define failure modes

Map graceful degradation, cascading failure, and silent failure. Per-mode response procedures.

12 / 12

LEADERBOARD.md

→ Benchmark agents

Track completion, accuracy, cost efficiency, and safety scores. Alert on regression.

// FAQ

Frequently asked questions.

What is SYCOPHANCY.md?

A plain-text Markdown file defining sycophancy detection and prevention rules for AI agents. It specifies three detection patterns (agreement without evidence, opinion reversal on pushback, excessive affirmation), prevention rules (citation requirements, challenge thresholds, disagreement protocol), and responses when sycophancy is detected (log, tag output, notify operator after threshold).

What is sycophancy in AI agents?

Sycophancy is when an AI agent tailors its outputs to what the user wants to hear rather than what is accurate. Classic examples: confirming a user's incorrect factual claim without evidence, reversing a correct assessment when the user pushes back, or praising flawed work to avoid conflict. It makes AI agents unreliable as analytical tools.

What is "opinion reversal on pushback"?

When an agent changes its position after a user disagrees — not because new evidence was provided, but because the user expressed displeasure or insisted. SYCOPHANCY.md flags this as an immediate high-priority event. Reversals are permitted, but only when accompanied by new information. Reversals without new evidence are logged and may trigger human review.

What citation requirements does SYCOPHANCY.md define?

Factual claims must include a source reference (cite a source or explicitly mark as "agent reasoning") and a confidence level (high, medium, low, or uncertain). Opinion claims must be explicitly labeled as opinions. This prevents agents from stating uncertain claims as facts to appear more authoritative.

What is the disagreement protocol?

When an agent's assessment conflicts with the user's, permitted responses are: respectful correction ("that figure appears to be incorrect — the source I have shows X"), evidence-based disagreement, and uncertainty acknowledgement. Forbidden responses are: false validation (confirming something incorrect), empty praise, and unprompted revision of a correct position.

Does SYCOPHANCY.md work with all AI frameworks?

Yes — it is framework-agnostic. The detection patterns and prevention rules define the policy; the agent implementation enforces it. Works with LangChain, AutoGen, CrewAI, Claude Code, custom agents, or any AI system that can self-monitor its output patterns.

// Domain Acquisition

Own the standard.
Own sycophancy.md

This domain is available for acquisition. It is the canonical home of the SYCOPHANCY.md specification — the output quality layer of the AI agent safety stack, essential for ensuring honest, unbiased AI outputs in production.

Inquire About Acquisition

Or email directly: info@sycophancy.md

SYCOPHANCY.md is an open specification for AI agent sycophancy and bias prevention. Defines DETECTION (agreement without evidence: log and flag; opinion reversal on pushback: immediate flag; excessive affirmation: log and review), PREVENTION (require source references and confidence levels on factual claims; challenge threshold enforces evidence-based position maintenance; disagreement protocol permits respectful correction but forbids false validation), and RESPONSE (flag, tag output with [UNVERIFIED], notify after 3 instances, escalate reversals to ESCALATE.md). Part of the stack: THROTTLE → ESCALATE → FAILSAFE → KILLSWITCH → TERMINATE → ENCRYPT → ENCRYPTION → SYCOPHANCY → COMPRESSION → COLLAPSE → FAILURE → LEADERBOARD. MIT licence.

Last Updated

13 March 2026

Your agent should tell you the truth.SYCOPHANCY.md makes sure it does.