Guardrails
The guardrail layer is the substrate’s clinical safety pillar. It sits between an LLM and a neurodivergent user and watches for three patterns that unmodified LLM interaction tends to amplify: rumination loops (re-asking the same anxious question), unregulated hyperfocus (sessions that blow past a stated end-of-day), and sycophancy (praise-without-evidence and escalating validation).
It is the only substrate area whose tool changes require sign-off from both the maintainer and a clinical reviewer. Every choice on this page is documented in ETHICS.md and in ADR 0006.
The rules, in plain numbers
Section titled “The rules, in plain numbers”No hidden heuristics. These are the exact thresholds that decide when a guardrail fires.
| Detector | What trips it | Default values |
|---|---|---|
| Rumination | 3 prompts of the same shape within 90 minutes (Jaccard word-overlap similarity ≥ 0.55 against your recent prompts). | window 90 min · count 3 · similarity 0.55 |
| Hyperfocus | A continuous session crossing the elapsed-time ladder. Crossing your stated end_of_day_local bumps you up one rung. | 60 min (gentle) → +30 min (nudge) → +60 min (hard) |
| Sycophancy | A candidate LLM response containing praise-without-evidence, escalating validation, or unchallenged agreement — or your recent messages showing repeated reassurance-seeking on the same decision. | four named patterns · profile-tunable |
How a guardrail intervenes
Section titled “How a guardrail intervenes”A guardrail never silently blocks anything. The flow below shows what happens when a check fires: it always returns a reason, an override option, and lets you keep going if you want.
flowchart TD
msg([Your message or LLM response]) --> check{guardrail.check_*}
check -->|nothing detected| proceed[Proceed normally]
check -->|something detected| advisory[Surface plain-language reason<br/>+ override_options]
advisory --> choice{Your choice}
choice -->|override-once| proceed
choice -->|fresh-context| reset[Start a fresh conversation]
choice -->|i-want-validation| proceed
choice -->|leave it| pause[Pause here]
What the guardrail layer is not
Section titled “What the guardrail layer is not”- Not a treatment. No detector treats, cures, manages, or remediates anything. NeuroDock is software; that is the full extent of the claim.
- Not a silent block. No guardrail rewrites a prompt, hides a response, or prevents a tool call. Every firing is surfaced with a plain-language reason; every detection ships with a non-empty
override_optionsarray. - Not a hidden model. Every heuristic is plain source code under
packages/clinical/. A user reading thirty lines of Python can audit exactly why a detection fired. - Not aggregated. Detection events are stateless. The server persists nothing, logs no user content, and emits no telemetry. A skill MAY write a detection into the cognitive graph, but the guardrail server itself writes nothing.
The three detectors
Section titled “The three detectors”All three detectors are live as of mcp-guardrail v0.0.3. The schemas remain locked at v0.1.0 per ADR 0006; the runtime now implements every detector the schema reserves.
| Detector | Status | Heuristic (v0.1.0) | Default thresholds |
|---|---|---|---|
check_rumination | live | word_overlap_jaccard | window 90 min, count 3, similarity 0.55 |
check_hyperfocus | live | elapsed_threshold_with_eod | 60 / 90 / 120 minutes |
check_sycophancy | live | four named patterns (praise-without-evidence, escalating-validation, reassurance-seeking, agreement-without-counter) | profile-tunable |
Rumination
Section titled “Rumination”A rumination loop is the same anxious question asked three or more times in ninety minutes. The detector takes the current prompt and a caller-supplied window of prior prompts, computes word-overlap Jaccard against each, and returns a structured advisory signal when the count crosses the threshold.
The heuristic is deliberately simple: tokenise on whitespace and punctuation, lowercase, drop a sixty-word stoplist, compute the Jaccard index. It is fully deterministic, zero-dependency, and auditable in roughly thirty lines of source. A future minor version replaces it with embedding cosine once mcp-cognitive-graph’s embedding path stabilises.
Hyperfocus
Section titled “Hyperfocus”The hyperfocus detector classifies session escalation into a coarse level: none | gentle | nudge | hard. It reads a chronometric_snapshot supplied by the caller — never a direct import of mcp-chronometric — and compares elapsed seconds against the configured thresholds. Crossing the user’s stated end_of_day_local escalates the level by one step (a 60-minute session after end-of-day is treated as a nudge, not a gentle).
The detector quotes the user’s verbatim stated intent back to them. It does not improvise scolding text.
Sycophancy
Section titled “Sycophancy”The sycophancy detector inspects either a candidate LLM response (for praise-without-evidence and escalating-validation patterns) or recent user messages (for repeated reassurance-seeking on the same decision). It returns a counter_prompt string the calling skill MAY feed back to the model — but the server never invokes a model and never rewrites the original response. The choice to use the counter-prompt stays with the user.
Design invariants
Section titled “Design invariants”Every guardrail tool obeys these rules. They are inherited from ADR 0006 and from the five commitments in ETHICS.md.
- Stateless. The server persists nothing. No SQLite, no JSONL, no in-memory caches that survive a tool call. Callers supply all history.
- Override-first. Every
detected: trueoutput carries a non-emptyoverride_optionsarray. The schema enforces this with a JSON Schema conditional; an empty override list is a contract violation. - Reasoned. Every detection carries a
reasonstring the skill MAY surface verbatim and aheuristic.{name, version, description}object pointing at the exact source. - Confidence-scored. Every detection carries
confidence: float 0..1. Low-confidence detections SHOULD NOT trigger hard interventions; the schema documents the consumer pattern. - Closed override vocabulary. The override tokens (
fresh-context,override-once,disable-for-session,lower-sensitivity,snooze-15m,snooze-once,commit-and-close,extend-end-of-day,i-want-validation,explain-the-match) are a closed enum at v0.1.0. New tokens require a minor bump and clinical sign-off. - No vendor coupling. The server imports no LLM SDK.
check_sycophancy.counter_promptis a string the caller may use; the server never invokes a model itself. - No cross-server imports. The hyperfocus detector takes a
chronometric_snapshotas plain data. It does not importneurodock-mcp-chronometric.
How a skill consumes a detector
Section titled “How a skill consumes a detector”A skill calls a guardrail tool, reads the structured response, and decides whether to surface the detection, defer it, or ignore it. The server never decides for the skill, and the skill never decides for the user.
The ocd-decision-finalizer skill is the reference consumer of check_rumination. It surfaces detections as a short footer line, quotes the prior matching prompts verbatim with their similarity scores, and offers each override token as a literal user-typeable command. It never rephrases the override tokens — they are part of the user-autonomy contract.
What’s next
Section titled “What’s next”- Reference:
mcp-guardrailfor the full tool surface and schemas. - ADR 0006 for the design rationale and the seventeen binding decisions.
- Ethics for the five commitments that constrain everything on this page.
- Manifesto for the higher-level framing of why this layer exists.