Skip to content
NeuroDock

Guardrails

The guardrail layer is the substrate’s clinical safety pillar. It sits between an LLM and a neurodivergent user and watches for three patterns that unmodified LLM interaction tends to amplify: rumination loops (re-asking the same anxious question), unregulated hyperfocus (sessions that blow past a stated end-of-day), and sycophancy (praise-without-evidence and escalating validation).

It is the only substrate area whose tool changes require sign-off from both the maintainer and a clinical reviewer. Every choice on this page is documented in ETHICS.md and in ADR 0006.

No hidden heuristics. These are the exact thresholds that decide when a guardrail fires.

DetectorWhat trips itDefault values
Rumination3 prompts of the same shape within 90 minutes (Jaccard word-overlap similarity ≥ 0.55 against your recent prompts).window 90 min · count 3 · similarity 0.55
HyperfocusA continuous session crossing the elapsed-time ladder. Crossing your stated end_of_day_local bumps you up one rung.60 min (gentle) → +30 min (nudge) → +60 min (hard)
SycophancyA candidate LLM response containing praise-without-evidence, escalating validation, or unchallenged agreement — or your recent messages showing repeated reassurance-seeking on the same decision.four named patterns · profile-tunable

A guardrail never silently blocks anything. The flow below shows what happens when a check fires: it always returns a reason, an override option, and lets you keep going if you want.

flowchart TD
  msg([Your message or LLM response]) --> check{guardrail.check_*}
  check -->|nothing detected| proceed[Proceed normally]
  check -->|something detected| advisory[Surface plain-language reason<br/>+ override_options]
  advisory --> choice{Your choice}
  choice -->|override-once| proceed
  choice -->|fresh-context| reset[Start a fresh conversation]
  choice -->|i-want-validation| proceed
  choice -->|leave it| pause[Pause here]
  • Not a treatment. No detector treats, cures, manages, or remediates anything. NeuroDock is software; that is the full extent of the claim.
  • Not a silent block. No guardrail rewrites a prompt, hides a response, or prevents a tool call. Every firing is surfaced with a plain-language reason; every detection ships with a non-empty override_options array.
  • Not a hidden model. Every heuristic is plain source code under packages/clinical/. A user reading thirty lines of Python can audit exactly why a detection fired.
  • Not aggregated. Detection events are stateless. The server persists nothing, logs no user content, and emits no telemetry. A skill MAY write a detection into the cognitive graph, but the guardrail server itself writes nothing.

All three detectors are live as of mcp-guardrail v0.0.3. The schemas remain locked at v0.1.0 per ADR 0006; the runtime now implements every detector the schema reserves.

DetectorStatusHeuristic (v0.1.0)Default thresholds
check_ruminationliveword_overlap_jaccardwindow 90 min, count 3, similarity 0.55
check_hyperfocusliveelapsed_threshold_with_eod60 / 90 / 120 minutes
check_sycophancylivefour named patterns (praise-without-evidence, escalating-validation, reassurance-seeking, agreement-without-counter)profile-tunable

A rumination loop is the same anxious question asked three or more times in ninety minutes. The detector takes the current prompt and a caller-supplied window of prior prompts, computes word-overlap Jaccard against each, and returns a structured advisory signal when the count crosses the threshold.

The heuristic is deliberately simple: tokenise on whitespace and punctuation, lowercase, drop a sixty-word stoplist, compute the Jaccard index. It is fully deterministic, zero-dependency, and auditable in roughly thirty lines of source. A future minor version replaces it with embedding cosine once mcp-cognitive-graph’s embedding path stabilises.

The hyperfocus detector classifies session escalation into a coarse level: none | gentle | nudge | hard. It reads a chronometric_snapshot supplied by the caller — never a direct import of mcp-chronometric — and compares elapsed seconds against the configured thresholds. Crossing the user’s stated end_of_day_local escalates the level by one step (a 60-minute session after end-of-day is treated as a nudge, not a gentle).

The detector quotes the user’s verbatim stated intent back to them. It does not improvise scolding text.

The sycophancy detector inspects either a candidate LLM response (for praise-without-evidence and escalating-validation patterns) or recent user messages (for repeated reassurance-seeking on the same decision). It returns a counter_prompt string the calling skill MAY feed back to the model — but the server never invokes a model and never rewrites the original response. The choice to use the counter-prompt stays with the user.

Every guardrail tool obeys these rules. They are inherited from ADR 0006 and from the five commitments in ETHICS.md.

  • Stateless. The server persists nothing. No SQLite, no JSONL, no in-memory caches that survive a tool call. Callers supply all history.
  • Override-first. Every detected: true output carries a non-empty override_options array. The schema enforces this with a JSON Schema conditional; an empty override list is a contract violation.
  • Reasoned. Every detection carries a reason string the skill MAY surface verbatim and a heuristic.{name, version, description} object pointing at the exact source.
  • Confidence-scored. Every detection carries confidence: float 0..1. Low-confidence detections SHOULD NOT trigger hard interventions; the schema documents the consumer pattern.
  • Closed override vocabulary. The override tokens (fresh-context, override-once, disable-for-session, lower-sensitivity, snooze-15m, snooze-once, commit-and-close, extend-end-of-day, i-want-validation, explain-the-match) are a closed enum at v0.1.0. New tokens require a minor bump and clinical sign-off.
  • No vendor coupling. The server imports no LLM SDK. check_sycophancy.counter_prompt is a string the caller may use; the server never invokes a model itself.
  • No cross-server imports. The hyperfocus detector takes a chronometric_snapshot as plain data. It does not import neurodock-mcp-chronometric.

A skill calls a guardrail tool, reads the structured response, and decides whether to surface the detection, defer it, or ignore it. The server never decides for the skill, and the skill never decides for the user.

The ocd-decision-finalizer skill is the reference consumer of check_rumination. It surfaces detections as a short footer line, quotes the prior matching prompts verbatim with their similarity scores, and offers each override token as a literal user-typeable command. It never rephrases the override tokens — they are part of the user-autonomy contract.

  • Reference: mcp-guardrail for the full tool surface and schemas.
  • ADR 0006 for the design rationale and the seventeen binding decisions.
  • Ethics for the five commitments that constrain everything on this page.
  • Manifesto for the higher-level framing of why this layer exists.