Skip to content
NeuroDock

mcp-guardrail

ToolWhat it takesWhat it returnsWhy your brain cares
check_rumination{ current_prompt, history, window_minutes?, threshold_count?, similarity_threshold? }detected, matched prior prompts, confidence, plain-language reason, non-empty override listcatches the fifth time you’ve reasked “is the migration plan okay” inside ninety minutes and offers an off-ramp instead of another reassurance hit
check_hyperfocus{ chronometric_snapshot, session_id?, hyperfocus_break_minutes?, end_of_day_local?, escalation_thresholds? }level (none / gentle / nudge / hard), elapsed seconds, override tokens including snooze-15m and commit-and-closeescalates a quiet “you’ve been here a while” into something firmer at 120 minutes — without ever blocking you, because the user remains the authority
check_sycophancy{ candidate_response?, recent_user_messages?, decision_context? }matched pattern, confidence, optional counter_prompt the caller may pass back to the modelspots the “great question!” loop and a chain of reassurance-seeking on the same decision before the model warps into a yes-machine

mcp-guardrail is the substrate’s clinical layer. It exposes three detectors that flag patterns unmodified LLM interaction tends to amplify for neurodivergent users: rumination loops, unregulated hyperfocus, and sycophancy. It blocks nothing; every detection ships with a non-empty override list and a plain-language reason.

  • Package: packages/mcp-guardrail/
  • Version: 0.0.3 — all three detectors live (rumination, hyperfocus, sycophancy)
  • Schemas: packages/mcp-guardrail/schemas/*.schema.json
  • ADR: 0006 — Guardrail tool design
  • Schema $id prefix: https://schemas.neurodock.org/mcp-guardrail/v0.1.0/
  • Concept: Guardrails
  • Stateless. The server persists nothing — no SQLite, no JSONL, no in-memory caches that survive a tool call. Callers supply all history.
  • No telemetry, no network sockets. Per ETHICS.md commitment 4.
  • No user content in logs. Only tool_invoked metadata is logged.
  • Override-token vocabulary is closed at v0.1.0. New tokens require a minor bump and clinical sign-off per ADR 0006 §3 and §10.
  • Heuristics are auditable. Source for each heuristic lives in packages/clinical/. Changes require clinical sign-off per ETHICS.md commitment 3.
ToolStatusHeuristic (v0.1.0)Default thresholds
check_ruminationliveword_overlap_jaccardwindow 90 min, count 3, similarity 0.55
check_hyperfocusliveelapsed_threshold_with_eod60 / 90 / 120 minutes
check_sycophancylivepattern overlap (4 patterns)similarity 0.5

Detect whether the user’s current prompt is a semantic repeat of recent prompts within a rolling window. Returns a structured advisory signal with the rule that fired, a confidence score, and a non-empty list of overrides the user may invoke. NEVER blocks the user’s action; the calling skill decides whether to surface, defer, or ignore.

Input{ current_prompt, history, window_minutes?, threshold_count?, similarity_threshold? }

  • current_prompt — verbatim, max 8000 chars.
  • history — array of {text, at} oldest-first, max 500 items. The caller is responsible for filtering to a sensible window and for redacting anything that should not be compared.
  • window_minutes — default 90, range 1..1440.
  • threshold_count — default 3, range 2..50. Two is the floor: a single repeat is normal human behaviour.
  • similarity_threshold — default 0.55, range 0..1. Calibrated on the field-study corpus.

Output{ detected, similar_prompts?, count, window_seconds, threshold, confidence, reason, heuristic, override_options, false_positive_feedback_path }

  • detected — true when count >= threshold within window_seconds. False is a first-class return.
  • similar_prompts — verbatim prior matches with their similarity scores. Empty when detected is false.
  • confidence — 0..1. Low-confidence detections SHOULD NOT trigger hard interventions.
  • heuristic.name — enum: word_overlap_jaccard | embedding_cosine | topic_model. v0.1.0 ships only the first; the others are reserved.
  • override_options[].token — closed enum: fresh-context | override-once | disable-for-session | lower-sensitivity.

The schema enforces override_options.minItems >= 1 and similar_prompts.minItems >= 1 when detected is true (JSON Schema allOf conditional).

Classify hyperfocus escalation into a coarse level given a snapshot of the chronometric session.

Input{ chronometric_snapshot, session_id?, hyperfocus_break_minutes?, end_of_day_local?, escalation_thresholds? }

  • chronometric_snapshot — caller-supplied. Contains {open_session, now, idle_signal?}. Loose-coupled: the guardrail server does NOT import neurodock-mcp-chronometric. Skills construct this from get_time_context + the in-flight session record.
  • end_of_day_local — optional HH:MM local time. When supplied, after this clock time the elapsed thresholds are interpreted more strictly (gentle becomes nudge, nudge becomes hard).
  • escalation_thresholds — optional {gentle, nudge, hard} in minutes. Defaults (60, 90, 120) per the field-study spec.

Output{ level, elapsed_seconds, confidence, reason, heuristic, override_options, ... }

  • level — enum: none | gentle | nudge | hard.
  • override_options[].token — closed enum includes snooze-15m, snooze-once, commit-and-close, extend-end-of-day.

Runtime: live. Returns a structured {level, elapsed_seconds, confidence, reason, heuristic, override_options, ...} advisory. Never blocks.

Detect either over-validation patterns in a candidate LLM response, or repeated reassurance-seeking in recent user messages on the same decision. Returns the matched pattern, a confidence score, and a counter_prompt the calling skill MAY surface to the model — the server itself never invokes a model.

Input{ candidate_response?, recent_user_messages?, decision_context? }

  • At least one of candidate_response or recent_user_messages MUST be provided (anyOf).
  • candidate_response — draft model response to evaluate, max 16000 chars.
  • recent_user_messages — array of {text, at}, scoped to the same decision_context.

Output{ detected, pattern, confidence, reason, heuristic, override_options, counter_prompt?, false_positive_feedback_path }

  • pattern — closed enum for the four reserved heuristic names.
  • counter_prompt — string the calling skill may pass back to the model. The server NEVER calls a model itself.
  • override_options[].token — closed enum includes i-want-validation and explain-the-match.

Runtime: live. Returns a structured {detected, pattern, confidence, reason, heuristic, override_options, counter_prompt?, ...} advisory. The server NEVER calls a model itself — counter_prompt is a string the calling skill MAY pass back.

CodeMeaning
INPUT_TOO_LARGEA text input exceeded its per-field cap. Caller MUST truncate before retrying.
HISTORY_OUT_OF_ORDERcheck_rumination.history items were not oldest-first or contained future timestamps. The server does not silently re-sort.
WINDOW_OUT_OF_RANGEwindow_minutes was outside 1..1440. Rejected rather than clamped so callers cannot accidentally disable the check.
SESSION_ID_MISMATCHcheck_hyperfocus received conflicting session_id values in the snapshot vs the top-level argument.
  • The server is stateless. No SQLite, no JSONL, no caches.
  • No telemetry. No network sockets in default or non-default configurations.
  • The server does NOT log current_prompt, history, candidate_response, or recent_user_messages. Only structured event names.
  • false_positive_feedback_path is a public GitHub issue template URL. A user reporting a false positive thereby discloses the prompt that fired; the disclosure is user-initiated, not server-emitted.
  • Additive-only within v0.1.x. Override-token enum and heuristic-name enum are closed; additions require a minor bump and clinical sign-off.
  • Default thresholds (rumination similarity 0.55, hyperfocus 60/90/120 minutes) may be revised within v0.1.x based on field-study data, but the field types and bounds are frozen.
  • The schemas at packages/mcp-guardrail/schemas/ are the source of truth. CODEOWNERS requires clinical-reviewer approval on changes.