mcp-guardrail
Tools at a glance
Section titled “Tools at a glance”| Tool | What it takes | What it returns | Why your brain cares |
|---|---|---|---|
check_rumination | { current_prompt, history, window_minutes?, threshold_count?, similarity_threshold? } | detected, matched prior prompts, confidence, plain-language reason, non-empty override list | catches the fifth time you’ve reasked “is the migration plan okay” inside ninety minutes and offers an off-ramp instead of another reassurance hit |
check_hyperfocus | { chronometric_snapshot, session_id?, hyperfocus_break_minutes?, end_of_day_local?, escalation_thresholds? } | level (none / gentle / nudge / hard), elapsed seconds, override tokens including snooze-15m and commit-and-close | escalates a quiet “you’ve been here a while” into something firmer at 120 minutes — without ever blocking you, because the user remains the authority |
check_sycophancy | { candidate_response?, recent_user_messages?, decision_context? } | matched pattern, confidence, optional counter_prompt the caller may pass back to the model | spots the “great question!” loop and a chain of reassurance-seeking on the same decision before the model warps into a yes-machine |
mcp-guardrail is the substrate’s clinical layer. It exposes three detectors that flag patterns unmodified LLM interaction tends to amplify for neurodivergent users: rumination loops, unregulated hyperfocus, and sycophancy. It blocks nothing; every detection ships with a non-empty override list and a plain-language reason.
- Package:
packages/mcp-guardrail/ - Version:
0.0.3— all three detectors live (rumination, hyperfocus, sycophancy) - Schemas:
packages/mcp-guardrail/schemas/*.schema.json - ADR: 0006 — Guardrail tool design
- Schema
$idprefix:https://schemas.neurodock.org/mcp-guardrail/v0.1.0/ - Concept: Guardrails
Design invariants
Section titled “Design invariants”- Stateless. The server persists nothing — no SQLite, no JSONL, no in-memory caches that survive a tool call. Callers supply all history.
- No telemetry, no network sockets. Per
ETHICS.mdcommitment 4. - No user content in logs. Only
tool_invokedmetadata is logged. - Override-token vocabulary is closed at v0.1.0. New tokens require a minor bump and clinical sign-off per ADR 0006 §3 and §10.
- Heuristics are auditable. Source for each heuristic lives in
packages/clinical/. Changes require clinical sign-off perETHICS.mdcommitment 3.
Tools (status & heuristics)
Section titled “Tools (status & heuristics)”| Tool | Status | Heuristic (v0.1.0) | Default thresholds |
|---|---|---|---|
check_rumination | live | word_overlap_jaccard | window 90 min, count 3, similarity 0.55 |
check_hyperfocus | live | elapsed_threshold_with_eod | 60 / 90 / 120 minutes |
check_sycophancy | live | pattern overlap (4 patterns) | similarity 0.5 |
check_rumination
Section titled “check_rumination”Detect whether the user’s current prompt is a semantic repeat of recent prompts within a rolling window. Returns a structured advisory signal with the rule that fired, a confidence score, and a non-empty list of overrides the user may invoke. NEVER blocks the user’s action; the calling skill decides whether to surface, defer, or ignore.
Input — { current_prompt, history, window_minutes?, threshold_count?, similarity_threshold? }
current_prompt— verbatim, max 8000 chars.history— array of{text, at}oldest-first, max 500 items. The caller is responsible for filtering to a sensible window and for redacting anything that should not be compared.window_minutes— default 90, range 1..1440.threshold_count— default 3, range 2..50. Two is the floor: a single repeat is normal human behaviour.similarity_threshold— default 0.55, range 0..1. Calibrated on the field-study corpus.
Output — { detected, similar_prompts?, count, window_seconds, threshold, confidence, reason, heuristic, override_options, false_positive_feedback_path }
detected— true whencount >= thresholdwithinwindow_seconds. False is a first-class return.similar_prompts— verbatim prior matches with their similarity scores. Empty whendetectedis false.confidence— 0..1. Low-confidence detections SHOULD NOT trigger hard interventions.heuristic.name— enum:word_overlap_jaccard | embedding_cosine | topic_model. v0.1.0 ships only the first; the others are reserved.override_options[].token— closed enum:fresh-context | override-once | disable-for-session | lower-sensitivity.
The schema enforces override_options.minItems >= 1 and similar_prompts.minItems >= 1 when detected is true (JSON Schema allOf conditional).
check_hyperfocus
Section titled “check_hyperfocus”Classify hyperfocus escalation into a coarse level given a snapshot of the chronometric session.
Input — { chronometric_snapshot, session_id?, hyperfocus_break_minutes?, end_of_day_local?, escalation_thresholds? }
chronometric_snapshot— caller-supplied. Contains{open_session, now, idle_signal?}. Loose-coupled: the guardrail server does NOT importneurodock-mcp-chronometric. Skills construct this fromget_time_context+ the in-flight session record.end_of_day_local— optionalHH:MMlocal time. When supplied, after this clock time the elapsed thresholds are interpreted more strictly (gentle becomes nudge, nudge becomes hard).escalation_thresholds— optional{gentle, nudge, hard}in minutes. Defaults(60, 90, 120)per the field-study spec.
Output — { level, elapsed_seconds, confidence, reason, heuristic, override_options, ... }
level— enum:none | gentle | nudge | hard.override_options[].token— closed enum includessnooze-15m,snooze-once,commit-and-close,extend-end-of-day.
Runtime: live. Returns a structured {level, elapsed_seconds, confidence, reason, heuristic, override_options, ...} advisory. Never blocks.
check_sycophancy
Section titled “check_sycophancy”Detect either over-validation patterns in a candidate LLM response, or repeated reassurance-seeking in recent user messages on the same decision. Returns the matched pattern, a confidence score, and a counter_prompt the calling skill MAY surface to the model — the server itself never invokes a model.
Input — { candidate_response?, recent_user_messages?, decision_context? }
- At least one of
candidate_responseorrecent_user_messagesMUST be provided (anyOf). candidate_response— draft model response to evaluate, max 16000 chars.recent_user_messages— array of{text, at}, scoped to the samedecision_context.
Output — { detected, pattern, confidence, reason, heuristic, override_options, counter_prompt?, false_positive_feedback_path }
pattern— closed enum for the four reserved heuristic names.counter_prompt— string the calling skill may pass back to the model. The server NEVER calls a model itself.override_options[].token— closed enum includesi-want-validationandexplain-the-match.
Runtime: live. Returns a structured {detected, pattern, confidence, reason, heuristic, override_options, counter_prompt?, ...} advisory. The server NEVER calls a model itself — counter_prompt is a string the calling skill MAY pass back.
Error codes
Section titled “Error codes”| Code | Meaning |
|---|---|
INPUT_TOO_LARGE | A text input exceeded its per-field cap. Caller MUST truncate before retrying. |
HISTORY_OUT_OF_ORDER | check_rumination.history items were not oldest-first or contained future timestamps. The server does not silently re-sort. |
WINDOW_OUT_OF_RANGE | window_minutes was outside 1..1440. Rejected rather than clamped so callers cannot accidentally disable the check. |
SESSION_ID_MISMATCH | check_hyperfocus received conflicting session_id values in the snapshot vs the top-level argument. |
Privacy
Section titled “Privacy”- The server is stateless. No SQLite, no JSONL, no caches.
- No telemetry. No network sockets in default or non-default configurations.
- The server does NOT log
current_prompt,history,candidate_response, orrecent_user_messages. Only structured event names. false_positive_feedback_pathis a public GitHub issue template URL. A user reporting a false positive thereby discloses the prompt that fired; the disclosure is user-initiated, not server-emitted.
Versioning
Section titled “Versioning”- Additive-only within
v0.1.x. Override-token enum and heuristic-name enum are closed; additions require a minor bump and clinical sign-off. - Default thresholds (rumination similarity 0.55, hyperfocus 60/90/120 minutes) may be revised within
v0.1.xbased on field-study data, but the field types and bounds are frozen. - The schemas at
packages/mcp-guardrail/schemas/are the source of truth. CODEOWNERS requires clinical-reviewer approval on changes.
What’s next
Section titled “What’s next”- Guardrails concept for the higher-level framing.
- ADR 0006 for the design rationale.
- Ethics for the five commitments that bind this layer.