0006 — Guardrail tool design
Source:
docs/decisions/0006-guardrail-tool-design.mdis the canonical artefact. This page is a short summary; read the full ADR for the alternatives considered, the seven binding decisions, the open questions, and the implementer’s notes.
- Status: Accepted
- Date: 2026-05-16
- Deciders: Thomas Lennon (maintainer), clinical reviewer,
mcp-architect
Context
Section titled “Context”mcp-guardrail is the substrate’s clinical layer — the only server whose tool changes require sign-off from both the maintainer and the clinical reviewer, and the only server whose detectors directly mediate the conversation between an LLM and a neurodivergent user.
Per the project plan, Phase 2 ships mcp-guardrail v0.1 with rumination detection only. Phase 3 ships all three detectors live once the field study (30–50 ND professionals, 8-week pilot) endorses heuristics and thresholds.
This ADR locks the schemas for all three tools now, in advance of Phase 2 shipping, so the contract Phase-2 consumer skills (e.g. ocd-decision-finalizer) commit to is the contract Phase 3 honours. Future implementation work is constrained to filling in heuristic code, not redesigning the wire.
Decision
Section titled “Decision”We adopt:
- Three schemas, locked at v0.1.0 as drafted in
packages/mcp-guardrail/schemas/. - Word-overlap Jaccard for
check_ruminationv0.1.0 (defaults: window 90 min, threshold count 3, similarity 0.55). - Caller-supplied
chronometric_snapshotforcheck_hyperfocus. No direct imports ofneurodock-mcp-chronometric. - Schema-only deployment for
check_hyperfocusandcheck_sycophancyin Phase 2; runtime returnsDETECTOR_NOT_YET_IMPLEMENTEDwithphase: "3"metadata. - Closed override-token vocabulary:
fresh-context,override-once,disable-for-session,lower-sensitivity,snooze-15m,snooze-once,commit-and-close,extend-end-of-day,i-want-validation,explain-the-match. Each tool exposes only the subset that makes sense. x-clinical-review-required: trueannotation on every schema, recording the standing requirement that changes require clinical sign-off.heuristic_sourcepath recorded in each schema’scompatibilityblock. The path is normative: the source code IS the auditable specification perETHICS.mdcommitment 3.
View alternative approaches and technical debates
Alternatives rejected:
- Sentence-embedding cosine for rumination v0.1.0 — deferred to v0.0.2 once
mcp-cognitive-graph’s embedding stack stabilises. Model weights complicate the “heuristics are public” commitment. - Topic modelling over recent prompts — introduces unsupervised learning state, violates stateless-server principle, brittle on N=3 windows.
- Direct Python import of
neurodock-mcp-chronometricfor hyperfocus — tightens two servers into one logical unit, violates composability. - Free-form override tokens — fragments user-autonomy contract across skills.
Cross-cutting rules established here
Section titled “Cross-cutting rules established here”Each rule is a direct restatement of an ETHICS.md commitment, translated into a schema invariant:
- No treatment claims (commitment 1). No tool name, field, enum value, or description uses clinical vocabulary that could be read as diagnosis or treatment.
- No silent blocks (commitment 2). Every
detected: trueoutput carries a non-emptyoverride_optionsarray (enforced via JSON SchemaallOfconditional). Every detection carries areasonstring. - Public, auditable heuristics (commitment 3). Every detection output carries
heuristic.{name, version, description}. The actual rule code lives inpackages/clinical/and is git-auditable. - No aggregation (commitment 4). The server is stateless.
compatibility.side_effectsandcompatibility.telemetryare uniformly “None”. Skills MAY write detection events into the cognitive graph; the guardrail server never persists them. - False-positive humility (commitment 5). Every output carries
confidence: float 0..1andfalse_positive_feedback_path. Low-confidence detections SHOULD NOT trigger hard interventions.
Vendor-boundary discipline: check_sycophancy returns a counter_prompt string but does not call any LLM. The calling skill is responsible for any model interaction. All four substrate servers route LLM use through the user’s MCP client.
Open questions
Section titled “Open questions”The full ADR carries six open questions:
- Where does the field-study corpus live? (Recommended: HuggingFace, consistent with the translation eval corpus, with a small in-repo seed for CI replay.)
- Clinical-advisor sign-off process before tagging v0.1.0. (Recommended: tag the package and document detector status in release notes, with
x-implementation-statuscarrying the truth in the schema.) - Mechanism for plugin-distributed skills to introduce custom override tokens. (Recommended: no in v0.1.0; the closed vocabulary IS the consistency surface.)
- Cross-skill UI primitive for “this detector fired” rendering. (Recommended: defer to design-system-keeper; the schemas as drafted do not constrain rendering.)
- Repeated
disable-for-sessioninvocations — should install-time consent be re-confirmed? (Recommended: defer to profile UX, not schemas.) - Multi-language support for the Jaccard stoplist. (Recommended: defer; v0.0.2 with embeddings is the better fix.)
What’s next
Section titled “What’s next”- Read the full ADR.
mcp-guardrailreference for the tool surface.- Guardrails concept for the higher-level framing.
- Ethics for the five commitments that bind this layer.