Contribute an eval example
Source: the eval corpus lives at
packages/evals/corpora/.packages/evals/README.mdis canonical for the harness; this page is the contributor on-ramp.
The eval corpora are the most under-supplied resource in the project. Every contributed example improves how well the substrate handles real corporate ambiguity. It is also the lowest-friction way to contribute — fifteen minutes from “I want to add this” to “PR opened”, no code, no Python toolchain.
The translation corpus is the v0.0.2 contribution surface. Guardrail and other corpora open later (see packages/evals/corpora/README.md).
What you are adding
Section titled “What you are adding”One YAML file under packages/evals/corpora/translation/community/. The file describes one real Slack/email/Linear/PR-comment message you have received, anonymised, with your literal reading of what it actually meant. The harness replays your example against the four mcp-translation tools on every prompt change.
The schema is at packages/evals/schemas/example.schema.json. Validate locally with the harness; CI re-validates on the PR.
The fifteen-minute flow
Section titled “The fifteen-minute flow”- Pick a real message. Slack, email, Linear, GitHub, Notion — anywhere a corporate ambiguity bit you. Pick one that needed mental translation.
- Anonymise it. Replace names, company refs, project codenames, dates, and any other identifying detail with generic placeholders (
Alex,Project Atlas,the team,Q2). The substrate’sanonymise.pywill run a second pass in CI, but the contributor pass is required. - Drop a new YAML file into
packages/evals/corpora/translation/community/. Filename: any unique kebab-case slug (alex-circle-back.example.yaml). - Fill in
expectedwith your literal reading: the explicit ask (if any), whether there is ambiguity, and the action you would recommend. - Open a PR. Title:
evals(translation): add <slug>. Link to this page in the description.
Minimum example file
Section titled “Minimum example file”id: "translation.incoming.community.alex-circle-back"slice: "translation/incoming"created_at: "2026-05-20"consent: contributor: "anon-2026-05-20-001" consent_token: "sha256:<paste contributor-generated token>" anonymisation_pass: 1status: "contributed"license: "AGPL-3.0-or-later"input: text: "Hey - want to circle back on the Atlas rollout? No rush." channel: "slack"expected: explicit_ask: null ambiguity: detected: true recommended_next_action: action: "set_reminder"notes: | Soft-deferral pattern: "circle back" and "no rush" together signal the reply will not self-fulfil. The baseline should flag both phrases.Existing files under packages/evals/corpora/translation/incoming/ are good templates. Match their shape; the harness does partial matching, not exact equality, so you only need to assert the fields you are confident in.
Anonymisation tips
Section titled “Anonymisation tips”- Replace names with single first names from a name generator (
Alex,Priya,Sam). Avoid surnames entirely. - Replace project codenames with generic project nouns (
the rollout,Project Atlas,the migration). - Replace dates with relative references (
next week,Q2) or generic dates (2026-Qn). Do not preserve the original date if it could identify a release. - Replace company-specific jargon with the closest generic equivalent. If the jargon is the ambiguity, paraphrase the surrounding context and keep the jargon in quotes.
- Remove URLs, ticket numbers, internal tool names. If the URL pattern matters (e.g. a Linear permalink shape), use a placeholder like
https://linear.app/<workspace>/<ticket>. - Do not paste verbatim if the message contains anything you would not want a stranger reading. The corpus is published in the open; treat anonymisation as if it were.
Field-by-field reference
Section titled “Field-by-field reference”id— kebab-case, namespaced under your slice.translation.incoming.community.<slug>.slice— must match the directory:translation/incoming,translation/tone,translation/outgoing,translation/meetings.created_at— ISO date.consent.contributor— pseudonymous id starting withanon-. Never a real name.consent.consent_token— opaque sha256-prefixed string verifying opt-in. Generate locally; never reuse another contributor’s token.consent.anonymisation_pass—1for a contributor-anonymised example; CI runs pass 2.status—contributedfor community submissions. The harness setspublishedlater in the curation pipeline.license— must beAGPL-3.0-or-later. Same license as the rest of the repo.input— tool-specific input. Fortranslation/incoming, the shape is{text, channel?, thread_context?, target_language?}. Match the schema atpackages/mcp-translation/schemas/translate_incoming.schema.json.expected— your literal reading. Partial; only the fields you are confident about.ratings— optional. If you can speak for a neurotype (adhd,asd,audhd,ocd), add aratings[]entry withrater_neurotypesandagreement_with_expected.notes— optional. Brief explanation of what pattern this example exercises. Future curators will thank you.
Before opening the PR
Section titled “Before opening the PR”- The message is anonymised. Re-read it as a stranger.
- The
idis unique and namespaced. -
consent.contributorstarts withanon-. No real names. - The file validates against
packages/evals/schemas/example.schema.json. Runuv run python -m neurodock_evals.validate packages/evals/corpora/translation/community/<your-file>.yamlif your environment is set up; otherwise CI catches it. - You read
ETHICS.mdand confirmed the example does not embed personal data the substrate would refuse to log.
What’s next
Section titled “What’s next”- Overview for the broader contribution lanes.
- Governance for how the corpus is reviewed and merged.
- The seed examples at
packages/evals/corpora/translation/incoming/are the best stylistic reference.