Five confirmed findings from the sovereign-audit pass, ordered by severity:
Z3-001 CRITICAL — Fastify now trustProxy:true so req.ip resolves to the
real visitor IP via X-Forwarded-For instead of always being the nginx /
docker-bridge peer. Every per-IP rate-limit in the codebase was silently
collapsed into one global counter; this restores them.
Z1-001 CRITICAL — runner container hardening flags (--read-only,
--cap-drop=ALL, --security-opt=no-new-privileges:true, --pids-limit=100,
--memory=512m, --cpus=0.5, tmpfs /tmp) were sitting commented-out as a
TODO despite /security promising them. Now applied unconditionally on
production/staging; opt-out flag RUNNER_DISABLE_HARDENING=1 for Win-dev.
Z2-001 + Z2-002 CRITICAL / MEDIUM — banned-pattern blacklist tightened
(Function(...) without `new`, process.binding, process.dlopen,
.constructor.constructor, _load, vm.runIn*Context, globalThis['..'],
"system prompt override"). scanForInjection now also walks tool.name and
every inputSchema property description, not only implementation +
description — closes the prompt-injection-into-AI-client surface that
downstream clients (Claude Desktop, Cursor) read verbatim. The duplicate
BANNED_PATTERNS in apps/api/src/routes/servers.ts deleted in favour of
the single shared scanForInjection export from @bmm/llm.
Z4-001 HIGH — /v1/auth/magic-link gained the two-axis daily rate-limit
the SMS endpoint already had: 10/IP/day + 5/email/day. Combined with the
trustProxy fix above these are now real per-visitor limits.
Z4-002 MEDIUM — magic-link callback URL no longer printed to stdout in
production. In dev it still prints (so devs can click the link); in
production we log only "issued, URL withheld" and a loud error if no
email sender is wired (Resend integration is the actual launch
blocker — left as a TODO).
Z6-001 MEDIUM — /v1/builds/:id/stream WebSocket now refuses cross-origin
upgrades. SameSite=Lax already mitigates in modern browsers; this is the
defense-in-depth against browser bugs and non-browser clients.
FALSE POSITIVES dismissed: slug path-traversal (schema regex
^[a-z][a-z0-9-]*$ in @bmm/types catches it); session-after-promote
(getSession re-fetches isAdmin from DB on every request).
DEFERRED (not blockers, tracked):
- Z1-002 generated-server HTTPS — needs nginx wildcard subdomain TLS
- Z1-003 docker image cleanup cron
- Z2-001 v2 — real sandbox runtime (multi-week refactor)
- Z3-002 rawBody-per-request memory — branch on webhook path only
- Z5-001 multi-user org RBAC for billing — gated on Team feature
- Email sender integration (Resend) — launch blocker
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The free tier was hemorrhaging Anthropic cost with no abuse cap (no rate
limit on /preview, Opus default in the build worker, 5-min cache TTL that
made cache-miss the common case). This switches free users to GLM, paid
users to Claude tiers, and tightens every leak found in the audit.
Backend:
- @bmm/llm: GLM provider via Zhipu's OpenAI-compatible endpoint, pickPreviewModel
+ pickBuildModel helpers, plan-aware ModelChoice
- preview-cache TTL 5min -> 24h (kills the cache-miss path)
- /v1/servers/preview: picks model from caller's plan, returns model name to UI
- /v1/servers POST: enforces SERVER_LIMITS per plan (402), rate-limits builds
- daily rate-limit on preview (5/40/150/1000) and build (3/20/100/500)
- /v1/auth/me returns plan so the wizard can show the right model name
- generator worker: GLM default, Anthropic Sonnet fallback if GLM errors
Frontend:
- Wizard fetches plan, shows "<model> is drafting the tool spec" pre-emptively,
upgrade hint for hobby users, friendly errors for 402 / 429
- Pricing page: AI-model line per tier (Open-tier / Haiku / Sonnet / Opus),
Team €149 -> €199, Enterprise €499 -> €999, daily-preview limit per tier
- Privacy + Security: explicit subprocessor disclosure for Anthropic (US) /
Zhipu (CN) and which tier uses which
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The /v1/servers/preview route ran claude-opus-4-7 synchronously; full spec
generation routinely exceeded Cloudflare's ~100s proxy cap, so the browser
received a headerless 524 and reported it as a CORS failure.
- preview now uses claude-sonnet-4-6 with a 45s per-attempt timeout and one
retry — comfortably inside the proxy budget
- generateSpec maps an exhausted timeout to SpecTimeoutError; the route
returns a clean 504 (with CORS headers) instead of a stalled connection
- analyze step: live elapsed-seconds counter as freeze-proof, plus a
reduced-motion exception so the loading spinner keeps spinning (a status
indicator, which WCAG exempts from reduced-motion)
- textarea resize grip restyled to dark theme (light hatch on dark square)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Server-side authorization-code flow: /v1/auth/google redirects to the
consent screen with a CSRF state cookie; /v1/auth/google/callback
exchanges the code, validates the ID token (iss/aud/exp/email_verified),
and mints a 30-day session via upsertOAuthLogin. /v1/auth/providers lets
the login UI hide the button until GOOGLE_OAUTH_ID/SECRET are set.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes structural weakness #4 from the audit (single global key, no rotation,
no KMS path). Customer secrets now use envelope encryption with a real
rotation story.
Model:
KEK — Key Encryption Key, 32 bytes from env (SECRETS_ENCRYPTION_KEY). Never
stored in the DB. Root of trust.
DEK — Data Encryption Key, 32 random bytes we generate, stored in the new
encryption_keys table *wrapped* (AES-256-GCM encrypted) with the KEK.
Secrets are encrypted with the DEK.
Schema:
- encryption_keys (version, wrappedDek, active, rotatedBy, createdAt, retiredAt)
- secrets.keyId — which DEK encrypted this row. NULL = legacy (KEK-direct,
pre-envelope); decryptSecret handles both and the first rotation migrates
legacy rows onto a DEK.
crypto.ts (full rewrite):
- ensureActiveKey() — boot-time, loads keys + creates v1 if none. Fail-closed:
index.ts process.exit(1) if it throws — the API will not serve if encryption
can't initialize.
- encryptSecret() — encrypts with the active DEK, returns { value, keyId }.
- decryptSecret(value, keyId) — DEK path or legacy KEK-direct path.
- rotateKeys() — mints a fresh DEK, re-encrypts EVERY secret under it inside a
single transaction (decrypt-old / encrypt-new per row), retires the old key,
activates the new one. A partial failure is recoverable because every row
carries its own keyId.
- encryptionStatus() — active version, key history, secret + legacy counts.
Admin:
- GET /v1/admin/encryption — status
- POST /v1/admin/encryption/rotate — triggers rotateKeys, audit-logged as
admin.encryption.rotate with { newVersion, reEncrypted }.
- /admin/encryption page — active-key/secret/legacy cards, Rotate button with
confirm, key-history table, plain-English how-it-works. Added to admin nav.
Verified end-to-end:
- boot → encryption_keys v1 active, '[crypto] envelope encryption ready'
- created a server with secret MY_API_KEY → stored ciphertext, keyId = v1
- POST rotate → { newVersion: 2, reEncrypted: 1 }; ciphertext changed, keyId
now v2, v1 retired, v2 active. The decrypt-then-reencrypt round-trip
succeeded (rotation throws otherwise) — the secret is provably recoverable.
- admin UI renders the status + history correctly.
Deferred, named honestly (not built this iteration):
- worker reads secrets from the DB instead of the BullMQ job-data plaintext
copy — would also remove plaintext secrets from Redis. Separate change with
its own risk surface on the iterate/fork flows.
- per-server secret-value rotation UI
- audit_log hash-chaining (tamper-evidence)
- rate limiting on auth endpoints
What this enables:
- A user builds an MCP server. If others would benefit, they click 'Publish as
template' on their server detail page. The spec + pre-rendered TypeScript
snapshot is preserved.
- Visitors browse /templates, filter by category, sort by trending/top/newest.
Each template card shows fork count + active deployment count as natural
manipulation-resistant popularity signal.
- /templates/[slug] shows the full plan: tool list with input schemas,
required-credential explanations (with 'how to get one' deep links), and a
collapsible code preview so users can audit before forking.
- Fork is one click → /servers/new?template=slug. The wizard skips Step 1 and
pre-fills Step 2 with the template's parsed spec. Forker only fills in their
own credentials. mcp_servers.template_id is recorded; template.fork_count is
bumped atomically. Each fork gets its own isolated container with its own
port, its own AES-256 secrets — the template author has zero visibility into
the fork's traffic or data.
- Admin /admin/templates moderation: verify quality templates (shows shield
badge in marketplace), hide low-effort ones, takedown anything malicious.
Takedowns cascade-pause every fork container — owners must re-deploy.
Why template+fork instead of shared-container:
- Shared containers would mean the publisher's quota + their secrets + their
logs are exposed to forkers. Bad ergonomics, bad security, bad ownership.
- Templates/forks decouple the spec (shared, vouched-for) from the runtime
(isolated per user). Network-effect moat without the trust collapse.
Why no 5-star voting in v1:
- Manipulation-anfällig, empty lists without adoption. We use fork count +
active deploys + verified badge. Trending algorithm:
score = (activeDeploys * 3 + forks) / sqrt(ageDays + 1)
Real signal, no brigading attack surface.
Backend:
- New schema: templates table (16 cols incl. tools_schema, generated_code,
required_secrets, allowedDomains, status enum, verified, fork_count).
- mcp_servers.template_id FK + idx for fork lookup.
- @bmm/types: SpecEdit unchanged, CreateServerInput accepts optional templateId.
- preview-cache.ts: new cachePrebuiltCode/loadPrebuiltCode for storing the
template's full rendered server.ts alongside the spec. Generator worker
detects this and skips the render step — uses the audited pre-built code
verbatim. Banned-pattern re-scan at publish time.
- routes/templates.ts: 5 public/auth routes + 2 admin routes. Banned-pattern
re-scan before publish. Slug auto-uniqued. forkCount atomic-increment via
SQL.
UI:
- /templates marketplace with trending/top/newest tabs, category filter, search.
Cards show forks + live count + author + verified badge.
- /templates/[slug] full detail with tools, credentials-with-hints, expandable
code preview, fork CTA, ownership + stats sidebar, 'forking is safe' explainer.
- /servers/new?template=slug — wizard auto-jumps to Step 2 with template spec
pre-filled, fork banner at top with link back to template.
- /servers/[id] new Publish tab with title, category, descriptions, per-secret
hint fields (description + howToGetUrl per UPPER_SNAKE_CASE key).
- /admin/templates moderation with verify/hide/takedown actions.
- Marketing nav now includes /templates.
Verified end-to-end:
- Published Echo Demo Template from marco@test.local's live server
- Marketplace lists it correctly with stats
- Detail page renders with all sections
- Fork CTA navigates to wizard with ?template= param
- Wizard skips Step 1, shows fork banner, pre-fills spec
- Build succeeds in ~10s (cached spec + prebuilt code path skips Claude AND
render), container live on :4109 with proper OAuth 401 → token → 200 flow
- DB: templates.fork_count=1, activeDeployments=1, mcp_servers.template_id
populated on the fork
- /admin/templates shows the new template with verify/hide/takedown controls
The wizard's confirm step is no longer read-only. Users can refine what Claude
parsed before committing to a build.
Backend:
- @bmm/types adds SpecEdit (tools[name,description,inputSchema] + requiredSecrets);
CreateServerInput accepts an optional specEdit alongside previewId.
- Servers create endpoint: when specEdit is provided, loads cached spec from Redis,
index-merges the edits in (keeping LLM-generated implementations untouched),
re-validates via GeneratorSpec, re-runs the banned-pattern scan, overwrites the
Redis cache so the worker reads the user's version. Refuses with
preview_expired/tool_count_mismatch/banned_pattern on safety failures.
- New overwriteSpec() helper in preview-cache.
Frontend:
- Step 2 renders each tool as an editable card: name input, description textarea,
JSON schema textarea with parse-on-keystroke validation (inline error if invalid).
- Required secrets list is editable: keys via uppercase-snake-case input, +Add /
remove buttons, secret values kept in sync when keys are renamed.
- Reset-to-AI-suggestion button appears when edits are dirty.
- Pre-submit validation: schema must parse, secret keys must match UPPER_SNAKE_CASE,
required secret values must be provided.
- Warning copy: 'Renaming parameters may require an Iterate after build — the
existing impl references the original names.'
Verified end-to-end via browser smoke test: edited description + renamed tool
landed correctly in mcp_servers.tools_schema and in the live container at :4107.
Implementation field preserved from the original cached spec.
- POST /v1/servers/preview runs Claude synchronously, validates output, caches spec
in Redis under preview:<id> with 5min TTL, returns previewId+spec+detectedSecrets.
- POST /v1/servers accepts optional previewId; worker reuses the cached spec if
the entry is still present, otherwise regenerates fresh. Skips the second
Claude round-trip (~30s saved on the demoable path).
- audit() helper writes auth.login, auth.logout, server.create, server.iterate,
server.delete to audit_log with ip, metadata, resourceId.
- GET /v1/me/org returns organization + members list for the settings page.
- GET /v1/audit?limit=&action=&resourceType= returns scoped audit entries.
- Bump @modelcontextprotocol/sdk from 1.0.4 to 1.29.0 in runner-template
(1.0.4 has no McpServer or StreamableHTTPServerTransport — file not found at runtime).
- Bump zod to 3.25.76 across workspace to satisfy modern SDK peer dep.
- Split OAUTH_ISSUER (canonical, host-reachable) from CONTROL_PLANE_URL (container-reachable for JWKS).
Runner verifies iss against OAUTH_ISSUER; fetches JWKS from CONTROL_PLANE_URL.
Both API and runner now agree on http://localhost:4000/oauth as the issuer in dev.
- Move postgres host port 5432 to 5440, redis 6379 to 6390 to avoid collisions with
native installs on the dev machine.
- Move web from 3000 to 3001 (3000 occupied by Gitea on dev machine).
- Drop pino-pretty transport from API to avoid runtime require of an unbundled dep.
- Cast build_logs.level (varchar) to BuildEvent's literal union in WS replay path.
- Remove unused reqBase helper in oauth.ts.