buildmymcpserver

Author	SHA1	Message	Date
Marco Sadjadi	092290bb38	fix(preview/stream): await onSpec/onError handlers All checks were successful Deploy to Production / deploy (push) Successful in 1m21s Details The llm package called the user-supplied onSpec/onError handlers without awaiting them. In the /preview/stream route onSpec is async (it does `await cacheSpec(...)` then writes the SSE `spec` event), so the api handler's `await streamSpecFromAnthropic(...)` returned BEFORE the terminal event had been written. The route's finally block then ran `reply.raw.end()`, the queued `send('spec', ...)` hit a closed stream and silently no-op'd, and the browser saw zero terminal events — frontend ran into the "Spec generation failed." fallback even though Anthropic had delivered a perfectly valid spec. Verified against prod log: req-8 ran 66s with 200 and produced no preview_spec_* log line, which is exactly the success-but-event-lost signature. Fix: - StreamHandlers.onSpec / onError typed as Promise<void> \| void - Both call sites in streamSpecFromAnthropic now `await` them - /preview/stream sets `resolved = true` at the END of each handler (after the SSE write completes) so the post-stream "unresolved" fallback only fires on a genuine programming bug - Added preview_spec_ready info log on the happy path so future diagnosis doesn't have to infer success from the absence of error logs	2026-05-28 22:00:03 +02:00
Marco Sadjadi	ec819082a6	fix(llm): escape backticks in SYSTEM_PROMPT (broke typecheck) All checks were successful Deploy to Production / deploy (push) Successful in 1m9s Details	2026-05-28 21:39:34 +02:00
Marco Sadjadi	147ba69968	fix(runner): alias params/input to args so tool implementations don't ReferenceError Some checks failed Deploy to Production / deploy (push) Has been cancelled Details Auth chain finally landed but tool calls crashed in the wetter server with "Error: params is not defined". The MCP SDK passes the validated tool args as a single parameter; our template names that parameter `args` but the model frequently writes `params.location` / `input.x` because that's how OpenAPI and JSON-RPC reference docs read. Two-sided fix: - render.ts wraps every implementation with `const params = args; const input = args;` inside the try block. Whichever alias the model picked, the variable resolves to the same validated object. - SYSTEM_PROMPT now states the variable name EXPLICITLY ("variable named EXACTLY `args`, e.g. args.location") so new generations stop drifting on that detail. Existing wetter runner needs a rebuild to pick up the alias shim.	2026-05-28 21:39:11 +02:00
Marco Sadjadi	0c6d738a6b	feat(preview): SSE-streamed generation, no CF 100s edge cap All checks were successful Deploy to Production / deploy (push) Successful in 1m27s Details Architectural fix for "spec_too_large" / preview_timeout — the sync endpoint had to fit the whole model run into Cloudflare's ~100s edge window, which made the system fragile against any prompt that produced a verbose spec. The new streaming path pipes Anthropic's token deltas as Server-Sent Events; every chunk resets CF's idle timer and a 15s keepalive comment guarantees activity even during slow first-token windows. @bmm/llm: new streamSpecFromAnthropic() exposes the SDK's .stream() flow with the same typed-error contract as generateSpec — same SpecTruncatedError / SpecValidationError / SpecTimeoutError raised from the relevant moment. API: POST /v1/servers/preview/stream returns text/event-stream with events 'text' (deltas), 'spec' (final success payload, same shape as the sync endpoint), 'error' (typed). Anthropic-only — GLM/hobby falls back to the sync route via 409 streaming_unavailable. Frontend: apiSseStream() handles the POST + ReadableStream + SSE parser. The wizard's analyze() prefers the stream and only uses the sync endpoint on the explicit 409 fallback. nginx (api.buildmymcpserver.com): the /v1/builds/ location block (which already had proxy_buffering off + 600s read timeout for the WS build stream) now also matches /v1/servers/preview/stream so the SSE response isn't buffered.	2026-05-28 21:11:05 +02:00
Marco Sadjadi	b930a454e8	fix(llm): tighter system prompt + 12288 max_tokens for paid tiers All checks were successful Deploy to Production / deploy (push) Successful in 1m33s Details Sonnet 4.6 was still hitting max_tokens on ambitious prompts like "WorldWeather MCP for any location" because the implementation bodies ballooned with defensive scaffolding. Two changes: 1. SYSTEM_PROMPT now imposes hard limits the model can self-enforce: - at most 6 tools (combine related capabilities with a mode param) - implementation body <= 40 lines, no comments, no overengineering - descriptions <= 100 chars These keep a typical preview under ~7k output tokens. 2. team/enterprise maxTokens 8192 -> 12288. At ~130 tok/s that fits in ~94s, still under Cloudflare's 100s edge cap. Hobby (GLM) and pro (Haiku) keep their existing limits — they were not hitting the ceiling. SpecTruncatedError still fires + surfaces 422 spec_too_large when even 12288 isn't enough, so the user gets actionable feedback instead of an opaque zod error.	2026-05-28 21:01:50 +02:00
Marco Sadjadi	d2b19a5439	fix(preview): max_tokens 4096→8192 + detect truncation explicitly All checks were successful Deploy to Production / deploy (push) Successful in 1m24s Details Root cause of repeat 422s: 4096 was too tight for ambitious prompts (Marco's research-assistant prompt produces ~12kB of JSON before the model gets cut off mid-string). The error then surfaced as an opaque "Unterminated string in JSON" zod failure instead of pointing the user at the real problem. Two fixes: - maxTokens back to 8192 (the original) for all Claude tiers, 4096 for GLM. Timeouts bumped to 95s — Sonnet 4.6 at ~130 tok/s does 8192 in ~63s, ~30s headroom for cold starts, still under Cloudflare's 100s edge cap. - Detect stop_reason === 'max_tokens' on the Anthropic response BEFORE parsing and throw the new SpecTruncatedError. /preview catches it and returns 422 spec_too_large with a clear "split the prompt" message instead of leaking the zod parse failure.	2026-05-28 19:34:40 +02:00
Marco Sadjadi	979d1abfca	feat(preview): log spec validation failures with raw output All checks were successful Deploy to Production / deploy (push) Successful in 1m25s Details 422s from /preview hid the actual reason: zod_message tells which field was wrong and a 400-char preview of the model output reveals refusals or non-JSON returns. Both stay in the api log only — never surfaced to the client unchanged.	2026-05-28 19:19:57 +02:00
Marco Sadjadi	5a8e736113	fix(llm): preview timeout 60s→90s + maxTokens 8192→4096 All checks were successful Deploy to Production / deploy (push) Successful in 1m21s Details Enterprise plan was hitting SpecTimeoutError exactly at 60s because the Sonnet 4.6 preview was budgeted for 8192 tokens at ~80 tok/s (≈102s worst case) inside a 60s window. The frontend then rolled back to step 1 with no spec. A real spec is small (<= ~10 tools, ~1.5–2.5k output tokens in practice) so 4096 is plenty and lets even Sonnet finish in ~51s worst case. The 90s timeout buys headroom for cold starts while staying under Cloudflare's 100s edge cap. Hobby/GLM bumped to 90s too — same headroom argument.	2026-05-28 18:51:51 +02:00
Marco Sadjadi	aa79a71357	security: sovereign-audit Pass-2 fixes — auth-lib, oauth, templates All checks were successful Deploy to Production / deploy (push) Successful in 54s Details Six confirmed findings closed (3 MEDIUM, 3 LOW). Tier-1 surfaces from Pass-1 re-verified non-regressed; this pass deepened the audit on the auth library, OAuth issuer, and template marketplace. Za-002 MEDIUM (scrypt cost) — bump SCRYPT_N from 2^14 → 2^17 (131072) matching current OWASP guidance for password hashing in 2026. Hash format embeds N (`scrypt$N$salt$hash`), so the existing admin password at the old cost still verifies — backward-compatible. Also added explicit maxmem ceilings since Node's default (~32MiB) is insufficient for the new N. Za-003 MEDIUM (single-use race) — consumeMagicLink was SELECT-then- UPDATE; two parallel redemptions could both win and mint two sessions from the same token. Now uses the same atomic `UPDATE … WHERE id = ? AND consumedAt IS NULL RETURNING id` pattern /oauth/token already had — loser of the race gets invalid_or_expired_token. Za-004 LOW (membership ordering) — `.orderBy(memberships.createdAt)` added so when org-invites eventually let a user belong to multiple orgs, the same one wins every login instead of insertion-order roulette. Latent-bug pre-empt. Zb-002 LOW (OAuth register spam) — /oauth/register now per-IP daily rate-limited at 20/day (well above any legitimate MCP-client bootstrap pattern). Prevents DB-row spam. Zc-001 MEDIUM (banned-pattern drift) — three separate copies of BANNED_PATTERNS had drifted apart. The publish-time scanner in templates.ts was MISSING the 7 new patterns added in Pass-1 (process.binding, dlopen, .constructor.constructor, vm.runIn, globalThis['..']). Single source of truth in @bmm/llm now exports SHARED_BANNED_PATTERNS; templates.ts composes PUBLISH_BANNED_PATTERNS = SHARED ∪ code-only-extras (dynamic import, fs.rm, setTimeout-with- string, process.kill, jailbreak markers). Zc-002 LOW (N+1) — /v1/templates list was issuing one COUNT() per template (101 queries for a 100-row page). Now one grouped query with templateId GROUP BY, merged in JS. p95 doesn't degrade with marketplace growth. DEFERRED (documented, scoped for next sprint): Za-001 HIGH — Account takeover via cross-provider email lookup. Requires schema change (users.primaryProvider). Mitigation in /settings/account banner planned. Zb-001 MEDIUM — /oauth/token refresh_token grant: advertised in AS metadata but unsupported_grant_type. Either implement (~40 LOC) or strip from metadata. Zc-003 LOW — Admin takedown partial-failure consistency. Zd-001 IMPROVE — DEK cache invalidation across replicas (single- instance today). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 18:15:54 +02:00
Marco Sadjadi	f8af3fc0fd	security: sovereign-audit Phase 2 fixes — trustProxy, Docker hardening, banned-pattern overhaul All checks were successful Deploy to Production / deploy (push) Successful in 55s Details Five confirmed findings from the sovereign-audit pass, ordered by severity: Z3-001 CRITICAL — Fastify now trustProxy:true so req.ip resolves to the real visitor IP via X-Forwarded-For instead of always being the nginx / docker-bridge peer. Every per-IP rate-limit in the codebase was silently collapsed into one global counter; this restores them. Z1-001 CRITICAL — runner container hardening flags (--read-only, --cap-drop=ALL, --security-opt=no-new-privileges:true, --pids-limit=100, --memory=512m, --cpus=0.5, tmpfs /tmp) were sitting commented-out as a TODO despite /security promising them. Now applied unconditionally on production/staging; opt-out flag RUNNER_DISABLE_HARDENING=1 for Win-dev. Z2-001 + Z2-002 CRITICAL / MEDIUM — banned-pattern blacklist tightened (Function(...) without `new`, process.binding, process.dlopen, .constructor.constructor, _load, vm.runInContext, globalThis['..'], "system prompt override"). scanForInjection now also walks tool.name and every inputSchema property description, not only implementation + description — closes the prompt-injection-into-AI-client surface that downstream clients (Claude Desktop, Cursor) read verbatim. The duplicate BANNED_PATTERNS in apps/api/src/routes/servers.ts deleted in favour of the single shared scanForInjection export from @bmm/llm. Z4-001 HIGH — /v1/auth/magic-link gained the two-axis daily rate-limit the SMS endpoint already had: 10/IP/day + 5/email/day. Combined with the trustProxy fix above these are now real per-visitor limits. Z4-002 MEDIUM — magic-link callback URL no longer printed to stdout in production. In dev it still prints (so devs can click the link); in production we log only "issued, URL withheld" and a loud error if no email sender is wired (Resend integration is the actual launch blocker — left as a TODO). Z6-001 MEDIUM — /v1/builds/:id/stream WebSocket now refuses cross-origin upgrades. SameSite=Lax already mitigates in modern browsers; this is the defense-in-depth against browser bugs and non-browser clients. FALSE POSITIVES dismissed: slug path-traversal (schema regex ^[a-z][a-z0-9-]$ in @bmm/types catches it); session-after-promote (getSession re-fetches isAdmin from DB on every request). DEFERRED (not blockers, tracked): - Z1-002 generated-server HTTPS — needs nginx wildcard subdomain TLS - Z1-003 docker image cleanup cron - Z2-001 v2 — real sandbox runtime (multi-week refactor) - Z3-002 rawBody-per-request memory — branch on webhook path only - Z5-001 multi-user org RBAC for billing — gated on Team feature - Email sender integration (Resend) — launch blocker Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 18:02:59 +02:00
Marco Sadjadi	bc174c1302	feat: tiered LLM (GLM free / Claude paid) + rate limits + quota enforcement All checks were successful Deploy to Production / deploy (push) Successful in 53s Details The free tier was hemorrhaging Anthropic cost with no abuse cap (no rate limit on /preview, Opus default in the build worker, 5-min cache TTL that made cache-miss the common case). This switches free users to GLM, paid users to Claude tiers, and tightens every leak found in the audit. Backend: - @bmm/llm: GLM provider via Zhipu's OpenAI-compatible endpoint, pickPreviewModel + pickBuildModel helpers, plan-aware ModelChoice - preview-cache TTL 5min -> 24h (kills the cache-miss path) - /v1/servers/preview: picks model from caller's plan, returns model name to UI - /v1/servers POST: enforces SERVER_LIMITS per plan (402), rate-limits builds - daily rate-limit on preview (5/40/150/1000) and build (3/20/100/500) - /v1/auth/me returns plan so the wizard can show the right model name - generator worker: GLM default, Anthropic Sonnet fallback if GLM errors Frontend: - Wizard fetches plan, shows "<model> is drafting the tool spec" pre-emptively, upgrade hint for hobby users, friendly errors for 402 / 429 - Pricing page: AI-model line per tier (Open-tier / Haiku / Sonnet / Opus), Team €149 -> €199, Enterprise €499 -> €999, daily-preview limit per tier - Privacy + Security: explicit subprocessor disclosure for Anthropic (US) / Zhipu (CN) and which tier uses which Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 23:50:00 +02:00
Marco Sadjadi	e198d44e1e	fix(preview): stop spec generation timing out behind the edge proxy All checks were successful Deploy to Production / deploy (push) Successful in 50s Details The /v1/servers/preview route ran claude-opus-4-7 synchronously; full spec generation routinely exceeded Cloudflare's ~100s proxy cap, so the browser received a headerless 524 and reported it as a CORS failure. - preview now uses claude-sonnet-4-6 with a 45s per-attempt timeout and one retry — comfortably inside the proxy budget - generateSpec maps an exhausted timeout to SpecTimeoutError; the route returns a clean 504 (with CORS headers) instead of a stalled connection - analyze step: live elapsed-seconds counter as freeze-proof, plus a reduced-motion exception so the loading spinner keeps spinning (a status indicator, which WCAG exempts from reduced-motion) - textarea resize grip restyled to dark theme (light hatch on dark square) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 23:52:48 +02:00
Marco Sadjadi	bb0d9c2cda	feat(llm): extract Claude SYSTEM_PROMPT + generateSpec into shared @bmm/llm package	2026-05-19 18:05:31 +02:00

13 Commits