2026-05-19 18:05:31 +02:00
|
|
|
|
import Anthropic from '@anthropic-ai/sdk';
|
|
|
|
|
|
import { GeneratorSpec, type GeneratorSpec as GeneratorSpecT } from '@bmm/types';
|
|
|
|
|
|
|
|
|
|
|
|
export const SYSTEM_PROMPT = `You generate production-grade MCP server specifications as STRICT JSON.
|
|
|
|
|
|
|
|
|
|
|
|
Output ONE JSON object (no markdown, no prose, no code fences) with this exact shape:
|
|
|
|
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
"name": "human-readable server name (max 128 chars)",
|
|
|
|
|
|
"description": "1-2 sentence purpose",
|
|
|
|
|
|
"tools": [
|
|
|
|
|
|
{
|
|
|
|
|
|
"name": "snake_case_tool_name",
|
|
|
|
|
|
"description": "what the AI client sees — single sentence, clear",
|
|
|
|
|
|
"inputSchema": {
|
|
|
|
|
|
"param_name": { "type": "string|number|boolean|array|object", "description": "...", "required": true }
|
|
|
|
|
|
},
|
|
|
|
|
|
"implementation": "ASYNC TypeScript body. Receives {args} pre-validated. Must return MCP content blocks: { content: [{ type: 'text', text: '...' }] }. Use process.env.SECRET_NAME for secrets. NEVER use eval/Function/child_process. Use globalThis.fetch for HTTP. Wrap external calls in try/catch and return { content: [{ type: 'text', text: 'Error: ...' }], isError: true } on failure."
|
|
|
|
|
|
}
|
|
|
|
|
|
],
|
|
|
|
|
|
"resources": [],
|
|
|
|
|
|
"prompts": [],
|
|
|
|
|
|
"requiredSecrets": ["UPPER_SNAKE_CASE"],
|
|
|
|
|
|
"scopes": ["mcp:read"],
|
|
|
|
|
|
"dependencies": {}
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
Rules:
|
|
|
|
|
|
- Tools are idempotent unless the description explicitly says destructive.
|
|
|
|
|
|
- Validate all string inputs before use.
|
|
|
|
|
|
- For databases: parameterized queries only (use the 'pg' library with $1 placeholders).
|
|
|
|
|
|
- For HTTP APIs: globalThis.fetch with explicit timeout via AbortSignal.timeout(10000).
|
|
|
|
|
|
- Never hardcode credentials; declare them under requiredSecrets and read via process.env.
|
|
|
|
|
|
- Keep tool implementations under 5000 characters.
|
|
|
|
|
|
- Do not include "import" statements in implementations — the runtime injects fetch, pg, etc.
|
|
|
|
|
|
|
|
|
|
|
|
Return JSON only. No explanation.`;
|
|
|
|
|
|
|
security: sovereign-audit Phase 2 fixes — trustProxy, Docker hardening, banned-pattern overhaul
Five confirmed findings from the sovereign-audit pass, ordered by severity:
Z3-001 CRITICAL — Fastify now trustProxy:true so req.ip resolves to the
real visitor IP via X-Forwarded-For instead of always being the nginx /
docker-bridge peer. Every per-IP rate-limit in the codebase was silently
collapsed into one global counter; this restores them.
Z1-001 CRITICAL — runner container hardening flags (--read-only,
--cap-drop=ALL, --security-opt=no-new-privileges:true, --pids-limit=100,
--memory=512m, --cpus=0.5, tmpfs /tmp) were sitting commented-out as a
TODO despite /security promising them. Now applied unconditionally on
production/staging; opt-out flag RUNNER_DISABLE_HARDENING=1 for Win-dev.
Z2-001 + Z2-002 CRITICAL / MEDIUM — banned-pattern blacklist tightened
(Function(...) without `new`, process.binding, process.dlopen,
.constructor.constructor, _load, vm.runIn*Context, globalThis['..'],
"system prompt override"). scanForInjection now also walks tool.name and
every inputSchema property description, not only implementation +
description — closes the prompt-injection-into-AI-client surface that
downstream clients (Claude Desktop, Cursor) read verbatim. The duplicate
BANNED_PATTERNS in apps/api/src/routes/servers.ts deleted in favour of
the single shared scanForInjection export from @bmm/llm.
Z4-001 HIGH — /v1/auth/magic-link gained the two-axis daily rate-limit
the SMS endpoint already had: 10/IP/day + 5/email/day. Combined with the
trustProxy fix above these are now real per-visitor limits.
Z4-002 MEDIUM — magic-link callback URL no longer printed to stdout in
production. In dev it still prints (so devs can click the link); in
production we log only "issued, URL withheld" and a loud error if no
email sender is wired (Resend integration is the actual launch
blocker — left as a TODO).
Z6-001 MEDIUM — /v1/builds/:id/stream WebSocket now refuses cross-origin
upgrades. SameSite=Lax already mitigates in modern browsers; this is the
defense-in-depth against browser bugs and non-browser clients.
FALSE POSITIVES dismissed: slug path-traversal (schema regex
^[a-z][a-z0-9-]*$ in @bmm/types catches it); session-after-promote
(getSession re-fetches isAdmin from DB on every request).
DEFERRED (not blockers, tracked):
- Z1-002 generated-server HTTPS — needs nginx wildcard subdomain TLS
- Z1-003 docker image cleanup cron
- Z2-001 v2 — real sandbox runtime (multi-week refactor)
- Z3-002 rawBody-per-request memory — branch on webhook path only
- Z5-001 multi-user org RBAC for billing — gated on Team feature
- Email sender integration (Resend) — launch blocker
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 18:02:59 +02:00
|
|
|
|
// Regex blacklist — explicitly NOT a security boundary, just an early-warning
|
|
|
|
|
|
// for obviously-dangerous LLM output. The real defence is the Docker
|
|
|
|
|
|
// hardening in apps/generator/src/lib/deploy.ts (--cap-drop=ALL etc.). A
|
|
|
|
|
|
// determined attacker can bypass any of these with string concatenation
|
|
|
|
|
|
// (`'chi'+'ld_process'`) or alternate APIs — that's why container isolation
|
|
|
|
|
|
// has to hold even when this fails.
|
security: sovereign-audit Pass-2 fixes — auth-lib, oauth, templates
Six confirmed findings closed (3 MEDIUM, 3 LOW). Tier-1 surfaces from
Pass-1 re-verified non-regressed; this pass deepened the audit on the
auth library, OAuth issuer, and template marketplace.
Za-002 MEDIUM (scrypt cost) — bump SCRYPT_N from 2^14 → 2^17 (131072)
matching current OWASP guidance for password hashing in 2026. Hash
format embeds N (`scrypt$N$salt$hash`), so the existing admin
password at the old cost still verifies — backward-compatible. Also
added explicit maxmem ceilings since Node's default (~32MiB) is
insufficient for the new N.
Za-003 MEDIUM (single-use race) — consumeMagicLink was SELECT-then-
UPDATE; two parallel redemptions could both win and mint two
sessions from the same token. Now uses the same atomic
`UPDATE … WHERE id = ? AND consumedAt IS NULL RETURNING id` pattern
/oauth/token already had — loser of the race gets
invalid_or_expired_token.
Za-004 LOW (membership ordering) — `.orderBy(memberships.createdAt)`
added so when org-invites eventually let a user belong to multiple
orgs, the same one wins every login instead of insertion-order
roulette. Latent-bug pre-empt.
Zb-002 LOW (OAuth register spam) — /oauth/register now per-IP daily
rate-limited at 20/day (well above any legitimate MCP-client
bootstrap pattern). Prevents DB-row spam.
Zc-001 MEDIUM (banned-pattern drift) — three separate copies of
BANNED_PATTERNS had drifted apart. The publish-time scanner in
templates.ts was MISSING the 7 new patterns added in Pass-1
(process.binding, dlopen, .constructor.constructor, vm.runIn*,
globalThis['..']). Single source of truth in @bmm/llm now exports
SHARED_BANNED_PATTERNS; templates.ts composes PUBLISH_BANNED_PATTERNS
= SHARED ∪ code-only-extras (dynamic import, fs.rm, setTimeout-with-
string, process.kill, jailbreak markers).
Zc-002 LOW (N+1) — /v1/templates list was issuing one COUNT(*) per
template (101 queries for a 100-row page). Now one grouped query
with templateId GROUP BY, merged in JS. p95 doesn't degrade with
marketplace growth.
DEFERRED (documented, scoped for next sprint):
Za-001 HIGH — Account takeover via cross-provider email lookup.
Requires schema change (users.primaryProvider). Mitigation in
/settings/account banner planned.
Zb-001 MEDIUM — /oauth/token refresh_token grant: advertised in
AS metadata but unsupported_grant_type. Either implement (~40
LOC) or strip from metadata.
Zc-003 LOW — Admin takedown partial-failure consistency.
Zd-001 IMPROVE — DEK cache invalidation across replicas (single-
instance today).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 18:15:54 +02:00
|
|
|
|
//
|
|
|
|
|
|
// Exported so the publish-time template scan in apps/api/src/routes/templates
|
|
|
|
|
|
// can reuse it instead of maintaining a parallel list that drifts. (Zc-001.)
|
|
|
|
|
|
export const SHARED_BANNED_PATTERNS: readonly RegExp[] = [
|
2026-05-19 18:05:31 +02:00
|
|
|
|
/\beval\s*\(/,
|
|
|
|
|
|
/\bnew\s+Function\s*\(/,
|
security: sovereign-audit Phase 2 fixes — trustProxy, Docker hardening, banned-pattern overhaul
Five confirmed findings from the sovereign-audit pass, ordered by severity:
Z3-001 CRITICAL — Fastify now trustProxy:true so req.ip resolves to the
real visitor IP via X-Forwarded-For instead of always being the nginx /
docker-bridge peer. Every per-IP rate-limit in the codebase was silently
collapsed into one global counter; this restores them.
Z1-001 CRITICAL — runner container hardening flags (--read-only,
--cap-drop=ALL, --security-opt=no-new-privileges:true, --pids-limit=100,
--memory=512m, --cpus=0.5, tmpfs /tmp) were sitting commented-out as a
TODO despite /security promising them. Now applied unconditionally on
production/staging; opt-out flag RUNNER_DISABLE_HARDENING=1 for Win-dev.
Z2-001 + Z2-002 CRITICAL / MEDIUM — banned-pattern blacklist tightened
(Function(...) without `new`, process.binding, process.dlopen,
.constructor.constructor, _load, vm.runIn*Context, globalThis['..'],
"system prompt override"). scanForInjection now also walks tool.name and
every inputSchema property description, not only implementation +
description — closes the prompt-injection-into-AI-client surface that
downstream clients (Claude Desktop, Cursor) read verbatim. The duplicate
BANNED_PATTERNS in apps/api/src/routes/servers.ts deleted in favour of
the single shared scanForInjection export from @bmm/llm.
Z4-001 HIGH — /v1/auth/magic-link gained the two-axis daily rate-limit
the SMS endpoint already had: 10/IP/day + 5/email/day. Combined with the
trustProxy fix above these are now real per-visitor limits.
Z4-002 MEDIUM — magic-link callback URL no longer printed to stdout in
production. In dev it still prints (so devs can click the link); in
production we log only "issued, URL withheld" and a loud error if no
email sender is wired (Resend integration is the actual launch
blocker — left as a TODO).
Z6-001 MEDIUM — /v1/builds/:id/stream WebSocket now refuses cross-origin
upgrades. SameSite=Lax already mitigates in modern browsers; this is the
defense-in-depth against browser bugs and non-browser clients.
FALSE POSITIVES dismissed: slug path-traversal (schema regex
^[a-z][a-z0-9-]*$ in @bmm/types catches it); session-after-promote
(getSession re-fetches isAdmin from DB on every request).
DEFERRED (not blockers, tracked):
- Z1-002 generated-server HTTPS — needs nginx wildcard subdomain TLS
- Z1-003 docker image cleanup cron
- Z2-001 v2 — real sandbox runtime (multi-week refactor)
- Z3-002 rawBody-per-request memory — branch on webhook path only
- Z5-001 multi-user org RBAC for billing — gated on Team feature
- Email sender integration (Resend) — launch blocker
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 18:02:59 +02:00
|
|
|
|
/\bFunction\s*\(\s*['"`]/, // Function('...') without `new`
|
2026-05-19 18:05:31 +02:00
|
|
|
|
/\brequire\s*\(\s*['"]child_process['"]/,
|
|
|
|
|
|
/\bchild_process\b/,
|
security: sovereign-audit Phase 2 fixes — trustProxy, Docker hardening, banned-pattern overhaul
Five confirmed findings from the sovereign-audit pass, ordered by severity:
Z3-001 CRITICAL — Fastify now trustProxy:true so req.ip resolves to the
real visitor IP via X-Forwarded-For instead of always being the nginx /
docker-bridge peer. Every per-IP rate-limit in the codebase was silently
collapsed into one global counter; this restores them.
Z1-001 CRITICAL — runner container hardening flags (--read-only,
--cap-drop=ALL, --security-opt=no-new-privileges:true, --pids-limit=100,
--memory=512m, --cpus=0.5, tmpfs /tmp) were sitting commented-out as a
TODO despite /security promising them. Now applied unconditionally on
production/staging; opt-out flag RUNNER_DISABLE_HARDENING=1 for Win-dev.
Z2-001 + Z2-002 CRITICAL / MEDIUM — banned-pattern blacklist tightened
(Function(...) without `new`, process.binding, process.dlopen,
.constructor.constructor, _load, vm.runIn*Context, globalThis['..'],
"system prompt override"). scanForInjection now also walks tool.name and
every inputSchema property description, not only implementation +
description — closes the prompt-injection-into-AI-client surface that
downstream clients (Claude Desktop, Cursor) read verbatim. The duplicate
BANNED_PATTERNS in apps/api/src/routes/servers.ts deleted in favour of
the single shared scanForInjection export from @bmm/llm.
Z4-001 HIGH — /v1/auth/magic-link gained the two-axis daily rate-limit
the SMS endpoint already had: 10/IP/day + 5/email/day. Combined with the
trustProxy fix above these are now real per-visitor limits.
Z4-002 MEDIUM — magic-link callback URL no longer printed to stdout in
production. In dev it still prints (so devs can click the link); in
production we log only "issued, URL withheld" and a loud error if no
email sender is wired (Resend integration is the actual launch
blocker — left as a TODO).
Z6-001 MEDIUM — /v1/builds/:id/stream WebSocket now refuses cross-origin
upgrades. SameSite=Lax already mitigates in modern browsers; this is the
defense-in-depth against browser bugs and non-browser clients.
FALSE POSITIVES dismissed: slug path-traversal (schema regex
^[a-z][a-z0-9-]*$ in @bmm/types catches it); session-after-promote
(getSession re-fetches isAdmin from DB on every request).
DEFERRED (not blockers, tracked):
- Z1-002 generated-server HTTPS — needs nginx wildcard subdomain TLS
- Z1-003 docker image cleanup cron
- Z2-001 v2 — real sandbox runtime (multi-week refactor)
- Z3-002 rawBody-per-request memory — branch on webhook path only
- Z5-001 multi-user org RBAC for billing — gated on Team feature
- Email sender integration (Resend) — launch blocker
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 18:02:59 +02:00
|
|
|
|
/\bprocess\.binding\b/,
|
|
|
|
|
|
/\bprocess\.dlopen\b/,
|
|
|
|
|
|
/\.constructor\s*\.\s*constructor\b/, // [].constructor.constructor('...')
|
|
|
|
|
|
/\b_load\s*\(/,
|
|
|
|
|
|
/\bvm\.runIn(This|New)Context\b/,
|
|
|
|
|
|
/globalThis\s*\[\s*['"`]/, // globalThis['Fun'+'ction']
|
2026-05-19 18:05:31 +02:00
|
|
|
|
/ignore\s+previous\s+instructions/i,
|
|
|
|
|
|
/disregard\s+(the\s+)?(above|previous)/i,
|
security: sovereign-audit Phase 2 fixes — trustProxy, Docker hardening, banned-pattern overhaul
Five confirmed findings from the sovereign-audit pass, ordered by severity:
Z3-001 CRITICAL — Fastify now trustProxy:true so req.ip resolves to the
real visitor IP via X-Forwarded-For instead of always being the nginx /
docker-bridge peer. Every per-IP rate-limit in the codebase was silently
collapsed into one global counter; this restores them.
Z1-001 CRITICAL — runner container hardening flags (--read-only,
--cap-drop=ALL, --security-opt=no-new-privileges:true, --pids-limit=100,
--memory=512m, --cpus=0.5, tmpfs /tmp) were sitting commented-out as a
TODO despite /security promising them. Now applied unconditionally on
production/staging; opt-out flag RUNNER_DISABLE_HARDENING=1 for Win-dev.
Z2-001 + Z2-002 CRITICAL / MEDIUM — banned-pattern blacklist tightened
(Function(...) without `new`, process.binding, process.dlopen,
.constructor.constructor, _load, vm.runIn*Context, globalThis['..'],
"system prompt override"). scanForInjection now also walks tool.name and
every inputSchema property description, not only implementation +
description — closes the prompt-injection-into-AI-client surface that
downstream clients (Claude Desktop, Cursor) read verbatim. The duplicate
BANNED_PATTERNS in apps/api/src/routes/servers.ts deleted in favour of
the single shared scanForInjection export from @bmm/llm.
Z4-001 HIGH — /v1/auth/magic-link gained the two-axis daily rate-limit
the SMS endpoint already had: 10/IP/day + 5/email/day. Combined with the
trustProxy fix above these are now real per-visitor limits.
Z4-002 MEDIUM — magic-link callback URL no longer printed to stdout in
production. In dev it still prints (so devs can click the link); in
production we log only "issued, URL withheld" and a loud error if no
email sender is wired (Resend integration is the actual launch
blocker — left as a TODO).
Z6-001 MEDIUM — /v1/builds/:id/stream WebSocket now refuses cross-origin
upgrades. SameSite=Lax already mitigates in modern browsers; this is the
defense-in-depth against browser bugs and non-browser clients.
FALSE POSITIVES dismissed: slug path-traversal (schema regex
^[a-z][a-z0-9-]*$ in @bmm/types catches it); session-after-promote
(getSession re-fetches isAdmin from DB on every request).
DEFERRED (not blockers, tracked):
- Z1-002 generated-server HTTPS — needs nginx wildcard subdomain TLS
- Z1-003 docker image cleanup cron
- Z2-001 v2 — real sandbox runtime (multi-week refactor)
- Z3-002 rawBody-per-request memory — branch on webhook path only
- Z5-001 multi-user org RBAC for billing — gated on Team feature
- Email sender integration (Resend) — launch blocker
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 18:02:59 +02:00
|
|
|
|
/system\s+prompt\s+override/i,
|
2026-05-19 18:05:31 +02:00
|
|
|
|
];
|
|
|
|
|
|
|
feat: tiered LLM (GLM free / Claude paid) + rate limits + quota enforcement
The free tier was hemorrhaging Anthropic cost with no abuse cap (no rate
limit on /preview, Opus default in the build worker, 5-min cache TTL that
made cache-miss the common case). This switches free users to GLM, paid
users to Claude tiers, and tightens every leak found in the audit.
Backend:
- @bmm/llm: GLM provider via Zhipu's OpenAI-compatible endpoint, pickPreviewModel
+ pickBuildModel helpers, plan-aware ModelChoice
- preview-cache TTL 5min -> 24h (kills the cache-miss path)
- /v1/servers/preview: picks model from caller's plan, returns model name to UI
- /v1/servers POST: enforces SERVER_LIMITS per plan (402), rate-limits builds
- daily rate-limit on preview (5/40/150/1000) and build (3/20/100/500)
- /v1/auth/me returns plan so the wizard can show the right model name
- generator worker: GLM default, Anthropic Sonnet fallback if GLM errors
Frontend:
- Wizard fetches plan, shows "<model> is drafting the tool spec" pre-emptively,
upgrade hint for hobby users, friendly errors for 402 / 429
- Pricing page: AI-model line per tier (Open-tier / Haiku / Sonnet / Opus),
Team €149 -> €199, Enterprise €499 -> €999, daily-preview limit per tier
- Privacy + Security: explicit subprocessor disclosure for Anthropic (US) /
Zhipu (CN) and which tier uses which
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 23:50:00 +02:00
|
|
|
|
// ──────────────────────────────────────────────────────────────────────────
|
|
|
|
|
|
// Plan-aware model selection
|
|
|
|
|
|
// ──────────────────────────────────────────────────────────────────────────
|
|
|
|
|
|
|
|
|
|
|
|
export type Plan = 'hobby' | 'pro' | 'team' | 'enterprise';
|
|
|
|
|
|
export type Purpose = 'preview' | 'build';
|
|
|
|
|
|
export type Provider = 'anthropic' | 'glm';
|
|
|
|
|
|
export type DisplayBadge = 'open-tier' | 'claude-haiku' | 'claude-sonnet' | 'claude-opus';
|
|
|
|
|
|
|
|
|
|
|
|
export interface ModelChoice {
|
|
|
|
|
|
provider: Provider;
|
|
|
|
|
|
model: string;
|
|
|
|
|
|
maxTokens: number;
|
|
|
|
|
|
timeoutMs: number;
|
|
|
|
|
|
/** User-facing model name shown in the wizard + previews. */
|
|
|
|
|
|
displayName: string;
|
|
|
|
|
|
displayBadge: DisplayBadge;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
|
* Preview runs synchronously inside an HTTP request behind Cloudflare's
|
|
|
|
|
|
* ~100s edge cap. Each tier's (model + max_tokens + timeout) is bounded to
|
|
|
|
|
|
* fit. Hobby uses GLM as the cost lever; paid tiers escalate to Claude — the
|
|
|
|
|
|
* visible quality/speed jump *is* the upgrade pitch.
|
|
|
|
|
|
*
|
2026-05-28 18:51:51 +02:00
|
|
|
|
* Measured token rates: glm-4-plus ~58 tok/s · Claude Haiku 4.5 ~200 tok/s ·
|
|
|
|
|
|
* Claude Sonnet 4.6 ~80 tok/s. A spec is small (<= ~10 tools with short
|
|
|
|
|
|
* descriptions, ~1.5–2.5k output tokens in practice) so we cap maxTokens at
|
|
|
|
|
|
* 4096 — well under the model's hard ceiling and tight enough that even
|
|
|
|
|
|
* Sonnet finishes inside 60s in the worst case (4096 / 80 ≈ 51s). The
|
|
|
|
|
|
* timeouts above 90s buy headroom for cold starts / slow API responses
|
|
|
|
|
|
* while staying clear of Cloudflare's 100s edge cap.
|
feat: tiered LLM (GLM free / Claude paid) + rate limits + quota enforcement
The free tier was hemorrhaging Anthropic cost with no abuse cap (no rate
limit on /preview, Opus default in the build worker, 5-min cache TTL that
made cache-miss the common case). This switches free users to GLM, paid
users to Claude tiers, and tightens every leak found in the audit.
Backend:
- @bmm/llm: GLM provider via Zhipu's OpenAI-compatible endpoint, pickPreviewModel
+ pickBuildModel helpers, plan-aware ModelChoice
- preview-cache TTL 5min -> 24h (kills the cache-miss path)
- /v1/servers/preview: picks model from caller's plan, returns model name to UI
- /v1/servers POST: enforces SERVER_LIMITS per plan (402), rate-limits builds
- daily rate-limit on preview (5/40/150/1000) and build (3/20/100/500)
- /v1/auth/me returns plan so the wizard can show the right model name
- generator worker: GLM default, Anthropic Sonnet fallback if GLM errors
Frontend:
- Wizard fetches plan, shows "<model> is drafting the tool spec" pre-emptively,
upgrade hint for hobby users, friendly errors for 402 / 429
- Pricing page: AI-model line per tier (Open-tier / Haiku / Sonnet / Opus),
Team €149 -> €199, Enterprise €499 -> €999, daily-preview limit per tier
- Privacy + Security: explicit subprocessor disclosure for Anthropic (US) /
Zhipu (CN) and which tier uses which
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 23:50:00 +02:00
|
|
|
|
*/
|
|
|
|
|
|
const PREVIEW_MODELS: Record<Plan, ModelChoice> = {
|
|
|
|
|
|
hobby: {
|
|
|
|
|
|
provider: 'glm',
|
|
|
|
|
|
model: 'glm-4-plus',
|
|
|
|
|
|
maxTokens: 3500,
|
2026-05-28 18:51:51 +02:00
|
|
|
|
timeoutMs: 90_000,
|
feat: tiered LLM (GLM free / Claude paid) + rate limits + quota enforcement
The free tier was hemorrhaging Anthropic cost with no abuse cap (no rate
limit on /preview, Opus default in the build worker, 5-min cache TTL that
made cache-miss the common case). This switches free users to GLM, paid
users to Claude tiers, and tightens every leak found in the audit.
Backend:
- @bmm/llm: GLM provider via Zhipu's OpenAI-compatible endpoint, pickPreviewModel
+ pickBuildModel helpers, plan-aware ModelChoice
- preview-cache TTL 5min -> 24h (kills the cache-miss path)
- /v1/servers/preview: picks model from caller's plan, returns model name to UI
- /v1/servers POST: enforces SERVER_LIMITS per plan (402), rate-limits builds
- daily rate-limit on preview (5/40/150/1000) and build (3/20/100/500)
- /v1/auth/me returns plan so the wizard can show the right model name
- generator worker: GLM default, Anthropic Sonnet fallback if GLM errors
Frontend:
- Wizard fetches plan, shows "<model> is drafting the tool spec" pre-emptively,
upgrade hint for hobby users, friendly errors for 402 / 429
- Pricing page: AI-model line per tier (Open-tier / Haiku / Sonnet / Opus),
Team €149 -> €199, Enterprise €499 -> €999, daily-preview limit per tier
- Privacy + Security: explicit subprocessor disclosure for Anthropic (US) /
Zhipu (CN) and which tier uses which
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 23:50:00 +02:00
|
|
|
|
displayName: 'Open-tier AI',
|
|
|
|
|
|
displayBadge: 'open-tier',
|
|
|
|
|
|
},
|
|
|
|
|
|
pro: {
|
|
|
|
|
|
provider: 'anthropic',
|
|
|
|
|
|
model: 'claude-haiku-4-5-20251001',
|
2026-05-28 18:51:51 +02:00
|
|
|
|
maxTokens: 4096,
|
|
|
|
|
|
timeoutMs: 90_000,
|
feat: tiered LLM (GLM free / Claude paid) + rate limits + quota enforcement
The free tier was hemorrhaging Anthropic cost with no abuse cap (no rate
limit on /preview, Opus default in the build worker, 5-min cache TTL that
made cache-miss the common case). This switches free users to GLM, paid
users to Claude tiers, and tightens every leak found in the audit.
Backend:
- @bmm/llm: GLM provider via Zhipu's OpenAI-compatible endpoint, pickPreviewModel
+ pickBuildModel helpers, plan-aware ModelChoice
- preview-cache TTL 5min -> 24h (kills the cache-miss path)
- /v1/servers/preview: picks model from caller's plan, returns model name to UI
- /v1/servers POST: enforces SERVER_LIMITS per plan (402), rate-limits builds
- daily rate-limit on preview (5/40/150/1000) and build (3/20/100/500)
- /v1/auth/me returns plan so the wizard can show the right model name
- generator worker: GLM default, Anthropic Sonnet fallback if GLM errors
Frontend:
- Wizard fetches plan, shows "<model> is drafting the tool spec" pre-emptively,
upgrade hint for hobby users, friendly errors for 402 / 429
- Pricing page: AI-model line per tier (Open-tier / Haiku / Sonnet / Opus),
Team €149 -> €199, Enterprise €499 -> €999, daily-preview limit per tier
- Privacy + Security: explicit subprocessor disclosure for Anthropic (US) /
Zhipu (CN) and which tier uses which
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 23:50:00 +02:00
|
|
|
|
displayName: 'Claude Haiku 4.5',
|
|
|
|
|
|
displayBadge: 'claude-haiku',
|
|
|
|
|
|
},
|
|
|
|
|
|
team: {
|
|
|
|
|
|
provider: 'anthropic',
|
|
|
|
|
|
model: 'claude-sonnet-4-6',
|
2026-05-28 18:51:51 +02:00
|
|
|
|
maxTokens: 4096,
|
|
|
|
|
|
timeoutMs: 90_000,
|
feat: tiered LLM (GLM free / Claude paid) + rate limits + quota enforcement
The free tier was hemorrhaging Anthropic cost with no abuse cap (no rate
limit on /preview, Opus default in the build worker, 5-min cache TTL that
made cache-miss the common case). This switches free users to GLM, paid
users to Claude tiers, and tightens every leak found in the audit.
Backend:
- @bmm/llm: GLM provider via Zhipu's OpenAI-compatible endpoint, pickPreviewModel
+ pickBuildModel helpers, plan-aware ModelChoice
- preview-cache TTL 5min -> 24h (kills the cache-miss path)
- /v1/servers/preview: picks model from caller's plan, returns model name to UI
- /v1/servers POST: enforces SERVER_LIMITS per plan (402), rate-limits builds
- daily rate-limit on preview (5/40/150/1000) and build (3/20/100/500)
- /v1/auth/me returns plan so the wizard can show the right model name
- generator worker: GLM default, Anthropic Sonnet fallback if GLM errors
Frontend:
- Wizard fetches plan, shows "<model> is drafting the tool spec" pre-emptively,
upgrade hint for hobby users, friendly errors for 402 / 429
- Pricing page: AI-model line per tier (Open-tier / Haiku / Sonnet / Opus),
Team €149 -> €199, Enterprise €499 -> €999, daily-preview limit per tier
- Privacy + Security: explicit subprocessor disclosure for Anthropic (US) /
Zhipu (CN) and which tier uses which
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 23:50:00 +02:00
|
|
|
|
displayName: 'Claude Sonnet 4.6',
|
|
|
|
|
|
displayBadge: 'claude-sonnet',
|
|
|
|
|
|
},
|
|
|
|
|
|
enterprise: {
|
|
|
|
|
|
provider: 'anthropic',
|
|
|
|
|
|
model: 'claude-sonnet-4-6',
|
2026-05-28 18:51:51 +02:00
|
|
|
|
maxTokens: 4096,
|
|
|
|
|
|
timeoutMs: 90_000,
|
feat: tiered LLM (GLM free / Claude paid) + rate limits + quota enforcement
The free tier was hemorrhaging Anthropic cost with no abuse cap (no rate
limit on /preview, Opus default in the build worker, 5-min cache TTL that
made cache-miss the common case). This switches free users to GLM, paid
users to Claude tiers, and tightens every leak found in the audit.
Backend:
- @bmm/llm: GLM provider via Zhipu's OpenAI-compatible endpoint, pickPreviewModel
+ pickBuildModel helpers, plan-aware ModelChoice
- preview-cache TTL 5min -> 24h (kills the cache-miss path)
- /v1/servers/preview: picks model from caller's plan, returns model name to UI
- /v1/servers POST: enforces SERVER_LIMITS per plan (402), rate-limits builds
- daily rate-limit on preview (5/40/150/1000) and build (3/20/100/500)
- /v1/auth/me returns plan so the wizard can show the right model name
- generator worker: GLM default, Anthropic Sonnet fallback if GLM errors
Frontend:
- Wizard fetches plan, shows "<model> is drafting the tool spec" pre-emptively,
upgrade hint for hobby users, friendly errors for 402 / 429
- Pricing page: AI-model line per tier (Open-tier / Haiku / Sonnet / Opus),
Team €149 -> €199, Enterprise €499 -> €999, daily-preview limit per tier
- Privacy + Security: explicit subprocessor disclosure for Anthropic (US) /
Zhipu (CN) and which tier uses which
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 23:50:00 +02:00
|
|
|
|
displayName: 'Claude Sonnet 4.6',
|
|
|
|
|
|
displayBadge: 'claude-sonnet',
|
|
|
|
|
|
},
|
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
|
* Build worker runs async via BullMQ — no proxy timeout. With the 24h preview
|
|
|
|
|
|
* cache TTL cache-misses are rare, so GLM as the default keeps that rare path
|
|
|
|
|
|
* cheap; Enterprise gets Opus as a premium-quality promise.
|
|
|
|
|
|
*/
|
|
|
|
|
|
const BUILD_MODELS: Record<Plan, ModelChoice> = {
|
|
|
|
|
|
hobby: {
|
|
|
|
|
|
provider: 'glm',
|
|
|
|
|
|
model: 'glm-4.5',
|
|
|
|
|
|
maxTokens: 8192,
|
|
|
|
|
|
timeoutMs: 180_000,
|
|
|
|
|
|
displayName: 'Open-tier AI',
|
|
|
|
|
|
displayBadge: 'open-tier',
|
|
|
|
|
|
},
|
|
|
|
|
|
pro: {
|
|
|
|
|
|
provider: 'glm',
|
|
|
|
|
|
model: 'glm-4.5',
|
|
|
|
|
|
maxTokens: 8192,
|
|
|
|
|
|
timeoutMs: 180_000,
|
|
|
|
|
|
displayName: 'Open-tier AI',
|
|
|
|
|
|
displayBadge: 'open-tier',
|
|
|
|
|
|
},
|
|
|
|
|
|
team: {
|
|
|
|
|
|
provider: 'glm',
|
|
|
|
|
|
model: 'glm-4.5',
|
|
|
|
|
|
maxTokens: 8192,
|
|
|
|
|
|
timeoutMs: 180_000,
|
|
|
|
|
|
displayName: 'Open-tier AI',
|
|
|
|
|
|
displayBadge: 'open-tier',
|
|
|
|
|
|
},
|
|
|
|
|
|
enterprise: {
|
|
|
|
|
|
provider: 'anthropic',
|
|
|
|
|
|
model: 'claude-opus-4-7',
|
|
|
|
|
|
maxTokens: 8192,
|
|
|
|
|
|
timeoutMs: 600_000,
|
|
|
|
|
|
displayName: 'Claude Opus 4.7',
|
|
|
|
|
|
displayBadge: 'claude-opus',
|
|
|
|
|
|
},
|
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
|
|
export function pickPreviewModel(plan: Plan): ModelChoice {
|
|
|
|
|
|
return PREVIEW_MODELS[plan];
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
export function pickBuildModel(plan: Plan): ModelChoice {
|
|
|
|
|
|
return BUILD_MODELS[plan];
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// ──────────────────────────────────────────────────────────────────────────
|
|
|
|
|
|
// Generation API
|
|
|
|
|
|
// ──────────────────────────────────────────────────────────────────────────
|
|
|
|
|
|
|
2026-05-19 18:05:31 +02:00
|
|
|
|
export interface GenerationResult {
|
|
|
|
|
|
spec: GeneratorSpecT;
|
feat: tiered LLM (GLM free / Claude paid) + rate limits + quota enforcement
The free tier was hemorrhaging Anthropic cost with no abuse cap (no rate
limit on /preview, Opus default in the build worker, 5-min cache TTL that
made cache-miss the common case). This switches free users to GLM, paid
users to Claude tiers, and tightens every leak found in the audit.
Backend:
- @bmm/llm: GLM provider via Zhipu's OpenAI-compatible endpoint, pickPreviewModel
+ pickBuildModel helpers, plan-aware ModelChoice
- preview-cache TTL 5min -> 24h (kills the cache-miss path)
- /v1/servers/preview: picks model from caller's plan, returns model name to UI
- /v1/servers POST: enforces SERVER_LIMITS per plan (402), rate-limits builds
- daily rate-limit on preview (5/40/150/1000) and build (3/20/100/500)
- /v1/auth/me returns plan so the wizard can show the right model name
- generator worker: GLM default, Anthropic Sonnet fallback if GLM errors
Frontend:
- Wizard fetches plan, shows "<model> is drafting the tool spec" pre-emptively,
upgrade hint for hobby users, friendly errors for 402 / 429
- Pricing page: AI-model line per tier (Open-tier / Haiku / Sonnet / Opus),
Team €149 -> €199, Enterprise €499 -> €999, daily-preview limit per tier
- Privacy + Security: explicit subprocessor disclosure for Anthropic (US) /
Zhipu (CN) and which tier uses which
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 23:50:00 +02:00
|
|
|
|
source: 'claude' | 'glm' | 'mock';
|
2026-05-19 18:05:31 +02:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
export interface GenerateOptions {
|
feat: tiered LLM (GLM free / Claude paid) + rate limits + quota enforcement
The free tier was hemorrhaging Anthropic cost with no abuse cap (no rate
limit on /preview, Opus default in the build worker, 5-min cache TTL that
made cache-miss the common case). This switches free users to GLM, paid
users to Claude tiers, and tightens every leak found in the audit.
Backend:
- @bmm/llm: GLM provider via Zhipu's OpenAI-compatible endpoint, pickPreviewModel
+ pickBuildModel helpers, plan-aware ModelChoice
- preview-cache TTL 5min -> 24h (kills the cache-miss path)
- /v1/servers/preview: picks model from caller's plan, returns model name to UI
- /v1/servers POST: enforces SERVER_LIMITS per plan (402), rate-limits builds
- daily rate-limit on preview (5/40/150/1000) and build (3/20/100/500)
- /v1/auth/me returns plan so the wizard can show the right model name
- generator worker: GLM default, Anthropic Sonnet fallback if GLM errors
Frontend:
- Wizard fetches plan, shows "<model> is drafting the tool spec" pre-emptively,
upgrade hint for hobby users, friendly errors for 402 / 429
- Pricing page: AI-model line per tier (Open-tier / Haiku / Sonnet / Opus),
Team €149 -> €199, Enterprise €499 -> €999, daily-preview limit per tier
- Privacy + Security: explicit subprocessor disclosure for Anthropic (US) /
Zhipu (CN) and which tier uses which
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 23:50:00 +02:00
|
|
|
|
/** 'anthropic' (default) or 'glm'. */
|
|
|
|
|
|
provider?: Provider;
|
|
|
|
|
|
/** Anthropic API key — required if provider === 'anthropic'. */
|
2026-05-19 18:05:31 +02:00
|
|
|
|
apiKey?: string;
|
feat: tiered LLM (GLM free / Claude paid) + rate limits + quota enforcement
The free tier was hemorrhaging Anthropic cost with no abuse cap (no rate
limit on /preview, Opus default in the build worker, 5-min cache TTL that
made cache-miss the common case). This switches free users to GLM, paid
users to Claude tiers, and tightens every leak found in the audit.
Backend:
- @bmm/llm: GLM provider via Zhipu's OpenAI-compatible endpoint, pickPreviewModel
+ pickBuildModel helpers, plan-aware ModelChoice
- preview-cache TTL 5min -> 24h (kills the cache-miss path)
- /v1/servers/preview: picks model from caller's plan, returns model name to UI
- /v1/servers POST: enforces SERVER_LIMITS per plan (402), rate-limits builds
- daily rate-limit on preview (5/40/150/1000) and build (3/20/100/500)
- /v1/auth/me returns plan so the wizard can show the right model name
- generator worker: GLM default, Anthropic Sonnet fallback if GLM errors
Frontend:
- Wizard fetches plan, shows "<model> is drafting the tool spec" pre-emptively,
upgrade hint for hobby users, friendly errors for 402 / 429
- Pricing page: AI-model line per tier (Open-tier / Haiku / Sonnet / Opus),
Team €149 -> €199, Enterprise €499 -> €999, daily-preview limit per tier
- Privacy + Security: explicit subprocessor disclosure for Anthropic (US) /
Zhipu (CN) and which tier uses which
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 23:50:00 +02:00
|
|
|
|
/** Zhipu (GLM) API key — required if provider === 'glm'. */
|
|
|
|
|
|
glmApiKey?: string;
|
2026-05-19 18:05:31 +02:00
|
|
|
|
model?: string;
|
|
|
|
|
|
maxTokens?: number;
|
feat: tiered LLM (GLM free / Claude paid) + rate limits + quota enforcement
The free tier was hemorrhaging Anthropic cost with no abuse cap (no rate
limit on /preview, Opus default in the build worker, 5-min cache TTL that
made cache-miss the common case). This switches free users to GLM, paid
users to Claude tiers, and tightens every leak found in the audit.
Backend:
- @bmm/llm: GLM provider via Zhipu's OpenAI-compatible endpoint, pickPreviewModel
+ pickBuildModel helpers, plan-aware ModelChoice
- preview-cache TTL 5min -> 24h (kills the cache-miss path)
- /v1/servers/preview: picks model from caller's plan, returns model name to UI
- /v1/servers POST: enforces SERVER_LIMITS per plan (402), rate-limits builds
- daily rate-limit on preview (5/40/150/1000) and build (3/20/100/500)
- /v1/auth/me returns plan so the wizard can show the right model name
- generator worker: GLM default, Anthropic Sonnet fallback if GLM errors
Frontend:
- Wizard fetches plan, shows "<model> is drafting the tool spec" pre-emptively,
upgrade hint for hobby users, friendly errors for 402 / 429
- Pricing page: AI-model line per tier (Open-tier / Haiku / Sonnet / Opus),
Team €149 -> €199, Enterprise €499 -> €999, daily-preview limit per tier
- Privacy + Security: explicit subprocessor disclosure for Anthropic (US) /
Zhipu (CN) and which tier uses which
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 23:50:00 +02:00
|
|
|
|
/** Per-attempt request timeout in ms. */
|
2026-05-21 23:52:48 +02:00
|
|
|
|
timeoutMs?: number;
|
feat: tiered LLM (GLM free / Claude paid) + rate limits + quota enforcement
The free tier was hemorrhaging Anthropic cost with no abuse cap (no rate
limit on /preview, Opus default in the build worker, 5-min cache TTL that
made cache-miss the common case). This switches free users to GLM, paid
users to Claude tiers, and tightens every leak found in the audit.
Backend:
- @bmm/llm: GLM provider via Zhipu's OpenAI-compatible endpoint, pickPreviewModel
+ pickBuildModel helpers, plan-aware ModelChoice
- preview-cache TTL 5min -> 24h (kills the cache-miss path)
- /v1/servers/preview: picks model from caller's plan, returns model name to UI
- /v1/servers POST: enforces SERVER_LIMITS per plan (402), rate-limits builds
- daily rate-limit on preview (5/40/150/1000) and build (3/20/100/500)
- /v1/auth/me returns plan so the wizard can show the right model name
- generator worker: GLM default, Anthropic Sonnet fallback if GLM errors
Frontend:
- Wizard fetches plan, shows "<model> is drafting the tool spec" pre-emptively,
upgrade hint for hobby users, friendly errors for 402 / 429
- Pricing page: AI-model line per tier (Open-tier / Haiku / Sonnet / Opus),
Team €149 -> €199, Enterprise €499 -> €999, daily-preview limit per tier
- Privacy + Security: explicit subprocessor disclosure for Anthropic (US) /
Zhipu (CN) and which tier uses which
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 23:50:00 +02:00
|
|
|
|
/** SDK retry count. Anthropic only. */
|
2026-05-21 23:52:48 +02:00
|
|
|
|
maxRetries?: number;
|
2026-05-19 18:05:31 +02:00
|
|
|
|
}
|
|
|
|
|
|
|
2026-05-21 23:52:48 +02:00
|
|
|
|
export async function generateSpec(
|
|
|
|
|
|
prompt: string,
|
|
|
|
|
|
opts: GenerateOptions = {},
|
|
|
|
|
|
): Promise<GenerationResult> {
|
feat: tiered LLM (GLM free / Claude paid) + rate limits + quota enforcement
The free tier was hemorrhaging Anthropic cost with no abuse cap (no rate
limit on /preview, Opus default in the build worker, 5-min cache TTL that
made cache-miss the common case). This switches free users to GLM, paid
users to Claude tiers, and tightens every leak found in the audit.
Backend:
- @bmm/llm: GLM provider via Zhipu's OpenAI-compatible endpoint, pickPreviewModel
+ pickBuildModel helpers, plan-aware ModelChoice
- preview-cache TTL 5min -> 24h (kills the cache-miss path)
- /v1/servers/preview: picks model from caller's plan, returns model name to UI
- /v1/servers POST: enforces SERVER_LIMITS per plan (402), rate-limits builds
- daily rate-limit on preview (5/40/150/1000) and build (3/20/100/500)
- /v1/auth/me returns plan so the wizard can show the right model name
- generator worker: GLM default, Anthropic Sonnet fallback if GLM errors
Frontend:
- Wizard fetches plan, shows "<model> is drafting the tool spec" pre-emptively,
upgrade hint for hobby users, friendly errors for 402 / 429
- Pricing page: AI-model line per tier (Open-tier / Haiku / Sonnet / Opus),
Team €149 -> €199, Enterprise €499 -> €999, daily-preview limit per tier
- Privacy + Security: explicit subprocessor disclosure for Anthropic (US) /
Zhipu (CN) and which tier uses which
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 23:50:00 +02:00
|
|
|
|
const provider = opts.provider ?? 'anthropic';
|
|
|
|
|
|
|
|
|
|
|
|
if (provider === 'glm') {
|
|
|
|
|
|
if (!opts.glmApiKey) return { spec: mockSpec(prompt), source: 'mock' };
|
|
|
|
|
|
return generateWithGlm(prompt, {
|
|
|
|
|
|
apiKey: opts.glmApiKey,
|
|
|
|
|
|
model: opts.model ?? 'glm-4-plus',
|
|
|
|
|
|
maxTokens: opts.maxTokens ?? 4096,
|
|
|
|
|
|
timeoutMs: opts.timeoutMs,
|
|
|
|
|
|
});
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2026-05-19 18:05:31 +02:00
|
|
|
|
if (!opts.apiKey) {
|
|
|
|
|
|
return { spec: mockSpec(prompt), source: 'mock' };
|
|
|
|
|
|
}
|
feat: tiered LLM (GLM free / Claude paid) + rate limits + quota enforcement
The free tier was hemorrhaging Anthropic cost with no abuse cap (no rate
limit on /preview, Opus default in the build worker, 5-min cache TTL that
made cache-miss the common case). This switches free users to GLM, paid
users to Claude tiers, and tightens every leak found in the audit.
Backend:
- @bmm/llm: GLM provider via Zhipu's OpenAI-compatible endpoint, pickPreviewModel
+ pickBuildModel helpers, plan-aware ModelChoice
- preview-cache TTL 5min -> 24h (kills the cache-miss path)
- /v1/servers/preview: picks model from caller's plan, returns model name to UI
- /v1/servers POST: enforces SERVER_LIMITS per plan (402), rate-limits builds
- daily rate-limit on preview (5/40/150/1000) and build (3/20/100/500)
- /v1/auth/me returns plan so the wizard can show the right model name
- generator worker: GLM default, Anthropic Sonnet fallback if GLM errors
Frontend:
- Wizard fetches plan, shows "<model> is drafting the tool spec" pre-emptively,
upgrade hint for hobby users, friendly errors for 402 / 429
- Pricing page: AI-model line per tier (Open-tier / Haiku / Sonnet / Opus),
Team €149 -> €199, Enterprise €499 -> €999, daily-preview limit per tier
- Privacy + Security: explicit subprocessor disclosure for Anthropic (US) /
Zhipu (CN) and which tier uses which
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 23:50:00 +02:00
|
|
|
|
return generateWithAnthropic(prompt, {
|
|
|
|
|
|
apiKey: opts.apiKey,
|
|
|
|
|
|
model: opts.model ?? 'claude-opus-4-7',
|
|
|
|
|
|
maxTokens: opts.maxTokens ?? 8192,
|
|
|
|
|
|
timeoutMs: opts.timeoutMs,
|
|
|
|
|
|
maxRetries: opts.maxRetries,
|
|
|
|
|
|
});
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
async function generateWithAnthropic(
|
|
|
|
|
|
prompt: string,
|
|
|
|
|
|
opts: {
|
|
|
|
|
|
apiKey: string;
|
|
|
|
|
|
model: string;
|
|
|
|
|
|
maxTokens: number;
|
|
|
|
|
|
timeoutMs?: number;
|
|
|
|
|
|
maxRetries?: number;
|
|
|
|
|
|
},
|
|
|
|
|
|
): Promise<GenerationResult> {
|
2026-05-19 18:05:31 +02:00
|
|
|
|
const client = new Anthropic({ apiKey: opts.apiKey });
|
2026-05-21 23:52:48 +02:00
|
|
|
|
const requestOptions: { timeout?: number; maxRetries?: number } = {};
|
|
|
|
|
|
if (opts.timeoutMs !== undefined) requestOptions.timeout = opts.timeoutMs;
|
|
|
|
|
|
if (opts.maxRetries !== undefined) requestOptions.maxRetries = opts.maxRetries;
|
|
|
|
|
|
|
|
|
|
|
|
const response = await client.messages
|
|
|
|
|
|
.create(
|
|
|
|
|
|
{
|
feat: tiered LLM (GLM free / Claude paid) + rate limits + quota enforcement
The free tier was hemorrhaging Anthropic cost with no abuse cap (no rate
limit on /preview, Opus default in the build worker, 5-min cache TTL that
made cache-miss the common case). This switches free users to GLM, paid
users to Claude tiers, and tightens every leak found in the audit.
Backend:
- @bmm/llm: GLM provider via Zhipu's OpenAI-compatible endpoint, pickPreviewModel
+ pickBuildModel helpers, plan-aware ModelChoice
- preview-cache TTL 5min -> 24h (kills the cache-miss path)
- /v1/servers/preview: picks model from caller's plan, returns model name to UI
- /v1/servers POST: enforces SERVER_LIMITS per plan (402), rate-limits builds
- daily rate-limit on preview (5/40/150/1000) and build (3/20/100/500)
- /v1/auth/me returns plan so the wizard can show the right model name
- generator worker: GLM default, Anthropic Sonnet fallback if GLM errors
Frontend:
- Wizard fetches plan, shows "<model> is drafting the tool spec" pre-emptively,
upgrade hint for hobby users, friendly errors for 402 / 429
- Pricing page: AI-model line per tier (Open-tier / Haiku / Sonnet / Opus),
Team €149 -> €199, Enterprise €499 -> €999, daily-preview limit per tier
- Privacy + Security: explicit subprocessor disclosure for Anthropic (US) /
Zhipu (CN) and which tier uses which
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 23:50:00 +02:00
|
|
|
|
model: opts.model,
|
|
|
|
|
|
max_tokens: opts.maxTokens,
|
2026-05-21 23:52:48 +02:00
|
|
|
|
system: SYSTEM_PROMPT,
|
|
|
|
|
|
messages: [{ role: 'user', content: prompt }],
|
|
|
|
|
|
},
|
|
|
|
|
|
requestOptions,
|
|
|
|
|
|
)
|
|
|
|
|
|
.catch((err: unknown) => {
|
|
|
|
|
|
if (err instanceof Anthropic.APIConnectionTimeoutError) {
|
|
|
|
|
|
throw new SpecTimeoutError('spec generation exceeded the time budget');
|
|
|
|
|
|
}
|
|
|
|
|
|
throw err;
|
|
|
|
|
|
});
|
feat: tiered LLM (GLM free / Claude paid) + rate limits + quota enforcement
The free tier was hemorrhaging Anthropic cost with no abuse cap (no rate
limit on /preview, Opus default in the build worker, 5-min cache TTL that
made cache-miss the common case). This switches free users to GLM, paid
users to Claude tiers, and tightens every leak found in the audit.
Backend:
- @bmm/llm: GLM provider via Zhipu's OpenAI-compatible endpoint, pickPreviewModel
+ pickBuildModel helpers, plan-aware ModelChoice
- preview-cache TTL 5min -> 24h (kills the cache-miss path)
- /v1/servers/preview: picks model from caller's plan, returns model name to UI
- /v1/servers POST: enforces SERVER_LIMITS per plan (402), rate-limits builds
- daily rate-limit on preview (5/40/150/1000) and build (3/20/100/500)
- /v1/auth/me returns plan so the wizard can show the right model name
- generator worker: GLM default, Anthropic Sonnet fallback if GLM errors
Frontend:
- Wizard fetches plan, shows "<model> is drafting the tool spec" pre-emptively,
upgrade hint for hobby users, friendly errors for 402 / 429
- Pricing page: AI-model line per tier (Open-tier / Haiku / Sonnet / Opus),
Team €149 -> €199, Enterprise €499 -> €999, daily-preview limit per tier
- Privacy + Security: explicit subprocessor disclosure for Anthropic (US) /
Zhipu (CN) and which tier uses which
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 23:50:00 +02:00
|
|
|
|
|
2026-05-19 18:05:31 +02:00
|
|
|
|
const text = response.content
|
|
|
|
|
|
.filter((b): b is { type: 'text'; text: string } => b.type === 'text')
|
|
|
|
|
|
.map((b) => b.text)
|
|
|
|
|
|
.join('');
|
|
|
|
|
|
const json = extractJson(text);
|
|
|
|
|
|
const parsed = GeneratorSpec.safeParse(json);
|
feat: tiered LLM (GLM free / Claude paid) + rate limits + quota enforcement
The free tier was hemorrhaging Anthropic cost with no abuse cap (no rate
limit on /preview, Opus default in the build worker, 5-min cache TTL that
made cache-miss the common case). This switches free users to GLM, paid
users to Claude tiers, and tightens every leak found in the audit.
Backend:
- @bmm/llm: GLM provider via Zhipu's OpenAI-compatible endpoint, pickPreviewModel
+ pickBuildModel helpers, plan-aware ModelChoice
- preview-cache TTL 5min -> 24h (kills the cache-miss path)
- /v1/servers/preview: picks model from caller's plan, returns model name to UI
- /v1/servers POST: enforces SERVER_LIMITS per plan (402), rate-limits builds
- daily rate-limit on preview (5/40/150/1000) and build (3/20/100/500)
- /v1/auth/me returns plan so the wizard can show the right model name
- generator worker: GLM default, Anthropic Sonnet fallback if GLM errors
Frontend:
- Wizard fetches plan, shows "<model> is drafting the tool spec" pre-emptively,
upgrade hint for hobby users, friendly errors for 402 / 429
- Pricing page: AI-model line per tier (Open-tier / Haiku / Sonnet / Opus),
Team €149 -> €199, Enterprise €499 -> €999, daily-preview limit per tier
- Privacy + Security: explicit subprocessor disclosure for Anthropic (US) /
Zhipu (CN) and which tier uses which
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 23:50:00 +02:00
|
|
|
|
if (!parsed.success) throw new SpecValidationError(parsed.error.message);
|
2026-05-19 18:05:31 +02:00
|
|
|
|
scanForInjection(parsed.data);
|
|
|
|
|
|
return { spec: parsed.data, source: 'claude' };
|
|
|
|
|
|
}
|
|
|
|
|
|
|
feat: tiered LLM (GLM free / Claude paid) + rate limits + quota enforcement
The free tier was hemorrhaging Anthropic cost with no abuse cap (no rate
limit on /preview, Opus default in the build worker, 5-min cache TTL that
made cache-miss the common case). This switches free users to GLM, paid
users to Claude tiers, and tightens every leak found in the audit.
Backend:
- @bmm/llm: GLM provider via Zhipu's OpenAI-compatible endpoint, pickPreviewModel
+ pickBuildModel helpers, plan-aware ModelChoice
- preview-cache TTL 5min -> 24h (kills the cache-miss path)
- /v1/servers/preview: picks model from caller's plan, returns model name to UI
- /v1/servers POST: enforces SERVER_LIMITS per plan (402), rate-limits builds
- daily rate-limit on preview (5/40/150/1000) and build (3/20/100/500)
- /v1/auth/me returns plan so the wizard can show the right model name
- generator worker: GLM default, Anthropic Sonnet fallback if GLM errors
Frontend:
- Wizard fetches plan, shows "<model> is drafting the tool spec" pre-emptively,
upgrade hint for hobby users, friendly errors for 402 / 429
- Pricing page: AI-model line per tier (Open-tier / Haiku / Sonnet / Opus),
Team €149 -> €199, Enterprise €499 -> €999, daily-preview limit per tier
- Privacy + Security: explicit subprocessor disclosure for Anthropic (US) /
Zhipu (CN) and which tier uses which
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 23:50:00 +02:00
|
|
|
|
const GLM_ENDPOINT = 'https://open.bigmodel.cn/api/paas/v4/chat/completions';
|
|
|
|
|
|
|
|
|
|
|
|
async function generateWithGlm(
|
|
|
|
|
|
prompt: string,
|
|
|
|
|
|
opts: { apiKey: string; model: string; maxTokens: number; timeoutMs?: number },
|
|
|
|
|
|
): Promise<GenerationResult> {
|
|
|
|
|
|
const controller = new AbortController();
|
|
|
|
|
|
const timer = opts.timeoutMs ? setTimeout(() => controller.abort(), opts.timeoutMs) : null;
|
|
|
|
|
|
let res: Response;
|
|
|
|
|
|
try {
|
|
|
|
|
|
res = await fetch(GLM_ENDPOINT, {
|
|
|
|
|
|
method: 'POST',
|
|
|
|
|
|
headers: {
|
|
|
|
|
|
Authorization: `Bearer ${opts.apiKey}`,
|
|
|
|
|
|
'Content-Type': 'application/json',
|
|
|
|
|
|
},
|
|
|
|
|
|
body: JSON.stringify({
|
|
|
|
|
|
model: opts.model,
|
|
|
|
|
|
max_tokens: opts.maxTokens,
|
|
|
|
|
|
messages: [
|
|
|
|
|
|
{ role: 'system', content: SYSTEM_PROMPT },
|
|
|
|
|
|
{ role: 'user', content: prompt },
|
|
|
|
|
|
],
|
|
|
|
|
|
}),
|
|
|
|
|
|
signal: controller.signal,
|
|
|
|
|
|
});
|
|
|
|
|
|
} catch (err) {
|
|
|
|
|
|
if ((err as { name?: string }).name === 'AbortError') {
|
|
|
|
|
|
throw new SpecTimeoutError('glm spec generation exceeded the time budget');
|
|
|
|
|
|
}
|
|
|
|
|
|
throw err;
|
|
|
|
|
|
} finally {
|
|
|
|
|
|
if (timer) clearTimeout(timer);
|
|
|
|
|
|
}
|
|
|
|
|
|
if (!res.ok) {
|
|
|
|
|
|
const body = await res.text().catch(() => '');
|
|
|
|
|
|
throw new Error(`glm_api_${res.status}: ${body.slice(0, 200)}`);
|
|
|
|
|
|
}
|
|
|
|
|
|
const data = (await res.json()) as {
|
|
|
|
|
|
choices?: Array<{ message?: { content?: string }; finish_reason?: string }>;
|
|
|
|
|
|
};
|
|
|
|
|
|
const content = data.choices?.[0]?.message?.content;
|
|
|
|
|
|
if (!content) throw new SpecValidationError('glm_empty_response');
|
|
|
|
|
|
const json = extractJson(content);
|
|
|
|
|
|
const parsed = GeneratorSpec.safeParse(json);
|
|
|
|
|
|
if (!parsed.success) throw new SpecValidationError(parsed.error.message);
|
|
|
|
|
|
scanForInjection(parsed.data);
|
|
|
|
|
|
return { spec: parsed.data, source: 'glm' };
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2026-05-19 18:05:31 +02:00
|
|
|
|
export class SpecValidationError extends Error {
|
|
|
|
|
|
override readonly name = 'SpecValidationError';
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
export class BannedPatternError extends Error {
|
|
|
|
|
|
override readonly name = 'BannedPatternError';
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2026-05-21 23:52:48 +02:00
|
|
|
|
export class SpecTimeoutError extends Error {
|
|
|
|
|
|
override readonly name = 'SpecTimeoutError';
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2026-05-19 18:05:31 +02:00
|
|
|
|
function extractJson(text: string): unknown {
|
|
|
|
|
|
const trimmed = text.trim();
|
|
|
|
|
|
const fenced = trimmed.match(/```(?:json)?\s*([\s\S]*?)```/);
|
|
|
|
|
|
const body = fenced ? fenced[1] : trimmed;
|
|
|
|
|
|
if (!body) throw new SpecValidationError('empty_generation_output');
|
|
|
|
|
|
try {
|
|
|
|
|
|
return JSON.parse(body);
|
|
|
|
|
|
} catch (e) {
|
|
|
|
|
|
throw new SpecValidationError(`generation_not_json: ${(e as Error).message}`);
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
|
security: sovereign-audit Phase 2 fixes — trustProxy, Docker hardening, banned-pattern overhaul
Five confirmed findings from the sovereign-audit pass, ordered by severity:
Z3-001 CRITICAL — Fastify now trustProxy:true so req.ip resolves to the
real visitor IP via X-Forwarded-For instead of always being the nginx /
docker-bridge peer. Every per-IP rate-limit in the codebase was silently
collapsed into one global counter; this restores them.
Z1-001 CRITICAL — runner container hardening flags (--read-only,
--cap-drop=ALL, --security-opt=no-new-privileges:true, --pids-limit=100,
--memory=512m, --cpus=0.5, tmpfs /tmp) were sitting commented-out as a
TODO despite /security promising them. Now applied unconditionally on
production/staging; opt-out flag RUNNER_DISABLE_HARDENING=1 for Win-dev.
Z2-001 + Z2-002 CRITICAL / MEDIUM — banned-pattern blacklist tightened
(Function(...) without `new`, process.binding, process.dlopen,
.constructor.constructor, _load, vm.runIn*Context, globalThis['..'],
"system prompt override"). scanForInjection now also walks tool.name and
every inputSchema property description, not only implementation +
description — closes the prompt-injection-into-AI-client surface that
downstream clients (Claude Desktop, Cursor) read verbatim. The duplicate
BANNED_PATTERNS in apps/api/src/routes/servers.ts deleted in favour of
the single shared scanForInjection export from @bmm/llm.
Z4-001 HIGH — /v1/auth/magic-link gained the two-axis daily rate-limit
the SMS endpoint already had: 10/IP/day + 5/email/day. Combined with the
trustProxy fix above these are now real per-visitor limits.
Z4-002 MEDIUM — magic-link callback URL no longer printed to stdout in
production. In dev it still prints (so devs can click the link); in
production we log only "issued, URL withheld" and a loud error if no
email sender is wired (Resend integration is the actual launch
blocker — left as a TODO).
Z6-001 MEDIUM — /v1/builds/:id/stream WebSocket now refuses cross-origin
upgrades. SameSite=Lax already mitigates in modern browsers; this is the
defense-in-depth against browser bugs and non-browser clients.
FALSE POSITIVES dismissed: slug path-traversal (schema regex
^[a-z][a-z0-9-]*$ in @bmm/types catches it); session-after-promote
(getSession re-fetches isAdmin from DB on every request).
DEFERRED (not blockers, tracked):
- Z1-002 generated-server HTTPS — needs nginx wildcard subdomain TLS
- Z1-003 docker image cleanup cron
- Z2-001 v2 — real sandbox runtime (multi-week refactor)
- Z3-002 rawBody-per-request memory — branch on webhook path only
- Z5-001 multi-user org RBAC for billing — gated on Team feature
- Email sender integration (Resend) — launch blocker
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 18:02:59 +02:00
|
|
|
|
/**
|
|
|
|
|
|
* Public so other layers (the spec-edit merge in apps/api) can re-scan a
|
|
|
|
|
|
* user-edited spec without duplicating the pattern list — single source of
|
|
|
|
|
|
* truth for what counts as obviously-dangerous LLM output.
|
|
|
|
|
|
*/
|
|
|
|
|
|
export function scanForInjection(spec: GeneratorSpecT): void {
|
2026-05-19 18:05:31 +02:00
|
|
|
|
for (const tool of spec.tools) {
|
security: sovereign-audit Phase 2 fixes — trustProxy, Docker hardening, banned-pattern overhaul
Five confirmed findings from the sovereign-audit pass, ordered by severity:
Z3-001 CRITICAL — Fastify now trustProxy:true so req.ip resolves to the
real visitor IP via X-Forwarded-For instead of always being the nginx /
docker-bridge peer. Every per-IP rate-limit in the codebase was silently
collapsed into one global counter; this restores them.
Z1-001 CRITICAL — runner container hardening flags (--read-only,
--cap-drop=ALL, --security-opt=no-new-privileges:true, --pids-limit=100,
--memory=512m, --cpus=0.5, tmpfs /tmp) were sitting commented-out as a
TODO despite /security promising them. Now applied unconditionally on
production/staging; opt-out flag RUNNER_DISABLE_HARDENING=1 for Win-dev.
Z2-001 + Z2-002 CRITICAL / MEDIUM — banned-pattern blacklist tightened
(Function(...) without `new`, process.binding, process.dlopen,
.constructor.constructor, _load, vm.runIn*Context, globalThis['..'],
"system prompt override"). scanForInjection now also walks tool.name and
every inputSchema property description, not only implementation +
description — closes the prompt-injection-into-AI-client surface that
downstream clients (Claude Desktop, Cursor) read verbatim. The duplicate
BANNED_PATTERNS in apps/api/src/routes/servers.ts deleted in favour of
the single shared scanForInjection export from @bmm/llm.
Z4-001 HIGH — /v1/auth/magic-link gained the two-axis daily rate-limit
the SMS endpoint already had: 10/IP/day + 5/email/day. Combined with the
trustProxy fix above these are now real per-visitor limits.
Z4-002 MEDIUM — magic-link callback URL no longer printed to stdout in
production. In dev it still prints (so devs can click the link); in
production we log only "issued, URL withheld" and a loud error if no
email sender is wired (Resend integration is the actual launch
blocker — left as a TODO).
Z6-001 MEDIUM — /v1/builds/:id/stream WebSocket now refuses cross-origin
upgrades. SameSite=Lax already mitigates in modern browsers; this is the
defense-in-depth against browser bugs and non-browser clients.
FALSE POSITIVES dismissed: slug path-traversal (schema regex
^[a-z][a-z0-9-]*$ in @bmm/types catches it); session-after-promote
(getSession re-fetches isAdmin from DB on every request).
DEFERRED (not blockers, tracked):
- Z1-002 generated-server HTTPS — needs nginx wildcard subdomain TLS
- Z1-003 docker image cleanup cron
- Z2-001 v2 — real sandbox runtime (multi-week refactor)
- Z3-002 rawBody-per-request memory — branch on webhook path only
- Z5-001 multi-user org RBAC for billing — gated on Team feature
- Email sender integration (Resend) — launch blocker
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 18:02:59 +02:00
|
|
|
|
// Collect every string the LLM could have planted a payload in. Downstream
|
|
|
|
|
|
// AI clients (Claude Desktop, Cursor) read tool.name + every inputSchema
|
|
|
|
|
|
// description verbatim, so an injection there can pivot the user's AI
|
|
|
|
|
|
// session — not only the runtime code.
|
|
|
|
|
|
const surfaces: string[] = [tool.name, tool.description, tool.implementation];
|
|
|
|
|
|
for (const param of Object.values(tool.inputSchema)) {
|
|
|
|
|
|
if (param && typeof param === 'object' && 'description' in param) {
|
|
|
|
|
|
const d = (param as { description?: unknown }).description;
|
|
|
|
|
|
if (typeof d === 'string') surfaces.push(d);
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
for (const text of surfaces) {
|
security: sovereign-audit Pass-2 fixes — auth-lib, oauth, templates
Six confirmed findings closed (3 MEDIUM, 3 LOW). Tier-1 surfaces from
Pass-1 re-verified non-regressed; this pass deepened the audit on the
auth library, OAuth issuer, and template marketplace.
Za-002 MEDIUM (scrypt cost) — bump SCRYPT_N from 2^14 → 2^17 (131072)
matching current OWASP guidance for password hashing in 2026. Hash
format embeds N (`scrypt$N$salt$hash`), so the existing admin
password at the old cost still verifies — backward-compatible. Also
added explicit maxmem ceilings since Node's default (~32MiB) is
insufficient for the new N.
Za-003 MEDIUM (single-use race) — consumeMagicLink was SELECT-then-
UPDATE; two parallel redemptions could both win and mint two
sessions from the same token. Now uses the same atomic
`UPDATE … WHERE id = ? AND consumedAt IS NULL RETURNING id` pattern
/oauth/token already had — loser of the race gets
invalid_or_expired_token.
Za-004 LOW (membership ordering) — `.orderBy(memberships.createdAt)`
added so when org-invites eventually let a user belong to multiple
orgs, the same one wins every login instead of insertion-order
roulette. Latent-bug pre-empt.
Zb-002 LOW (OAuth register spam) — /oauth/register now per-IP daily
rate-limited at 20/day (well above any legitimate MCP-client
bootstrap pattern). Prevents DB-row spam.
Zc-001 MEDIUM (banned-pattern drift) — three separate copies of
BANNED_PATTERNS had drifted apart. The publish-time scanner in
templates.ts was MISSING the 7 new patterns added in Pass-1
(process.binding, dlopen, .constructor.constructor, vm.runIn*,
globalThis['..']). Single source of truth in @bmm/llm now exports
SHARED_BANNED_PATTERNS; templates.ts composes PUBLISH_BANNED_PATTERNS
= SHARED ∪ code-only-extras (dynamic import, fs.rm, setTimeout-with-
string, process.kill, jailbreak markers).
Zc-002 LOW (N+1) — /v1/templates list was issuing one COUNT(*) per
template (101 queries for a 100-row page). Now one grouped query
with templateId GROUP BY, merged in JS. p95 doesn't degrade with
marketplace growth.
DEFERRED (documented, scoped for next sprint):
Za-001 HIGH — Account takeover via cross-provider email lookup.
Requires schema change (users.primaryProvider). Mitigation in
/settings/account banner planned.
Zb-001 MEDIUM — /oauth/token refresh_token grant: advertised in
AS metadata but unsupported_grant_type. Either implement (~40
LOC) or strip from metadata.
Zc-003 LOW — Admin takedown partial-failure consistency.
Zd-001 IMPROVE — DEK cache invalidation across replicas (single-
instance today).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 18:15:54 +02:00
|
|
|
|
for (const pattern of SHARED_BANNED_PATTERNS) {
|
security: sovereign-audit Phase 2 fixes — trustProxy, Docker hardening, banned-pattern overhaul
Five confirmed findings from the sovereign-audit pass, ordered by severity:
Z3-001 CRITICAL — Fastify now trustProxy:true so req.ip resolves to the
real visitor IP via X-Forwarded-For instead of always being the nginx /
docker-bridge peer. Every per-IP rate-limit in the codebase was silently
collapsed into one global counter; this restores them.
Z1-001 CRITICAL — runner container hardening flags (--read-only,
--cap-drop=ALL, --security-opt=no-new-privileges:true, --pids-limit=100,
--memory=512m, --cpus=0.5, tmpfs /tmp) were sitting commented-out as a
TODO despite /security promising them. Now applied unconditionally on
production/staging; opt-out flag RUNNER_DISABLE_HARDENING=1 for Win-dev.
Z2-001 + Z2-002 CRITICAL / MEDIUM — banned-pattern blacklist tightened
(Function(...) without `new`, process.binding, process.dlopen,
.constructor.constructor, _load, vm.runIn*Context, globalThis['..'],
"system prompt override"). scanForInjection now also walks tool.name and
every inputSchema property description, not only implementation +
description — closes the prompt-injection-into-AI-client surface that
downstream clients (Claude Desktop, Cursor) read verbatim. The duplicate
BANNED_PATTERNS in apps/api/src/routes/servers.ts deleted in favour of
the single shared scanForInjection export from @bmm/llm.
Z4-001 HIGH — /v1/auth/magic-link gained the two-axis daily rate-limit
the SMS endpoint already had: 10/IP/day + 5/email/day. Combined with the
trustProxy fix above these are now real per-visitor limits.
Z4-002 MEDIUM — magic-link callback URL no longer printed to stdout in
production. In dev it still prints (so devs can click the link); in
production we log only "issued, URL withheld" and a loud error if no
email sender is wired (Resend integration is the actual launch
blocker — left as a TODO).
Z6-001 MEDIUM — /v1/builds/:id/stream WebSocket now refuses cross-origin
upgrades. SameSite=Lax already mitigates in modern browsers; this is the
defense-in-depth against browser bugs and non-browser clients.
FALSE POSITIVES dismissed: slug path-traversal (schema regex
^[a-z][a-z0-9-]*$ in @bmm/types catches it); session-after-promote
(getSession re-fetches isAdmin from DB on every request).
DEFERRED (not blockers, tracked):
- Z1-002 generated-server HTTPS — needs nginx wildcard subdomain TLS
- Z1-003 docker image cleanup cron
- Z2-001 v2 — real sandbox runtime (multi-week refactor)
- Z3-002 rawBody-per-request memory — branch on webhook path only
- Z5-001 multi-user org RBAC for billing — gated on Team feature
- Email sender integration (Resend) — launch blocker
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 18:02:59 +02:00
|
|
|
|
if (pattern.test(text)) {
|
|
|
|
|
|
throw new BannedPatternError(`banned_pattern_detected: ${pattern.source}`);
|
|
|
|
|
|
}
|
2026-05-19 18:05:31 +02:00
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
export function mockSpec(prompt: string): GeneratorSpecT {
|
|
|
|
|
|
return {
|
|
|
|
|
|
name: 'Echo MCP',
|
feat: tiered LLM (GLM free / Claude paid) + rate limits + quota enforcement
The free tier was hemorrhaging Anthropic cost with no abuse cap (no rate
limit on /preview, Opus default in the build worker, 5-min cache TTL that
made cache-miss the common case). This switches free users to GLM, paid
users to Claude tiers, and tightens every leak found in the audit.
Backend:
- @bmm/llm: GLM provider via Zhipu's OpenAI-compatible endpoint, pickPreviewModel
+ pickBuildModel helpers, plan-aware ModelChoice
- preview-cache TTL 5min -> 24h (kills the cache-miss path)
- /v1/servers/preview: picks model from caller's plan, returns model name to UI
- /v1/servers POST: enforces SERVER_LIMITS per plan (402), rate-limits builds
- daily rate-limit on preview (5/40/150/1000) and build (3/20/100/500)
- /v1/auth/me returns plan so the wizard can show the right model name
- generator worker: GLM default, Anthropic Sonnet fallback if GLM errors
Frontend:
- Wizard fetches plan, shows "<model> is drafting the tool spec" pre-emptively,
upgrade hint for hobby users, friendly errors for 402 / 429
- Pricing page: AI-model line per tier (Open-tier / Haiku / Sonnet / Opus),
Team €149 -> €199, Enterprise €499 -> €999, daily-preview limit per tier
- Privacy + Security: explicit subprocessor disclosure for Anthropic (US) /
Zhipu (CN) and which tier uses which
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 23:50:00 +02:00
|
|
|
|
description: `Mock server (no LLM key). Prompt was: ${prompt.slice(0, 200)}`,
|
2026-05-19 18:05:31 +02:00
|
|
|
|
tools: [
|
|
|
|
|
|
{
|
|
|
|
|
|
name: 'echo',
|
|
|
|
|
|
description: 'Echoes the input string back to the caller.',
|
|
|
|
|
|
inputSchema: {
|
|
|
|
|
|
message: { type: 'string', description: 'Message to echo back', required: true },
|
|
|
|
|
|
},
|
|
|
|
|
|
implementation: `const msg = String(args.message ?? '');\nreturn { content: [{ type: 'text', text: \`echo: \${msg}\` }] };`,
|
|
|
|
|
|
},
|
|
|
|
|
|
{
|
|
|
|
|
|
name: 'now',
|
|
|
|
|
|
description: 'Returns the current server UTC timestamp.',
|
|
|
|
|
|
inputSchema: {},
|
|
|
|
|
|
implementation: `return { content: [{ type: 'text', text: new Date().toISOString() }] };`,
|
|
|
|
|
|
},
|
|
|
|
|
|
],
|
|
|
|
|
|
resources: [],
|
|
|
|
|
|
prompts: [],
|
|
|
|
|
|
requiredSecrets: [],
|
|
|
|
|
|
scopes: ['mcp:read'],
|
|
|
|
|
|
dependencies: {},
|
|
|
|
|
|
};
|
|
|
|
|
|
}
|