buildmymcpserver/packages/llm/src/index.ts

431 lines
15 KiB
TypeScript
Raw Normal View History

import Anthropic from '@anthropic-ai/sdk';
import { GeneratorSpec, type GeneratorSpec as GeneratorSpecT } from '@bmm/types';
export const SYSTEM_PROMPT = `You generate production-grade MCP server specifications as STRICT JSON.
Output ONE JSON object (no markdown, no prose, no code fences) with this exact shape:
{
"name": "human-readable server name (max 80 chars)",
"description": "one sentence",
"tools": [
{
"name": "snake_case_tool_name",
"description": "single sentence, max 100 chars",
"inputSchema": {
"param_name": { "type": "string|number|boolean|array|object", "description": "short", "required": true }
},
"implementation": "async TS body, return { content: [{ type:'text', text:'...' }] }; secrets via process.env; HTTP via globalThis.fetch with AbortSignal.timeout(10000); try/catch -> { content:[{type:'text',text:'Error: ...'}], isError:true }; no eval/Function/child_process; no imports."
}
],
"resources": [],
"prompts": [],
"requiredSecrets": ["UPPER_SNAKE_CASE"],
"scopes": ["mcp:read"],
"dependencies": {}
}
Hard limits (the output gets truncated past these write tight):
- At most 6 tools. Combine related capabilities into one tool with a "mode" param rather than splitting.
- Each implementation body: at most 40 lines of code, no defensive overengineering, no comments.
- Each description / inputSchema description: one short clause, no examples.
- Parameterised SQL only (pg with $1 placeholders). No prose, no JSON examples in code.
Return JSON only. No preamble, no closing remark.`;
security: sovereign-audit Phase 2 fixes — trustProxy, Docker hardening, banned-pattern overhaul Five confirmed findings from the sovereign-audit pass, ordered by severity: Z3-001 CRITICAL — Fastify now trustProxy:true so req.ip resolves to the real visitor IP via X-Forwarded-For instead of always being the nginx / docker-bridge peer. Every per-IP rate-limit in the codebase was silently collapsed into one global counter; this restores them. Z1-001 CRITICAL — runner container hardening flags (--read-only, --cap-drop=ALL, --security-opt=no-new-privileges:true, --pids-limit=100, --memory=512m, --cpus=0.5, tmpfs /tmp) were sitting commented-out as a TODO despite /security promising them. Now applied unconditionally on production/staging; opt-out flag RUNNER_DISABLE_HARDENING=1 for Win-dev. Z2-001 + Z2-002 CRITICAL / MEDIUM — banned-pattern blacklist tightened (Function(...) without `new`, process.binding, process.dlopen, .constructor.constructor, _load, vm.runIn*Context, globalThis['..'], "system prompt override"). scanForInjection now also walks tool.name and every inputSchema property description, not only implementation + description — closes the prompt-injection-into-AI-client surface that downstream clients (Claude Desktop, Cursor) read verbatim. The duplicate BANNED_PATTERNS in apps/api/src/routes/servers.ts deleted in favour of the single shared scanForInjection export from @bmm/llm. Z4-001 HIGH — /v1/auth/magic-link gained the two-axis daily rate-limit the SMS endpoint already had: 10/IP/day + 5/email/day. Combined with the trustProxy fix above these are now real per-visitor limits. Z4-002 MEDIUM — magic-link callback URL no longer printed to stdout in production. In dev it still prints (so devs can click the link); in production we log only "issued, URL withheld" and a loud error if no email sender is wired (Resend integration is the actual launch blocker — left as a TODO). Z6-001 MEDIUM — /v1/builds/:id/stream WebSocket now refuses cross-origin upgrades. SameSite=Lax already mitigates in modern browsers; this is the defense-in-depth against browser bugs and non-browser clients. FALSE POSITIVES dismissed: slug path-traversal (schema regex ^[a-z][a-z0-9-]*$ in @bmm/types catches it); session-after-promote (getSession re-fetches isAdmin from DB on every request). DEFERRED (not blockers, tracked): - Z1-002 generated-server HTTPS — needs nginx wildcard subdomain TLS - Z1-003 docker image cleanup cron - Z2-001 v2 — real sandbox runtime (multi-week refactor) - Z3-002 rawBody-per-request memory — branch on webhook path only - Z5-001 multi-user org RBAC for billing — gated on Team feature - Email sender integration (Resend) — launch blocker Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 18:02:59 +02:00
// Regex blacklist — explicitly NOT a security boundary, just an early-warning
// for obviously-dangerous LLM output. The real defence is the Docker
// hardening in apps/generator/src/lib/deploy.ts (--cap-drop=ALL etc.). A
// determined attacker can bypass any of these with string concatenation
// (`'chi'+'ld_process'`) or alternate APIs — that's why container isolation
// has to hold even when this fails.
security: sovereign-audit Pass-2 fixes — auth-lib, oauth, templates Six confirmed findings closed (3 MEDIUM, 3 LOW). Tier-1 surfaces from Pass-1 re-verified non-regressed; this pass deepened the audit on the auth library, OAuth issuer, and template marketplace. Za-002 MEDIUM (scrypt cost) — bump SCRYPT_N from 2^14 → 2^17 (131072) matching current OWASP guidance for password hashing in 2026. Hash format embeds N (`scrypt$N$salt$hash`), so the existing admin password at the old cost still verifies — backward-compatible. Also added explicit maxmem ceilings since Node's default (~32MiB) is insufficient for the new N. Za-003 MEDIUM (single-use race) — consumeMagicLink was SELECT-then- UPDATE; two parallel redemptions could both win and mint two sessions from the same token. Now uses the same atomic `UPDATE … WHERE id = ? AND consumedAt IS NULL RETURNING id` pattern /oauth/token already had — loser of the race gets invalid_or_expired_token. Za-004 LOW (membership ordering) — `.orderBy(memberships.createdAt)` added so when org-invites eventually let a user belong to multiple orgs, the same one wins every login instead of insertion-order roulette. Latent-bug pre-empt. Zb-002 LOW (OAuth register spam) — /oauth/register now per-IP daily rate-limited at 20/day (well above any legitimate MCP-client bootstrap pattern). Prevents DB-row spam. Zc-001 MEDIUM (banned-pattern drift) — three separate copies of BANNED_PATTERNS had drifted apart. The publish-time scanner in templates.ts was MISSING the 7 new patterns added in Pass-1 (process.binding, dlopen, .constructor.constructor, vm.runIn*, globalThis['..']). Single source of truth in @bmm/llm now exports SHARED_BANNED_PATTERNS; templates.ts composes PUBLISH_BANNED_PATTERNS = SHARED ∪ code-only-extras (dynamic import, fs.rm, setTimeout-with- string, process.kill, jailbreak markers). Zc-002 LOW (N+1) — /v1/templates list was issuing one COUNT(*) per template (101 queries for a 100-row page). Now one grouped query with templateId GROUP BY, merged in JS. p95 doesn't degrade with marketplace growth. DEFERRED (documented, scoped for next sprint): Za-001 HIGH — Account takeover via cross-provider email lookup. Requires schema change (users.primaryProvider). Mitigation in /settings/account banner planned. Zb-001 MEDIUM — /oauth/token refresh_token grant: advertised in AS metadata but unsupported_grant_type. Either implement (~40 LOC) or strip from metadata. Zc-003 LOW — Admin takedown partial-failure consistency. Zd-001 IMPROVE — DEK cache invalidation across replicas (single- instance today). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 18:15:54 +02:00
//
// Exported so the publish-time template scan in apps/api/src/routes/templates
// can reuse it instead of maintaining a parallel list that drifts. (Zc-001.)
export const SHARED_BANNED_PATTERNS: readonly RegExp[] = [
/\beval\s*\(/,
/\bnew\s+Function\s*\(/,
security: sovereign-audit Phase 2 fixes — trustProxy, Docker hardening, banned-pattern overhaul Five confirmed findings from the sovereign-audit pass, ordered by severity: Z3-001 CRITICAL — Fastify now trustProxy:true so req.ip resolves to the real visitor IP via X-Forwarded-For instead of always being the nginx / docker-bridge peer. Every per-IP rate-limit in the codebase was silently collapsed into one global counter; this restores them. Z1-001 CRITICAL — runner container hardening flags (--read-only, --cap-drop=ALL, --security-opt=no-new-privileges:true, --pids-limit=100, --memory=512m, --cpus=0.5, tmpfs /tmp) were sitting commented-out as a TODO despite /security promising them. Now applied unconditionally on production/staging; opt-out flag RUNNER_DISABLE_HARDENING=1 for Win-dev. Z2-001 + Z2-002 CRITICAL / MEDIUM — banned-pattern blacklist tightened (Function(...) without `new`, process.binding, process.dlopen, .constructor.constructor, _load, vm.runIn*Context, globalThis['..'], "system prompt override"). scanForInjection now also walks tool.name and every inputSchema property description, not only implementation + description — closes the prompt-injection-into-AI-client surface that downstream clients (Claude Desktop, Cursor) read verbatim. The duplicate BANNED_PATTERNS in apps/api/src/routes/servers.ts deleted in favour of the single shared scanForInjection export from @bmm/llm. Z4-001 HIGH — /v1/auth/magic-link gained the two-axis daily rate-limit the SMS endpoint already had: 10/IP/day + 5/email/day. Combined with the trustProxy fix above these are now real per-visitor limits. Z4-002 MEDIUM — magic-link callback URL no longer printed to stdout in production. In dev it still prints (so devs can click the link); in production we log only "issued, URL withheld" and a loud error if no email sender is wired (Resend integration is the actual launch blocker — left as a TODO). Z6-001 MEDIUM — /v1/builds/:id/stream WebSocket now refuses cross-origin upgrades. SameSite=Lax already mitigates in modern browsers; this is the defense-in-depth against browser bugs and non-browser clients. FALSE POSITIVES dismissed: slug path-traversal (schema regex ^[a-z][a-z0-9-]*$ in @bmm/types catches it); session-after-promote (getSession re-fetches isAdmin from DB on every request). DEFERRED (not blockers, tracked): - Z1-002 generated-server HTTPS — needs nginx wildcard subdomain TLS - Z1-003 docker image cleanup cron - Z2-001 v2 — real sandbox runtime (multi-week refactor) - Z3-002 rawBody-per-request memory — branch on webhook path only - Z5-001 multi-user org RBAC for billing — gated on Team feature - Email sender integration (Resend) — launch blocker Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 18:02:59 +02:00
/\bFunction\s*\(\s*['"`]/, // Function('...') without `new`
/\brequire\s*\(\s*['"]child_process['"]/,
/\bchild_process\b/,
security: sovereign-audit Phase 2 fixes — trustProxy, Docker hardening, banned-pattern overhaul Five confirmed findings from the sovereign-audit pass, ordered by severity: Z3-001 CRITICAL — Fastify now trustProxy:true so req.ip resolves to the real visitor IP via X-Forwarded-For instead of always being the nginx / docker-bridge peer. Every per-IP rate-limit in the codebase was silently collapsed into one global counter; this restores them. Z1-001 CRITICAL — runner container hardening flags (--read-only, --cap-drop=ALL, --security-opt=no-new-privileges:true, --pids-limit=100, --memory=512m, --cpus=0.5, tmpfs /tmp) were sitting commented-out as a TODO despite /security promising them. Now applied unconditionally on production/staging; opt-out flag RUNNER_DISABLE_HARDENING=1 for Win-dev. Z2-001 + Z2-002 CRITICAL / MEDIUM — banned-pattern blacklist tightened (Function(...) without `new`, process.binding, process.dlopen, .constructor.constructor, _load, vm.runIn*Context, globalThis['..'], "system prompt override"). scanForInjection now also walks tool.name and every inputSchema property description, not only implementation + description — closes the prompt-injection-into-AI-client surface that downstream clients (Claude Desktop, Cursor) read verbatim. The duplicate BANNED_PATTERNS in apps/api/src/routes/servers.ts deleted in favour of the single shared scanForInjection export from @bmm/llm. Z4-001 HIGH — /v1/auth/magic-link gained the two-axis daily rate-limit the SMS endpoint already had: 10/IP/day + 5/email/day. Combined with the trustProxy fix above these are now real per-visitor limits. Z4-002 MEDIUM — magic-link callback URL no longer printed to stdout in production. In dev it still prints (so devs can click the link); in production we log only "issued, URL withheld" and a loud error if no email sender is wired (Resend integration is the actual launch blocker — left as a TODO). Z6-001 MEDIUM — /v1/builds/:id/stream WebSocket now refuses cross-origin upgrades. SameSite=Lax already mitigates in modern browsers; this is the defense-in-depth against browser bugs and non-browser clients. FALSE POSITIVES dismissed: slug path-traversal (schema regex ^[a-z][a-z0-9-]*$ in @bmm/types catches it); session-after-promote (getSession re-fetches isAdmin from DB on every request). DEFERRED (not blockers, tracked): - Z1-002 generated-server HTTPS — needs nginx wildcard subdomain TLS - Z1-003 docker image cleanup cron - Z2-001 v2 — real sandbox runtime (multi-week refactor) - Z3-002 rawBody-per-request memory — branch on webhook path only - Z5-001 multi-user org RBAC for billing — gated on Team feature - Email sender integration (Resend) — launch blocker Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 18:02:59 +02:00
/\bprocess\.binding\b/,
/\bprocess\.dlopen\b/,
/\.constructor\s*\.\s*constructor\b/, // [].constructor.constructor('...')
/\b_load\s*\(/,
/\bvm\.runIn(This|New)Context\b/,
/globalThis\s*\[\s*['"`]/, // globalThis['Fun'+'ction']
/ignore\s+previous\s+instructions/i,
/disregard\s+(the\s+)?(above|previous)/i,
security: sovereign-audit Phase 2 fixes — trustProxy, Docker hardening, banned-pattern overhaul Five confirmed findings from the sovereign-audit pass, ordered by severity: Z3-001 CRITICAL — Fastify now trustProxy:true so req.ip resolves to the real visitor IP via X-Forwarded-For instead of always being the nginx / docker-bridge peer. Every per-IP rate-limit in the codebase was silently collapsed into one global counter; this restores them. Z1-001 CRITICAL — runner container hardening flags (--read-only, --cap-drop=ALL, --security-opt=no-new-privileges:true, --pids-limit=100, --memory=512m, --cpus=0.5, tmpfs /tmp) were sitting commented-out as a TODO despite /security promising them. Now applied unconditionally on production/staging; opt-out flag RUNNER_DISABLE_HARDENING=1 for Win-dev. Z2-001 + Z2-002 CRITICAL / MEDIUM — banned-pattern blacklist tightened (Function(...) without `new`, process.binding, process.dlopen, .constructor.constructor, _load, vm.runIn*Context, globalThis['..'], "system prompt override"). scanForInjection now also walks tool.name and every inputSchema property description, not only implementation + description — closes the prompt-injection-into-AI-client surface that downstream clients (Claude Desktop, Cursor) read verbatim. The duplicate BANNED_PATTERNS in apps/api/src/routes/servers.ts deleted in favour of the single shared scanForInjection export from @bmm/llm. Z4-001 HIGH — /v1/auth/magic-link gained the two-axis daily rate-limit the SMS endpoint already had: 10/IP/day + 5/email/day. Combined with the trustProxy fix above these are now real per-visitor limits. Z4-002 MEDIUM — magic-link callback URL no longer printed to stdout in production. In dev it still prints (so devs can click the link); in production we log only "issued, URL withheld" and a loud error if no email sender is wired (Resend integration is the actual launch blocker — left as a TODO). Z6-001 MEDIUM — /v1/builds/:id/stream WebSocket now refuses cross-origin upgrades. SameSite=Lax already mitigates in modern browsers; this is the defense-in-depth against browser bugs and non-browser clients. FALSE POSITIVES dismissed: slug path-traversal (schema regex ^[a-z][a-z0-9-]*$ in @bmm/types catches it); session-after-promote (getSession re-fetches isAdmin from DB on every request). DEFERRED (not blockers, tracked): - Z1-002 generated-server HTTPS — needs nginx wildcard subdomain TLS - Z1-003 docker image cleanup cron - Z2-001 v2 — real sandbox runtime (multi-week refactor) - Z3-002 rawBody-per-request memory — branch on webhook path only - Z5-001 multi-user org RBAC for billing — gated on Team feature - Email sender integration (Resend) — launch blocker Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 18:02:59 +02:00
/system\s+prompt\s+override/i,
];
// ──────────────────────────────────────────────────────────────────────────
// Plan-aware model selection
// ──────────────────────────────────────────────────────────────────────────
export type Plan = 'hobby' | 'pro' | 'team' | 'enterprise';
export type Purpose = 'preview' | 'build';
export type Provider = 'anthropic' | 'glm';
export type DisplayBadge = 'open-tier' | 'claude-haiku' | 'claude-sonnet' | 'claude-opus';
export interface ModelChoice {
provider: Provider;
model: string;
maxTokens: number;
timeoutMs: number;
/** User-facing model name shown in the wizard + previews. */
displayName: string;
displayBadge: DisplayBadge;
}
/**
* Preview runs synchronously inside an HTTP request behind Cloudflare's
* ~100s edge cap. Each tier's (model + max_tokens + timeout) is bounded to
* fit. Hobby uses GLM as the cost lever; paid tiers escalate to Claude the
* visible quality/speed jump *is* the upgrade pitch.
*
* Measured token rates: glm-4-plus ~58 tok/s · Claude Haiku 4.5 ~200 tok/s ·
* Claude Sonnet 4.6 ~130 tok/s (current measurement; the older ~80 tok/s
* number was from the pre-4.6 generation).
*
* Token budget: a *small* spec is ~1.52.5k output tokens, but ambitious
* prompts ("research assistant with web search, papers, wikipedia, …")
* routinely produce 68k tokens of deeply-nested tool schemas. We cap at
* 8192 the model's effective ceiling for these prompts and detect the
* `stop_reason === 'max_tokens'` case to surface a "spec too large" message
* instead of letting the truncated JSON blow up at the zod boundary.
*
* Timeouts sit at 95s, just under Cloudflare's 100s edge cap. Sonnet at
* 130 tok/s finishes 8192 tokens in ~63s, giving ~30s headroom for cold
* starts and TCP/TLS setup.
*/
const PREVIEW_MODELS: Record<Plan, ModelChoice> = {
hobby: {
provider: 'glm',
model: 'glm-4-plus',
maxTokens: 4096,
timeoutMs: 95_000,
displayName: 'Open-tier AI',
displayBadge: 'open-tier',
},
pro: {
provider: 'anthropic',
model: 'claude-haiku-4-5-20251001',
maxTokens: 8192,
timeoutMs: 95_000,
displayName: 'Claude Haiku 4.5',
displayBadge: 'claude-haiku',
},
team: {
provider: 'anthropic',
model: 'claude-sonnet-4-6',
maxTokens: 12288,
timeoutMs: 95_000,
displayName: 'Claude Sonnet 4.6',
displayBadge: 'claude-sonnet',
},
enterprise: {
provider: 'anthropic',
model: 'claude-sonnet-4-6',
maxTokens: 12288,
timeoutMs: 95_000,
displayName: 'Claude Sonnet 4.6',
displayBadge: 'claude-sonnet',
},
};
/**
* Build worker runs async via BullMQ no proxy timeout. With the 24h preview
* cache TTL cache-misses are rare, so GLM as the default keeps that rare path
* cheap; Enterprise gets Opus as a premium-quality promise.
*/
const BUILD_MODELS: Record<Plan, ModelChoice> = {
hobby: {
provider: 'glm',
model: 'glm-4.5',
maxTokens: 8192,
timeoutMs: 180_000,
displayName: 'Open-tier AI',
displayBadge: 'open-tier',
},
pro: {
provider: 'glm',
model: 'glm-4.5',
maxTokens: 8192,
timeoutMs: 180_000,
displayName: 'Open-tier AI',
displayBadge: 'open-tier',
},
team: {
provider: 'glm',
model: 'glm-4.5',
maxTokens: 8192,
timeoutMs: 180_000,
displayName: 'Open-tier AI',
displayBadge: 'open-tier',
},
enterprise: {
provider: 'anthropic',
model: 'claude-opus-4-7',
maxTokens: 8192,
timeoutMs: 600_000,
displayName: 'Claude Opus 4.7',
displayBadge: 'claude-opus',
},
};
export function pickPreviewModel(plan: Plan): ModelChoice {
return PREVIEW_MODELS[plan];
}
export function pickBuildModel(plan: Plan): ModelChoice {
return BUILD_MODELS[plan];
}
// ──────────────────────────────────────────────────────────────────────────
// Generation API
// ──────────────────────────────────────────────────────────────────────────
export interface GenerationResult {
spec: GeneratorSpecT;
source: 'claude' | 'glm' | 'mock';
}
export interface GenerateOptions {
/** 'anthropic' (default) or 'glm'. */
provider?: Provider;
/** Anthropic API key — required if provider === 'anthropic'. */
apiKey?: string;
/** Zhipu (GLM) API key — required if provider === 'glm'. */
glmApiKey?: string;
model?: string;
maxTokens?: number;
/** Per-attempt request timeout in ms. */
timeoutMs?: number;
/** SDK retry count. Anthropic only. */
maxRetries?: number;
}
export async function generateSpec(
prompt: string,
opts: GenerateOptions = {},
): Promise<GenerationResult> {
const provider = opts.provider ?? 'anthropic';
if (provider === 'glm') {
if (!opts.glmApiKey) return { spec: mockSpec(prompt), source: 'mock' };
return generateWithGlm(prompt, {
apiKey: opts.glmApiKey,
model: opts.model ?? 'glm-4-plus',
maxTokens: opts.maxTokens ?? 4096,
timeoutMs: opts.timeoutMs,
});
}
if (!opts.apiKey) {
return { spec: mockSpec(prompt), source: 'mock' };
}
return generateWithAnthropic(prompt, {
apiKey: opts.apiKey,
model: opts.model ?? 'claude-opus-4-7',
maxTokens: opts.maxTokens ?? 8192,
timeoutMs: opts.timeoutMs,
maxRetries: opts.maxRetries,
});
}
async function generateWithAnthropic(
prompt: string,
opts: {
apiKey: string;
model: string;
maxTokens: number;
timeoutMs?: number;
maxRetries?: number;
},
): Promise<GenerationResult> {
const client = new Anthropic({ apiKey: opts.apiKey });
const requestOptions: { timeout?: number; maxRetries?: number } = {};
if (opts.timeoutMs !== undefined) requestOptions.timeout = opts.timeoutMs;
if (opts.maxRetries !== undefined) requestOptions.maxRetries = opts.maxRetries;
const response = await client.messages
.create(
{
model: opts.model,
max_tokens: opts.maxTokens,
system: SYSTEM_PROMPT,
messages: [{ role: 'user', content: prompt }],
},
requestOptions,
)
.catch((err: unknown) => {
if (err instanceof Anthropic.APIConnectionTimeoutError) {
throw new SpecTimeoutError('spec generation exceeded the time budget');
}
throw err;
});
const text = response.content
.filter((b): b is { type: 'text'; text: string } => b.type === 'text')
.map((b) => b.text)
.join('');
// Detect token-limit truncation BEFORE attempting to parse. The model
// chops mid-token when it hits max_tokens, so the closing `}` of a deeply
// nested tool schema never gets emitted and JSON.parse blows up with an
// unterminated-string error that's indistinguishable from a refusal at
// the catch site. With stop_reason in hand we can surface a precise
// "spec too large" message and tell the user to split / simplify the
// prompt instead of letting them keep retrying the same one.
if (response.stop_reason === 'max_tokens') {
throw new SpecTruncatedError(
`model hit max_tokens (${opts.maxTokens}) before finishing the spec`,
);
}
const json = extractJson(text);
const parsed = GeneratorSpec.safeParse(json);
if (!parsed.success) {
// Include a truncated raw preview so the caller (api log) can see whether
// the model returned non-JSON / a refusal / a near-miss schema, instead
// of just the opaque zod error.
const preview = text.slice(0, 400).replace(/\s+/g, ' ');
throw new SpecValidationError(`${parsed.error.message} :: raw="${preview}"`);
}
scanForInjection(parsed.data);
return { spec: parsed.data, source: 'claude' };
}
const GLM_ENDPOINT = 'https://open.bigmodel.cn/api/paas/v4/chat/completions';
async function generateWithGlm(
prompt: string,
opts: { apiKey: string; model: string; maxTokens: number; timeoutMs?: number },
): Promise<GenerationResult> {
const controller = new AbortController();
const timer = opts.timeoutMs ? setTimeout(() => controller.abort(), opts.timeoutMs) : null;
let res: Response;
try {
res = await fetch(GLM_ENDPOINT, {
method: 'POST',
headers: {
Authorization: `Bearer ${opts.apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: opts.model,
max_tokens: opts.maxTokens,
messages: [
{ role: 'system', content: SYSTEM_PROMPT },
{ role: 'user', content: prompt },
],
}),
signal: controller.signal,
});
} catch (err) {
if ((err as { name?: string }).name === 'AbortError') {
throw new SpecTimeoutError('glm spec generation exceeded the time budget');
}
throw err;
} finally {
if (timer) clearTimeout(timer);
}
if (!res.ok) {
const body = await res.text().catch(() => '');
throw new Error(`glm_api_${res.status}: ${body.slice(0, 200)}`);
}
const data = (await res.json()) as {
choices?: Array<{ message?: { content?: string }; finish_reason?: string }>;
};
const content = data.choices?.[0]?.message?.content;
if (!content) throw new SpecValidationError('glm_empty_response');
const json = extractJson(content);
const parsed = GeneratorSpec.safeParse(json);
if (!parsed.success) throw new SpecValidationError(parsed.error.message);
scanForInjection(parsed.data);
return { spec: parsed.data, source: 'glm' };
}
export class SpecValidationError extends Error {
override readonly name = 'SpecValidationError';
}
export class BannedPatternError extends Error {
override readonly name = 'BannedPatternError';
}
export class SpecTimeoutError extends Error {
override readonly name = 'SpecTimeoutError';
}
export class SpecTruncatedError extends Error {
override readonly name = 'SpecTruncatedError';
}
function extractJson(text: string): unknown {
const trimmed = text.trim();
const fenced = trimmed.match(/```(?:json)?\s*([\s\S]*?)```/);
const body = fenced ? fenced[1] : trimmed;
if (!body) throw new SpecValidationError('empty_generation_output');
try {
return JSON.parse(body);
} catch (e) {
throw new SpecValidationError(`generation_not_json: ${(e as Error).message}`);
}
}
security: sovereign-audit Phase 2 fixes — trustProxy, Docker hardening, banned-pattern overhaul Five confirmed findings from the sovereign-audit pass, ordered by severity: Z3-001 CRITICAL — Fastify now trustProxy:true so req.ip resolves to the real visitor IP via X-Forwarded-For instead of always being the nginx / docker-bridge peer. Every per-IP rate-limit in the codebase was silently collapsed into one global counter; this restores them. Z1-001 CRITICAL — runner container hardening flags (--read-only, --cap-drop=ALL, --security-opt=no-new-privileges:true, --pids-limit=100, --memory=512m, --cpus=0.5, tmpfs /tmp) were sitting commented-out as a TODO despite /security promising them. Now applied unconditionally on production/staging; opt-out flag RUNNER_DISABLE_HARDENING=1 for Win-dev. Z2-001 + Z2-002 CRITICAL / MEDIUM — banned-pattern blacklist tightened (Function(...) without `new`, process.binding, process.dlopen, .constructor.constructor, _load, vm.runIn*Context, globalThis['..'], "system prompt override"). scanForInjection now also walks tool.name and every inputSchema property description, not only implementation + description — closes the prompt-injection-into-AI-client surface that downstream clients (Claude Desktop, Cursor) read verbatim. The duplicate BANNED_PATTERNS in apps/api/src/routes/servers.ts deleted in favour of the single shared scanForInjection export from @bmm/llm. Z4-001 HIGH — /v1/auth/magic-link gained the two-axis daily rate-limit the SMS endpoint already had: 10/IP/day + 5/email/day. Combined with the trustProxy fix above these are now real per-visitor limits. Z4-002 MEDIUM — magic-link callback URL no longer printed to stdout in production. In dev it still prints (so devs can click the link); in production we log only "issued, URL withheld" and a loud error if no email sender is wired (Resend integration is the actual launch blocker — left as a TODO). Z6-001 MEDIUM — /v1/builds/:id/stream WebSocket now refuses cross-origin upgrades. SameSite=Lax already mitigates in modern browsers; this is the defense-in-depth against browser bugs and non-browser clients. FALSE POSITIVES dismissed: slug path-traversal (schema regex ^[a-z][a-z0-9-]*$ in @bmm/types catches it); session-after-promote (getSession re-fetches isAdmin from DB on every request). DEFERRED (not blockers, tracked): - Z1-002 generated-server HTTPS — needs nginx wildcard subdomain TLS - Z1-003 docker image cleanup cron - Z2-001 v2 — real sandbox runtime (multi-week refactor) - Z3-002 rawBody-per-request memory — branch on webhook path only - Z5-001 multi-user org RBAC for billing — gated on Team feature - Email sender integration (Resend) — launch blocker Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 18:02:59 +02:00
/**
* Public so other layers (the spec-edit merge in apps/api) can re-scan a
* user-edited spec without duplicating the pattern list single source of
* truth for what counts as obviously-dangerous LLM output.
*/
export function scanForInjection(spec: GeneratorSpecT): void {
for (const tool of spec.tools) {
security: sovereign-audit Phase 2 fixes — trustProxy, Docker hardening, banned-pattern overhaul Five confirmed findings from the sovereign-audit pass, ordered by severity: Z3-001 CRITICAL — Fastify now trustProxy:true so req.ip resolves to the real visitor IP via X-Forwarded-For instead of always being the nginx / docker-bridge peer. Every per-IP rate-limit in the codebase was silently collapsed into one global counter; this restores them. Z1-001 CRITICAL — runner container hardening flags (--read-only, --cap-drop=ALL, --security-opt=no-new-privileges:true, --pids-limit=100, --memory=512m, --cpus=0.5, tmpfs /tmp) were sitting commented-out as a TODO despite /security promising them. Now applied unconditionally on production/staging; opt-out flag RUNNER_DISABLE_HARDENING=1 for Win-dev. Z2-001 + Z2-002 CRITICAL / MEDIUM — banned-pattern blacklist tightened (Function(...) without `new`, process.binding, process.dlopen, .constructor.constructor, _load, vm.runIn*Context, globalThis['..'], "system prompt override"). scanForInjection now also walks tool.name and every inputSchema property description, not only implementation + description — closes the prompt-injection-into-AI-client surface that downstream clients (Claude Desktop, Cursor) read verbatim. The duplicate BANNED_PATTERNS in apps/api/src/routes/servers.ts deleted in favour of the single shared scanForInjection export from @bmm/llm. Z4-001 HIGH — /v1/auth/magic-link gained the two-axis daily rate-limit the SMS endpoint already had: 10/IP/day + 5/email/day. Combined with the trustProxy fix above these are now real per-visitor limits. Z4-002 MEDIUM — magic-link callback URL no longer printed to stdout in production. In dev it still prints (so devs can click the link); in production we log only "issued, URL withheld" and a loud error if no email sender is wired (Resend integration is the actual launch blocker — left as a TODO). Z6-001 MEDIUM — /v1/builds/:id/stream WebSocket now refuses cross-origin upgrades. SameSite=Lax already mitigates in modern browsers; this is the defense-in-depth against browser bugs and non-browser clients. FALSE POSITIVES dismissed: slug path-traversal (schema regex ^[a-z][a-z0-9-]*$ in @bmm/types catches it); session-after-promote (getSession re-fetches isAdmin from DB on every request). DEFERRED (not blockers, tracked): - Z1-002 generated-server HTTPS — needs nginx wildcard subdomain TLS - Z1-003 docker image cleanup cron - Z2-001 v2 — real sandbox runtime (multi-week refactor) - Z3-002 rawBody-per-request memory — branch on webhook path only - Z5-001 multi-user org RBAC for billing — gated on Team feature - Email sender integration (Resend) — launch blocker Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 18:02:59 +02:00
// Collect every string the LLM could have planted a payload in. Downstream
// AI clients (Claude Desktop, Cursor) read tool.name + every inputSchema
// description verbatim, so an injection there can pivot the user's AI
// session — not only the runtime code.
const surfaces: string[] = [tool.name, tool.description, tool.implementation];
for (const param of Object.values(tool.inputSchema)) {
if (param && typeof param === 'object' && 'description' in param) {
const d = (param as { description?: unknown }).description;
if (typeof d === 'string') surfaces.push(d);
}
}
for (const text of surfaces) {
security: sovereign-audit Pass-2 fixes — auth-lib, oauth, templates Six confirmed findings closed (3 MEDIUM, 3 LOW). Tier-1 surfaces from Pass-1 re-verified non-regressed; this pass deepened the audit on the auth library, OAuth issuer, and template marketplace. Za-002 MEDIUM (scrypt cost) — bump SCRYPT_N from 2^14 → 2^17 (131072) matching current OWASP guidance for password hashing in 2026. Hash format embeds N (`scrypt$N$salt$hash`), so the existing admin password at the old cost still verifies — backward-compatible. Also added explicit maxmem ceilings since Node's default (~32MiB) is insufficient for the new N. Za-003 MEDIUM (single-use race) — consumeMagicLink was SELECT-then- UPDATE; two parallel redemptions could both win and mint two sessions from the same token. Now uses the same atomic `UPDATE … WHERE id = ? AND consumedAt IS NULL RETURNING id` pattern /oauth/token already had — loser of the race gets invalid_or_expired_token. Za-004 LOW (membership ordering) — `.orderBy(memberships.createdAt)` added so when org-invites eventually let a user belong to multiple orgs, the same one wins every login instead of insertion-order roulette. Latent-bug pre-empt. Zb-002 LOW (OAuth register spam) — /oauth/register now per-IP daily rate-limited at 20/day (well above any legitimate MCP-client bootstrap pattern). Prevents DB-row spam. Zc-001 MEDIUM (banned-pattern drift) — three separate copies of BANNED_PATTERNS had drifted apart. The publish-time scanner in templates.ts was MISSING the 7 new patterns added in Pass-1 (process.binding, dlopen, .constructor.constructor, vm.runIn*, globalThis['..']). Single source of truth in @bmm/llm now exports SHARED_BANNED_PATTERNS; templates.ts composes PUBLISH_BANNED_PATTERNS = SHARED ∪ code-only-extras (dynamic import, fs.rm, setTimeout-with- string, process.kill, jailbreak markers). Zc-002 LOW (N+1) — /v1/templates list was issuing one COUNT(*) per template (101 queries for a 100-row page). Now one grouped query with templateId GROUP BY, merged in JS. p95 doesn't degrade with marketplace growth. DEFERRED (documented, scoped for next sprint): Za-001 HIGH — Account takeover via cross-provider email lookup. Requires schema change (users.primaryProvider). Mitigation in /settings/account banner planned. Zb-001 MEDIUM — /oauth/token refresh_token grant: advertised in AS metadata but unsupported_grant_type. Either implement (~40 LOC) or strip from metadata. Zc-003 LOW — Admin takedown partial-failure consistency. Zd-001 IMPROVE — DEK cache invalidation across replicas (single- instance today). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 18:15:54 +02:00
for (const pattern of SHARED_BANNED_PATTERNS) {
security: sovereign-audit Phase 2 fixes — trustProxy, Docker hardening, banned-pattern overhaul Five confirmed findings from the sovereign-audit pass, ordered by severity: Z3-001 CRITICAL — Fastify now trustProxy:true so req.ip resolves to the real visitor IP via X-Forwarded-For instead of always being the nginx / docker-bridge peer. Every per-IP rate-limit in the codebase was silently collapsed into one global counter; this restores them. Z1-001 CRITICAL — runner container hardening flags (--read-only, --cap-drop=ALL, --security-opt=no-new-privileges:true, --pids-limit=100, --memory=512m, --cpus=0.5, tmpfs /tmp) were sitting commented-out as a TODO despite /security promising them. Now applied unconditionally on production/staging; opt-out flag RUNNER_DISABLE_HARDENING=1 for Win-dev. Z2-001 + Z2-002 CRITICAL / MEDIUM — banned-pattern blacklist tightened (Function(...) without `new`, process.binding, process.dlopen, .constructor.constructor, _load, vm.runIn*Context, globalThis['..'], "system prompt override"). scanForInjection now also walks tool.name and every inputSchema property description, not only implementation + description — closes the prompt-injection-into-AI-client surface that downstream clients (Claude Desktop, Cursor) read verbatim. The duplicate BANNED_PATTERNS in apps/api/src/routes/servers.ts deleted in favour of the single shared scanForInjection export from @bmm/llm. Z4-001 HIGH — /v1/auth/magic-link gained the two-axis daily rate-limit the SMS endpoint already had: 10/IP/day + 5/email/day. Combined with the trustProxy fix above these are now real per-visitor limits. Z4-002 MEDIUM — magic-link callback URL no longer printed to stdout in production. In dev it still prints (so devs can click the link); in production we log only "issued, URL withheld" and a loud error if no email sender is wired (Resend integration is the actual launch blocker — left as a TODO). Z6-001 MEDIUM — /v1/builds/:id/stream WebSocket now refuses cross-origin upgrades. SameSite=Lax already mitigates in modern browsers; this is the defense-in-depth against browser bugs and non-browser clients. FALSE POSITIVES dismissed: slug path-traversal (schema regex ^[a-z][a-z0-9-]*$ in @bmm/types catches it); session-after-promote (getSession re-fetches isAdmin from DB on every request). DEFERRED (not blockers, tracked): - Z1-002 generated-server HTTPS — needs nginx wildcard subdomain TLS - Z1-003 docker image cleanup cron - Z2-001 v2 — real sandbox runtime (multi-week refactor) - Z3-002 rawBody-per-request memory — branch on webhook path only - Z5-001 multi-user org RBAC for billing — gated on Team feature - Email sender integration (Resend) — launch blocker Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 18:02:59 +02:00
if (pattern.test(text)) {
throw new BannedPatternError(`banned_pattern_detected: ${pattern.source}`);
}
}
}
}
}
export function mockSpec(prompt: string): GeneratorSpecT {
return {
name: 'Echo MCP',
description: `Mock server (no LLM key). Prompt was: ${prompt.slice(0, 200)}`,
tools: [
{
name: 'echo',
description: 'Echoes the input string back to the caller.',
inputSchema: {
message: { type: 'string', description: 'Message to echo back', required: true },
},
implementation: `const msg = String(args.message ?? '');\nreturn { content: [{ type: 'text', text: \`echo: \${msg}\` }] };`,
},
{
name: 'now',
description: 'Returns the current server UTC timestamp.',
inputSchema: {},
implementation: `return { content: [{ type: 'text', text: new Date().toISOString() }] };`,
},
],
resources: [],
prompts: [],
requiredSecrets: [],
scopes: ['mcp:read'],
dependencies: {},
};
}