fix(llm): tighter system prompt + 12288 max_tokens for paid tiers
All checks were successful
Deploy to Production / deploy (push) Successful in 1m33s

Sonnet 4.6 was still hitting max_tokens on ambitious prompts like
"WorldWeather MCP for any location" because the implementation bodies
ballooned with defensive scaffolding. Two changes:

1. SYSTEM_PROMPT now imposes hard limits the model can self-enforce:
   - at most 6 tools (combine related capabilities with a mode param)
   - implementation body <= 40 lines, no comments, no overengineering
   - descriptions <= 100 chars
   These keep a typical preview under ~7k output tokens.

2. team/enterprise maxTokens 8192 -> 12288. At ~130 tok/s that fits in
   ~94s, still under Cloudflare's 100s edge cap. Hobby (GLM) and pro
   (Haiku) keep their existing limits — they were not hitting the
   ceiling.

SpecTruncatedError still fires + surfaces 422 spec_too_large when even
12288 isn't enough, so the user gets actionable feedback instead of an
opaque zod error.
This commit is contained in:
Marco Sadjadi 2026-05-28 21:01:50 +02:00
parent 4d136c4fb2
commit b930a454e8

View File

@ -6,16 +6,16 @@ export const SYSTEM_PROMPT = `You generate production-grade MCP server specifica
Output ONE JSON object (no markdown, no prose, no code fences) with this exact shape: Output ONE JSON object (no markdown, no prose, no code fences) with this exact shape:
{ {
"name": "human-readable server name (max 128 chars)", "name": "human-readable server name (max 80 chars)",
"description": "1-2 sentence purpose", "description": "one sentence",
"tools": [ "tools": [
{ {
"name": "snake_case_tool_name", "name": "snake_case_tool_name",
"description": "what the AI client sees — single sentence, clear", "description": "single sentence, max 100 chars",
"inputSchema": { "inputSchema": {
"param_name": { "type": "string|number|boolean|array|object", "description": "...", "required": true } "param_name": { "type": "string|number|boolean|array|object", "description": "short", "required": true }
}, },
"implementation": "ASYNC TypeScript body. Receives {args} pre-validated. Must return MCP content blocks: { content: [{ type: 'text', text: '...' }] }. Use process.env.SECRET_NAME for secrets. NEVER use eval/Function/child_process. Use globalThis.fetch for HTTP. Wrap external calls in try/catch and return { content: [{ type: 'text', text: 'Error: ...' }], isError: true } on failure." "implementation": "async TS body, return { content: [{ type:'text', text:'...' }] }; secrets via process.env; HTTP via globalThis.fetch with AbortSignal.timeout(10000); try/catch -> { content:[{type:'text',text:'Error: ...'}], isError:true }; no eval/Function/child_process; no imports."
} }
], ],
"resources": [], "resources": [],
@ -25,16 +25,13 @@ Output ONE JSON object (no markdown, no prose, no code fences) with this exact s
"dependencies": {} "dependencies": {}
} }
Rules: Hard limits (the output gets truncated past these write tight):
- Tools are idempotent unless the description explicitly says destructive. - At most 6 tools. Combine related capabilities into one tool with a "mode" param rather than splitting.
- Validate all string inputs before use. - Each implementation body: at most 40 lines of code, no defensive overengineering, no comments.
- For databases: parameterized queries only (use the 'pg' library with $1 placeholders). - Each description / inputSchema description: one short clause, no examples.
- For HTTP APIs: globalThis.fetch with explicit timeout via AbortSignal.timeout(10000). - Parameterised SQL only (pg with $1 placeholders). No prose, no JSON examples in code.
- Never hardcode credentials; declare them under requiredSecrets and read via process.env.
- Keep tool implementations under 5000 characters.
- Do not include "import" statements in implementations the runtime injects fetch, pg, etc.
Return JSON only. No explanation.`; Return JSON only. No preamble, no closing remark.`;
// Regex blacklist — explicitly NOT a security boundary, just an early-warning // Regex blacklist — explicitly NOT a security boundary, just an early-warning
// for obviously-dangerous LLM output. The real defence is the Docker // for obviously-dangerous LLM output. The real defence is the Docker
@ -122,7 +119,7 @@ const PREVIEW_MODELS: Record<Plan, ModelChoice> = {
team: { team: {
provider: 'anthropic', provider: 'anthropic',
model: 'claude-sonnet-4-6', model: 'claude-sonnet-4-6',
maxTokens: 8192, maxTokens: 12288,
timeoutMs: 95_000, timeoutMs: 95_000,
displayName: 'Claude Sonnet 4.6', displayName: 'Claude Sonnet 4.6',
displayBadge: 'claude-sonnet', displayBadge: 'claude-sonnet',
@ -130,7 +127,7 @@ const PREVIEW_MODELS: Record<Plan, ModelChoice> = {
enterprise: { enterprise: {
provider: 'anthropic', provider: 'anthropic',
model: 'claude-sonnet-4-6', model: 'claude-sonnet-4-6',
maxTokens: 8192, maxTokens: 12288,
timeoutMs: 95_000, timeoutMs: 95_000,
displayName: 'Claude Sonnet 4.6', displayName: 'Claude Sonnet 4.6',
displayBadge: 'claude-sonnet', displayBadge: 'claude-sonnet',