All checks were successful
Deploy to Production / deploy (push) Successful in 1m10s
Adds /opt/bmm-ops/ scripts (deployed separately from the app, so tar
overlays don't clobber them) for three previously-missing production
readiness items:
1. Backup hardening (backup.sh):
- Previous cron one-liner did pg_dump | gzip with no validation.
- Now: pipefail-safe pg_dump, gunzip -t integrity check, pg_dump
header sanity (scans first 5 lines — line 1 is just "--", actual
"PostgreSQL database dump" comment lands on line 2), size-warning
under 1KB, atomic move-into-place so partial backups never replace
the previous good file. 14-day retention preserved.
- Optional offsite via BMM_BACKUP_REMOTE (rclone). Reads env via
grep+cut, NOT `source` — the .env.production has unquoted text
values (e.g. ADMIN_NAME) that crash a sourced shell.
2. Restore drill (restore-test.sh, Sun 04:30 UTC weekly):
- Restores the newest backup into a throwaway DB inside the same
Postgres container, verifies the core tables exist (users,
sessions, oauth_tokens, mcp_servers), drops the temp DB. Proves
backups are actually restorable, not just byte-streams that look
like backups. Silent-corruption detector.
3. Self-hosted uptime monitor (uptime-check.sh, every 5 min):
- Probes homepage + /api/health + /robots.txt.
- Edge-triggered alerting: SMS via Twilio only on up→down and
down→up transitions (avoids SMS storm during sustained outages).
- Pings HEALTHCHECKS_HEARTBEAT_URL on every success — when the box
itself dies the heartbeat stops and the external watchdog alerts
(covers the gap that self-hosted monitors can't see their own
box failing).
notify.sh is the shared helper: Twilio SMS if all four creds set,
optional webhook to HEALTHCHECKS_FAIL_URL, always logs to syslog. Never
fails loudly — broken notification path still lands in journalctl
-t bmm-ops.
README.md documents the 3-2-1 strategy, manual full-recovery
procedure, and how to enable offsite (R2 / B2 / Hetzner Storage Box).
Smoke-tested all three on prod: backup wrote 8004 bytes with checks
passing, restore-test confirmed schema, uptime probe returned up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
62 lines
2.4 KiB
Bash
62 lines
2.4 KiB
Bash
#!/usr/bin/env bash
|
|
# Shared notification helper for BMM ops scripts.
|
|
#
|
|
# Sends an alert via:
|
|
# - Twilio SMS (if TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN, TWILIO_SMS_FROM,
|
|
# ADMIN_PHONE are all set in /opt/buildmymcpserver/.env.production)
|
|
# - HEALTHCHECKS_FAIL_URL (if set — generic webhook fallback)
|
|
# - syslog (always)
|
|
#
|
|
# Usage: notify.sh "subject" "body"
|
|
#
|
|
# Designed to never fail loudly: if Twilio is misconfigured we still log
|
|
# to syslog so failures aren't silent. Backup/uptime scripts trust this
|
|
# helper to handle their own delivery failures gracefully.
|
|
|
|
set -uo pipefail
|
|
|
|
SUBJECT="${1:-bmm-alert}"
|
|
BODY="${2:-}"
|
|
|
|
# Always syslog — covers the case where notification channels are broken
|
|
logger -t bmm-ops "$SUBJECT: $BODY"
|
|
|
|
# Grep-parse the env file rather than `source`-ing it: the file is managed
|
|
# for Docker compose (KEY=value, often unquoted text values like names),
|
|
# and `source` evaluates unquoted RHS as shell — breaking on any value
|
|
# with whitespace or shell metachars. This pulls only the keys we need.
|
|
ENV_FILE="/opt/buildmymcpserver/.env.production"
|
|
read_env() {
|
|
grep -E "^$1=" "$ENV_FILE" 2>/dev/null | head -1 | cut -d= -f2- | sed 's/^"\(.*\)"$/\1/; s/^'"'"'\(.*\)'"'"'$/\1/'
|
|
}
|
|
if [ -f "$ENV_FILE" ]; then
|
|
TWILIO_ACCOUNT_SID="${TWILIO_ACCOUNT_SID:-$(read_env TWILIO_ACCOUNT_SID)}"
|
|
TWILIO_AUTH_TOKEN="${TWILIO_AUTH_TOKEN:-$(read_env TWILIO_AUTH_TOKEN)}"
|
|
TWILIO_SMS_FROM="${TWILIO_SMS_FROM:-$(read_env TWILIO_SMS_FROM)}"
|
|
ADMIN_PHONE="${ADMIN_PHONE:-$(read_env ADMIN_PHONE)}"
|
|
HEALTHCHECKS_FAIL_URL="${HEALTHCHECKS_FAIL_URL:-$(read_env HEALTHCHECKS_FAIL_URL)}"
|
|
fi
|
|
|
|
# Twilio SMS — only if all four vars set
|
|
if [ -n "${TWILIO_ACCOUNT_SID:-}" ] && \
|
|
[ -n "${TWILIO_AUTH_TOKEN:-}" ] && \
|
|
[ -n "${TWILIO_SMS_FROM:-}" ] && \
|
|
[ -n "${ADMIN_PHONE:-}" ]; then
|
|
curl -sS -o /dev/null --max-time 10 \
|
|
-X POST "https://api.twilio.com/2010-04-01/Accounts/${TWILIO_ACCOUNT_SID}/Messages.json" \
|
|
--data-urlencode "From=${TWILIO_SMS_FROM}" \
|
|
--data-urlencode "To=${ADMIN_PHONE}" \
|
|
--data-urlencode "Body=[BMM] ${SUBJECT}: ${BODY}" \
|
|
-u "${TWILIO_ACCOUNT_SID}:${TWILIO_AUTH_TOKEN}" \
|
|
|| logger -t bmm-ops "twilio-sms-failed: $SUBJECT"
|
|
fi
|
|
|
|
# Generic webhook (for healthchecks.io, BetterStack, etc.) — POST body
|
|
if [ -n "${HEALTHCHECKS_FAIL_URL:-}" ]; then
|
|
curl -fsS -o /dev/null --max-time 10 --retry 2 \
|
|
--data "${SUBJECT}: ${BODY}" "${HEALTHCHECKS_FAIL_URL}" \
|
|
|| logger -t bmm-ops "healthcheck-webhook-failed"
|
|
fi
|
|
|
|
exit 0
|