Security Hardening

Production Security Checklist

Before going live, verify every item:

Secrets & Encryption

[ ] All secrets generated with openssl rand -hex 32 (not default values)
[ ] .env.prod file permissions set to 600 (owner-only)
[ ] .env.prod is in .gitignore and never committed
[ ] JWT_SECRET is at least 32 characters
[ ] Separate encryption keys for APP_ENCRYPTION_KEY, MFA_ENCRYPTION_KEY
[ ] AGENT_ENROLLMENT_SECRET rotated after initial enrollment batch

Network

[ ] Only ports 80/443 (and optionally 3478 for TURN) exposed publicly
[ ] PostgreSQL bound to 127.0.0.1 (not 0.0.0.0)
[ ] Redis bound to 127.0.0.1 (not 0.0.0.0)
[ ] Redis password authentication enabled via REDIS_PASSWORD (set in docker-compose and included in REDIS_URL)
[ ] Grafana/Prometheus accessible only via localhost or VPN
[ ] SSH key-only authentication (no password auth)
[ ] UFW or iptables configured

TLS

[ ] Caddy auto-TLS configured with valid domain and ACME email
[ ] HSTS header enabled with includeSubDomains; preload
[ ] No self-signed certificates in production

Container Security

[ ] no-new-privileges: true on all containers (default in prod compose)
[ ] cap_drop: ALL on all containers
[ ] API and Web containers run with read_only: true rootfs
[ ] Resource limits (cpus, mem_limit, pids_limit) set
[ ] Non-root container users (UID 1001)

Authentication

[ ] MFA (TOTP) enabled for all admin accounts
[ ] Roles that should require MFA have Force MFA turned on. Users in a force-MFA role get a 428 Precondition Required response until they enroll a TOTP device; the dashboard then redirects them through a forced-enrollment page before any other workflow becomes available.
[ ] Registration disabled in production (ENABLE_REGISTRATION=false) after initial setup
[ ] Rate limiting active on login endpoints
[ ] Session timeout configured (SESSION_MAX_AGE)
[ ] Session revocation is fail-closed — revoked sessions stay revoked even if Redis is unavailable
[ ] Refresh tokens use family-based reuse detection — replaying a previously rotated refresh token immediately revokes every other token in that family, log out included.

Agent Security

[ ] Agent tokens stored as SHA-256 hashes (automatic for new enrollments)
[ ] Agent token rotation tested (POST /agents/:id/rotate-token) — both old and new tokens are valid for a 5-minute grace period, and the agent picks up the new token on its next heartbeat with no downtime
[ ] Config file permissions: 0750 for /etc/breeze/, 0640 for agent.yaml, 0600 for secrets.yaml
[ ] Agent rate limiting enabled (120 req/60s per agent via Redis)
[ ] Enrollment keys set with expiry and usage limits
[ ] Cross-tenant probe detection enabled — if an agent token is used to access a device in another tenant, the token is automatically suspended and re-enrollment is blocked until an admin reviews the device.
[ ] Source-IP tracking active — every heartbeat records the agent’s source IP, and an agent.source.ip.changed audit event fires when it shifts, surfacing token theft or NAT changes.
[ ] Consider enabling Cloudflare mTLS for zero-trust agent auth

Outbound Request Safety (SSRF)

[ ] Outbound integrations (webhooks, DNS providers, SSO discovery) flow through the platform’s SSRF guard, which blocks private/loopback ranges and cloud metadata hostnames unless an explicit allowlist entry permits them.
[ ] partners.settings and sites.settings columns are AES-256-GCM encrypted at rest — secrets stored here (provider credentials, integration tokens) never leave the database in plaintext.

Monitoring

[ ] Prometheus metrics endpoint protected with bearer token
[ ] Alert rules configured for error rates and infrastructure
[ ] Audit logging enabled (automatic for all mutating operations)
[ ] Log aggregation configured (Loki)

Firewall Configuration

# UFW example
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow ssh
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
# Only if using TURN for WebRTC:
# sudo ufw allow 3478/tcp
# sudo ufw allow 3478/udp
sudo ufw enable

Audit Logging

All mutating operations are automatically logged with:

| Field | Description | |---|---| | actorType | user, api_key, agent, or system | | actorId | User ID or device ID | | action | Operation performed | | resource | Target resource type | | resourceId | Target resource ID | | details | JSON payload of changes | | ipAddress | Client IP address | | timestamp | ISO 8601 timestamp | | checksum | SHA-256 of the canonical row payload | | prev_checksum | Checksum of the previous row in this organization’s chain |

Tamper evidence

The audit_log table is append-only at the database level. Database triggers refuse UPDATE, DELETE, and TRUNCATE operations against audit rows — not even a superuser can quietly edit history. Each row also carries a prev_checksum that links to the previous audit row in the same organization, producing a per-org SHA-256 hash chain. Verifying the chain end-to-end detects any insertion, deletion, or alteration between two timestamps.

Retention pruning is the one legitimate path that removes audit rows. It requires both the breeze_audit_admin Postgres role and the breeze.allow_audit_retention='1' session GUC; pruning re-anchors the chain on the surviving rows so the integrity check still passes after old data ages out. Both controls are managed by the platform’s audit retention worker — operators do not run pruning by hand.

Query audit logs via the API:

curl -H "Authorization: Bearer $TOKEN" \
  "https://breeze.yourdomain.com/api/v1/audit?resource=devices&action=delete"

Rate Limiting

Breeze implements Redis-backed sliding window rate limiting:

| Endpoint | Limit | Window | |---|---|---| | Login | 5 attempts | 5 minutes | | API (per user) | 100 requests | 60 seconds | | Agent (per device) | 120 requests | 60 seconds | | Agent (per organization) | 600 requests | 60 seconds | | Enrollment | 10 attempts | 60 seconds |

The per-organization agent limit is configurable via AGENT_ORG_RATE_LIMIT_PER_MIN and caps total fleet traffic for any single tenant. When exceeded, the API returns 429 with Retry-After: 60; agents respect this header and back off automatically.