Security Hardening
Production Security Checklist
Section titled “Production Security Checklist”Before going live, verify every item:
Secrets & Encryption
Section titled “Secrets & Encryption”- [ ] All secrets generated with
openssl rand -hex 32(not default values) - [ ]
.env.prodfile permissions set to600(owner-only) - [ ]
.env.prodis in.gitignoreand never committed - [ ]
JWT_SECRETis at least 32 characters - [ ] Separate encryption keys for
APP_ENCRYPTION_KEY,MFA_ENCRYPTION_KEY - [ ]
AGENT_ENROLLMENT_SECRETrotated after initial enrollment batch
Network
Section titled “Network”- [ ] Only ports 80/443 (and optionally 3478 for TURN) exposed publicly
- [ ] PostgreSQL bound to
127.0.0.1(not0.0.0.0) - [ ] Redis bound to
127.0.0.1(not0.0.0.0) - [ ] Redis password authentication enabled via
REDIS_PASSWORD(set in docker-compose and included inREDIS_URL) - [ ] Grafana/Prometheus accessible only via localhost or VPN
- [ ] SSH key-only authentication (no password auth)
- [ ] UFW or iptables configured
- [ ] Caddy auto-TLS configured with valid domain and ACME email
- [ ] HSTS header enabled with
includeSubDomains; preload - [ ] No self-signed certificates in production
Container Security
Section titled “Container Security”- [ ]
no-new-privileges: trueon all containers (default in prod compose) - [ ]
cap_drop: ALLon all containers - [ ] API and Web containers run with
read_only: truerootfs - [ ] Resource limits (
cpus,mem_limit,pids_limit) set - [ ] Non-root container users (UID 1001)
Authentication
Section titled “Authentication”- [ ] MFA (TOTP) enabled for all admin accounts
- [ ] Roles that should require MFA have Force MFA turned on. Users in a force-MFA role get a
428 Precondition Requiredresponse until they enroll a TOTP device; the dashboard then redirects them through a forced-enrollment page before any other workflow becomes available. - [ ] Registration disabled in production (
ENABLE_REGISTRATION=false) after initial setup - [ ] Rate limiting active on login endpoints
- [ ] Session timeout configured (
SESSION_MAX_AGE) - [ ] Session revocation is fail-closed — revoked sessions stay revoked even if Redis is unavailable
- [ ] Refresh tokens use family-based reuse detection — replaying a previously rotated refresh token immediately revokes every other token in that family, log out included.
Agent Security
Section titled “Agent Security”- [ ] Agent tokens stored as SHA-256 hashes (automatic for new enrollments)
- [ ] Agent token rotation tested (
POST /agents/:id/rotate-token) — both old and new tokens are valid for a 5-minute grace period, and the agent picks up the new token on its next heartbeat with no downtime - [ ] Config file permissions:
0750for/etc/breeze/,0640foragent.yaml,0600forsecrets.yaml - [ ] Agent rate limiting enabled (120 req/60s per agent via Redis)
- [ ] Enrollment keys set with expiry and usage limits
- [ ] Cross-tenant probe detection enabled — if an agent token is used to access a device in another tenant, the token is automatically suspended and re-enrollment is blocked until an admin reviews the device.
- [ ] Source-IP tracking active — every heartbeat records the agent’s source IP, and an
agent.source.ip.changedaudit event fires when it shifts, surfacing token theft or NAT changes. - [ ] Consider enabling Cloudflare mTLS for zero-trust agent auth
Outbound Request Safety (SSRF)
Section titled “Outbound Request Safety (SSRF)”- [ ] Outbound integrations (webhooks, DNS providers, SSO discovery) flow through the platform’s SSRF guard, which blocks private/loopback ranges and cloud metadata hostnames unless an explicit allowlist entry permits them.
- [ ]
partners.settingsandsites.settingscolumns are AES-256-GCM encrypted at rest — secrets stored here (provider credentials, integration tokens) never leave the database in plaintext.
Monitoring
Section titled “Monitoring”- [ ] Prometheus metrics endpoint protected with bearer token
- [ ] Alert rules configured for error rates and infrastructure
- [ ] Audit logging enabled (automatic for all mutating operations)
- [ ] Log aggregation configured (Loki)
Firewall Configuration
Section titled “Firewall Configuration”# UFW examplesudo ufw default deny incomingsudo ufw default allow outgoingsudo ufw allow sshsudo ufw allow 80/tcpsudo ufw allow 443/tcp# Only if using TURN for WebRTC:# sudo ufw allow 3478/tcp# sudo ufw allow 3478/udpsudo ufw enableAudit Logging
Section titled “Audit Logging”All mutating operations are automatically logged with:
| Field | Description |
|---|---|
| actorType | user, api_key, agent, or system |
| actorId | User ID or device ID |
| action | Operation performed |
| resource | Target resource type |
| resourceId | Target resource ID |
| details | JSON payload of changes |
| ipAddress | Client IP address |
| timestamp | ISO 8601 timestamp |
| checksum | SHA-256 of the canonical row payload |
| prev_checksum | Checksum of the previous row in this organization’s chain |
Tamper evidence
Section titled “Tamper evidence”The audit_log table is append-only at the database level. Database triggers refuse UPDATE, DELETE, and TRUNCATE operations against audit rows — not even a superuser can quietly edit history. Each row also carries a prev_checksum that links to the previous audit row in the same organization, producing a per-org SHA-256 hash chain. Verifying the chain end-to-end detects any insertion, deletion, or alteration between two timestamps.
Retention pruning is the one legitimate path that removes audit rows. It requires both the breeze_audit_admin Postgres role and the breeze.allow_audit_retention='1' session GUC; pruning re-anchors the chain on the surviving rows so the integrity check still passes after old data ages out. Both controls are managed by the platform’s audit retention worker — operators do not run pruning by hand.
Query audit logs via the API:
curl -H "Authorization: Bearer $TOKEN" \ "https://breeze.yourdomain.com/api/v1/audit?resource=devices&action=delete"Rate Limiting
Section titled “Rate Limiting”Breeze implements Redis-backed sliding window rate limiting:
| Endpoint | Limit | Window | |---|---|---| | Login | 5 attempts | 5 minutes | | API (per user) | 100 requests | 60 seconds | | Agent (per device) | 120 requests | 60 seconds | | Agent (per organization) | 600 requests | 60 seconds | | Enrollment | 10 attempts | 60 seconds |
The per-organization agent limit is configurable via AGENT_ORG_RATE_LIMIT_PER_MIN and caps total fleet traffic for any single tenant. When exceeded, the API returns 429 with Retry-After: 60; agents respect this header and back off automatically.