Infrastructure Implementation Tasks/Plan

Related documentation: Kisum System · Backend implementation · Architecture documentation

Detailed Infra Tasks (Redis / DB / Services / Domains / Networking / Security / Deployment / Operations)

Audience: DevOps engineers, platform engineers, backend leads, security engineers, SRE/operations, architecture leads
Status: execution-grade infrastructure specification
Goal: define, in practical detail, how to deploy, secure, connect, observe, scale, and operate the full Kisum platform infrastructure for:

Auth Backend
Platform Core Backend
Platform Admin Backend
Basic/Core Backend
Finance Backend
Market Backend
Touring Backend
Venue Backend
AI Backend
PostgreSQL databases
Redis
domains / DNS / TLS
networking / ingress / egress
secrets / credentials
backups / restore
logging / metrics / tracing
CI/CD / release strategy
incidents / runbooks / disaster recovery

0. How to read this document

This document is intentionally long and explicit.

It is not a product brief.
It is not a high-level architecture note.
It is not a slide deck.

It is meant to answer questions like:

where exactly should each service run?
what domains should point where?
what databases exist?
which services can talk to which others?
where does Redis live?
what must be private vs public?
how should environment variables be managed?
how should certificates be handled?
what gets backed up?
how do we deploy without breaking production?
what happens when Redis is down?
how do we rotate JWT signing keys safely?
how do we restore auth if the DB is corrupted?
what monitoring and alerts are mandatory before go-live?

Where there is still an application-level TBD, this document states that clearly rather than inventing fake certainty.

1. Infra objectives

The infrastructure must support the agreed architecture:

Auth = identity, sessions, memberships, roles, permissions, delegation, access aggregation
Platform Core = packages, add-ons, modules, company entitlements
Business Backends = business data and business logic
Frontend = consumes backend APIs; does not own security truth
Redis = performance layer, never source of truth
PostgreSQL = source of truth for relational system state

The infrastructure must also support these business/technical rules:

JWT remains small and identity-only.
Access is recomputed from Auth + Core data, not embedded in JWT.
Core App is only visible when Basic subscription is active.
Add-on modules may be active even if Basic is inactive.
Business backends must validate and enforce access consistently.
Platform Core is internal for entitlements and should not become a public authorization engine.
Frontend must call Auth for access bootstrap, not Core directly.

2. Top-level infrastructure inventory

The platform should be thought of as these deployable groups:

2.1 Public-facing services

These accept browser/client traffic from the public internet.

Frontend App
Platform Admin Frontend
Auth Backend
Platform Admin Backend
Basic Backend
Finance Backend
Market Backend
Touring Backend
Venue Backend
AI Backend

2.2 Internal-only services

These should not be directly callable by browsers/public clients unless there is a very specific decision to expose them.

Platform Core Backend
internal supporting jobs/workers, if introduced
optional internal cache warming jobs
optional scheduled reconciliation jobs

2.3 Data services

These hold or accelerate platform state.

PostgreSQL for Auth
PostgreSQL for Platform Core
MongoDB for Main/Basic (current)
PostgreSQL for Finance
MongoDB for Market/Touring (current)
Venue DB (TBD)
AI DB / storage (TBD)
Redis

2.4 Platform services

These support routing, delivery, and operations.

DNS provider
TLS certificate management
reverse proxy / ingress
CI/CD
secret manager or secret injection mechanism
logging sink
metrics backend
alerting system
backup jobs
restore tooling

3. Recommended environment layout

At minimum, infrastructure should support:

local
dev
staging
production

Optional:

preview
qa
disaster-recovery / warm standby

3.1 Local

Purpose:

individual development
integration tests with minimal dependencies

Should support:

local Auth
local Core
optional local Redis
local PostgreSQL
mocked or dev versions of other backends

3.2 Dev

Purpose:

shared development environment
integration testing by multiple engineers

Should support:

all services deployable
test domains
real Redis
real Postgres instances
non-production secrets
isolated data

3.3 Staging

Purpose:

production-like validation
release candidate testing
smoke tests
load testing before prod

Should support:

same routing pattern as prod
same TLS pattern as prod
same secret layout pattern as prod
realistic DB/Redis services
monitoring and alerts enabled

3.4 Production

Purpose:

customer traffic
platform admin traffic
real business operations

Must have:

secure TLS
monitored DB/Redis
backups
restore runbooks
incident alerting
environment separation from staging
rollout and rollback plan

4. Domain and DNS plan

4.1 Recommended domains

Public domains

auth.kisum.io → Auth Backend
app.kisum.io → main frontend / Core App shell / module routes
admin.kisum.io → platform admin frontend
api-v2.kisum.dev (or production equivalent under kisum.io if desired) → Basic Backend
api-v2-finance.kisum.dev → Finance Backend
api-v2-market.kisum.dev → Market Backend
api-v2-touring.kisum.dev → Touring Backend (if separate)
api.kisum.dev/venue or dedicated subdomain → Venue Backend
api-v2-ai.kisum.dev → AI Backend

Internal domains

core.kisum.io or platform-core.kisum.io → Platform Core Backend
optional internal admin-to-core/internal-to-auth hostnames depending on network design

4.2 Domain strategy notes

Option A — keep current mixed domain pattern temporarily

This reflects your current system reality and reduces migration friction.

Option B — normalize all production domains later

Examples:

api-basic.kisum.io
api-finance.kisum.io
api-market.kisum.io

This can be cleaner long term, but not required immediately.

4.3 DNS records

For each public hostname, define:

A / AAAA or CNAME depending on hosting model
TTL appropriate for your infra
separate staging/dev DNS namespace
no shared records between prod and non-prod

Examples:

auth.kisum.io
auth-staging.kisum.io
core-staging.kisum.io
app-staging.kisum.io

4.4 DNS operational tasks

document all DNS records in infra repo
avoid manual undocumented DNS edits
define ownership for DNS changes
use lower TTL before cutovers or migrations
record certificate dependencies for each hostname

5. TLS / certificates

5.1 Every public domain must use TLS

Mandatory for:

frontend
auth
admin
all public APIs

5.2 Internal-only services

Recommended:

still use TLS if routed through internal ingress/service mesh
otherwise restrict to private networking and authenticated service calls

5.3 Certificate management tasks

choose cert strategy:
- managed certs via cloud LB
- or Let’s Encrypt via ingress / reverse proxy
ensure auto-renewal
alert before expiration
test renewals in staging

5.4 HSTS and secure headers

For frontend and auth domains, configure:

HSTS
X-Content-Type-Options
X-Frame-Options or CSP equivalent as appropriate
Referrer-Policy
secure cookie policy if cookies are used anywhere
CSP for frontends if feasible

6. Networking model

6.1 Public vs private services

Public

Auth Backend
Frontend(s)
Platform Admin Backend
Basic Backend
Module backends

Private/internal

Platform Core Backend
databases
Redis
internal job runners

6.2 Principle of least network exposure

Default rule:

if a service does not need public internet traffic, do not expose it publicly

Platform Core is the most important example:

it should be reachable by Auth and Admin backend
it should not be a public browser API

6.3 Network segmentation

Recommended segmentation:

ingress/public subnet or layer
app/services private subnet or network
data subnet or private data layer

At minimum:

public reverse proxy or LB
private app-to-app connectivity
private DB and Redis connectivity

6.4 Allowed traffic matrix

Auth Backend

Can talk to:

Auth DB
Redis
Platform Core Backend
optional email provider (SES etc.)
optional internal admin/backend tools

Should not need direct access to:

module DBs
module business backends for normal runtime

Platform Core Backend

Can talk to:

Platform Core DB
optional billing provider APIs
optional internal messaging bus
optional Redis if used for local acceleration

Should not need direct access to:

Auth DB
module DBs

Platform Admin Backend

Can talk to:

Auth Backend
Platform Core Backend
optional admin DB/config DB

Basic Backend

Can talk to:

Main DB
Auth/JWKS or Auth internal access route
optional Redis if using shared cache path

Finance Backend

Can talk to:

Finance DB
Auth/JWKS or Auth internal access route

Market/Touring Backend

Can talk to:

Market/Touring DB
Auth/JWKS or Auth internal access route

Venue Backend

Can talk to:

Venue DB
Auth/JWKS or Auth internal access route

AI Backend

Can talk to:

AI DB/storage
Auth/JWKS or Auth internal access route
optional model providers

Databases and Redis

Should never accept public internet traffic directly.

7. Service hosting and deployment model

This section does not force one provider, but it defines what the hosting model must support.

7.1 Supported hosting patterns

Pattern A — VM-based deployment

Services run in containers or processes on VMs. Good when:

you want direct control
you already use VPS infrastructure
you want simpler networking and lower moving parts early

Pattern B — container platform / orchestrator

Services run in managed containers / Kubernetes / Nomad / etc. Good when:

you want stronger scaling and orchestration
you have DevOps maturity for it

Pattern C — mixed model

frontends on managed frontend hosting/CDN
APIs on containers/VMs
DB/Redis managed This is often the most practical for a growing product.

7.2 Recommended pragmatic layout

Given your current environment history and the service split, a pragmatic production layout is:

Frontends on managed static/app hosting or dedicated app servers behind CDN
Auth, Core, Admin, and module backends in containers on private app hosts or managed app platform
PostgreSQL managed where possible
Redis managed where possible
reverse proxy or load balancer in front

This gives:

cleaner deployments
safer DB operations
less ops burden on the most critical stateful services

7.3 Deployment units

Each of these should be independently deployable:

auth-backend
platform-core-backend
platform-admin-backend
basic-backend
finance-backend
market-backend
touring-backend (if separate)
venue-backend
ai-backend

Each needs:

its own config
its own runtime health checks
its own logs
its own release history

8. Database plan

8.1 Auth DB

Type: PostgreSQL
Name: auth_db

Stores:

users
sessions
company memberships
module grants
permissions
delegation
token/versioning-related state
audit/security records if stored in DB

Infra tasks

create managed or dedicated PostgreSQL instance/database
enforce TLS for DB connections where supported
create app role/user with least privilege
apply migrations through CI/CD
enable PITR if provider supports it
define backup schedule
test restore

Sizing considerations

Hot-path DB. Expect frequent reads for:

user/session checks
membership checks
access aggregation Therefore:
proper indexes are mandatory
pooling must be tuned
read patterns must be measured

8.2 Platform Core DB

Type: PostgreSQL
Name: platform_core_db

Stores:

modules
packages
add-ons
package/add-on mappings
company subscriptions
company add-ons
entitlement versioning
entitlement history optionally

Infra tasks

create separate DB or at minimum separate schema
migrations managed independently from Auth
backup/restore independently
secure private connectivity only

8.3 Main / Basic DB

Type: current main business DB (MongoDB Main per current documentation)

Stores:

core/basic business data

Infra tasks

document exact DB cluster/instance
confirm connection policy
confirm backup policy
define read/write credentials for Basic Backend only
review indexes for core business routes

8.4 Finance DB

Type: PostgreSQL Finance

Infra tasks

verify production DB sizing
verify migrations
verify backup and restore policy
ensure Finance backend has only required role access

8.5 Market/Touring DB

Type: MongoDB Market/Touring (current direction)

Infra tasks

confirm whether shared cluster or DB namespace
document collections and ownership
backup schedule
restore test
security policy for shared Market/Touring service

8.6 Venue DB

Type: TBD

Infra tasks

finalize DB engine
define owner service
define backup plan
define access credentials

8.7 AI DB / storage

Type: TBD

May include:

relational DB
document store
object storage
vector storage depending on AI design

Infra tasks

finalize data types
define retention rules
define cost controls
define model artifact storage if any

9. PostgreSQL operational tasks

These apply to Auth and Platform Core and Finance where relevant.

9.1 Roles and privileges

For each application DB:

create separate DB user per service
no superuser for application runtime
migrations may use a stronger controlled role
app user should only access its own schema/tables

9.2 Connection pooling

Tasks:

configure max connections per service
decide whether pooling is done:
- in-app
- via PgBouncer
- or managed DB proxy
prevent connection exhaustion during spikes

9.3 Migrations

Tasks:

choose migration tool
maintain migration repo/folder
run in CI/CD before app rollout when appropriate
protect prod with migration review
support rollback strategy for reversible migrations
define non-reversible migration warnings

9.4 Backups

Tasks:

daily full backups at minimum
WAL/PITR if supported
encrypted storage for backups
retention policy
restore test at least periodically

9.5 Monitoring

Track:

CPU
memory
storage
connection count
slow queries
replication lag if any
lock contention
migration failures

10. MongoDB operational tasks

These apply to Main and Market/Touring if still on Mongo.

10.1 Cluster/config review

Tasks:

document version
document replica set/sharding status
confirm backup method
confirm auth enabled
confirm TLS enabled if supported/used
confirm private network only

10.2 Access control

Tasks:

separate DB users per service
least privilege roles
no shared root creds in app runtime

10.3 Backup and restore

Tasks:

automated backups
retention policy
restore drill
document RPO/RTO expectations

10.4 Performance

Tasks:

review indexes on hot collections
review unbounded collection growth
review heavy aggregation/query paths
monitor oplog/replica health if applicable

11. Redis plan

Redis is critical but is not source of truth.

11.1 Redis responsibilities

Redis may be used for:

access-context cache
session hot cache
revocation markers
login rate limiting
refresh throttling
password-reset throttling
company resolution cache
optional internal access snapshot cache
any other hot-path cache that preserves correctness when lost

11.2 Redis deployment recommendations

Preferred:

managed Redis or highly available Redis service Alternative:
self-managed Redis with persistence and restart strategy

11.3 Redis must not become truth

Mandatory rule:

if Redis is empty, wrong, or unavailable, the system must still be able to rebuild truth from DBs
correctness must not depend on Redis being the sole store of critical state

11.4 Redis key design

Access context cache

access:{companyId}:{membershipId}:{accessVersion}:{entitlementVersion}

Session cache

session:{sessionId}

Revocation marker

revoked:session:{sessionId}

ratelimit:login:ip:{ip}
ratelimit:login:email:{email}

Company resolution cache

company-map:{raw-x-org}

11.5 TTL policy

Recommended starting points:

access cache: 5–15 minutes
session hot cache: aligned to short token/session needs
revocation marker: at least until all access tokens tied to that session are expired
rate limit keys: according to endpoint policy

11.6 Redis failure behavior

Define expected behavior if Redis is unavailable:

Auth

login may temporarily lose rate limiting if no fallback exists
session/access lookups should fall back to DB
service should degrade, not catastrophically fail, unless a specific route is designed otherwise

Backends

access context should be refetched from Auth/DB path
performance degrades but correctness remains

11.7 Redis monitoring

Track:

memory usage
CPU
evictions
latency
keyspace misses/hits
connection count
persistence failures if persistence enabled

12. Secrets and credential management

12.1 Secret categories

Infra must manage these separately:

JWT private signing keys
JWT public key metadata/JWKS config
DB credentials per service
Redis credentials
internal API keys / service tokens
billing provider keys
email provider keys
AI provider keys
admin bootstrap credentials if any
TLS/cert-related secrets where applicable

12.2 Rules

never hardcode secrets in repo
never share one DB password across all services
rotate secrets on schedule or after incident
use environment injection or secret manager
restrict who can read prod secrets

12.3 JWT key management

Tasks:

generate production RSA keypair
keep private key in secure secret store
expose public key via JWKS
support key rotation using kid
document rotation runbook
ensure old public keys remain available until old tokens expire

12.4 Internal service auth secrets

Tasks:

define how Auth ↔ Core and Admin ↔ Core authenticate internally
options:
- internal API key
- mTLS
- signed service tokens
- private allowlist + shared auth
choose one and document rollout

13. Environment variables and config management

13.1 General principle

Each service should have:

typed config
required vs optional env validation
startup failure if critical env missing
no hidden config defaults for security-sensitive values

13.2 Auth config categories

Examples:

AUTH_DB_URL
REDIS_URL
JWT_ISSUER
JWT_AUDIENCE
JWT_PRIVATE_KEY
JWT_PUBLIC_KID
ACCESS_TOKEN_TTL
REFRESH_TOKEN_TTL
AUTH_INTERNAL_API_KEY for Auth machine routes
CORE_INTERNAL_API_KEY for Core internal routes
email provider config
rate limit settings

13.3 Platform Core config categories

Examples:

CORE_DB_URL
internal auth mechanism config
billing provider config if introduced
feature toggles for self-service billing

13.4 Module backend config categories

Each backend needs:

its DB URL
Auth/JWKS config
service name
x-org policy config if applicable
optional Redis config if used
logging/metrics config

13.5 Frontend config categories

Even though this is infra, document that frontend deployments need:

auth base URL
backend base URLs per service
admin base URL
environment flags
public domain config

14. CI/CD tasks

14.1 Repo pipeline requirements

For each service:

lint
type/build step
test step
security/dependency scan if possible
artifact/container build
deployment step
post-deploy health check

14.2 Migration handling

For services with DB migrations:

migrations should run in controlled step
migration success must be checked before full rollout
rollback policy documented

14.3 Deployment strategy

Recommended:

rolling deploy or blue/green/canary where supported
no big-bang deploy for auth-critical changes
stage auth and core carefully before frontend consuming new contracts

14.4 Release order

Preferred order:

Core DB changes
Core backend changes
Auth DB changes
Auth backend changes
Module backend changes
frontend changes

Exact order may vary by compatibility, but do not deploy frontend assuming contracts that backend has not shipped yet.

14.5 Rollback strategy

Each deployment must define:

what can be rolled back immediately
what DB migrations make rollback harder
what feature flags can disable broken behavior quickly

15. Service startup and health checks

15.1 Every service should expose

/health
/ready

15.2 Liveness

Liveness only answers:

process is alive
event loop/runtime not deadlocked

15.3 Readiness

Readiness should answer:

service can handle traffic now
critical dependencies reachable

Examples:

Auth readiness checks

Auth DB reachable
Redis reachable if configured as required dependency
signing keys loaded
optional Core connectivity if route requires it? Usually not hard dependency for process readiness, but should be separately monitored

Core readiness checks

Core DB reachable
internal auth config loaded if needed

Module backend readiness

own DB reachable
Auth/JWKS config loaded
optional Redis reachable if required

16. Internal service-to-service authentication

16.1 Problem

Auth must call Core.
Admin backend must call Auth and Core.
Module backends may call Auth internal context APIs.

These are not browser calls and should not rely on public user JWT flows.

16.2 Options

Option A — Internal API key

Simplest early-stage pattern. Pros:

easy
fast to implement Cons:
weaker than mTLS/service identity
rotation discipline needed

Option B — Signed service token

Pros:

stronger separation
auditable identity Cons:
more implementation complexity

Option C — mTLS/service mesh identity

Pros:

strongest model Cons:
most operational complexity

16.3 Recommendation

Start with:

private networking
internal API key or signed internal service token
explicit allowlist on internal services

Then evolve if needed.

16.4 Tasks

choose internal auth mechanism
document headers/validation
ensure Core rejects public unauthorized traffic
ensure internal routes are not accidentally exposed

17. Ingress / reverse proxy / gateway tasks

17.1 Public ingress responsibilities

TLS termination
host-based routing
request size limits where needed
timeout configuration
rate limiting where appropriate
forwarding headers correctly
access logs

17.2 Route ownership

Examples:

auth.kisum.io/* → Auth
admin.kisum.io/* → Admin frontend
app.kisum.io/* → main frontend
api-v2-finance.kisum.dev/* → Finance backend
etc.

17.3 Proxy headers

Ensure backends can trust and parse:

X-Forwarded-For
X-Forwarded-Proto
host headers if sitting behind reverse proxy/LB

17.4 Timeouts

Define:

connect timeout
read timeout
idle timeout
upstream timeout per service type AI endpoints may need longer policies than auth.

18. Security hardening tasks

18.1 Public exposure review

For every service ask:

must this be public?
must this route be public?
should this be private/internal only?

18.2 Auth hardening

Tasks:

enforce strict issuer/audience verification
rate limit login
rotate refresh tokens
store refresh tokens hashed
revoke sessions cleanly
audit auth-sensitive actions

18.3 Core hardening

Tasks:

keep internal-only if possible
validate internal auth on all internal routes
audit entitlement changes
protect product catalog changes

18.4 Module backend hardening

Tasks:

never trust frontend visibility
always validate JWT
always resolve access context
never assume one module implies another unless explicitly designed

18.5 Admin hardening

Tasks:

strong RBAC for platform admins
audit every subscription/catalog change
protect high-risk actions with confirmation UX/server checks
consider step-up auth later for highest-risk actions if needed

19. Logging tasks

19.1 Structured logging standard

Every service should log in structured form with:

timestamp
service name
environment
request ID
route
user/session/company identifiers when safe
outcome status
error code/message

19.2 Must-log events

Auth

login success/failure
refresh success/failure
logout
logout-all
session revoked
access aggregation failure
access denied due to inactive/revoked state

Core

subscription changes
add-on changes
catalog changes
entitlement version bumps

Business backends

auth verification failure
missing/invalid x-org
access denied by module
access denied by permission

Admin backend

product/catalog create/update/delete
organization approval/rejection
company subscription changes

20. Metrics and monitoring tasks

20.1 Service-level metrics

Per service:

request count
latency
error rate
4xx/5xx rate
CPU
memory
restart count

20.2 Auth-specific metrics

login success rate
login failure rate
refresh success rate
revoked-session usage
access cache hit/miss
access merge latency
Core entitlement lookup latency

20.3 Core-specific metrics

entitlement lookup latency
catalog mutation count
entitlement version bump count
subscription/add-on change rate

20.4 Backend-specific metrics

access denied by module
access denied by permission
x-org resolution failures
internal access-context fetch latency

20.5 Alerting

Create alerts for:

service down
readiness failing
DB unavailable
Redis unavailable
elevated 5xx
elevated login failures
unusual auth denial spikes
slow entitlement/access aggregation
certificate expiry nearing
backup failures

21. Tracing tasks

21.1 Distributed tracing scope

Recommended for:

frontend request → auth → core
frontend request → backend → auth access context
admin change → core/auth → cache invalidation

21.2 Trace propagation

Use request IDs and tracing headers consistently across:

auth
core
admin backend
module backends

21.3 Why it matters

This is especially useful for debugging:

“why did user lose access?”
“why did module show but API returned 403?”
“why did new subscription not appear immediately?”

22. Backup and restore plan

22.1 Databases

Every stateful DB must have:

automated backups
defined retention
restore testing
documented RPO/RTO expectations

22.2 Redis

Redis backup is optional depending on role, but:

if persistence enabled, define retention and recovery
if pure cache, losing it should only impact performance

22.3 Restore drills

At least periodically test:

Auth DB restore
Core DB restore
Finance DB restore
Mongo restore for Main / Market/Touring

22.4 Restore order in major incident

Likely restore priority:

Auth DB
Core DB
Redis (or allow rebuild)
Basic/Main DB
Finance DB
other module DBs

Why:

without Auth and Core, platform access model is unusable

23. Disaster recovery and business continuity

23.1 Define RPO / RTO

Per critical service define:

RPO (acceptable data loss)
RTO (acceptable downtime)

Suggested criticality

Auth: highest
Core: highest
Finance: high
Basic/Main: high
Market/Touring: medium-high
Venue/AI: medium depending on business usage

23.2 DR tasks

decide if warm standby required for Auth/Core DBs
document failover strategy
document how DNS/LB routing would switch if region/service fails
ensure secrets available in DR environment
ensure restore scripts are not tribal knowledge

24. Capacity planning tasks

24.1 Initial capacity questions

Before prod launch estimate:

daily active users
concurrent auth requests
access bootstrap frequency
module backend QPS
admin change frequency
report-heavy workloads
AI load patterns

24.2 Auth capacity

Auth will be on the hot path for:

login
refresh
session restore
access aggregation
backend access checks (depending on chosen pattern)

So:

scale Auth horizontally if needed
ensure DB and Redis can support it
cache wisely without hiding truth in cache

24.3 Core capacity

Core traffic volume should be lower than Auth, but:

entitlement reads will happen often through Auth
writes are lower but highly important

24.4 Backend capacity

Module backends scale based on business traffic. Finance and AI may have very different resource profiles:

Finance: transactional/reporting
AI: potentially heavy CPU/network calls

25. Deployment sequencing tasks

25.1 Initial rollout sequence

Recommended order:

provision Auth DB + Core DB + Redis
deploy Auth
deploy Core
validate internal Auth ↔ Core
deploy Business backends with new auth validation path
deploy Admin backend
deploy frontends
run staged smoke tests

25.2 Incremental rollout

For existing systems:

enable compatibility mode where needed
cut over one backend at a time if necessary
verify module access after each backend migration

25.3 Smoke tests after each deploy

At minimum test:

Auth health/readiness
login
/auth/me
/auth/me/access
Core entitlement read
one backend permission-allowed request
one backend permission-denied request
admin subscription change invalidates access

26. Runbooks

26.1 “User cannot access module” runbook

Check:

is Auth healthy?
is Core healthy?
is JWT valid?
is session revoked?
is x-org correct?
does company own module in Core?
does membership have module grant?
does access cache need invalidation?
does backend enforce wrong permission key?

Check:

Auth DB health
Redis/rate limit behavior
signing key/config changes
DNS/LB issues
client deployment issues

26.3 “Subscription changed but UI still old” runbook

Check:

Core wrote entitlement change successfully?
entitlement version bumped?
invalidation signal sent?
Auth cache cleared?
frontend refetched /auth/me/access?
backend still using stale access snapshot?

26.4 “Redis down” runbook

Expected behavior:

performance degrades
truth still rebuilds from DB Tasks:
verify services fail open/closed correctly per route
restore Redis
confirm access cache repopulates

26.5 “JWT verification failing in backends” runbook

Check:

JWKS reachable
kid known
issuer/audience config correct
clock skew/time sync issues
key rotation event incomplete

27. Security review checklist before production

28. Infra checklist by component

28.1 Auth infra checklist

28.2 Core infra checklist

28.3 Basic backend infra checklist

28.4 Finance backend infra checklist

public API routed
finance DB connected
auth validation configured
logs/metrics enabled

28.5 Market/Touring infra checklist

28.6 Venue infra checklist

domain/path configured
DB finalized
auth validation configured
logs/metrics enabled

28.7 AI infra checklist

28.8 Redis infra checklist

private access only
persistence decision documented
memory/TTL policy set
monitoring enabled

28.9 DNS/TLS checklist

29. Recommended infra deliverables after this document

After this phase, the next concrete infra artifacts should be:

Environment matrix
- local/dev/staging/prod values and owners
Service inventory sheet
- each service, domain, runtime, repo, owner, DB, secrets, alerts
Network matrix
- who can call whom
Secrets inventory
- all required secrets by environment
DB backup/restore runbook
- step-by-step recovery
Deployment runbook
- release, smoke tests, rollback
Incident playbook
- auth outage
- core outage
- redis outage
- db outage
- cert expiry
- stale access issue

30. Final summary

Infrastructure must guarantee these truths:

Auth is the identity and access truth.
Core is the commercial entitlement truth.
Redis is only acceleration.
PostgreSQL and service-owned DBs are the real source of truth.
Backends enforce access consistently.
Private services stay private.
Backups and restore are mandatory.
Monitoring and alerting are not optional.

And the most important infra rule of all:

The system must remain correct even when cache is cold, stale, or unavailable.