docs: plan admin auth and api user management

This commit is contained in:
2026-06-06 00:53:08 -06:00
parent fbd4e231ca
commit ea4746f023
@@ -0,0 +1,937 @@
# Hermes Admin Auth, API Users, Usage, and Audit Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Protect the Hermes control plane with admin authentication and provide PostgreSQL-backed per-user API keys, limits, usage attribution, 90-day full-message audit retention, and JSONL downloads across pre-Hermes and post-Hermes APIs.
**Architecture:** Add focused CommonJS modules for configuration, PostgreSQL access, security, API-user persistence, and audit persistence. Keep admin routes in the existing control-plane server, and add a reusable public API gateway process that runs once for pre-Hermes and once for post-Hermes while forwarding to internal-only native Hermes services.
**Tech Stack:** Node.js 20 CommonJS, built-in `http`, `crypto`, and `node:test`; PostgreSQL through `pg`; existing HTML/CSS/vanilla JavaScript UI; Docker Compose; native Hermes `proxy start` and `gateway run`.
---
## File Structure
Create these focused modules:
- `lib/config.cjs`: validates required environment variables and returns typed configuration.
- `lib/db.cjs`: owns the PostgreSQL pool, transactions, and migration runner.
- `lib/security.cjs`: timing-safe admin credential checks, session tokens, API-key generation/hashing, and cookie parsing.
- `lib/admin-store.cjs`: admin-session persistence and validation.
- `lib/api-users-store.cjs`: API-user/key lifecycle, permission checks, limits, and last-used updates.
- `lib/audit-store.cjs`: usage events, full message logs, JSONL streaming queries, and retention cleanup.
- `lib/http.cjs`: bounded request-body reading and OpenAI-style JSON errors shared by both servers.
- `api-gateway.cjs`: public authenticated gateway process configured as either pre or post.
- `migrations/001_admin_api_users.sql`: initial PostgreSQL schema and indexes.
- `login.html`, `login.js`, `login.css`: minimal unauthenticated login surface.
Modify these existing files:
- `server.cjs`: protect the control plane and add admin/API-user endpoints.
- `index.html`, `app.js`, `style.css`: add the approved API-user table and management dialogs.
- `docker-compose.yml`: make native Hermes services internal and run public pre/post gateway services.
- `Dockerfile`: install production dependencies and copy new server modules/assets.
- `.env.example`, `README.md`, `.gitignore`, `.dockerignore`, `package.json`: deployment configuration, scripts, and generated-artifact hygiene.
Tests:
- `test/helpers/db-test.cjs`: isolated PostgreSQL schema setup using `TEST_DATABASE_URL`.
- `test/security.test.cjs`: pure security helper tests.
- `test/admin-auth.integration.test.cjs`: login/session/full-site protection.
- `test/api-users.integration.test.cjs`: API-user and key lifecycle.
- `test/api-gateway.integration.test.cjs`: forwarding, permissions, limits, streaming capture, and failures.
- `test/audit.integration.test.cjs`: JSONL downloads and 90-day cleanup.
### Task 1: Add PostgreSQL Dependency, Configuration, and Migration Runner
**Files:**
- Modify: `package.json`
- Create: `lib/config.cjs`
- Create: `lib/db.cjs`
- Create: `migrations/001_admin_api_users.sql`
- Create: `test/helpers/db-test.cjs`
- Create: `test/db.integration.test.cjs`
- [ ] **Step 1: Add the failing migration integration test**
Create `test/db.integration.test.cjs` with a `node:test` case that:
```js
const test = require("node:test")
const assert = require("node:assert/strict")
const { withTestDatabase } = require("./helpers/db-test.cjs")
const { runMigrations } = require("../lib/db.cjs")
test("runMigrations creates the admin and API-user schema idempotently", async (t) => {
await withTestDatabase(t, async ({ pool }) => {
await runMigrations(pool)
await runMigrations(pool)
const result = await pool.query(`
select table_name from information_schema.tables
where table_schema = current_schema()
order by table_name
`)
const names = result.rows.map((row) => row.table_name)
assert(names.includes("admin_sessions"))
assert(names.includes("api_users"))
assert(names.includes("api_keys"))
assert(names.includes("usage_events"))
assert(names.includes("message_logs"))
})
})
```
`withTestDatabase` must require `TEST_DATABASE_URL`, create a unique schema, set `search_path`, and drop the schema in `t.after()`.
- [ ] **Step 2: Run the test to verify it fails**
Run:
```bash
node --test test/db.integration.test.cjs
```
Expected: FAIL because `lib/db.cjs` and the migration do not exist.
- [ ] **Step 3: Add dependency and minimal database implementation**
Update `package.json`:
```json
"scripts": {
"start": "node server.cjs",
"start:gateway": "node api-gateway.cjs",
"check": "node -c server.cjs && node -c api-gateway.cjs && docker compose --env-file .env.example config",
"test": "node --test test/*.test.cjs"
},
"dependencies": {
"pg": "^8.16.3"
}
```
Implement `lib/config.cjs` with:
```js
function required(name, env = process.env) {
const value = String(env[name] || "").trim()
if (!value) throw new Error(`${name} is required`)
return value
}
function loadDatabaseConfig(env = process.env) {
return { databaseUrl: required("DATABASE_URL", env) }
}
module.exports = { required, loadDatabaseConfig }
```
Implement `lib/db.cjs` using `pg.Pool`, `withTransaction(pool, fn)`, and `runMigrations(pool)` that applies ordered `.sql` files once under a PostgreSQL advisory lock and records them in `schema_migrations`.
Create `migrations/001_admin_api_users.sql` with UUID/text IDs generated by the application and these tables:
```sql
create table admin_sessions (
token_hash text primary key,
created_at timestamptz not null default now(),
expires_at timestamptz not null,
last_seen_at timestamptz not null default now(),
revoked_at timestamptz
);
create table api_users (
id text primary key,
display_name text not null,
status text not null check (status in ('active', 'revoked', 'deleted')),
allow_pre boolean not null default false,
allow_post boolean not null default false,
requests_per_minute integer not null check (requests_per_minute > 0),
monthly_token_limit bigint not null check (monthly_token_limit > 0),
expires_at timestamptz,
created_at timestamptz not null default now(),
updated_at timestamptz not null default now(),
last_used_at timestamptz,
revoked_at timestamptz,
deleted_at timestamptz
);
create table api_keys (
id text primary key,
api_user_id text not null references api_users(id),
key_hash text not null unique,
key_suffix text not null,
created_at timestamptz not null default now(),
revoked_at timestamptz
);
create unique index api_keys_one_active_per_user
on api_keys(api_user_id) where revoked_at is null;
create table usage_events (
id text primary key,
api_user_id text not null,
api_user_name text not null,
api_key_id text not null,
route text not null check (route in ('pre', 'post')),
request_started_at timestamptz not null,
request_completed_at timestamptz,
http_status integer,
model text,
prompt_tokens bigint not null default 0,
completion_tokens bigint not null default 0,
total_tokens bigint not null default 0,
latency_ms bigint,
error_code text,
audit_complete boolean not null default false
);
create index usage_events_user_started_idx on usage_events(api_user_id, request_started_at);
create table message_logs (
usage_event_id text primary key references usage_events(id),
request_json jsonb,
request_text text,
response_json jsonb,
response_text text,
response_content_type text,
streaming boolean not null default false,
partial boolean not null default false,
created_at timestamptz not null default now(),
delete_after timestamptz not null
);
create index message_logs_delete_after_idx on message_logs(delete_after);
```
- [ ] **Step 4: Install dependencies and run the test**
Run:
```bash
npm install
node --test test/db.integration.test.cjs
```
Expected: PASS when `TEST_DATABASE_URL` is set; otherwise the helper reports one explicit SKIP.
- [ ] **Step 5: Commit**
```bash
git add package.json package-lock.json lib/config.cjs lib/db.cjs migrations/001_admin_api_users.sql test/helpers/db-test.cjs test/db.integration.test.cjs
git commit -m "feat: add postgres schema and migration runner"
```
### Task 2: Add Security Primitives and Admin Session Store
**Files:**
- Create: `lib/security.cjs`
- Create: `lib/admin-store.cjs`
- Create: `test/security.test.cjs`
- Create: `test/admin-store.integration.test.cjs`
- [ ] **Step 1: Write failing security tests**
Create pure tests that assert:
```js
const { adminCredentialsMatch, createSessionToken, createApiKey, hashSecret, parseCookies } = require("../lib/security.cjs")
assert.equal(adminCredentialsMatch("admin", "secret-value-123", {
username: "admin", password: "secret-value-123"
}), true)
assert.equal(adminCredentialsMatch("admin", "wrong-value-123", {
username: "admin", password: "secret-value-123"
}), false)
assert.match(createApiKey().plaintext, /^hms_[A-Za-z0-9_-]{40,}$/)
assert.equal(createSessionToken().hash.length, 64)
assert.deepEqual(parseCookies("a=1; hermes_admin=abc"), { a: "1", hermes_admin: "abc" })
```
Create integration tests proving a session can be created, validated, touched, expired, and revoked.
- [ ] **Step 2: Run tests to verify they fail**
```bash
node --test test/security.test.cjs test/admin-store.integration.test.cjs
```
Expected: FAIL because the modules do not exist.
- [ ] **Step 3: Implement security and session persistence**
Implement `lib/security.cjs` using `crypto.randomBytes`, `crypto.createHash("sha256")`, and `crypto.timingSafeEqual`.
Export:
```js
module.exports = {
adminCredentialsMatch,
createApiKey,
createSessionToken,
hashSecret,
parseCookies,
serializeAdminCookie,
clearAdminCookie
}
```
Implement `lib/admin-store.cjs` with:
```js
async function createAdminSession(pool, tokenHash, expiresAt) {}
async function validateAdminSession(pool, tokenHash, now = new Date()) {}
async function revokeAdminSession(pool, tokenHash) {}
async function deleteExpiredAdminSessions(pool, now = new Date()) {}
```
Validation must reject expired/revoked sessions and update `last_seen_at` no more than once per five minutes.
- [ ] **Step 4: Run tests**
```bash
node --test test/security.test.cjs test/admin-store.integration.test.cjs
```
Expected: PASS.
- [ ] **Step 5: Commit**
```bash
git add lib/security.cjs lib/admin-store.cjs test/security.test.cjs test/admin-store.integration.test.cjs
git commit -m "feat: add admin and api key security primitives"
```
### Task 3: Protect the Entire Control Plane With Admin Login
**Files:**
- Create: `lib/http.cjs`
- Create: `login.html`
- Create: `login.js`
- Create: `login.css`
- Modify: `server.cjs`
- Modify: `Dockerfile`
- Test: `test/admin-auth.integration.test.cjs`
- [ ] **Step 1: Write the failing full-site authentication test**
Start `server.cjs` against the test database and assert:
```js
assert.equal((await request("/")).status, 302)
assert.equal((await request("/app.js")).status, 302)
assert.equal((await request("/api/status")).status, 401)
assert.equal((await request("/login")).status, 200)
assert.equal((await request("/login.js")).status, 200)
assert.equal((await request("/health")).status, 200)
```
Post correct credentials to `/api/admin/login`, capture `Set-Cookie`, then assert `/`, `/app.js`, and `/api/status` succeed. Post `/api/admin/logout` and assert the cookie no longer authorizes access.
- [ ] **Step 2: Run the test to verify it fails**
```bash
node --test test/admin-auth.integration.test.cjs
```
Expected: FAIL because the control plane is currently public.
- [ ] **Step 3: Implement login and request guard**
Add `lib/http.cjs`:
```js
async function readJsonBody(req, maxBytes = 1_000_000) {}
function sendJson(res, status, body, headers = {}) {}
function openAiError(res, status, message, code) {}
module.exports = { readJsonBody, sendJson, openAiError }
```
In `server.cjs`:
- Load and validate `DATABASE_URL`, `HERMES_ADMIN_USERNAME`, and `HERMES_ADMIN_PASSWORD`.
- Reject startup when the admin username is blank or the admin password is shorter than 16 characters.
- Run migrations before listening.
- Add `GET /health`, `GET /login`, `GET /login.js`, `GET /login.css`, `POST /api/admin/login`, and `POST /api/admin/logout`.
- Add `requireAdmin(req, res)` before all existing routes/static serving.
- Redirect unauthenticated browser GETs to `/login`; return JSON `401` for unauthenticated `/api/*`.
The login endpoint must:
```js
const { plaintext, hash } = createSessionToken()
await createAdminSession(pool, hash, expiresAt)
sendJson(res, 200, { ok: true }, {
"Set-Cookie": serializeAdminCookie(plaintext, sessionTtlSeconds)
})
```
Create a minimal styled login form that posts JSON credentials and redirects to `/` after success.
Update `Dockerfile` to copy login assets, `lib/`, and `migrations/`.
- [ ] **Step 4: Run auth and existing tests**
```bash
node --test test/admin-auth.integration.test.cjs test/status-identities.test.cjs
```
Expected: PASS. Update the existing identity test to log in before requesting `/api/status`.
- [ ] **Step 5: Commit**
```bash
git add lib/http.cjs login.html login.js login.css server.cjs Dockerfile test/admin-auth.integration.test.cjs test/status-identities.test.cjs
git commit -m "feat: protect control plane with admin login"
```
### Task 4: Implement API-User and API-Key Lifecycle
**Files:**
- Create: `lib/api-users-store.cjs`
- Modify: `server.cjs`
- Test: `test/api-users.integration.test.cjs`
- [ ] **Step 1: Write failing lifecycle tests**
Test these store and HTTP behaviors:
```js
const created = await createApiUser(pool, {
displayName: "Marketing Automation",
allowPre: true,
allowPost: false,
requestsPerMinute: 30,
monthlyTokenLimit: 250000,
expiresAt: "2026-08-31T23:59:59Z"
})
assert.match(created.plaintextKey, /^hms_/)
assert.equal(created.user.keySuffix.length, 4)
assert.equal((await listApiUsers(pool))[0].plaintextKey, undefined)
```
Also assert edit, rotate, revoke, reactivate, soft-delete, expiry, and one-active-key-per-user behavior.
- [ ] **Step 2: Run the test to verify it fails**
```bash
node --test test/api-users.integration.test.cjs
```
Expected: FAIL because API-user persistence/routes do not exist.
- [ ] **Step 3: Implement store and admin endpoints**
Implement `lib/api-users-store.cjs` with:
```js
createApiUser(pool, input)
listApiUsers(pool)
updateApiUser(pool, id, patch)
rotateApiUserKey(pool, id)
revokeApiUser(pool, id)
reactivateApiUser(pool, id)
deleteApiUser(pool, id)
authenticateApiKey(pool, plaintextKey, route, now)
```
Use transactions for create/rotate/revoke/delete. `authenticateApiKey` must return typed rejection reasons: `invalid`, `revoked`, `expired`, or `forbidden`.
Add protected routes:
```text
GET /api/admin/api-users
POST /api/admin/api-users
PATCH /api/admin/api-users/:id
POST /api/admin/api-users/:id/rotate
POST /api/admin/api-users/:id/revoke
POST /api/admin/api-users/:id/reactivate
DELETE /api/admin/api-users/:id
```
Reject invalid limits, blank names, no permissions, and expiration timestamps in the past.
- [ ] **Step 4: Run lifecycle and auth tests**
```bash
node --test test/api-users.integration.test.cjs test/admin-auth.integration.test.cjs
```
Expected: PASS.
- [ ] **Step 5: Commit**
```bash
git add lib/api-users-store.cjs server.cjs test/api-users.integration.test.cjs
git commit -m "feat: add managed api users and keys"
```
### Task 5: Implement Usage, Limits, Audit Storage, and Cleanup
**Files:**
- Create: `lib/audit-store.cjs`
- Test: `test/audit.integration.test.cjs`
- [ ] **Step 1: Write failing audit and limit tests**
Test:
- Request 31 inside one minute is denied for a `30 req/min` user.
- A user at its monthly token limit is denied.
- `beginUsageEvent` creates an incomplete event.
- `completeUsageEvent` stores token totals and full request/response.
- `cleanupExpiredMessageLogs` deletes message bodies older than 90 days but keeps `usage_events`.
- `streamJsonlLogs` returns one valid JSON object per line and filters by user/date.
Example JSONL assertion:
```js
const lines = output.trim().split("\n").map(JSON.parse)
assert.equal(lines[0].api_user_name, "Marketing Automation")
assert.deepEqual(lines[0].request, { model: "test", messages: [{ role: "user", content: "hello" }] })
assert.equal(lines[0].response.choices[0].message.content, "world")
```
- [ ] **Step 2: Run the test to verify it fails**
```bash
node --test test/audit.integration.test.cjs
```
Expected: FAIL because `lib/audit-store.cjs` does not exist.
- [ ] **Step 3: Implement transactional enforcement and audit functions**
Implement:
```js
async function authorizeUsage(pool, apiUser, now = new Date()) {}
async function beginUsageEvent(pool, input) {}
async function completeUsageEvent(pool, id, input) {}
async function failUsageEvent(pool, id, input) {}
async function streamJsonlLogs(pool, filters, writable) {}
async function cleanupExpiredMessageLogs(pool, now = new Date()) {}
```
`authorizeUsage` must lock the API-user row and query:
```sql
select count(*) from usage_events
where api_user_id = $1 and request_started_at >= $2
```
and:
```sql
select coalesce(sum(total_tokens), 0) from usage_events
where api_user_id = $1 and request_started_at >= date_trunc('month', $2::timestamptz)
```
`completeUsageEvent` stores `delete_after = created_at + interval '90 days'`.
- [ ] **Step 4: Run the audit tests**
```bash
node --test test/audit.integration.test.cjs
```
Expected: PASS.
- [ ] **Step 5: Commit**
```bash
git add lib/audit-store.cjs test/audit.integration.test.cjs
git commit -m "feat: add api usage limits and audit storage"
```
### Task 6: Build the Public Pre/Post API Gateway
**Files:**
- Create: `api-gateway.cjs`
- Test: `test/api-gateway.integration.test.cjs`
- [ ] **Step 1: Write failing gateway integration tests**
Start a fake upstream and `api-gateway.cjs` configured as `pre`. Test:
- Missing/invalid key returns OpenAI-style `401`.
- Post-only key on pre gateway returns `403`.
- Revoked/expired key returns `410`.
- Rate/monthly-limit rejection returns `429`.
- Non-streaming response forwards status/body and stores full audit content.
- SSE response forwards chunks unchanged and stores the assembled response text.
- Upstream failure returns `502`.
- Client disconnect marks the audit record partial.
Use this expected error shape:
```js
assert.deepEqual(body, {
error: {
message: "API key does not permit pre-Hermes access",
type: "permission_denied",
code: "permission_denied"
}
})
```
- [ ] **Step 2: Run the test to verify it fails**
```bash
node --test test/api-gateway.integration.test.cjs
```
Expected: FAIL because `api-gateway.cjs` does not exist.
- [ ] **Step 3: Implement the reusable gateway process**
`api-gateway.cjs` reads:
```text
DATABASE_URL
HERMES_API_ROUTE_KIND=pre|post
HERMES_API_GATEWAY_HOST
HERMES_API_GATEWAY_PORT
HERMES_UPSTREAM_URL
HERMES_LOG_RETENTION_DAYS=90
```
For every `/v1/*` request:
```js
const identity = await authenticateApiKey(pool, bearer, routeKind)
await authorizeUsage(pool, identity.user)
const eventId = await beginUsageEvent(pool, requestMetadata)
await forwardAndCapture(req, res, upstreamUrl, eventId)
```
Forward all request headers except hop-by-hop headers and replace `Authorization` with the internal upstream key only when configured. Enforce `HERMES_AUDIT_MAX_BYTES` for both request and response bodies so the system never silently truncates an accepted audit record:
- Reject an oversized request with `413` before forwarding it.
- If an upstream response crosses the limit, stop the upstream stream, finish the client response with an OpenAI-compatible audit-size error when headers have not been sent, or close the stream when they have.
- Store every byte accepted from the request and delivered to the client, marking the usage event with `audit_size_exceeded`.
Expose unauthenticated `/health` that verifies process health and database reachability without leaking configuration.
- [ ] **Step 4: Run gateway tests**
```bash
node --test test/api-gateway.integration.test.cjs
```
Expected: PASS.
- [ ] **Step 5: Commit**
```bash
git add api-gateway.cjs test/api-gateway.integration.test.cjs
git commit -m "feat: add authenticated pre and post api gateway"
```
### Task 7: Add JSONL Download and Retention Scheduling to the Control Plane
**Files:**
- Modify: `server.cjs`
- Modify: `lib/config.cjs`
- Test: `test/admin-logs.integration.test.cjs`
- [ ] **Step 1: Write failing protected-download tests**
Test:
- Unauthenticated download returns `401`.
- Admin can download all logs.
- Admin can filter with `api_user_id`, `start`, and `end`.
- Invalid date ranges return `400`.
- Response headers are:
```text
Content-Type: application/x-ndjson
Content-Disposition: attachment; filename="hermes-audit-<date>.jsonl"
```
- [ ] **Step 2: Run test to verify it fails**
```bash
node --test test/admin-logs.integration.test.cjs
```
Expected: FAIL because the endpoint does not exist.
- [ ] **Step 3: Add download endpoint and cleanup timer**
Add:
```text
GET /api/admin/logs/download?api_user_id=&start=&end=
```
After authentication, write download headers and call `streamJsonlLogs`.
Start a cleanup interval after migrations:
```js
const cleanupTimer = setInterval(() => {
cleanupExpiredMessageLogs(pool).catch((err) => console.error("audit cleanup failed", err))
}, 6 * 60 * 60 * 1000)
cleanupTimer.unref()
```
Run one cleanup immediately at startup.
- [ ] **Step 4: Run download, audit, and auth tests**
```bash
node --test test/admin-logs.integration.test.cjs test/audit.integration.test.cjs test/admin-auth.integration.test.cjs
```
Expected: PASS.
- [ ] **Step 5: Commit**
```bash
git add server.cjs lib/config.cjs test/admin-logs.integration.test.cjs
git commit -m "feat: add protected jsonl audit downloads"
```
### Task 8: Build the Approved API-User Management Table
**Files:**
- Modify: `index.html`
- Modify: `app.js`
- Modify: `style.css`
- Test: `test/ui-contract.test.cjs`
- [ ] **Step 1: Write the failing UI contract test**
Create a static contract test that reads the three files and asserts the presence of:
```js
assert.match(index, /data-route="api-users"/)
assert.match(index, /id="api-users-table"/)
assert.match(app, /loadApiUsers/)
assert.match(app, /createApiUser/)
assert.match(app, /rotateApiUserKey/)
assert.match(app, /downloadApiUserLogs/)
assert.match(css, /\.api-user-table/)
assert.match(css, /@media/)
```
- [ ] **Step 2: Run test to verify it fails**
```bash
node --test test/ui-contract.test.cjs
```
Expected: FAIL because the API-user pane does not exist.
- [ ] **Step 3: Implement the API-user pane**
Add a navigation item and pane with:
- Page title and `Create API User` button.
- Structured desktop table columns: user/status, masked key, access, limits, last used, expires, actions.
- Compact stacked records on narrow screens.
- Create/edit dialog with name, pre/post checkboxes, requests-per-minute, monthly token limit, and expiry.
- One-time key display dialog after create/rotate.
- Row actions: edit, rotate, JSONL download, revoke/reactivate, delete.
API client functions must call the protected endpoints and handle `401` by navigating to `/login`.
Use native `<dialog>` elements and existing button/input styles. Do not display prompt/response content.
- [ ] **Step 4: Run UI contract and full Node tests**
```bash
npm test
```
Expected: PASS.
- [ ] **Step 5: Commit**
```bash
git add index.html app.js style.css test/ui-contract.test.cjs
git commit -m "feat: add api user management interface"
```
### Task 9: Correct Compose Routing and Container Startup
**Files:**
- Modify: `docker-compose.yml`
- Modify: `Dockerfile`
- Modify: `.env.example`
- Modify: `.dockerignore`
- Test: `test/compose-contract.test.cjs`
- [ ] **Step 1: Write failing Compose contract test**
Execute `docker compose --env-file .env.example config --format json` and assert:
- Public gateway services publish `8645` and `8646`.
- Native pre/post upstream services publish no host ports.
- Post upstream sets `API_SERVER_ENABLED=true`.
- Control plane and public gateways receive `DATABASE_URL`.
- Required admin variables are passed only to the control plane.
- Health checks target the correct internal endpoints.
- [ ] **Step 2: Run test to verify it fails**
```bash
node --test test/compose-contract.test.cjs
```
Expected: FAIL because current native services own public ports and no gateway services exist.
- [ ] **Step 3: Update Docker and Compose**
The final services are:
```text
hermes-control-plane
hermes-pre-upstream
hermes-post-upstream
hermes-pre-api
hermes-post-api
```
Key Compose behavior:
```yaml
hermes-pre-upstream:
expose: ["8645"]
command: ["/bin/sh", "-lc", "exec \"$$HERMES_EXE\" proxy start --provider \"$$HERMES_PRE_AI_PROVIDER\" --host 0.0.0.0 --port 8645"]
hermes-post-upstream:
expose: ["8642"]
environment:
API_SERVER_ENABLED: "true"
API_SERVER_HOST: 0.0.0.0
API_SERVER_PORT: 8642
command: ["/bin/sh", "-lc", "exec \"$$HERMES_EXE\" gateway run --replace --accept-hooks"]
hermes-pre-api:
command: ["node", "/app/api-gateway.cjs"]
environment:
HERMES_API_ROUTE_KIND: pre
HERMES_UPSTREAM_URL: http://hermes-pre-upstream:8645
ports: ["${HERMES_PRE_AI_API_PORT:-8645}:8645"]
hermes-post-api:
command: ["node", "/app/api-gateway.cjs"]
environment:
HERMES_API_ROUTE_KIND: post
HERMES_UPSTREAM_URL: http://hermes-post-upstream:8642
ports: ["${HERMES_POST_AI_API_PORT:-8646}:8646"]
```
Update the Dockerfile to run `npm ci --omit=dev` and copy all new modules/assets/migrations.
Add to `.env.example`:
```text
DATABASE_URL=postgresql://hermes_user:change-me@postgres.example.internal:5432/hermes_control_plane
HERMES_ADMIN_USERNAME=admin
HERMES_ADMIN_PASSWORD=change-this-to-a-long-random-password
HERMES_ADMIN_SESSION_TTL_HOURS=12
HERMES_LOG_RETENTION_DAYS=90
HERMES_AUDIT_MAX_BYTES=10485760
```
- [ ] **Step 4: Run Compose and full tests**
```bash
npm run check
npm test
```
Expected: PASS. Compose output shows only the public gateway services publishing `8645` and `8646`.
- [ ] **Step 5: Commit**
```bash
git add docker-compose.yml Dockerfile .env.example .dockerignore test/compose-contract.test.cjs
git commit -m "feat: route public api traffic through managed gateway"
```
### Task 10: Documentation, Browser QA, and End-to-End Verification
**Files:**
- Modify: `README.md`
- Modify: `.gitignore`
- Create: `test/e2e-smoke.cjs`
- [ ] **Step 1: Add the failing smoke test**
Create a smoke test that, against a running stack configured with `TEST_DATABASE_URL`:
1. Logs into the control plane.
2. Creates a pre-only API user.
3. Calls pre API successfully.
4. Calls post API and receives `403`.
5. Rotates the key and verifies the old key receives `401`.
6. Downloads JSONL and verifies the successful pre request is present.
- [ ] **Step 2: Run the smoke test before documentation**
```bash
node --test test/e2e-smoke.cjs
```
Expected: PASS against the running stack; explicit SKIP when `RUN_E2E` is not set.
- [ ] **Step 3: Document deployment and security operations**
Update `README.md` with:
- Dedicated PostgreSQL database creation and least-privilege user guidance.
- Required Portainer environment variables.
- Admin login URL.
- Pre/post public API URLs.
- API-user creation/rotation/revocation behavior.
- JSONL download and 90-day retention behavior.
- Backup guidance for PostgreSQL and `.hermes/.codex/.claude/.gemini`.
- Warning that full prompts and responses are retained for 90 days.
Add `.superpowers/` to `.gitignore`.
- [ ] **Step 4: Run final automated verification**
```bash
npm run check
npm test
git diff --check
```
Expected: all checks pass.
- [ ] **Step 5: Run browser QA**
Start the stack, then verify in a real browser at desktop and mobile widths:
- Unauthenticated `/` redirects to login.
- Login form works and errors are clear.
- API-user table aligns correctly and no cells overlap.
- Create/edit/rotate dialogs work.
- Plaintext key appears once.
- JSONL download triggers a file download.
- Revoke/delete confirmations are explicit.
- Mobile layout becomes stacked records without horizontal clipping.
- [ ] **Step 6: Commit**
```bash
git add README.md .gitignore test/e2e-smoke.cjs
git commit -m "docs: document managed hermes api access"
```
## Final Verification Checklist
- [ ] `npm run check` passes.
- [ ] `npm test` passes with PostgreSQL integration tests enabled.
- [ ] `git diff --check` passes.
- [ ] Only the managed pre/post gateway services publish AI API ports.
- [ ] Control-plane HTML, assets, and management endpoints require admin login.
- [ ] API keys are shown once and stored only as hashes.
- [ ] Pre/post permissions, rate limits, monthly limits, expiry, revoke, rotate, and delete work.
- [ ] Full prompts/responses are downloadable as JSONL and never displayed in the UI.
- [ ] Message content older than 90 days is deleted while usage metadata remains.
- [ ] Browser QA passes at desktop and mobile widths.