docs: plan admin auth and api user management
This commit is contained in:
@@ -0,0 +1,937 @@
|
||||
# Hermes Admin Auth, API Users, Usage, and Audit Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Protect the Hermes control plane with admin authentication and provide PostgreSQL-backed per-user API keys, limits, usage attribution, 90-day full-message audit retention, and JSONL downloads across pre-Hermes and post-Hermes APIs.
|
||||
|
||||
**Architecture:** Add focused CommonJS modules for configuration, PostgreSQL access, security, API-user persistence, and audit persistence. Keep admin routes in the existing control-plane server, and add a reusable public API gateway process that runs once for pre-Hermes and once for post-Hermes while forwarding to internal-only native Hermes services.
|
||||
|
||||
**Tech Stack:** Node.js 20 CommonJS, built-in `http`, `crypto`, and `node:test`; PostgreSQL through `pg`; existing HTML/CSS/vanilla JavaScript UI; Docker Compose; native Hermes `proxy start` and `gateway run`.
|
||||
|
||||
---
|
||||
|
||||
## File Structure
|
||||
|
||||
Create these focused modules:
|
||||
|
||||
- `lib/config.cjs`: validates required environment variables and returns typed configuration.
|
||||
- `lib/db.cjs`: owns the PostgreSQL pool, transactions, and migration runner.
|
||||
- `lib/security.cjs`: timing-safe admin credential checks, session tokens, API-key generation/hashing, and cookie parsing.
|
||||
- `lib/admin-store.cjs`: admin-session persistence and validation.
|
||||
- `lib/api-users-store.cjs`: API-user/key lifecycle, permission checks, limits, and last-used updates.
|
||||
- `lib/audit-store.cjs`: usage events, full message logs, JSONL streaming queries, and retention cleanup.
|
||||
- `lib/http.cjs`: bounded request-body reading and OpenAI-style JSON errors shared by both servers.
|
||||
- `api-gateway.cjs`: public authenticated gateway process configured as either pre or post.
|
||||
- `migrations/001_admin_api_users.sql`: initial PostgreSQL schema and indexes.
|
||||
- `login.html`, `login.js`, `login.css`: minimal unauthenticated login surface.
|
||||
|
||||
Modify these existing files:
|
||||
|
||||
- `server.cjs`: protect the control plane and add admin/API-user endpoints.
|
||||
- `index.html`, `app.js`, `style.css`: add the approved API-user table and management dialogs.
|
||||
- `docker-compose.yml`: make native Hermes services internal and run public pre/post gateway services.
|
||||
- `Dockerfile`: install production dependencies and copy new server modules/assets.
|
||||
- `.env.example`, `README.md`, `.gitignore`, `.dockerignore`, `package.json`: deployment configuration, scripts, and generated-artifact hygiene.
|
||||
|
||||
Tests:
|
||||
|
||||
- `test/helpers/db-test.cjs`: isolated PostgreSQL schema setup using `TEST_DATABASE_URL`.
|
||||
- `test/security.test.cjs`: pure security helper tests.
|
||||
- `test/admin-auth.integration.test.cjs`: login/session/full-site protection.
|
||||
- `test/api-users.integration.test.cjs`: API-user and key lifecycle.
|
||||
- `test/api-gateway.integration.test.cjs`: forwarding, permissions, limits, streaming capture, and failures.
|
||||
- `test/audit.integration.test.cjs`: JSONL downloads and 90-day cleanup.
|
||||
|
||||
### Task 1: Add PostgreSQL Dependency, Configuration, and Migration Runner
|
||||
|
||||
**Files:**
|
||||
- Modify: `package.json`
|
||||
- Create: `lib/config.cjs`
|
||||
- Create: `lib/db.cjs`
|
||||
- Create: `migrations/001_admin_api_users.sql`
|
||||
- Create: `test/helpers/db-test.cjs`
|
||||
- Create: `test/db.integration.test.cjs`
|
||||
|
||||
- [ ] **Step 1: Add the failing migration integration test**
|
||||
|
||||
Create `test/db.integration.test.cjs` with a `node:test` case that:
|
||||
|
||||
```js
|
||||
const test = require("node:test")
|
||||
const assert = require("node:assert/strict")
|
||||
const { withTestDatabase } = require("./helpers/db-test.cjs")
|
||||
const { runMigrations } = require("../lib/db.cjs")
|
||||
|
||||
test("runMigrations creates the admin and API-user schema idempotently", async (t) => {
|
||||
await withTestDatabase(t, async ({ pool }) => {
|
||||
await runMigrations(pool)
|
||||
await runMigrations(pool)
|
||||
const result = await pool.query(`
|
||||
select table_name from information_schema.tables
|
||||
where table_schema = current_schema()
|
||||
order by table_name
|
||||
`)
|
||||
const names = result.rows.map((row) => row.table_name)
|
||||
assert(names.includes("admin_sessions"))
|
||||
assert(names.includes("api_users"))
|
||||
assert(names.includes("api_keys"))
|
||||
assert(names.includes("usage_events"))
|
||||
assert(names.includes("message_logs"))
|
||||
})
|
||||
})
|
||||
```
|
||||
|
||||
`withTestDatabase` must require `TEST_DATABASE_URL`, create a unique schema, set `search_path`, and drop the schema in `t.after()`.
|
||||
|
||||
- [ ] **Step 2: Run the test to verify it fails**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
node --test test/db.integration.test.cjs
|
||||
```
|
||||
|
||||
Expected: FAIL because `lib/db.cjs` and the migration do not exist.
|
||||
|
||||
- [ ] **Step 3: Add dependency and minimal database implementation**
|
||||
|
||||
Update `package.json`:
|
||||
|
||||
```json
|
||||
"scripts": {
|
||||
"start": "node server.cjs",
|
||||
"start:gateway": "node api-gateway.cjs",
|
||||
"check": "node -c server.cjs && node -c api-gateway.cjs && docker compose --env-file .env.example config",
|
||||
"test": "node --test test/*.test.cjs"
|
||||
},
|
||||
"dependencies": {
|
||||
"pg": "^8.16.3"
|
||||
}
|
||||
```
|
||||
|
||||
Implement `lib/config.cjs` with:
|
||||
|
||||
```js
|
||||
function required(name, env = process.env) {
|
||||
const value = String(env[name] || "").trim()
|
||||
if (!value) throw new Error(`${name} is required`)
|
||||
return value
|
||||
}
|
||||
|
||||
function loadDatabaseConfig(env = process.env) {
|
||||
return { databaseUrl: required("DATABASE_URL", env) }
|
||||
}
|
||||
|
||||
module.exports = { required, loadDatabaseConfig }
|
||||
```
|
||||
|
||||
Implement `lib/db.cjs` using `pg.Pool`, `withTransaction(pool, fn)`, and `runMigrations(pool)` that applies ordered `.sql` files once under a PostgreSQL advisory lock and records them in `schema_migrations`.
|
||||
|
||||
Create `migrations/001_admin_api_users.sql` with UUID/text IDs generated by the application and these tables:
|
||||
|
||||
```sql
|
||||
create table admin_sessions (
|
||||
token_hash text primary key,
|
||||
created_at timestamptz not null default now(),
|
||||
expires_at timestamptz not null,
|
||||
last_seen_at timestamptz not null default now(),
|
||||
revoked_at timestamptz
|
||||
);
|
||||
|
||||
create table api_users (
|
||||
id text primary key,
|
||||
display_name text not null,
|
||||
status text not null check (status in ('active', 'revoked', 'deleted')),
|
||||
allow_pre boolean not null default false,
|
||||
allow_post boolean not null default false,
|
||||
requests_per_minute integer not null check (requests_per_minute > 0),
|
||||
monthly_token_limit bigint not null check (monthly_token_limit > 0),
|
||||
expires_at timestamptz,
|
||||
created_at timestamptz not null default now(),
|
||||
updated_at timestamptz not null default now(),
|
||||
last_used_at timestamptz,
|
||||
revoked_at timestamptz,
|
||||
deleted_at timestamptz
|
||||
);
|
||||
|
||||
create table api_keys (
|
||||
id text primary key,
|
||||
api_user_id text not null references api_users(id),
|
||||
key_hash text not null unique,
|
||||
key_suffix text not null,
|
||||
created_at timestamptz not null default now(),
|
||||
revoked_at timestamptz
|
||||
);
|
||||
|
||||
create unique index api_keys_one_active_per_user
|
||||
on api_keys(api_user_id) where revoked_at is null;
|
||||
|
||||
create table usage_events (
|
||||
id text primary key,
|
||||
api_user_id text not null,
|
||||
api_user_name text not null,
|
||||
api_key_id text not null,
|
||||
route text not null check (route in ('pre', 'post')),
|
||||
request_started_at timestamptz not null,
|
||||
request_completed_at timestamptz,
|
||||
http_status integer,
|
||||
model text,
|
||||
prompt_tokens bigint not null default 0,
|
||||
completion_tokens bigint not null default 0,
|
||||
total_tokens bigint not null default 0,
|
||||
latency_ms bigint,
|
||||
error_code text,
|
||||
audit_complete boolean not null default false
|
||||
);
|
||||
|
||||
create index usage_events_user_started_idx on usage_events(api_user_id, request_started_at);
|
||||
|
||||
create table message_logs (
|
||||
usage_event_id text primary key references usage_events(id),
|
||||
request_json jsonb,
|
||||
request_text text,
|
||||
response_json jsonb,
|
||||
response_text text,
|
||||
response_content_type text,
|
||||
streaming boolean not null default false,
|
||||
partial boolean not null default false,
|
||||
created_at timestamptz not null default now(),
|
||||
delete_after timestamptz not null
|
||||
);
|
||||
|
||||
create index message_logs_delete_after_idx on message_logs(delete_after);
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Install dependencies and run the test**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
npm install
|
||||
node --test test/db.integration.test.cjs
|
||||
```
|
||||
|
||||
Expected: PASS when `TEST_DATABASE_URL` is set; otherwise the helper reports one explicit SKIP.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add package.json package-lock.json lib/config.cjs lib/db.cjs migrations/001_admin_api_users.sql test/helpers/db-test.cjs test/db.integration.test.cjs
|
||||
git commit -m "feat: add postgres schema and migration runner"
|
||||
```
|
||||
|
||||
### Task 2: Add Security Primitives and Admin Session Store
|
||||
|
||||
**Files:**
|
||||
- Create: `lib/security.cjs`
|
||||
- Create: `lib/admin-store.cjs`
|
||||
- Create: `test/security.test.cjs`
|
||||
- Create: `test/admin-store.integration.test.cjs`
|
||||
|
||||
- [ ] **Step 1: Write failing security tests**
|
||||
|
||||
Create pure tests that assert:
|
||||
|
||||
```js
|
||||
const { adminCredentialsMatch, createSessionToken, createApiKey, hashSecret, parseCookies } = require("../lib/security.cjs")
|
||||
|
||||
assert.equal(adminCredentialsMatch("admin", "secret-value-123", {
|
||||
username: "admin", password: "secret-value-123"
|
||||
}), true)
|
||||
assert.equal(adminCredentialsMatch("admin", "wrong-value-123", {
|
||||
username: "admin", password: "secret-value-123"
|
||||
}), false)
|
||||
assert.match(createApiKey().plaintext, /^hms_[A-Za-z0-9_-]{40,}$/)
|
||||
assert.equal(createSessionToken().hash.length, 64)
|
||||
assert.deepEqual(parseCookies("a=1; hermes_admin=abc"), { a: "1", hermes_admin: "abc" })
|
||||
```
|
||||
|
||||
Create integration tests proving a session can be created, validated, touched, expired, and revoked.
|
||||
|
||||
- [ ] **Step 2: Run tests to verify they fail**
|
||||
|
||||
```bash
|
||||
node --test test/security.test.cjs test/admin-store.integration.test.cjs
|
||||
```
|
||||
|
||||
Expected: FAIL because the modules do not exist.
|
||||
|
||||
- [ ] **Step 3: Implement security and session persistence**
|
||||
|
||||
Implement `lib/security.cjs` using `crypto.randomBytes`, `crypto.createHash("sha256")`, and `crypto.timingSafeEqual`.
|
||||
|
||||
Export:
|
||||
|
||||
```js
|
||||
module.exports = {
|
||||
adminCredentialsMatch,
|
||||
createApiKey,
|
||||
createSessionToken,
|
||||
hashSecret,
|
||||
parseCookies,
|
||||
serializeAdminCookie,
|
||||
clearAdminCookie
|
||||
}
|
||||
```
|
||||
|
||||
Implement `lib/admin-store.cjs` with:
|
||||
|
||||
```js
|
||||
async function createAdminSession(pool, tokenHash, expiresAt) {}
|
||||
async function validateAdminSession(pool, tokenHash, now = new Date()) {}
|
||||
async function revokeAdminSession(pool, tokenHash) {}
|
||||
async function deleteExpiredAdminSessions(pool, now = new Date()) {}
|
||||
```
|
||||
|
||||
Validation must reject expired/revoked sessions and update `last_seen_at` no more than once per five minutes.
|
||||
|
||||
- [ ] **Step 4: Run tests**
|
||||
|
||||
```bash
|
||||
node --test test/security.test.cjs test/admin-store.integration.test.cjs
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add lib/security.cjs lib/admin-store.cjs test/security.test.cjs test/admin-store.integration.test.cjs
|
||||
git commit -m "feat: add admin and api key security primitives"
|
||||
```
|
||||
|
||||
### Task 3: Protect the Entire Control Plane With Admin Login
|
||||
|
||||
**Files:**
|
||||
- Create: `lib/http.cjs`
|
||||
- Create: `login.html`
|
||||
- Create: `login.js`
|
||||
- Create: `login.css`
|
||||
- Modify: `server.cjs`
|
||||
- Modify: `Dockerfile`
|
||||
- Test: `test/admin-auth.integration.test.cjs`
|
||||
|
||||
- [ ] **Step 1: Write the failing full-site authentication test**
|
||||
|
||||
Start `server.cjs` against the test database and assert:
|
||||
|
||||
```js
|
||||
assert.equal((await request("/")).status, 302)
|
||||
assert.equal((await request("/app.js")).status, 302)
|
||||
assert.equal((await request("/api/status")).status, 401)
|
||||
assert.equal((await request("/login")).status, 200)
|
||||
assert.equal((await request("/login.js")).status, 200)
|
||||
assert.equal((await request("/health")).status, 200)
|
||||
```
|
||||
|
||||
Post correct credentials to `/api/admin/login`, capture `Set-Cookie`, then assert `/`, `/app.js`, and `/api/status` succeed. Post `/api/admin/logout` and assert the cookie no longer authorizes access.
|
||||
|
||||
- [ ] **Step 2: Run the test to verify it fails**
|
||||
|
||||
```bash
|
||||
node --test test/admin-auth.integration.test.cjs
|
||||
```
|
||||
|
||||
Expected: FAIL because the control plane is currently public.
|
||||
|
||||
- [ ] **Step 3: Implement login and request guard**
|
||||
|
||||
Add `lib/http.cjs`:
|
||||
|
||||
```js
|
||||
async function readJsonBody(req, maxBytes = 1_000_000) {}
|
||||
function sendJson(res, status, body, headers = {}) {}
|
||||
function openAiError(res, status, message, code) {}
|
||||
module.exports = { readJsonBody, sendJson, openAiError }
|
||||
```
|
||||
|
||||
In `server.cjs`:
|
||||
|
||||
- Load and validate `DATABASE_URL`, `HERMES_ADMIN_USERNAME`, and `HERMES_ADMIN_PASSWORD`.
|
||||
- Reject startup when the admin username is blank or the admin password is shorter than 16 characters.
|
||||
- Run migrations before listening.
|
||||
- Add `GET /health`, `GET /login`, `GET /login.js`, `GET /login.css`, `POST /api/admin/login`, and `POST /api/admin/logout`.
|
||||
- Add `requireAdmin(req, res)` before all existing routes/static serving.
|
||||
- Redirect unauthenticated browser GETs to `/login`; return JSON `401` for unauthenticated `/api/*`.
|
||||
|
||||
The login endpoint must:
|
||||
|
||||
```js
|
||||
const { plaintext, hash } = createSessionToken()
|
||||
await createAdminSession(pool, hash, expiresAt)
|
||||
sendJson(res, 200, { ok: true }, {
|
||||
"Set-Cookie": serializeAdminCookie(plaintext, sessionTtlSeconds)
|
||||
})
|
||||
```
|
||||
|
||||
Create a minimal styled login form that posts JSON credentials and redirects to `/` after success.
|
||||
|
||||
Update `Dockerfile` to copy login assets, `lib/`, and `migrations/`.
|
||||
|
||||
- [ ] **Step 4: Run auth and existing tests**
|
||||
|
||||
```bash
|
||||
node --test test/admin-auth.integration.test.cjs test/status-identities.test.cjs
|
||||
```
|
||||
|
||||
Expected: PASS. Update the existing identity test to log in before requesting `/api/status`.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add lib/http.cjs login.html login.js login.css server.cjs Dockerfile test/admin-auth.integration.test.cjs test/status-identities.test.cjs
|
||||
git commit -m "feat: protect control plane with admin login"
|
||||
```
|
||||
|
||||
### Task 4: Implement API-User and API-Key Lifecycle
|
||||
|
||||
**Files:**
|
||||
- Create: `lib/api-users-store.cjs`
|
||||
- Modify: `server.cjs`
|
||||
- Test: `test/api-users.integration.test.cjs`
|
||||
|
||||
- [ ] **Step 1: Write failing lifecycle tests**
|
||||
|
||||
Test these store and HTTP behaviors:
|
||||
|
||||
```js
|
||||
const created = await createApiUser(pool, {
|
||||
displayName: "Marketing Automation",
|
||||
allowPre: true,
|
||||
allowPost: false,
|
||||
requestsPerMinute: 30,
|
||||
monthlyTokenLimit: 250000,
|
||||
expiresAt: "2026-08-31T23:59:59Z"
|
||||
})
|
||||
assert.match(created.plaintextKey, /^hms_/)
|
||||
assert.equal(created.user.keySuffix.length, 4)
|
||||
assert.equal((await listApiUsers(pool))[0].plaintextKey, undefined)
|
||||
```
|
||||
|
||||
Also assert edit, rotate, revoke, reactivate, soft-delete, expiry, and one-active-key-per-user behavior.
|
||||
|
||||
- [ ] **Step 2: Run the test to verify it fails**
|
||||
|
||||
```bash
|
||||
node --test test/api-users.integration.test.cjs
|
||||
```
|
||||
|
||||
Expected: FAIL because API-user persistence/routes do not exist.
|
||||
|
||||
- [ ] **Step 3: Implement store and admin endpoints**
|
||||
|
||||
Implement `lib/api-users-store.cjs` with:
|
||||
|
||||
```js
|
||||
createApiUser(pool, input)
|
||||
listApiUsers(pool)
|
||||
updateApiUser(pool, id, patch)
|
||||
rotateApiUserKey(pool, id)
|
||||
revokeApiUser(pool, id)
|
||||
reactivateApiUser(pool, id)
|
||||
deleteApiUser(pool, id)
|
||||
authenticateApiKey(pool, plaintextKey, route, now)
|
||||
```
|
||||
|
||||
Use transactions for create/rotate/revoke/delete. `authenticateApiKey` must return typed rejection reasons: `invalid`, `revoked`, `expired`, or `forbidden`.
|
||||
|
||||
Add protected routes:
|
||||
|
||||
```text
|
||||
GET /api/admin/api-users
|
||||
POST /api/admin/api-users
|
||||
PATCH /api/admin/api-users/:id
|
||||
POST /api/admin/api-users/:id/rotate
|
||||
POST /api/admin/api-users/:id/revoke
|
||||
POST /api/admin/api-users/:id/reactivate
|
||||
DELETE /api/admin/api-users/:id
|
||||
```
|
||||
|
||||
Reject invalid limits, blank names, no permissions, and expiration timestamps in the past.
|
||||
|
||||
- [ ] **Step 4: Run lifecycle and auth tests**
|
||||
|
||||
```bash
|
||||
node --test test/api-users.integration.test.cjs test/admin-auth.integration.test.cjs
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add lib/api-users-store.cjs server.cjs test/api-users.integration.test.cjs
|
||||
git commit -m "feat: add managed api users and keys"
|
||||
```
|
||||
|
||||
### Task 5: Implement Usage, Limits, Audit Storage, and Cleanup
|
||||
|
||||
**Files:**
|
||||
- Create: `lib/audit-store.cjs`
|
||||
- Test: `test/audit.integration.test.cjs`
|
||||
|
||||
- [ ] **Step 1: Write failing audit and limit tests**
|
||||
|
||||
Test:
|
||||
|
||||
- Request 31 inside one minute is denied for a `30 req/min` user.
|
||||
- A user at its monthly token limit is denied.
|
||||
- `beginUsageEvent` creates an incomplete event.
|
||||
- `completeUsageEvent` stores token totals and full request/response.
|
||||
- `cleanupExpiredMessageLogs` deletes message bodies older than 90 days but keeps `usage_events`.
|
||||
- `streamJsonlLogs` returns one valid JSON object per line and filters by user/date.
|
||||
|
||||
Example JSONL assertion:
|
||||
|
||||
```js
|
||||
const lines = output.trim().split("\n").map(JSON.parse)
|
||||
assert.equal(lines[0].api_user_name, "Marketing Automation")
|
||||
assert.deepEqual(lines[0].request, { model: "test", messages: [{ role: "user", content: "hello" }] })
|
||||
assert.equal(lines[0].response.choices[0].message.content, "world")
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the test to verify it fails**
|
||||
|
||||
```bash
|
||||
node --test test/audit.integration.test.cjs
|
||||
```
|
||||
|
||||
Expected: FAIL because `lib/audit-store.cjs` does not exist.
|
||||
|
||||
- [ ] **Step 3: Implement transactional enforcement and audit functions**
|
||||
|
||||
Implement:
|
||||
|
||||
```js
|
||||
async function authorizeUsage(pool, apiUser, now = new Date()) {}
|
||||
async function beginUsageEvent(pool, input) {}
|
||||
async function completeUsageEvent(pool, id, input) {}
|
||||
async function failUsageEvent(pool, id, input) {}
|
||||
async function streamJsonlLogs(pool, filters, writable) {}
|
||||
async function cleanupExpiredMessageLogs(pool, now = new Date()) {}
|
||||
```
|
||||
|
||||
`authorizeUsage` must lock the API-user row and query:
|
||||
|
||||
```sql
|
||||
select count(*) from usage_events
|
||||
where api_user_id = $1 and request_started_at >= $2
|
||||
```
|
||||
|
||||
and:
|
||||
|
||||
```sql
|
||||
select coalesce(sum(total_tokens), 0) from usage_events
|
||||
where api_user_id = $1 and request_started_at >= date_trunc('month', $2::timestamptz)
|
||||
```
|
||||
|
||||
`completeUsageEvent` stores `delete_after = created_at + interval '90 days'`.
|
||||
|
||||
- [ ] **Step 4: Run the audit tests**
|
||||
|
||||
```bash
|
||||
node --test test/audit.integration.test.cjs
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add lib/audit-store.cjs test/audit.integration.test.cjs
|
||||
git commit -m "feat: add api usage limits and audit storage"
|
||||
```
|
||||
|
||||
### Task 6: Build the Public Pre/Post API Gateway
|
||||
|
||||
**Files:**
|
||||
- Create: `api-gateway.cjs`
|
||||
- Test: `test/api-gateway.integration.test.cjs`
|
||||
|
||||
- [ ] **Step 1: Write failing gateway integration tests**
|
||||
|
||||
Start a fake upstream and `api-gateway.cjs` configured as `pre`. Test:
|
||||
|
||||
- Missing/invalid key returns OpenAI-style `401`.
|
||||
- Post-only key on pre gateway returns `403`.
|
||||
- Revoked/expired key returns `410`.
|
||||
- Rate/monthly-limit rejection returns `429`.
|
||||
- Non-streaming response forwards status/body and stores full audit content.
|
||||
- SSE response forwards chunks unchanged and stores the assembled response text.
|
||||
- Upstream failure returns `502`.
|
||||
- Client disconnect marks the audit record partial.
|
||||
|
||||
Use this expected error shape:
|
||||
|
||||
```js
|
||||
assert.deepEqual(body, {
|
||||
error: {
|
||||
message: "API key does not permit pre-Hermes access",
|
||||
type: "permission_denied",
|
||||
code: "permission_denied"
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the test to verify it fails**
|
||||
|
||||
```bash
|
||||
node --test test/api-gateway.integration.test.cjs
|
||||
```
|
||||
|
||||
Expected: FAIL because `api-gateway.cjs` does not exist.
|
||||
|
||||
- [ ] **Step 3: Implement the reusable gateway process**
|
||||
|
||||
`api-gateway.cjs` reads:
|
||||
|
||||
```text
|
||||
DATABASE_URL
|
||||
HERMES_API_ROUTE_KIND=pre|post
|
||||
HERMES_API_GATEWAY_HOST
|
||||
HERMES_API_GATEWAY_PORT
|
||||
HERMES_UPSTREAM_URL
|
||||
HERMES_LOG_RETENTION_DAYS=90
|
||||
```
|
||||
|
||||
For every `/v1/*` request:
|
||||
|
||||
```js
|
||||
const identity = await authenticateApiKey(pool, bearer, routeKind)
|
||||
await authorizeUsage(pool, identity.user)
|
||||
const eventId = await beginUsageEvent(pool, requestMetadata)
|
||||
await forwardAndCapture(req, res, upstreamUrl, eventId)
|
||||
```
|
||||
|
||||
Forward all request headers except hop-by-hop headers and replace `Authorization` with the internal upstream key only when configured. Enforce `HERMES_AUDIT_MAX_BYTES` for both request and response bodies so the system never silently truncates an accepted audit record:
|
||||
|
||||
- Reject an oversized request with `413` before forwarding it.
|
||||
- If an upstream response crosses the limit, stop the upstream stream, finish the client response with an OpenAI-compatible audit-size error when headers have not been sent, or close the stream when they have.
|
||||
- Store every byte accepted from the request and delivered to the client, marking the usage event with `audit_size_exceeded`.
|
||||
|
||||
Expose unauthenticated `/health` that verifies process health and database reachability without leaking configuration.
|
||||
|
||||
- [ ] **Step 4: Run gateway tests**
|
||||
|
||||
```bash
|
||||
node --test test/api-gateway.integration.test.cjs
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add api-gateway.cjs test/api-gateway.integration.test.cjs
|
||||
git commit -m "feat: add authenticated pre and post api gateway"
|
||||
```
|
||||
|
||||
### Task 7: Add JSONL Download and Retention Scheduling to the Control Plane
|
||||
|
||||
**Files:**
|
||||
- Modify: `server.cjs`
|
||||
- Modify: `lib/config.cjs`
|
||||
- Test: `test/admin-logs.integration.test.cjs`
|
||||
|
||||
- [ ] **Step 1: Write failing protected-download tests**
|
||||
|
||||
Test:
|
||||
|
||||
- Unauthenticated download returns `401`.
|
||||
- Admin can download all logs.
|
||||
- Admin can filter with `api_user_id`, `start`, and `end`.
|
||||
- Invalid date ranges return `400`.
|
||||
- Response headers are:
|
||||
|
||||
```text
|
||||
Content-Type: application/x-ndjson
|
||||
Content-Disposition: attachment; filename="hermes-audit-<date>.jsonl"
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
```bash
|
||||
node --test test/admin-logs.integration.test.cjs
|
||||
```
|
||||
|
||||
Expected: FAIL because the endpoint does not exist.
|
||||
|
||||
- [ ] **Step 3: Add download endpoint and cleanup timer**
|
||||
|
||||
Add:
|
||||
|
||||
```text
|
||||
GET /api/admin/logs/download?api_user_id=&start=&end=
|
||||
```
|
||||
|
||||
After authentication, write download headers and call `streamJsonlLogs`.
|
||||
|
||||
Start a cleanup interval after migrations:
|
||||
|
||||
```js
|
||||
const cleanupTimer = setInterval(() => {
|
||||
cleanupExpiredMessageLogs(pool).catch((err) => console.error("audit cleanup failed", err))
|
||||
}, 6 * 60 * 60 * 1000)
|
||||
cleanupTimer.unref()
|
||||
```
|
||||
|
||||
Run one cleanup immediately at startup.
|
||||
|
||||
- [ ] **Step 4: Run download, audit, and auth tests**
|
||||
|
||||
```bash
|
||||
node --test test/admin-logs.integration.test.cjs test/audit.integration.test.cjs test/admin-auth.integration.test.cjs
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add server.cjs lib/config.cjs test/admin-logs.integration.test.cjs
|
||||
git commit -m "feat: add protected jsonl audit downloads"
|
||||
```
|
||||
|
||||
### Task 8: Build the Approved API-User Management Table
|
||||
|
||||
**Files:**
|
||||
- Modify: `index.html`
|
||||
- Modify: `app.js`
|
||||
- Modify: `style.css`
|
||||
- Test: `test/ui-contract.test.cjs`
|
||||
|
||||
- [ ] **Step 1: Write the failing UI contract test**
|
||||
|
||||
Create a static contract test that reads the three files and asserts the presence of:
|
||||
|
||||
```js
|
||||
assert.match(index, /data-route="api-users"/)
|
||||
assert.match(index, /id="api-users-table"/)
|
||||
assert.match(app, /loadApiUsers/)
|
||||
assert.match(app, /createApiUser/)
|
||||
assert.match(app, /rotateApiUserKey/)
|
||||
assert.match(app, /downloadApiUserLogs/)
|
||||
assert.match(css, /\.api-user-table/)
|
||||
assert.match(css, /@media/)
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
```bash
|
||||
node --test test/ui-contract.test.cjs
|
||||
```
|
||||
|
||||
Expected: FAIL because the API-user pane does not exist.
|
||||
|
||||
- [ ] **Step 3: Implement the API-user pane**
|
||||
|
||||
Add a navigation item and pane with:
|
||||
|
||||
- Page title and `Create API User` button.
|
||||
- Structured desktop table columns: user/status, masked key, access, limits, last used, expires, actions.
|
||||
- Compact stacked records on narrow screens.
|
||||
- Create/edit dialog with name, pre/post checkboxes, requests-per-minute, monthly token limit, and expiry.
|
||||
- One-time key display dialog after create/rotate.
|
||||
- Row actions: edit, rotate, JSONL download, revoke/reactivate, delete.
|
||||
|
||||
API client functions must call the protected endpoints and handle `401` by navigating to `/login`.
|
||||
|
||||
Use native `<dialog>` elements and existing button/input styles. Do not display prompt/response content.
|
||||
|
||||
- [ ] **Step 4: Run UI contract and full Node tests**
|
||||
|
||||
```bash
|
||||
npm test
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add index.html app.js style.css test/ui-contract.test.cjs
|
||||
git commit -m "feat: add api user management interface"
|
||||
```
|
||||
|
||||
### Task 9: Correct Compose Routing and Container Startup
|
||||
|
||||
**Files:**
|
||||
- Modify: `docker-compose.yml`
|
||||
- Modify: `Dockerfile`
|
||||
- Modify: `.env.example`
|
||||
- Modify: `.dockerignore`
|
||||
- Test: `test/compose-contract.test.cjs`
|
||||
|
||||
- [ ] **Step 1: Write failing Compose contract test**
|
||||
|
||||
Execute `docker compose --env-file .env.example config --format json` and assert:
|
||||
|
||||
- Public gateway services publish `8645` and `8646`.
|
||||
- Native pre/post upstream services publish no host ports.
|
||||
- Post upstream sets `API_SERVER_ENABLED=true`.
|
||||
- Control plane and public gateways receive `DATABASE_URL`.
|
||||
- Required admin variables are passed only to the control plane.
|
||||
- Health checks target the correct internal endpoints.
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
```bash
|
||||
node --test test/compose-contract.test.cjs
|
||||
```
|
||||
|
||||
Expected: FAIL because current native services own public ports and no gateway services exist.
|
||||
|
||||
- [ ] **Step 3: Update Docker and Compose**
|
||||
|
||||
The final services are:
|
||||
|
||||
```text
|
||||
hermes-control-plane
|
||||
hermes-pre-upstream
|
||||
hermes-post-upstream
|
||||
hermes-pre-api
|
||||
hermes-post-api
|
||||
```
|
||||
|
||||
Key Compose behavior:
|
||||
|
||||
```yaml
|
||||
hermes-pre-upstream:
|
||||
expose: ["8645"]
|
||||
command: ["/bin/sh", "-lc", "exec \"$$HERMES_EXE\" proxy start --provider \"$$HERMES_PRE_AI_PROVIDER\" --host 0.0.0.0 --port 8645"]
|
||||
|
||||
hermes-post-upstream:
|
||||
expose: ["8642"]
|
||||
environment:
|
||||
API_SERVER_ENABLED: "true"
|
||||
API_SERVER_HOST: 0.0.0.0
|
||||
API_SERVER_PORT: 8642
|
||||
command: ["/bin/sh", "-lc", "exec \"$$HERMES_EXE\" gateway run --replace --accept-hooks"]
|
||||
|
||||
hermes-pre-api:
|
||||
command: ["node", "/app/api-gateway.cjs"]
|
||||
environment:
|
||||
HERMES_API_ROUTE_KIND: pre
|
||||
HERMES_UPSTREAM_URL: http://hermes-pre-upstream:8645
|
||||
ports: ["${HERMES_PRE_AI_API_PORT:-8645}:8645"]
|
||||
|
||||
hermes-post-api:
|
||||
command: ["node", "/app/api-gateway.cjs"]
|
||||
environment:
|
||||
HERMES_API_ROUTE_KIND: post
|
||||
HERMES_UPSTREAM_URL: http://hermes-post-upstream:8642
|
||||
ports: ["${HERMES_POST_AI_API_PORT:-8646}:8646"]
|
||||
```
|
||||
|
||||
Update the Dockerfile to run `npm ci --omit=dev` and copy all new modules/assets/migrations.
|
||||
|
||||
Add to `.env.example`:
|
||||
|
||||
```text
|
||||
DATABASE_URL=postgresql://hermes_user:change-me@postgres.example.internal:5432/hermes_control_plane
|
||||
HERMES_ADMIN_USERNAME=admin
|
||||
HERMES_ADMIN_PASSWORD=change-this-to-a-long-random-password
|
||||
HERMES_ADMIN_SESSION_TTL_HOURS=12
|
||||
HERMES_LOG_RETENTION_DAYS=90
|
||||
HERMES_AUDIT_MAX_BYTES=10485760
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run Compose and full tests**
|
||||
|
||||
```bash
|
||||
npm run check
|
||||
npm test
|
||||
```
|
||||
|
||||
Expected: PASS. Compose output shows only the public gateway services publishing `8645` and `8646`.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add docker-compose.yml Dockerfile .env.example .dockerignore test/compose-contract.test.cjs
|
||||
git commit -m "feat: route public api traffic through managed gateway"
|
||||
```
|
||||
|
||||
### Task 10: Documentation, Browser QA, and End-to-End Verification
|
||||
|
||||
**Files:**
|
||||
- Modify: `README.md`
|
||||
- Modify: `.gitignore`
|
||||
- Create: `test/e2e-smoke.cjs`
|
||||
|
||||
- [ ] **Step 1: Add the failing smoke test**
|
||||
|
||||
Create a smoke test that, against a running stack configured with `TEST_DATABASE_URL`:
|
||||
|
||||
1. Logs into the control plane.
|
||||
2. Creates a pre-only API user.
|
||||
3. Calls pre API successfully.
|
||||
4. Calls post API and receives `403`.
|
||||
5. Rotates the key and verifies the old key receives `401`.
|
||||
6. Downloads JSONL and verifies the successful pre request is present.
|
||||
|
||||
- [ ] **Step 2: Run the smoke test before documentation**
|
||||
|
||||
```bash
|
||||
node --test test/e2e-smoke.cjs
|
||||
```
|
||||
|
||||
Expected: PASS against the running stack; explicit SKIP when `RUN_E2E` is not set.
|
||||
|
||||
- [ ] **Step 3: Document deployment and security operations**
|
||||
|
||||
Update `README.md` with:
|
||||
|
||||
- Dedicated PostgreSQL database creation and least-privilege user guidance.
|
||||
- Required Portainer environment variables.
|
||||
- Admin login URL.
|
||||
- Pre/post public API URLs.
|
||||
- API-user creation/rotation/revocation behavior.
|
||||
- JSONL download and 90-day retention behavior.
|
||||
- Backup guidance for PostgreSQL and `.hermes/.codex/.claude/.gemini`.
|
||||
- Warning that full prompts and responses are retained for 90 days.
|
||||
|
||||
Add `.superpowers/` to `.gitignore`.
|
||||
|
||||
- [ ] **Step 4: Run final automated verification**
|
||||
|
||||
```bash
|
||||
npm run check
|
||||
npm test
|
||||
git diff --check
|
||||
```
|
||||
|
||||
Expected: all checks pass.
|
||||
|
||||
- [ ] **Step 5: Run browser QA**
|
||||
|
||||
Start the stack, then verify in a real browser at desktop and mobile widths:
|
||||
|
||||
- Unauthenticated `/` redirects to login.
|
||||
- Login form works and errors are clear.
|
||||
- API-user table aligns correctly and no cells overlap.
|
||||
- Create/edit/rotate dialogs work.
|
||||
- Plaintext key appears once.
|
||||
- JSONL download triggers a file download.
|
||||
- Revoke/delete confirmations are explicit.
|
||||
- Mobile layout becomes stacked records without horizontal clipping.
|
||||
|
||||
- [ ] **Step 6: Commit**
|
||||
|
||||
```bash
|
||||
git add README.md .gitignore test/e2e-smoke.cjs
|
||||
git commit -m "docs: document managed hermes api access"
|
||||
```
|
||||
|
||||
## Final Verification Checklist
|
||||
|
||||
- [ ] `npm run check` passes.
|
||||
- [ ] `npm test` passes with PostgreSQL integration tests enabled.
|
||||
- [ ] `git diff --check` passes.
|
||||
- [ ] Only the managed pre/post gateway services publish AI API ports.
|
||||
- [ ] Control-plane HTML, assets, and management endpoints require admin login.
|
||||
- [ ] API keys are shown once and stored only as hashes.
|
||||
- [ ] Pre/post permissions, rate limits, monthly limits, expiry, revoke, rotate, and delete work.
|
||||
- [ ] Full prompts/responses are downloadable as JSONL and never displayed in the UI.
|
||||
- [ ] Message content older than 90 days is deleted while usage metadata remains.
|
||||
- [ ] Browser QA passes at desktop and mobile widths.
|
||||
Reference in New Issue
Block a user