docs: design admin auth and api user management

This commit is contained in:
2026-06-06 00:49:00 -06:00
parent 6acec6bd2f
commit fbd4e231ca
@@ -0,0 +1,345 @@
# Hermes Admin Auth, API Users, Usage, and Audit Design
## Objective
Add a secure administrative access layer and a managed public API gateway to the Hermes control plane.
The finished system will:
- Protect the entire control-plane website and all management endpoints with an environment-configured admin login.
- Let the admin create and manage API users with individually issued API keys.
- Give each API key independent access to the pre-Hermes API, post-Hermes API, or both.
- Enforce per-key requests-per-minute and monthly token limits.
- Store complete request prompts and response content for 90 days.
- Provide downloadable JSONL audit logs without displaying message content in the website.
- Use a dedicated database on an existing PostgreSQL server.
## System Architecture
The Compose stack will contain these logical services:
### Control Plane
The existing Node control-plane server continues serving the administrative UI on port `7843`.
It will:
- Require an authenticated admin session for every page and management endpoint.
- Provide API-user creation, editing, rotation, revocation, deletion, and JSONL-log download endpoints.
- Display the approved API-user management table.
- Connect to the dedicated PostgreSQL database through `DATABASE_URL`.
### Public API Gateway
A new Node gateway service will be the only externally exposed AI API.
It owns the public ports:
- `8645`: pre-Hermes OpenAI-compatible API
- `8646`: post-Hermes OpenAI-compatible API
For every request, it will:
1. Read the bearer API key.
2. Hash the key and find its API user.
3. Validate active status, expiry, and pre/post permission.
4. Enforce requests-per-minute and monthly token limits.
5. Forward the request to the correct internal Hermes upstream.
6. Stream the upstream response to the client while capturing it for audit storage.
7. Store request metadata, full prompt content, full response content, status, latency, and token usage.
8. Update the API user's last-used timestamp and usage totals.
### Pre-Hermes Upstream
The pre-Hermes service runs Hermes' native direct provider proxy:
```text
hermes proxy start --provider <provider> --host 0.0.0.0 --port <internal-port>
```
This path forwards OpenAI-compatible requests directly to an authenticated provider. It does not run the Hermes agent, tools, memory, or instructions.
The service will be reachable only on the internal Compose network.
### Post-Hermes Upstream
The post-Hermes service runs:
```text
hermes gateway run
```
with Hermes' full agent API enabled through:
```text
API_SERVER_ENABLED=true
API_SERVER_HOST=0.0.0.0
API_SERVER_PORT=<internal-port>
```
This path runs requests through the full Hermes agent, including configured tools, memory, instructions, and session behavior.
The service will be reachable only on the internal Compose network.
## Authentication
### Admin Authentication
The admin username and password come from:
```text
HERMES_ADMIN_USERNAME
HERMES_ADMIN_PASSWORD
```
The control plane will not start in an externally accessible configuration if either value is missing or weak.
Login behavior:
- Credentials are compared using timing-safe comparisons.
- A successful login creates a cryptographically random admin session token.
- Only a hash of the session token is stored in PostgreSQL.
- The browser receives the token in an HTTP-only, SameSite cookie.
- Sessions have a configurable lifetime and can be invalidated through logout.
- Every control-plane page, management endpoint, and application asset requires a valid session. Only the login endpoint, its dedicated minimal assets, and health endpoints are unauthenticated.
### API-Key Authentication
API keys will use a recognizable prefix and random secret, such as:
```text
hms_<random-secret>
```
Rules:
- The plaintext key is shown only once, immediately after creation or rotation.
- PostgreSQL stores only a SHA-256 hash and a short display suffix.
- Rotating a key invalidates the previous key immediately.
- Revoking or deleting an API user invalidates its key immediately.
- Each key independently permits pre-Hermes, post-Hermes, or both.
## Limits and Enforcement
Each API user has:
- Requests-per-minute limit
- Monthly token limit
- Optional expiration timestamp
- Active or revoked status
- Pre-Hermes permission
- Post-Hermes permission
The gateway will enforce limits using PostgreSQL transactions so behavior remains consistent across restarts and multiple gateway instances.
OpenAI-compatible errors:
- `401`: missing or invalid API key
- `403`: valid key without permission for the selected API
- `410`: expired or revoked API key
- `429`: requests-per-minute or monthly token limit exceeded
- `502`: selected Hermes upstream unavailable
Monthly usage is calculated by UTC calendar month.
Before forwarding a request, the gateway rejects keys whose recorded monthly usage has already reached the configured limit. After completion, it reconciles the request using the upstream's reported token usage. A single in-flight request can therefore finish above the monthly boundary; subsequent requests are blocked.
## PostgreSQL Data Model
The system uses a dedicated PostgreSQL database supplied through:
```text
DATABASE_URL
```
Required logical tables:
### `admin_sessions`
- Session token hash
- Created timestamp
- Expiration timestamp
- Last-seen timestamp
- Revoked timestamp
### `api_users`
- Stable ID
- Display name
- Status
- Pre-Hermes permission
- Post-Hermes permission
- Requests-per-minute limit
- Monthly token limit
- Expiration timestamp
- Created timestamp
- Updated timestamp
- Last-used timestamp
- Revoked timestamp
### `api_keys`
- Stable ID
- API-user ID
- Key hash
- Key display suffix
- Created timestamp
- Revoked timestamp
The schema supports key history while allowing only one active key per API user.
### `usage_events`
- API-user ID
- API-key ID
- Pre/post route
- Request timestamp
- Completion timestamp
- HTTP status
- Model
- Prompt tokens
- Completion tokens
- Total tokens
- Latency
- Error code
### `message_logs`
- Usage-event ID
- Full request body
- Full captured response
- Response content type
- Streaming flag
- Created timestamp
- Automatic deletion timestamp
Request and response bodies are stored as JSONB when valid JSON and as text when an upstream returns non-JSON content.
## Log Retention and Downloads
Full prompt and response logs are retained for exactly 90 days.
A scheduled cleanup task deletes expired `message_logs` only. `usage_events` remain available so aggregate usage totals and operational audit metadata survive after message content expires.
Message content is never displayed in the control-plane UI.
An authenticated admin can download JSONL logs:
- For one API user
- For all API users
- Filtered by start and end timestamp
- Limited to the retained 90-day window
Each JSONL record includes:
- API user ID and display name
- Route: pre or post
- Timestamp
- Model
- Request ID
- HTTP status
- Token counts
- Latency
- Full request body
- Full response content
- Error details when applicable
Downloads stream records from PostgreSQL instead of loading the entire export into memory.
## Admin User Interface
The entire control-plane website requires admin login.
After login, the existing visual language remains unchanged. A new `API Users` pane will use the approved structured table layout.
Desktop table columns:
- User and status
- Masked API-key suffix
- Access: pre, post, or both
- Monthly token and requests-per-minute limits
- Last used
- Expiration
- Actions menu
Actions:
- Create API user
- Edit display name
- Edit pre/post permissions
- Edit limits
- Set or remove expiration
- Rotate key
- Download JSONL logs
- Revoke
- Delete
The create and rotate flows show the plaintext key once and require the admin to acknowledge that it cannot be retrieved later.
On narrow screens, each table row becomes a compact stacked record with the same actions.
## Delete and Revoke Semantics
- **Revoke** disables the API user immediately but preserves the user record and allows later reactivation.
- **Delete** permanently removes the API user from the active management list and invalidates its keys.
- Usage and message logs for deleted users remain until the normal 90-day message-log expiration.
- Audit records preserve the deleted user's ID and last known display name.
## Failure Handling
- Database unavailable at startup: management and public gateway services fail closed.
- Database unavailable during a request: reject the request instead of bypassing authentication or limits.
- Upstream unavailable: return an OpenAI-compatible `502` and record the failure event.
- Client disconnect during streaming: stop forwarding when possible and store the partial response with a disconnect marker.
- Log-write failure after a completed request: emit a critical service log and mark the usage event as audit-incomplete.
- Cleanup failure: retry on the next scheduled run without blocking API traffic.
## Deployment Configuration
New required environment variables:
```text
DATABASE_URL
HERMES_ADMIN_USERNAME
HERMES_ADMIN_PASSWORD
```
Additional configurable values will include:
```text
HERMES_ADMIN_SESSION_TTL_HOURS
HERMES_LOG_RETENTION_DAYS=90
HERMES_PRE_UPSTREAM_URL
HERMES_POST_UPSTREAM_URL
```
The pre/post native Hermes services will no longer publish their internal ports. Only the public gateway publishes `8645` and `8646`.
## Testing and Verification
Automated tests will cover:
- Admin login success and failure
- Timing-safe credential checks
- Session creation, expiry, logout, and revocation
- Protection of control-plane HTML, application assets, and every management endpoint, while keeping only login assets and health endpoints public
- API-user creation, editing, rotation, revocation, deletion, and expiry
- Plaintext keys shown once and hashes stored at rest
- Pre/post permission enforcement
- Requests-per-minute enforcement
- Monthly UTC token-limit enforcement
- Non-streaming forwarding and logging
- Streaming forwarding, capture, partial responses, and disconnect handling
- Prompt and response JSONL downloads
- 90-day retention cleanup
- Database migration idempotency
- Failure-closed behavior when PostgreSQL is unavailable
- Compose service routing and health checks
Manual verification will confirm:
- The login screen and API-user table match the existing control-plane style.
- API keys work against the permitted public endpoint and fail against forbidden endpoints.
- Last-used, limits, status, and expiration display correctly.
- Rotated, revoked, deleted, and expired keys stop working immediately.
- Downloaded JSONL contains complete retained prompts and responses.