Route APIs through shared Hermes gateway

2026-06-08 20:47:15 -06:00
parent ef7da651ac
commit c645027805
10 changed files with 378 additions and 81 deletions
@@ -8,9 +8,12 @@ HERMES_PRE_AI_API_PORT=8645
 HERMES_POST_AI_API_PORT=8646
 HERMES_PUBLISHED_BIND_IP=127.0.0.1

-HERMES_PRE_AI_PROVIDER=nous
-HERMES_POST_AI_PROVIDER=nous
 HERMES_INTERNAL_API_SERVER_KEY=change-this-to-a-separate-long-random-key
+HERMES_DEFAULT_PROVIDER=openai-codex
+HERMES_DEFAULT_THINKING_EFFORT=medium
+HERMES_DEFAULT_CLAUDE_MODEL=claude-sonnet-4.6
+HERMES_DEFAULT_CODEX_MODEL=gpt-5.4-codex
+HERMES_DEFAULT_GEMINI_MODEL=gemini-3.5-flash

 HERMES_HOME_HOST=/opt/hermes-control-plane/hermes
 CODEX_HOME_HOST=/opt/hermes-control-plane/codex
@@ -45,7 +45,9 @@ HERMES_LOG_RETENTION_DAYS      Audit log retention in days (default: 90)
 HERMES_AUDIT_MAX_BYTES         Max bytes per logged request body (default: 65536)
 HERMES_IMAGE                   Registry image Portainer pulls for app services
 HERMES_AGENT_REF               Pinned Hermes source revision baked into the image
-HERMES_INTERNAL_API_SERVER_KEY Separate internal key required when enabling the post gateway
+HERMES_INTERNAL_API_SERVER_KEY Separate internal key required when enabling API gateways
+HERMES_DEFAULT_PROVIDER        Provider used when an API request omits model/provider (default: openai-codex)
+HERMES_DEFAULT_THINKING_EFFORT Effort used when an API request omits reasoning/thinking effort (default: medium)
 ```

 ## Admin Login
@@ -62,19 +64,19 @@ If the admin UI is exposed through an HTTPS reverse proxy, set `HERMES_ADMIN_COO

 API keys start with `hms_`. Each key is scoped to allow the pre gateway, the post gateway, or both.

-The native gateways are intentionally opt-in so missing provider authentication
-cannot break the control plane:
+The native API gateways are opt-in so missing provider authentication cannot
+break the control plane. Both public API wrappers forward to one internal
+Hermes gateway. That internal gateway uses the configured primary model and
+`fallback_providers` chain, so the pre API is not pinned to one upstream
+provider.

 ```bash
-# Requires Hermes Nous or xAI authentication in HERMES_HOME_HOST
-docker compose --profile pre-gateway up --build -d
-
-# Starts the native Hermes tool/API gateway
-docker compose --profile post-gateway up --build -d
+# Starts the shared internal Hermes gateway plus both public API wrappers.
+docker compose --profile pre-gateway --profile post-gateway up --build -d
 ```

- **Pre API** (`http://<host>:8645/v1/chat/completions`) — for requests sent before Hermes processing.
- **Post API** (`http://<host>:8646/v1/chat/completions`) — for requests sent after Hermes processing.
+- **Pre API** (`http://<host>:8645/v1/chat/completions`) — public OpenAI-compatible API with `pre` API-key permissions and audit labels.
+- **Post API** (`http://<host>:8646/v1/chat/completions`) — public OpenAI-compatible API with `post` API-key permissions and audit labels.

 Send requests with a Bearer token:

@@ -84,6 +86,20 @@ Authorization: Bearer hms_...
 Content-Type: application/json
 ```

+If a JSON request to `/v1/chat/completions` or `/v1/responses` omits a model,
+the wrapper fills one from the default provider:
+
+```text
+Claude:  claude-sonnet-4.6
+Codex:   gpt-5.4-codex
+Gemini:  gemini-3.5-flash
+```
+
+Clients can still set the normal OpenAI-compatible `model` field. For thinking
+effort, clients can send `reasoning.effort`, `reasoning_effort`, or
+`thinking_effort`; the wrapper forwards it as `reasoning.effort`. If omitted,
+it defaults to `medium`.
+
 A key not authorized for a gateway returns `403 Forbidden`. A revoked or rotated key returns `410 Gone`.

 ## API Key Management
@@ -252,34 +268,38 @@ default Hermes `config.yaml` only when one does not already exist. Existing
 configuration and authentication files are preserved.

 The default Portainer deployment starts only PostgreSQL and the control plane.
-This is deliberate: Codex, Claude, and Gemini auth mounts are optional, and the
-native pre-Hermes proxy supports only providers reported by `hermes proxy providers`.
-This Hermes build currently reports `nous` and `xai`.
+This is deliberate: Codex, Claude, and Gemini auth mounts are optional. Enable
+both API profiles when you want the OpenAI-compatible endpoints.

 ```text
 Default services: hermes-postgres, hermes-control-plane
-Pre profile:      hermes-pre-upstream, hermes-pre-api
-Post profile:     hermes-post-upstream, hermes-post-api
+Pre profile:      hermes-ai-upstream, hermes-pre-api
+Post profile:     hermes-ai-upstream, hermes-post-api
 ```

 Enable optional gateway profiles only after their prerequisites are configured.
-With Docker Compose, use `--profile pre-gateway` or `--profile post-gateway`.
-In a Portainer version that exposes Compose profiles, enable the matching
-profile during stack deployment.
+With Docker Compose, use `--profile pre-gateway --profile post-gateway`. In
+Portainer, set `COMPOSE_PROFILES=pre-gateway,post-gateway` in the stack
+environment if the UI does not expose profile toggles.

 Set these in Portainer when enabling gateways:

 ```text
+COMPOSE_PROFILES=pre-gateway,post-gateway
 HERMES_PRE_AI_API_PORT=8645
-HERMES_PRE_AI_PROVIDER=nous
 HERMES_POST_AI_API_PORT=8646
-HERMES_POST_AI_PROVIDER=nous
 HERMES_INTERNAL_API_SERVER_KEY=<separate-long-random-key>
+HERMES_DEFAULT_PROVIDER=openai-codex
+HERMES_DEFAULT_THINKING_EFFORT=medium
+HERMES_DEFAULT_CLAUDE_MODEL=claude-sonnet-4.6
+HERMES_DEFAULT_CODEX_MODEL=gpt-5.4-codex
+HERMES_DEFAULT_GEMINI_MODEL=gemini-3.5-flash
 ```

-The public post API accepts the control plane's `hms_` user keys. Internally it
-replaces that header with `HERMES_INTERNAL_API_SERVER_KEY` before forwarding to
-Hermes. Do not reuse the admin password or expose the native upstream ports.
+The public pre and post APIs accept the control plane's `hms_` user keys.
+Internally they replace that header with `HERMES_INTERNAL_API_SERVER_KEY`
+before forwarding to Hermes. Do not reuse the admin password or expose the
+native upstream port.

 ## Backup

@@ -17,6 +17,13 @@ const GATEWAY_PORT = parseInt(process.env.HERMES_API_GATEWAY_PORT || "8080", 10)
 const UPSTREAM_URL = process.env.HERMES_UPSTREAM_URL
 const UPSTREAM_API_KEY = process.env.HERMES_UPSTREAM_API_KEY || ""
 const AUDIT_MAX_BYTES = parseInt(process.env.HERMES_AUDIT_MAX_BYTES || "10485760", 10)
+const DEFAULT_PROVIDER = normalizeProviderName(process.env.HERMES_DEFAULT_PROVIDER || "openai-codex")
+const DEFAULT_THINKING_EFFORT = process.env.HERMES_DEFAULT_THINKING_EFFORT || "medium"
+const DEFAULT_MODELS = {
+  anthropic: process.env.HERMES_DEFAULT_CLAUDE_MODEL || "claude-sonnet-4.6",
+  "openai-codex": process.env.HERMES_DEFAULT_CODEX_MODEL || "gpt-5.4-codex",
+  "google-gemini-cli": process.env.HERMES_DEFAULT_GEMINI_MODEL || "gemini-3.5-flash",
+}

 // Validate required env vars
 if (!DATABASE_URL) {
@@ -64,6 +71,73 @@ function upstreamRequestHeaders(headers) {
  return result
 }

+// ─── OpenAI-compatible request defaults ──────────────────────────────────────
+
+function normalizeProviderName(provider) {
+  const raw = String(provider || "").trim().toLowerCase()
+  if (["claude", "anthropic"].includes(raw)) return "anthropic"
+  if (["codex", "chatgpt", "openai", "openai-codex"].includes(raw)) return "openai-codex"
+  if (["gemini", "google", "google-gemini-cli"].includes(raw)) return "google-gemini-cli"
+  return raw
+}
+
+function defaultModelForProvider(provider) {
+  return DEFAULT_MODELS[normalizeProviderName(provider)] || DEFAULT_MODELS[DEFAULT_PROVIDER] || DEFAULT_MODELS["openai-codex"]
+}
+
+function providerFromRequestBody(body) {
+  return normalizeProviderName(
+    body.provider ||
+    body.hermes_provider ||
+    body.metadata?.provider ||
+    DEFAULT_PROVIDER
+  )
+}
+
+function shouldNormalizeOpenAiBody(req) {
+  if (req.method !== "POST") return false
+  const path = String(req.url || "").split("?")[0]
+  return path === "/v1/chat/completions" || path === "/v1/responses"
+}
+
+function normalizeOpenAiRequestBody(req, requestBodyBuffer) {
+  if (!shouldNormalizeOpenAiBody(req) || !requestBodyBuffer.length) {
+    return { buffer: requestBodyBuffer, json: null, model: null }
+  }
+
+  let parsed
+  try {
+    parsed = JSON.parse(requestBodyBuffer.toString("utf8"))
+  } catch {
+    return { buffer: requestBodyBuffer, json: null, model: null }
+  }
+  if (!parsed || typeof parsed !== "object" || Array.isArray(parsed)) {
+    return { buffer: requestBodyBuffer, json: parsed, model: null }
+  }
+
+  const normalized = { ...parsed }
+  if (!normalized.model) {
+    normalized.model = defaultModelForProvider(providerFromRequestBody(normalized))
+  }
+
+  const explicitEffort =
+    normalized.reasoning_effort ||
+    normalized.thinking_effort ||
+    normalized.reasoning?.effort ||
+    normalized.thinking?.effort ||
+    DEFAULT_THINKING_EFFORT
+
+  normalized.reasoning = {
+    ...(normalized.reasoning && typeof normalized.reasoning === "object" ? normalized.reasoning : {}),
+    effort: explicitEffort,
+  }
+  delete normalized.reasoning_effort
+  delete normalized.thinking_effort
+
+  const buffer = Buffer.from(JSON.stringify(normalized), "utf8")
+  return { buffer, json: normalized, model: normalized.model || null }
+}
+
 // ─── Upstream request ─────────────────────────────────────────────────────────

 /**
@@ -279,15 +353,23 @@ async function handleRequest(req, res, pool) {
    return
  }

+  const normalizedRequest = normalizeOpenAiRequestBody(req, requestBodyBuffer)
+  requestBodyBuffer = normalizedRequest.buffer
+
  // Extract model from request JSON if possible
  let requestModel = null
  let requestJson = null
-  try {
-    const parsed = JSON.parse(requestBodyBuffer.toString("utf8"))
-    requestModel = parsed.model || null
-    requestJson = requestBodyExceeded ? null : parsed
-  } catch {
-    // non-JSON body
+  if (normalizedRequest.json && !requestBodyExceeded) {
+    requestModel = normalizedRequest.model
+    requestJson = normalizedRequest.json
+  } else {
+    try {
+      const parsed = JSON.parse(requestBodyBuffer.toString("utf8"))
+      requestModel = parsed.model || null
+      requestJson = requestBodyExceeded ? null : parsed
+    } catch {
+      // non-JSON body
+    }
  }

  await beginUsageEvent(pool, {
@@ -2,9 +2,9 @@

 // ── Constants ──────────────────────────────────────────────────────────────
 const PROVIDERS = [
-  { id: "anthropic",         label: "Claude",   kind: "OAuth Pool", mark: "A1", default_model: "claude-opus-4.6",      oauth: true  },
-  { id: "openai-codex",      label: "Codex",    kind: "OAuth Pool", mark: "B2", default_model: "gpt-5.3-codex",        oauth: true  },
-  { id: "google-gemini-cli", label: "Gemini",   kind: "OAuth Pool", mark: "C3", default_model: "gemini-3-pro-preview", oauth: true  },
+  { id: "anthropic",         label: "Claude",   kind: "OAuth Pool", mark: "A1", default_model: "claude-sonnet-4.6",    oauth: true  },
+  { id: "openai-codex",      label: "Codex",    kind: "OAuth Pool", mark: "B2", default_model: "gpt-5.4-codex",        oauth: true  },
+  { id: "google-gemini-cli", label: "Gemini",   kind: "OAuth Pool", mark: "C3", default_model: "gemini-3.5-flash",     oauth: true  },
  { id: "deepseek",          label: "DeepSeek", kind: "API Key",    mark: "D4", default_model: "deepseek-chat",        oauth: false }
 ]

@@ -207,6 +207,9 @@ function renderProviderCards(data) {
  for (const p of PROVIDERS) {
    const pool = poolByProvider[p.id]
    const count = pool?.count ?? 0
+    const authState = pool?.authState || (p.id === "deepseek"
+      ? (data.deepseekConfigured ? { state: "authenticated", label: "Key set" } : { state: "unauthenticated", label: "No key" })
+      : { state: "unauthenticated", label: "Unauthenticated" })

    const card = document.createElement("div")
    card.className = "pcard"
@@ -251,6 +254,7 @@ function renderProviderCards(data) {
        <div class="pcard-kind">${p.kind}</div>
        <h3 class="pcard-title">${p.label}</h3>
        <div class="pcard-sub">${p.id}${count ? ` · ${count}` : ""}</div>
+        <div class="auth-pill" data-state="${escapeHtml(authState.state)}">${escapeHtml(authState.label)}</div>
      </div>
      ${credsHtml}
      <div class="pcard-actions">${actionsHtml}</div>
@@ -86,31 +86,8 @@ services:
      retries: 3
      start_period: 10s

-  hermes-pre-upstream:
-    profiles: ["pre-gateway"]
-    build: *hermes-build
-    image: *hermes-image
-    user: ${HERMES_CONTAINER_USER:-0:0}
-    restart: unless-stopped
-    expose:
-      - "8645"
-    command:
-      - /bin/sh
-      - -lc
-      - exec "$$HERMES_EXE" proxy start --provider "$$HERMES_PRE_AI_PROVIDER" --host 0.0.0.0 --port 8645
-    environment:
-      <<: *hermes-environment
-      HERMES_PRE_AI_PROVIDER: ${HERMES_PRE_AI_PROVIDER:-nous}
-    volumes: *hermes-volumes
-    healthcheck:
-      test: ["CMD-SHELL", "node -e \"fetch('http://127.0.0.1:8645/health').then(r=>process.exit(r.ok?0:1)).catch(()=>process.exit(1))\""]
-      interval: 30s
-      timeout: 10s
-      retries: 3
-      start_period: 20s
-
-  hermes-post-upstream:
-    profiles: ["post-gateway"]
+  hermes-ai-upstream:
+    profiles: ["pre-gateway", "post-gateway"]
    build: *hermes-build
    image: *hermes-image
    user: ${HERMES_CONTAINER_USER:-0:0}
@@ -127,7 +104,6 @@ services:
      API_SERVER_HOST: 0.0.0.0
      API_SERVER_PORT: 8642
      API_SERVER_KEY: ${HERMES_INTERNAL_API_SERVER_KEY:-change-this-internal-hermes-key}
-      HERMES_POST_AI_PROVIDER: ${HERMES_POST_AI_PROVIDER:-nous}
    volumes: *hermes-volumes
    healthcheck:
      test: ["CMD-SHELL", "node -e \"fetch('http://127.0.0.1:8642/health').then(r=>process.exit(r.ok?0:1)).catch(()=>process.exit(1))\""]
@@ -150,13 +126,19 @@ services:
      HERMES_API_ROUTE_KIND: pre
      HERMES_API_GATEWAY_HOST: 0.0.0.0
      HERMES_API_GATEWAY_PORT: 8645
-      HERMES_UPSTREAM_URL: http://hermes-pre-upstream:8645
+      HERMES_UPSTREAM_URL: http://hermes-ai-upstream:8642
+      HERMES_UPSTREAM_API_KEY: ${HERMES_INTERNAL_API_SERVER_KEY:-change-this-internal-hermes-key}
+      HERMES_DEFAULT_PROVIDER: ${HERMES_DEFAULT_PROVIDER:-openai-codex}
+      HERMES_DEFAULT_THINKING_EFFORT: ${HERMES_DEFAULT_THINKING_EFFORT:-medium}
+      HERMES_DEFAULT_CLAUDE_MODEL: ${HERMES_DEFAULT_CLAUDE_MODEL:-claude-sonnet-4.6}
+      HERMES_DEFAULT_CODEX_MODEL: ${HERMES_DEFAULT_CODEX_MODEL:-gpt-5.4-codex}
+      HERMES_DEFAULT_GEMINI_MODEL: ${HERMES_DEFAULT_GEMINI_MODEL:-gemini-3.5-flash}
      HERMES_LOG_RETENTION_DAYS: ${HERMES_LOG_RETENTION_DAYS:-90}
      HERMES_AUDIT_MAX_BYTES: ${HERMES_AUDIT_MAX_BYTES:-10485760}
    depends_on:
      hermes-postgres:
        condition: service_healthy
-      hermes-pre-upstream:
+      hermes-ai-upstream:
        condition: service_started
    healthcheck:
      test: ["CMD-SHELL", "node -e \"fetch('http://127.0.0.1:8645/health').then(r=>process.exit(r.ok?0:1)).catch(()=>process.exit(1))\""]
@@ -179,14 +161,19 @@ services:
      HERMES_API_ROUTE_KIND: post
      HERMES_API_GATEWAY_HOST: 0.0.0.0
      HERMES_API_GATEWAY_PORT: 8646
-      HERMES_UPSTREAM_URL: http://hermes-post-upstream:8642
+      HERMES_UPSTREAM_URL: http://hermes-ai-upstream:8642
      HERMES_UPSTREAM_API_KEY: ${HERMES_INTERNAL_API_SERVER_KEY:-change-this-internal-hermes-key}
+      HERMES_DEFAULT_PROVIDER: ${HERMES_DEFAULT_PROVIDER:-openai-codex}
+      HERMES_DEFAULT_THINKING_EFFORT: ${HERMES_DEFAULT_THINKING_EFFORT:-medium}
+      HERMES_DEFAULT_CLAUDE_MODEL: ${HERMES_DEFAULT_CLAUDE_MODEL:-claude-sonnet-4.6}
+      HERMES_DEFAULT_CODEX_MODEL: ${HERMES_DEFAULT_CODEX_MODEL:-gpt-5.4-codex}
+      HERMES_DEFAULT_GEMINI_MODEL: ${HERMES_DEFAULT_GEMINI_MODEL:-gemini-3.5-flash}
      HERMES_LOG_RETENTION_DAYS: ${HERMES_LOG_RETENTION_DAYS:-90}
      HERMES_AUDIT_MAX_BYTES: ${HERMES_AUDIT_MAX_BYTES:-10485760}
    depends_on:
      hermes-postgres:
        condition: service_healthy
-      hermes-post-upstream:
+      hermes-ai-upstream:
        condition: service_started
    healthcheck:
      test: ["CMD-SHELL", "node -e \"fetch('http://127.0.0.1:8646/health').then(r=>process.exit(r.ok?0:1)).catch(()=>process.exit(1))\""]
@@ -600,6 +600,29 @@ function matchIdentity(provider, entry, identities) {
  return candidates[entry.index - 1] || null
 }

+function authStateForPool(pool) {
+  const entries = Array.isArray(pool.entries) ? pool.entries : []
+  if (!entries.length) {
+    return { state: "unauthenticated", label: "Unauthenticated" }
+  }
+  const raw = entries.map((entry) => entry.raw || "").join(" ").toLowerCase()
+  if (/\b(exhausted|quota|usage limit|monthly limit|rate[-\s]?limit|429)\b/.test(raw)) {
+    return { state: "usage_limited", label: "Usage limit reached" }
+  }
+  if (/\b(expired|revoked|invalid|unauthorized|auth failed|401|403)\b/.test(raw)) {
+    return { state: "error", label: "Auth needs attention" }
+  }
+  return { state: "authenticated", label: "Authenticated" }
+}
+
+function attachAuthStates(pools) {
+  return pools.map((pool) => ({
+    ...pool,
+    count: pool.count ?? (Array.isArray(pool.entries) ? pool.entries.length : 0),
+    authState: authStateForPool(pool),
+  }))
+}
+
 function enrichAuthPools(pools) {
  const identities = [
    ...readCodexCliIdentities(),
@@ -640,7 +663,7 @@ function enrichAuthPools(pools) {
      existingProviders.add(provider)
    }
  }
-  return enriched
+  return attachAuthStates(enriched)
 }

 function authDirForProvider(provider) {
@@ -492,6 +492,43 @@ body::after {
  margin-top: 2px;
 }

+.auth-pill {
+  display: inline-flex;
+  align-items: center;
+  width: fit-content;
+  margin-top: 7px;
+  padding: 3px 7px;
+  border: 1px solid var(--rule);
+  border-radius: 999px;
+  font-family: var(--ff-mono);
+  font-size: 9px;
+  font-weight: 700;
+  letter-spacing: 0.12em;
+  text-transform: uppercase;
+  color: var(--ink-2);
+  background: rgba(255,255,255,0.03);
+}
+
+.auth-pill[data-state="authenticated"] {
+  border-color: rgba(168, 210, 122, 0.45);
+  color: #a8d27a;
+}
+
+.auth-pill[data-state="usage_limited"] {
+  border-color: rgba(232, 138, 138, 0.55);
+  color: #e88a8a;
+}
+
+.auth-pill[data-state="error"] {
+  border-color: rgba(232, 138, 138, 0.55);
+  color: #e88a8a;
+}
+
+.auth-pill[data-state="unauthenticated"] {
+  border-color: rgba(246, 241, 227, 0.22);
+  color: var(--ink-3);
+}
+
 .pcard-creds {
  list-style: none;
  margin: 0; padding: 0;
@@ -393,6 +393,93 @@ test("api-gateway integration", { timeout: 60000 }, async (t) => {
      assert.ok(body.usage, "should have usage")
    })

+    await t.test("OpenAI-compatible requests get default model and medium reasoning effort", async () => {
+      let receivedBody = null
+      const captureUpstream = await startFakeUpstream((req, res) => {
+        let raw = ""
+        req.setEncoding("utf8")
+        req.on("data", (c) => { raw += c })
+        req.on("end", () => {
+          receivedBody = JSON.parse(raw)
+          res.writeHead(200, { "Content-Type": "application/json" })
+          res.end(JSON.stringify({ model: receivedBody.model, usage: {} }))
+        })
+      })
+      const defaultsGw = await startGateway({
+        databaseUrl,
+        routeKind: "pre",
+        upstreamUrl: captureUpstream.url,
+        extraEnv: {
+          HERMES_DEFAULT_PROVIDER: "openai-codex",
+          HERMES_DEFAULT_CODEX_MODEL: "gpt-5.4-codex",
+        },
+      })
+
+      try {
+        const { status } = await gatewayRequest({
+          host: defaultsGw.host,
+          port: defaultsGw.port,
+          path: "/v1/chat/completions",
+          method: "POST",
+          headers: {
+            "Content-Type": "application/json",
+            "Authorization": `Bearer ${preKey}`,
+          },
+          body: JSON.stringify({ messages: [{ role: "user", content: "hi" }] }),
+        })
+        assert.equal(status, 200)
+        assert.equal(receivedBody.model, "gpt-5.4-codex")
+        assert.deepEqual(receivedBody.reasoning, { effort: "medium" })
+      } finally {
+        await defaultsGw.close()
+        await captureUpstream.close()
+      }
+    })
+
+    await t.test("request model and thinking_effort override gateway defaults", async () => {
+      let receivedBody = null
+      const captureUpstream = await startFakeUpstream((req, res) => {
+        let raw = ""
+        req.setEncoding("utf8")
+        req.on("data", (c) => { raw += c })
+        req.on("end", () => {
+          receivedBody = JSON.parse(raw)
+          res.writeHead(200, { "Content-Type": "application/json" })
+          res.end(JSON.stringify({ model: receivedBody.model, usage: {} }))
+        })
+      })
+      const defaultsGw = await startGateway({
+        databaseUrl,
+        routeKind: "pre",
+        upstreamUrl: captureUpstream.url,
+      })
+
+      try {
+        const { status } = await gatewayRequest({
+          host: defaultsGw.host,
+          port: defaultsGw.port,
+          path: "/v1/responses",
+          method: "POST",
+          headers: {
+            "Content-Type": "application/json",
+            "Authorization": `Bearer ${preKey}`,
+          },
+          body: JSON.stringify({
+            model: "claude-sonnet-4.6",
+            thinking_effort: "high",
+            input: "hi",
+          }),
+        })
+        assert.equal(status, 200)
+        assert.equal(receivedBody.model, "claude-sonnet-4.6")
+        assert.deepEqual(receivedBody.reasoning, { effort: "high" })
+        assert.equal(receivedBody.thinking_effort, undefined)
+      } finally {
+        await defaultsGw.close()
+        await captureUpstream.close()
+      }
+    })
+
    await t.test("configured upstream API key replaces the client API key", async () => {
      let receivedAuthorization = null
      const authenticatedUpstream = await startFakeUpstream((req, res) => {
@@ -29,27 +29,23 @@ test("compose config is valid and has correct service structure", (t) => {
  const services = config.services || {}
  const serviceNames = Object.keys(services)

-  // Must have all 6 services
+  // Must have all public services plus one shared native Hermes gateway upstream.
  assert(serviceNames.includes("hermes-postgres"), "missing hermes-postgres")
  assert(serviceNames.includes("hermes-control-plane"), "missing hermes-control-plane")
-  assert(serviceNames.includes("hermes-pre-upstream"), "missing hermes-pre-upstream")
-  assert(serviceNames.includes("hermes-post-upstream"), "missing hermes-post-upstream")
+  assert(serviceNames.includes("hermes-ai-upstream"), "missing hermes-ai-upstream")
  assert(serviceNames.includes("hermes-pre-api"), "missing hermes-pre-api")
  assert(serviceNames.includes("hermes-post-api"), "missing hermes-post-api")

  // Upstream services must NOT have host ports
-  const preUpstream = services["hermes-pre-upstream"]
-  const postUpstream = services["hermes-post-upstream"]
+  const aiUpstream = services["hermes-ai-upstream"]
  const preApi = services["hermes-pre-api"]
  const postApi = services["hermes-post-api"]

-  assert.deepEqual(preUpstream.profiles, ["pre-gateway"], "hermes-pre-upstream should be opt-in")
+  assert.deepEqual(aiUpstream.profiles, ["pre-gateway", "post-gateway"], "hermes-ai-upstream should start with either API profile")
  assert.deepEqual(preApi.profiles, ["pre-gateway"], "hermes-pre-api should be opt-in")
-  assert.deepEqual(postUpstream.profiles, ["post-gateway"], "hermes-post-upstream should be opt-in")
  assert.deepEqual(postApi.profiles, ["post-gateway"], "hermes-post-api should be opt-in")

-  assert(!preUpstream.ports || preUpstream.ports.length === 0, "hermes-pre-upstream should not publish host ports")
-  assert(!postUpstream.ports || postUpstream.ports.length === 0, "hermes-post-upstream should not publish host ports")
+  assert(!aiUpstream.ports || aiUpstream.ports.length === 0, "hermes-ai-upstream should not publish host ports")

  // Public gateway services must publish correct ports
  assert(preApi.ports && preApi.ports.length > 0, "hermes-pre-api should publish ports")
@@ -69,10 +65,10 @@ test("compose config is valid and has correct service structure", (t) => {
  assert.equal(preApiPort?.host_ip, "127.0.0.1", "pre API should bind to loopback by default")
  assert.equal(postApiPort?.host_ip, "127.0.0.1", "post API should bind to loopback by default")

-  // Post upstream must have API_SERVER_ENABLED=true
-  const postUpstreamEnv = postUpstream.environment || {}
-  assert.equal(String(postUpstreamEnv.API_SERVER_ENABLED), "true", "hermes-post-upstream API_SERVER_ENABLED must be true")
-  assert(postUpstreamEnv.API_SERVER_KEY, "hermes-post-upstream must have an internal API server key")
+  // Shared native Hermes upstream must have API_SERVER_ENABLED=true.
+  const aiUpstreamEnv = aiUpstream.environment || {}
+  assert.equal(String(aiUpstreamEnv.API_SERVER_ENABLED), "true", "hermes-ai-upstream API_SERVER_ENABLED must be true")
+  assert(aiUpstreamEnv.API_SERVER_KEY, "hermes-ai-upstream must have an internal API server key")

  // Control plane and gateway services must receive DATABASE_URL
  const controlPlaneEnv = services["hermes-control-plane"].environment || {}
@@ -89,7 +85,7 @@ test("compose config is valid and has correct service structure", (t) => {
    "/home/hermes/.claude",
    "/home/hermes/.gemini",
  ]
-  for (const serviceName of ["hermes-control-plane", "hermes-pre-upstream", "hermes-post-upstream"]) {
+  for (const serviceName of ["hermes-control-plane", "hermes-ai-upstream"]) {
    const volumes = services[serviceName].volumes || []
    for (const target of expectedStateTargets) {
      const mount = volumes.find((volume) => volume.target === target)
@@ -99,12 +95,17 @@ test("compose config is valid and has correct service structure", (t) => {

  const preApiEnv = preApi.environment || {}
  assert("DATABASE_URL" in preApiEnv, "hermes-pre-api must have DATABASE_URL")
+  assert.equal(
+    preApiEnv.HERMES_UPSTREAM_API_KEY,
+    aiUpstreamEnv.API_SERVER_KEY,
+    "pre API must authenticate to the native Hermes API with the same internal key"
+  )

  const postApiEnv = postApi.environment || {}
  assert("DATABASE_URL" in postApiEnv, "hermes-post-api must have DATABASE_URL")
  assert.equal(
    postApiEnv.HERMES_UPSTREAM_API_KEY,
-    postUpstreamEnv.API_SERVER_KEY,
+    aiUpstreamEnv.API_SERVER_KEY,
    "post API must authenticate to the native Hermes API with the same internal key"
  )

@@ -164,6 +164,8 @@ fi
  try {
    const status = await waitForServer("http://127.0.0.1:19743/api/status", proc)
    const byProvider = Object.fromEntries(status.pools.map((pool) => [pool.provider, pool]))
+    assert.strictEqual(byProvider["openai-codex"].authState.state, "authenticated")
+    assert.strictEqual(byProvider["openai-codex"].authState.label, "Authenticated")
    assert.deepStrictEqual(byProvider["openai-codex"].entries.map((entry) => entry.email), [
      "zach@example.com",
      "emma@example.com",
@@ -237,6 +239,10 @@ fi
    assert.strictEqual(byProvider.anthropic.entries[0].identity, "claude@example.com")
    assert.strictEqual(byProvider["google-gemini-cli"].entries[0].identity, "gemini@example.com")

+    assert.strictEqual(byProvider["openai-codex"].authState.state, "authenticated")
+    assert.strictEqual(byProvider.anthropic.authState.state, "authenticated")
+    assert.strictEqual(byProvider["google-gemini-cli"].authState.state, "authenticated")
+
    const remove = await postJson("http://127.0.0.1:19744/api/auth/remove", {
      provider: "openai-codex",
      index: 2,
@@ -253,7 +259,54 @@ fi
  }
 }

-main().then(directMountedAuthMain).catch((err) => {
+async function authStateMain() {
+  const tmp = fs.mkdtempSync(path.join(os.tmpdir(), "hermes-auth-state-test-"))
+  const hermesHome = path.join(tmp, ".hermes")
+  const codexHome = path.join(tmp, ".codex")
+  fs.mkdirSync(hermesHome, { recursive: true })
+  fs.mkdirSync(codexHome, { recursive: true })
+  fs.writeFileSync(path.join(hermesHome, "config.yaml"), "fallback_providers: []\n")
+
+  const fakeHermes = path.join(tmp, "hermes")
+  fs.writeFileSync(fakeHermes, `#!/bin/sh
+if [ "$1" = "auth" ] && [ "$2" = "list" ]; then
+  cat <<'EOF'
+openai-codex (1 credential):
+  #1 default exhausted: monthly usage limit reached ←
+anthropic (0 credentials):
+EOF
+elif [ "$1" = "fallback" ] && [ "$2" = "list" ]; then
+  echo "No fallback configured"
+elif [ "$1" = "version" ]; then
+  echo "test-hermes"
+else
+  exit 0
+fi
+`)
+  fs.chmodSync(fakeHermes, 0o755)
+
+  const proc = startStatusServer({ after: (fn) => process.once("exit", fn) }, {
+    hermesHome,
+    codexHome,
+    claudeHome: path.join(tmp, ".claude"),
+    geminiHome: path.join(tmp, ".gemini"),
+    fakeHermes,
+    port: 19745,
+  })
+  try {
+    const status = await waitForServer("http://127.0.0.1:19745/api/status", proc)
+    const byProvider = Object.fromEntries(status.pools.map((pool) => [pool.provider, pool]))
+    assert.strictEqual(byProvider["openai-codex"].authState.state, "usage_limited")
+    assert.match(byProvider["openai-codex"].authState.label, /Usage limit/i)
+    assert.strictEqual(byProvider.anthropic.authState.state, "unauthenticated")
+    assert.strictEqual(byProvider.anthropic.authState.label, "Unauthenticated")
+  } finally {
+    proc.kill()
+    fs.rmSync(tmp, { recursive: true, force: true })
+  }
+}
+
+main().then(directMountedAuthMain).then(authStateMain).catch((err) => {
  console.error(err)
  process.exit(1)
 })