Skip to main content

API contract

Three structural contracts every Hyrax API surface honors. SDK consumers can rely on these without per-endpoint checks.

1. Stripe-style error envelope

Every 4xx/5xx response is shaped:

{
"error": {
"type": "validation_error",
"code": "invalid_branch",
"message": "github_base_branch must match an existing branch on the repo",
"param": "github_base_branch",
"doc_url": "https://...",
"errors": [...],
"request_id": "req_..."
}
}

The nine Stripe-aligned type values:

typeWhenExample code
validation_errorPydantic / shape validation failureinvalid_branch, invalid_workflow
authentication_errorMissing or malformed hk_live_* keymissing_api_key
permission_errorAuth ok, but the principal can't do thistier_mismatch, permission_denied
rate_limit_errorInbound rate limit (see § Inbound API rate limits)rate_limit_qps, rate_limit_budget
billing_error402 from the submission gate — credit wallet exhausted or plan doesn't include the requested capability.credit_exhausted, plan_restricted
not_foundResource doesn't exist or is hiddenrepo_not_found, job_not_found
conflictState precondition failedjob_already_running
invalid_request_errorCaller-side problem outside validationunsupported_workflow
server_error5xxinternal_error

Validation failures expose the per-field array on errors[]. The per-error slug on code is what SDKs branch on; type is the broad bucket.

Implementation: apps/api/app/error_envelope.py translates every typed exception (hyrax.errors.NotFound, BadRequest, RateLimitExceeded, …) and HTTPException into the envelope. Raise the typed exceptions directly; never construct the envelope by hand.

2. /api/... canonical mount

Every router mounts at the canonical /api/... path. The SDK regen reads that surface and produces clean function names like getApiAdminTenants — no V1 infix to churn on rename.

Mounting goes through apps/api/app/versioning.py::include_versioned_router. New routers go through this helper; calling app.include_router(...) directly fails the catalog parity check.

When v2 ships. Add a parallel /api/v2/... alias the same way; route within handlers if the wire shape needs to diverge per version. The canonical mount stays at /api/.... (A /api/v1/... alias was previously kept "for callers that pin the version explicitly" and was dropped in #119 — the SDK already only consumes the canonical paths, so the alias was preemption for a hypothetical v2 fork. Re-introducing it later is a one-line change.)

Non-/api/ routers (mount unchanged; not part of the SDK contract):

  • /health/... — k8s liveness/readiness probes; URLs are baked into deployment manifests, not callers' SDKs.
  • /auth/... — GitHub OAuth callback; URL is registered with GitHub at App-install time, not negotiated in band.
  • /internal/... — worker M2M.

3. Cursor pagination contract

Every list endpoint returns PaginatedResponse[T]:

{
"data": [...],
"next_cursor": "eyJpZCI6ICIuLi4ifQ==",
"has_more": true
}
  • next_cursor is opaque base64. Pass it back verbatim on the next request; never inspect it.
  • data carries the page contents.
  • has_more lets clients short-circuit when they know they're done.

Default limit=50, max limit=200. Bare arrays / RootModel[list[T]] fail the catalog parity test (tests/test_api_catalog_pagination.py::TestCatalogParity).

Inbound API rate limits

Hierarchical limiter, two token buckets stacked per request — per-tenant ceiling AND per-key ceiling, deplete-first wins. Each bucket carries two dimensions: RPS (request volume) and $/hour (LLM spend). 429 envelopes carry discrete code values so SDKs can branch on the reason: rate_limit_qps (RPS bucket) vs rate_limit_budget ($/hour bucket).

LayerStorageDefaultOverride surface
Per-tenant ceilingpublic.tenant_rate_limits.requests_per_minute / dollars_per_hour600 RPM / $10 per hour (DEFAULT_TENANT_RPS / DEFAULT_TENANT_DOLLARS_PER_HOUR)Operator UPDATE on public.tenant_rate_limits
Per-key ceiling<tenant>.api_keys.requests_per_minute / dollars_per_hour (NULL = inherit tenant)NULL (inherit)Tenant UPDATE on the per-key row

The middleware (apps/api/app/rate_limits.py::install_rate_limit_middleware) is local-process; bucket state lives per-pod. Cross-pod coordination is deferred until horizontal traffic shaping becomes a real concern. Lookups fail open so a transient DB blip doesn't 429 every tenant out of their own API.

What is live today: the per-tenant RPS bucket (the requests_per_minute ceiling) plus the per-tenant dollar bucket (dollars_per_hour). RPS fires on every authenticated request; the dollar bucket fires only on spend-significant profiles (job submit/retry, chat turn, issue fix) — read endpoints and cheap-write profiles (issue_triage / reopen) skip it. Internal/admin tier tenants bypass the dollar bucket so Iru's benchmark + A/B traffic isn't constrained by the customer-shape default. 429 codes: rate_limit_qps (RPS) vs rate_limit_budget (dollar). Past-1h spend is read from the per-tenant <schema>.job_costs view, cached per-pod for 30s.

What is groundwork-only today: the per-key columns (api_keys.requests_per_minute, api_keys.dollars_per_hour). Schema is in place and check_dollar_budget accepts the per-key axis as a no-op argument, but the auth-dep refactor that surfaces api_keys.* into the request scope hasn't landed — every key inherits the tenant ceiling for now.

The table above governs what callers can do to Hyrax. The Anthropic-direct outbound rate-limit observation cache that previously sat alongside this surface was retired 2026-05-14 with the Bedrock-paid launch.

See also

  • MCP server — alternate transport for the same catalog.