API contract

Three structural contracts every Hyrax API surface honors. SDK consumers can rely on these without per-endpoint checks.

1. Stripe-style error envelope

Every 4xx/5xx response is shaped:

{
  "error": {
    "type": "validation_error",
    "code": "invalid_branch",
    "message": "github_base_branch must match an existing branch on the repo",
    "param": "github_base_branch",
    "doc_url": "https://...",
    "errors": [...],
    "request_id": "req_..."
  }
}

The nine Stripe-aligned type values:

`type`	When	Example `code`
`validation_error`	Pydantic / shape validation failure	`invalid_branch`, `invalid_workflow`
`authentication_error`	Missing or malformed `hk_live_*` key	`missing_api_key`
`permission_error`	Auth ok, but the principal can't do this	`tier_mismatch`, `permission_denied`
`rate_limit_error`	Inbound rate limit (see § Inbound API rate limits)	`rate_limit_qps`, `rate_limit_budget`
`billing_error`	402 from the submission gate — credit wallet exhausted or plan doesn't include the requested capability.	`credit_exhausted`, `plan_restricted`
`not_found`	Resource doesn't exist or is hidden	`repo_not_found`, `job_not_found`
`conflict`	State precondition failed	`job_already_running`
`invalid_request_error`	Caller-side problem outside validation	`unsupported_workflow`
`server_error`	5xx	`internal_error`

Validation failures expose the per-field array on errors[]. The per-error slug on code is what SDKs branch on; type is the broad bucket.

Implementation: apps/api/app/error_envelope.py translates every typed exception (hyrax.errors.NotFound, BadRequest, RateLimitExceeded, …) and HTTPException into the envelope. Raise the typed exceptions directly; never construct the envelope by hand.

2. `/api/...` canonical mount

Every router mounts at the canonical /api/... path. The SDK regen reads that surface and produces clean function names like getApiAdminTenants — no V1 infix to churn on rename.

Mounting goes through apps/api/app/versioning.py::include_versioned_router. New routers go through this helper; calling app.include_router(...) directly fails the catalog parity check.

When v2 ships. Add a parallel /api/v2/... alias the same way; route within handlers if the wire shape needs to diverge per version. The canonical mount stays at /api/.... (A /api/v1/... alias was previously kept "for callers that pin the version explicitly" and was dropped in #119 — the SDK already only consumes the canonical paths, so the alias was preemption for a hypothetical v2 fork. Re-introducing it later is a one-line change.)

Non-/api/ routers (mount unchanged; not part of the SDK contract):

/health/... — k8s liveness/readiness probes; URLs are baked into deployment manifests, not callers' SDKs.
/auth/... — GitHub OAuth callback; URL is registered with GitHub at App-install time, not negotiated in band.
/internal/... — worker M2M.

3. Cursor pagination contract

Every list endpoint returns PaginatedResponse[T]:

{
  "data": [...],
  "next_cursor": "eyJpZCI6ICIuLi4ifQ==",
  "has_more": true
}

next_cursor is opaque base64. Pass it back verbatim on the next request; never inspect it.
data carries the page contents.
has_more lets clients short-circuit when they know they're done.

Default limit=50, max limit=200. Bare arrays / RootModel[list[T]] fail the catalog parity test (tests/test_api_catalog_pagination.py::TestCatalogParity).

Inbound API rate limits

Hierarchical limiter, two token buckets stacked per request — per-tenant ceiling AND per-key ceiling, deplete-first wins. Each bucket carries two dimensions: RPS (request volume) and $/hour (LLM spend). 429 envelopes carry discrete code values so SDKs can branch on the reason: rate_limit_qps (RPS bucket) vs rate_limit_budget ($/hour bucket).

Layer	Storage	Default	Override surface
Per-tenant ceiling	`public.tenant_rate_limits.requests_per_minute` / `dollars_per_hour`	600 RPM / $10 per hour (`DEFAULT_TENANT_RPS` / `DEFAULT_TENANT_DOLLARS_PER_HOUR`)	Operator UPDATE on `public.tenant_rate_limits`
Per-key ceiling	`<tenant>.api_keys.requests_per_minute` / `dollars_per_hour` (NULL = inherit tenant)	NULL (inherit)	Tenant UPDATE on the per-key row

The middleware (apps/api/app/rate_limits.py::install_rate_limit_middleware) is local-process; bucket state lives per-pod. Cross-pod coordination is deferred until horizontal traffic shaping becomes a real concern. Lookups fail open so a transient DB blip doesn't 429 every tenant out of their own API.

What is live today: the per-tenant RPS bucket (the requests_per_minute ceiling) plus the per-tenant dollar bucket (dollars_per_hour). RPS fires on every authenticated request; the dollar bucket fires only on spend-significant profiles (job submit/retry, chat turn, issue fix) — read endpoints and cheap-write profiles (issue_triage / reopen) skip it. Internal/admin tier tenants bypass the dollar bucket so Iru's benchmark + A/B traffic isn't constrained by the customer-shape default. 429 codes: rate_limit_qps (RPS) vs rate_limit_budget (dollar). Past-1h spend is read from the per-tenant <schema>.job_costs view, cached per-pod for 30s.

What is groundwork-only today: the per-key columns (api_keys.requests_per_minute, api_keys.dollars_per_hour). Schema is in place and check_dollar_budget accepts the per-key axis as a no-op argument, but the auth-dep refactor that surfaces api_keys.* into the request scope hasn't landed — every key inherits the tenant ceiling for now.

The table above governs what callers can do to Hyrax. The Anthropic-direct outbound rate-limit observation cache that previously sat alongside this surface was retired 2026-05-14 with the Bedrock-paid launch.

1. Stripe-style error envelope​

2. /api/... canonical mount​

3. Cursor pagination contract​

Inbound API rate limits​

See also​

1. Stripe-style error envelope

2. `/api/...` canonical mount

3. Cursor pagination contract

Inbound API rate limits

See also