API contract
Three structural contracts every Hyrax API surface honors. SDK consumers can rely on these without per-endpoint checks.
1. Stripe-style error envelope
Every 4xx/5xx response is shaped:
{
"error": {
"type": "validation_error",
"code": "invalid_branch",
"message": "github_base_branch must match an existing branch on the repo",
"param": "github_base_branch",
"doc_url": "https://...",
"errors": [...],
"request_id": "req_..."
}
}
The nine Stripe-aligned type values:
type | When | Example code |
|---|---|---|
validation_error | Pydantic / shape validation failure | invalid_branch, invalid_workflow |
authentication_error | Missing or malformed hk_live_* key | missing_api_key |
permission_error | Auth ok, but the principal can't do this | tier_mismatch, permission_denied |
rate_limit_error | Inbound rate limit (see § Inbound API rate limits) | rate_limit_qps, rate_limit_budget |
billing_error | 402 from the submission gate — credit wallet exhausted or plan doesn't include the requested capability. | credit_exhausted, plan_restricted |
not_found | Resource doesn't exist or is hidden | repo_not_found, job_not_found |
conflict | State precondition failed | job_already_running |
invalid_request_error | Caller-side problem outside validation | unsupported_workflow |
server_error | 5xx | internal_error |
Validation failures expose the per-field array on errors[]. The per-error slug on code is what SDKs branch on; type is the broad bucket.
Implementation: apps/api/app/error_envelope.py translates every typed exception (hyrax.errors.NotFound, BadRequest, RateLimitExceeded, …) and HTTPException into the envelope. Raise the typed exceptions directly; never construct the envelope by hand.
2. /api/... canonical mount
Every router mounts at the canonical /api/... path. The SDK regen reads that surface and produces clean function names like getApiAdminTenants — no V1 infix to churn on rename.
Mounting goes through apps/api/app/versioning.py::include_versioned_router. New routers go through this helper; calling app.include_router(...) directly fails the catalog parity check.
When v2 ships. Add a parallel /api/v2/... alias the same way; route within handlers if the wire shape needs to diverge per version. The canonical mount stays at /api/.... (A /api/v1/... alias was previously kept "for callers that pin the version explicitly" and was dropped in #119 — the SDK already only consumes the canonical paths, so the alias was preemption for a hypothetical v2 fork. Re-introducing it later is a one-line change.)
Non-/api/ routers (mount unchanged; not part of the SDK contract):
/health/...— k8s liveness/readiness probes; URLs are baked into deployment manifests, not callers' SDKs./auth/...— GitHub OAuth callback; URL is registered with GitHub at App-install time, not negotiated in band./internal/...— worker M2M.
3. Cursor pagination contract
Every list endpoint returns PaginatedResponse[T]:
{
"data": [...],
"next_cursor": "eyJpZCI6ICIuLi4ifQ==",
"has_more": true
}
next_cursoris opaque base64. Pass it back verbatim on the next request; never inspect it.datacarries the page contents.has_morelets clients short-circuit when they know they're done.
Default limit=50, max limit=200. Bare arrays / RootModel[list[T]] fail the catalog parity test (tests/test_api_catalog_pagination.py::TestCatalogParity).
Inbound API rate limits
Hierarchical limiter, two token buckets stacked per request — per-tenant ceiling AND per-key ceiling, deplete-first wins. Each bucket carries two dimensions: RPS (request volume) and $/hour (LLM spend). 429 envelopes carry discrete code values so SDKs can branch on the reason: rate_limit_qps (RPS bucket) vs rate_limit_budget ($/hour bucket).
| Layer | Storage | Default | Override surface |
|---|---|---|---|
| Per-tenant ceiling | public.tenant_rate_limits.requests_per_minute / dollars_per_hour | 600 RPM / $10 per hour (DEFAULT_TENANT_RPS / DEFAULT_TENANT_DOLLARS_PER_HOUR) | Operator UPDATE on public.tenant_rate_limits |
| Per-key ceiling | <tenant>.api_keys.requests_per_minute / dollars_per_hour (NULL = inherit tenant) | NULL (inherit) | Tenant UPDATE on the per-key row |
The middleware (apps/api/app/rate_limits.py::install_rate_limit_middleware) is local-process; bucket state lives per-pod. Cross-pod coordination is deferred until horizontal traffic shaping becomes a real concern. Lookups fail open so a transient DB blip doesn't 429 every tenant out of their own API.
What is live today: the per-tenant RPS bucket (the requests_per_minute ceiling) plus the per-tenant dollar bucket (dollars_per_hour). RPS fires on every authenticated request; the dollar bucket fires only on spend-significant profiles (job submit/retry, chat turn, issue fix) — read endpoints and cheap-write profiles (issue_triage / reopen) skip it. Internal/admin tier tenants bypass the dollar bucket so Iru's benchmark + A/B traffic isn't constrained by the customer-shape default. 429 codes: rate_limit_qps (RPS) vs rate_limit_budget (dollar). Past-1h spend is read from the per-tenant <schema>.job_costs view, cached per-pod for 30s.
What is groundwork-only today: the per-key columns (api_keys.requests_per_minute, api_keys.dollars_per_hour). Schema is in place and check_dollar_budget accepts the per-key axis as a no-op argument, but the auth-dep refactor that surfaces api_keys.* into the request scope hasn't landed — every key inherits the tenant ceiling for now.
The table above governs what callers can do to Hyrax. The Anthropic-direct outbound rate-limit observation cache that previously sat alongside this surface was retired 2026-05-14 with the Bedrock-paid launch.
See also
- MCP server — alternate transport for the same catalog.