A gateway that lives inside your application, not beside it.
Most AI gateways are designed as standalone services — you point your app at them and they proxy your requests. That works until you need something the gateway does not support out of the box. At that point, you are either forking it, working around it, or giving up on the customization entirely.
Hebo Gateway takes a different approach, closer to how better-auth thinks about authentication: a library you embed in your application, not a platform you integrate with. It mounts directly into your existing server (Elysia, Hono, Next.js, ...), or any framework with a standard fetch handler. No new process to run, no separate infrastructure to manage.
Extensible by design, not by exception.
Because the gateway runs inside your application, you have full access to the request lifecycle through a hooks system. You can add custom authentication, enforce rate limits against your own database, implement dynamic model routing based on user tier, inject observability into every call, or transform requests and responses — all without touching the gateway core code.
This is intentionally different from plugin systems bolted on after the fact. The hooks are first-class — they are how you are expected to customize behavior, so they are stable, well-defined, and composable.
Provider differences are still real — we absorb them.
Underneath the embeddability story, the normalization work is still there. Temperature ranges do not line up across providers. Reasoning effort is expressed in tokens for some models, as low / medium / high for others. Tool calling may require thought signatures, hidden flags, or model-specific syntax. Hebo normalizes these rough edges so a single OpenAI-compatible endpoint behaves consistently across models — and stays compatible with Vercel AI SDK, LangChain, and any other SDK that speaks the OpenAI interface.
You can start with our managed providers and free tiers, then progressively bring your own credentials from Groq, AWS, GCP, or others. All without changing how your application talks to models.
