Hebo AI Gateway
POST/chat/completions
POST/embeddings
GET/models

Hebo AI Gateway

Model C
Temperature: 0 .. 1
Reasoning: Medium
Retries: 1
Model G
Temperature: 0 .. 2
Reasoning: 8192
Retries: 3
Normalized Endpoints
Configuration-over-Code
Access Management

“Chat Completions” looks standard. The behavior underneath isn’t.

On paper, “Chat Completions” looks like a solved problem. Most models expose a familiar endpoint, most SDKs assume compatibility, and the industry treats it as a standard. In practice, it isn’t.

Every provider implements the interface slightly differently. Temperature ranges don’t line-up. Reasoning is expressed in tokens for some models and low / medium / high for others. Tool calling may require thought signatures, hidden flags, or model-specific syntax. Even retries behave differently — some failures disappear on a second call, others don’t. These differences may look small at first, but they compound quickly.

Switching models, or even upgrading the same model version, usually means redeploying. Configuration that should be operational ends up hard-coded. Seemingly trivial changes like adjusting reasoning effort or temperature ripple through application code.

Visibility is fragmented. Latency lives in one console, token usage in another, errors somewhere else entirely. Cloud provider dashboards are powerful, but they’re not designed around how teams actually iterate on AI systems.

Access control adds another layer of complexity. As soon as you have multiple developers, projects, and agents, permissions and quotas turn into a brittle configuration problem rather than a simple policy decision.

Hebo exists to sit in the middle of this. We built our own gateway to separate model behavior from application logic. The goal isn’t just routing requests, but normalizing the rough edges so a single Chat Completions endpoint behaves consistently across models while remaining compatible with common SDKs like OpenAI, Vercel AI SDK, and LangChain.

This is intentionally different from gateways that simply pass requests through. Passing through preserves the differences. Hebo absorbs them.

You can start with our managed providers and free tiers, then progressively bring your own credentials and credits from Groq, AWS, GCP or others. All without changing how your application talks to models.

This is the foundation. The rest builds on top of it.