Skip to main content
Different providers express reasoning in different ways (token budgets, effort levels, or model-specific flags). Hebo Gateway standardizes this into a single reasoning configuration so you can keep one client and switch models safely.

Reasoning Parameter

You can pass a normalized reasoning object with the same shape across providers:
"reasoning": {
  "enabled": true,      // turn reasoning on/off
  "effort": "medium",   // none | minimal | low | medium | high | xhigh
  "max_tokens": 2048,   // max tokens for reasoning (advanced)
  "exclude": false      // exclude reasoning from response payload
}
Hebo maps this to the provider-specific knobs (effort levels, token budgets, or hidden flags), so your app doesn’t need to change when you switch models.
For compatibility with the standard OpenAI Chat Completions interface, Hebo also supports a top-level reasoning_effort parameter.

Basic Example

Vercel AI SDK
import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
import { generateText } from "ai";

const hebo = createOpenAICompatible({
  apiKey: process.env.HEBO_API_KEY,
  baseURL: "https://gateway.hebo.ai/v1",
});

const result = await generateText({
  model: hebo("openai/gpt-oss-20b"),
  prompt: "Why did the monkey bring a ladder to the bar?",
  reasoning: { effort: "medium" },
});

console.log("Reasoning: ", result.reasoning);
console.log("Text: ", result.text);
Some models don’t support returning reasoning content, in which case the result.reasoning parameter will be empty. The model still used reasoniong internally.

Chain-Of-Thoughts

If you use streaming, you can see the thought process of the model in realtime.
Vercel AI SDK
import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
import { streamText } from "ai";

const hebo = createOpenAICompatible({
  apiKey: process.env.HEBO_API_KEY,
  baseURL: "https://gateway.hebo.ai/v1",
});

const result = await streamText({
  model: hebo("openai/gpt-oss-20b"),
  prompt: "Tell me a very long story about monkeys",
});

for await (const part of result.fullStream) {
  if (part.type === "reasoning") {
    process.stdout.write("\n[REASONING]\n");
    process.stdout.write(part.text);
    process.stdout.write("\n\n");
  }

  if (part.type === "text") {
    process.stdout.write("[TEXT]\n");
    process.stdout.write(part.text);
  }
}
Not all models provide the detailed chain-of-thought output, summ will only provide a reasoning summary at the end

Follow-up Calls

Follow-up calls should reuse the previous response messages so the next request preserves the model’s context, including any tool calls and internal thought process. Some providers also include reasoning metadata that must be passed along to keep model behavior consistent across a conversation. Hebo Gateway handles this automatically, as long as you include the full returned messages object in your next request.
import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
import { generateText } from "ai";

const hebo = createOpenAICompatible({
  apiKey: process.env.HEBO_API_KEY,
  baseURL: "https://gateway.hebo.ai/v1",
});

const messages = [{ role: "user", content: "Draft a concise project plan." }];

const first = await generateText({
  model: hebo("openai/gpt-oss-20b"),
  messages,
  reasoning: { effort: "high" },
});

messages.push(...first.messages);

const second = await generateText({
  model: hebo("openai/gpt-oss-20b"),
  messages,
  reasoning: { effort: "high" },
});

console.log(second.text);

Internal Details

If you’re interested in the internal details that are handled automatically, see the upstream documentation: