Different providers express reasoning in different ways (token budgets, effort levels, or model-specific flags).
Hebo Gateway standardizes this into a single reasoning configuration so you can keep one client and switch models safely.
Reasoning Parameter
You can pass a normalized reasoning object with the same shape across providers:
"reasoning": {
"enabled": true, // turn reasoning on/off
"effort": "medium", // none | minimal | low | medium | high | xhigh
"max_tokens": 2048, // max tokens for reasoning (advanced)
"exclude": false // exclude reasoning from response payload
}
Hebo maps this to the provider-specific knobs (effort levels, token budgets, or hidden flags), so your app doesn’t need to change when you switch models.
For compatibility with the standard OpenAI Chat Completions interface, Hebo also supports a
top-level reasoning_effort parameter.
Basic Example
import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
import { generateText } from "ai";
const hebo = createOpenAICompatible({
apiKey: process.env.HEBO_API_KEY,
baseURL: "https://gateway.hebo.ai/v1",
});
const result = await generateText({
model: hebo("openai/gpt-oss-20b"),
prompt: "Why did the monkey bring a ladder to the bar?",
reasoning: { effort: "medium" },
});
console.log("Reasoning: ", result.reasoning);
console.log("Text: ", result.text);
Some models don’t support returning reasoning content, in which case the result.reasoning
parameter will be empty. The model still used reasoniong internally.
Chain-Of-Thoughts
If you use streaming, you can see the thought process of the model in realtime.
import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
import { streamText } from "ai";
const hebo = createOpenAICompatible({
apiKey: process.env.HEBO_API_KEY,
baseURL: "https://gateway.hebo.ai/v1",
});
const result = await streamText({
model: hebo("openai/gpt-oss-20b"),
prompt: "Tell me a very long story about monkeys",
});
for await (const part of result.fullStream) {
if (part.type === "reasoning") {
process.stdout.write("\n[REASONING]\n");
process.stdout.write(part.text);
process.stdout.write("\n\n");
}
if (part.type === "text") {
process.stdout.write("[TEXT]\n");
process.stdout.write(part.text);
}
}
Not all models provide the detailed chain-of-thought output, summ will only provide a reasoning
summary at the end
Follow-up Calls
Follow-up calls should reuse the previous response messages so the next request preserves the model’s context, including any tool calls and internal thought process.
Some providers also include reasoning metadata that must be passed along to keep model behavior consistent across a conversation.
Hebo Gateway handles this automatically, as long as you include the full returned messages object in your next request.
import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
import { generateText } from "ai";
const hebo = createOpenAICompatible({
apiKey: process.env.HEBO_API_KEY,
baseURL: "https://gateway.hebo.ai/v1",
});
const messages = [{ role: "user", content: "Draft a concise project plan." }];
const first = await generateText({
model: hebo("openai/gpt-oss-20b"),
messages,
reasoning: { effort: "high" },
});
messages.push(...first.messages);
const second = await generateText({
model: hebo("openai/gpt-oss-20b"),
messages,
reasoning: { effort: "high" },
});
console.log(second.text);
Internal Details
If you’re interested in the internal details that are handled automatically, see the upstream documentation: