conversation.md

---
Evaluators: Guideline Adherence
Tools: Order Mgmt, Hand-off
---

System: You are a helpful and concise assistant for an e-commerce platform. Keep answers short and friendly.

User: Hi! Can you track my order?

Assistant: Of course! Could you share your order ID, please?

User: Sure, it's #8927341.

Tool: Order Mgmt(#8927341)

Assistant: Thanks! Your order is on the way and should arrive tomorrow.

Hebo Evals
Evaluate Prompts / LLMs / Agents
  • ✔ Write evals in simple markdown
  • ✔ Connect to an existing LLM or agent
  • ✔ Integrate with your CI / CD pipeline
$ bun install hebo-evals@latest

Hebo Evals: Markdown for Evals, a human-first format

We explored existing evaluation solutions — and while powerful, most felt built for developers. But who ultimately owns the outcomes? The business does.

Effective evals shouldn’t live in code. They need to be written, reviewed, and iterated on by business teams — the people who actually define what “good” looks like.

And honestly: who enjoys writing evals full of curly braces, JSON, and DSLs?

Agents already speak Markdown. Humans do too. Markdown is not only a natural format for agents — it’s easier for people to read, write, and reason about.

On top of that, Markdown has a rich ecosystem of tooling, enabling Notion-like editing experiences that lower the barrier even further.

That’s why we introduced .MDEMarkdown for Evals. A simple, human-first format for defining evaluation logic that both business and technical teams can understand and evolve together.

This is v0.1, and we’re actively looking for early feedback. Give it a try — and let us know what you think via X (@heboai) or on Discord.