Hebo Eval CLI
Hebo Eval is a powerful command-line tool for evaluating and testing language models. It provides a robust framework for running evaluations against your AI agents and generating detailed reports.Installation
Basic Usage
Command Options
| Option | Description | Default |
|---|---|---|
-d, --directory <path> | Directory containing test cases | ./examples |
-c, --config <path> | Path to configuration file | - |
-t, --threshold <number> | Score threshold for passing (0-1) | 0.8 |
-f, --format <format> | Output format (json|markdown|text) | text |
-s, --stop-on-error | Stop processing on first error | false |
-m, --max-concurrency <number> | Maximum number of concurrent test executions | 5 |
-v, --verbose | Show detailed output for all test cases | false |
Configuration
You can configure Hebo Eval in two ways:-
Environment Variables:
-
Configuration File:
Create a YAML configuration file (e.g.,
hebo-evals.config.yaml):Note: The configuration file supports environment variable substitution using${VARIABLE_NAME}syntax. This allows you to keep sensitive information like API keys in environment variables while referencing them in your configuration file.
Configuration Template
Hebo Eval provides a boilerplate configuration template calledhebo-evals.config.yaml. This template includes:
- Multiple Provider Support: Configure both OpenAI and Hebo providers
- Environment Variable Integration: Use
${VARIABLE_NAME}syntax for secure API key management - Flexible Authentication: Support for different authentication header formats
- Embedding Configuration: Separate configuration for embedding models
-
Copy the template to your project directory:
-
Set up your environment variables:
- Customize the configuration as needed for your specific use case.
Provider Mapping Logic
When running Hebo Eval, the provider is automatically determined based on the model or agent name you specify in the command. The tool uses pattern matching to map the model/agent name to the appropriate provider:- OpenAI: Any model name that starts with
gpt-(e.g.,gpt-3.5-turbo,gpt-4) is mapped to the OpenAI provider. - Hebo: Any model name that contains a colon (
:) (e.g.,gato-qa:v1) is mapped to the Hebo provider. - Anthropic: Any model name that starts with
claude-(e.g.,claude-2,claude-instant) is mapped to the Anthropic provider. - Custom: Any model name that starts with
custom-is mapped to the Custom provider.
Note:If you need to override the default mapping, you can specify the provider explicitly in your configuration file or command options.
Currently, Hebo Eval only supports OpenAI and Hebo as providers. Support for additional providers such as Anthropic and Custom is planned for future releases.
Test Cases
Hebo Eval supports a flexible test case structure that allows you to organize and manage your test cases effectively.Test Case Structure
-
Multiple Test Cases in One File:
- Test cases can be defined in a single file, separated by
--- - Each test case starts with a title using
# Test Case Nameformat
- Test cases can be defined in a single file, separated by
-
Directory Organization:
- Test cases can be organized in subdirectories
- All test cases in subdirectories are automatically discovered and executed
Test Case Format
Special Characters and Multiline Messages
When writing test cases, it’s important to understand how to handle special characters and multiline messages correctly. Here’s a comprehensive guide:Special Characters
The following characters have special meaning in test cases:| Character | Usage | Example |
|---|---|---|
: | Role marker delimiter | user:, assistant:, system: |
# | Test case title | # My Test Case |
--- | Test case separator | Used between test cases |
Escaping Special Characters
To include literal special characters in your messages, you can use them directly in the message content. The parser will only interpret these characters as special when they appear in specific contexts::is only special when it appears after a role marker#is only special when it appears at the start of a line---is only special when it appears on its own line
Multiline Message Format
Hebo Eval supports two styles of multiline messages:-
Indented Style:
-
Non-indented Style:
Directory Structure Example
Output Format
The tool supports three output formats with different verbosity levels:Default Output (Concise)
Example Usage
-
Basic Evaluation:
-
Custom Directory and Format:
-
With Configuration File:
-
Custom Threshold and Concurrency:
-
Verbose Output:
Best Practices
- Always set up your API keys using environment variables for security
- Use the provided
hebo-evals.config.yamltemplate as a starting point - Start with a small test set before running large evaluations
- Use descriptive test case titles with the
#format - Organize test cases in subdirectories for better management
- Keep your configuration file secure and never commit API keys to version control
- Use the
-vflag when debugging test failures - Leverage environment variable substitution in your configuration for better security
Troubleshooting
If you encounter the “HEBO_API_KEY is required” error:-
Verify your environment variables:
-
Or use a configuration file:

