51 lines
847 B
Markdown
51 lines
847 B
Markdown
# eval
|
|
|
|
Evaluation tool for testing Ollama models.
|
|
|
|
## Usage
|
|
|
|
Run all tests:
|
|
|
|
```bash
|
|
go run . -model llama3.2:latest
|
|
```
|
|
|
|
Run specific suite:
|
|
|
|
```bash
|
|
go run . -model llama3.2:latest -suite tool-calling-basic -v
|
|
```
|
|
|
|
List available suites:
|
|
|
|
```bash
|
|
go run . -list
|
|
```
|
|
|
|
## Adding Tests
|
|
|
|
Edit `suites.go` to add new test suites. Each test needs:
|
|
|
|
- `Name`: test identifier
|
|
- `Prompt`: what to send to the model
|
|
- `Check`: function to validate the response
|
|
|
|
Example:
|
|
|
|
```go
|
|
{
|
|
Name: "my-test",
|
|
Prompt: "What is 2+2?",
|
|
Check: Contains("4"),
|
|
}
|
|
```
|
|
|
|
Available check functions:
|
|
|
|
- `HasResponse()` - response is non-empty
|
|
- `Contains(s)` - response contains substring
|
|
- `CallsTool(name)` - model called specific tool
|
|
- `NoTools()` - model called no tools
|
|
- `MinTools(n)` - model called at least n tools
|
|
- `All(checks...)` - all checks pass
|