847 B

Raw Permalink Blame History

eval

Evaluation tool for testing Ollama models.

Usage

Run all tests:

go run . -model llama3.2:latest

Run specific suite:

go run . -model llama3.2:latest -suite tool-calling-basic -v

List available suites:

go run . -list

Adding Tests

Edit suites.go to add new test suites. Each test needs:

Name: test identifier
Prompt: what to send to the model
Check: function to validate the response

Example:

{
    Name:   "my-test",
    Prompt: "What is 2+2?",
    Check:  Contains("4"),
}

Available check functions:

HasResponse() - response is non-empty
Contains(s) - response contains substring
CallsTool(name) - model called specific tool
NoTools() - model called no tools
MinTools(n) - model called at least n tools
All(checks...) - all checks pass