847 B
847 B
eval
Evaluation tool for testing Ollama models.
Usage
Run all tests:
go run . -model llama3.2:latest
Run specific suite:
go run . -model llama3.2:latest -suite tool-calling-basic -v
List available suites:
go run . -list
Adding Tests
Edit suites.go to add new test suites. Each test needs:
Name: test identifierPrompt: what to send to the modelCheck: function to validate the response
Example:
{
Name: "my-test",
Prompt: "What is 2+2?",
Check: Contains("4"),
}
Available check functions:
HasResponse()- response is non-emptyContains(s)- response contains substringCallsTool(name)- model called specific toolNoTools()- model called no toolsMinTools(n)- model called at least n toolsAll(checks...)- all checks pass