Some checks are pending
Copilot Setup Steps / copilot-setup-steps (push) Waiting to run
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run
Python check requirements.txt / check-requirements (push) Waiting to run
Python Type-Check / python type-check (push) Waiting to run
Update Operations Documentation / update-ops-docs (push) Waiting to run
27 lines
662 B
Markdown
27 lines
662 B
Markdown
# llama-eval
|
|
|
|
Simple evaluation tool for llama.cpp with support for multiple datasets.
|
|
|
|
For a full description, usage examples, and sample results, see:
|
|
|
|
- [PR 21152](https://github.com/ggml-org/llama.cpp/pull/21152)
|
|
|
|
## Quick start
|
|
|
|
```bash
|
|
# Single server
|
|
python3 llama-eval.py \
|
|
--server http://localhost:8033 \
|
|
--model my-model \
|
|
--dataset gsm8k --n_cases 100 \
|
|
--grader-type regex --threads 32
|
|
|
|
# Multiple servers (comma-separated URLs and thread counts)
|
|
python3 llama-eval.py \
|
|
--server http://server1:8033,http://server2:8033 \
|
|
--server-name server1,server2 \
|
|
--threads 16,16 \
|
|
--dataset aime2025 --n_cases 240 \
|
|
--grader-type regex
|
|
```
|