admin/llama.cpp

Files

History

admin 8e5a449007

Copilot Setup Steps / copilot-setup-steps (push) Waiting to run

Details

Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run

Details

Python check requirements.txt / check-requirements (push) Waiting to run

Details

Python Type-Check / python type-check (push) Waiting to run

Details

Update Operations Documentation / update-ops-docs (push) Waiting to run

Details

llama.cpp verification source 2026-05-22

2026-05-22 16:44:08 +08:00

..

llama-eval.py

llama.cpp verification source 2026-05-22

2026-05-22 16:44:08 +08:00

llama-server-simulator.py

llama.cpp verification source 2026-05-22

2026-05-22 16:44:08 +08:00

README.md

llama.cpp verification source 2026-05-22

2026-05-22 16:44:08 +08:00

test-simulator.sh

llama.cpp verification source 2026-05-22

2026-05-22 16:44:08 +08:00

README.md

llama-eval

Simple evaluation tool for llama.cpp with support for multiple datasets.

For a full description, usage examples, and sample results, see:

PR 21152

Quick start

# Single server
python3 llama-eval.py \
  --server http://localhost:8033 \
  --model my-model \
  --dataset gsm8k --n_cases 100 \
  --grader-type regex --threads 32

# Multiple servers (comma-separated URLs and thread counts)
python3 llama-eval.py \
  --server http://server1:8033,http://server2:8033 \
  --server-name server1,server2 \
  --threads 16,16 \
  --dataset aime2025 --n_cases 240 \
  --grader-type regex