Models are half the story. Choose what you want to do and get the full stack — model + runtime + the right agent harness — tuned to your hardware.
This field is empirical. These are community-submitted results — model + machine + speed + how well it actually did the task.
Community-submitted & open. Add yours ↓
The leaderboard is community-owned and fully open. Ran a model + agent on your machine? Add a data point by opening a pull request that appends one entry to data/reports.json.
No form, no account-scraping — every result is reviewable in the open. CI checks the shape, a maintainer merges, and it appears here on the next deploy. Copy the template, fill in your numbers, paste it into the file on GitHub.
{
"machine": { "label": "Apple M4 / 16GB", "chip": "Apple M4", "ram_gb": 16, "gpu": "Apple M4", "os": "macos" },
"runtime": { "name": "Ollama" },
"model": { "id": "qwen3:8b", "quant": "Q4_K_M" },
"task": "coding",
"harness": { "name": "Aider" },
"metrics": { "decode_tok_s": 12, "success": 4, "notes": "what worked / what broke" },
"source": { "by": "yourhandle", "x_url": null, "date": "2026-06-07" }
}Other options: llama.cpp (power users), Jan, GPT4All, Open WebUI (self-hosted)