Local LLM Guide — Private AI is easier than you think

🔍 System Scan

GPU

System RAM (GB) ★ most important

CPU Cores

Platform

🤖 Run an Agent — pick your task

Models are half the story. Choose what you want to do and get the full stack — model + runtime + the right agent harness — tuned to your hardware.

📊 Community Results — real runs, real hardware

This field is empirical. These are community-submitted results — model + machine + speed + how well it actually did the task.

Community-submitted & open. Add yours ↓

✍️ Add Your Result — by pull request

The leaderboard is community-owned and fully open. Ran a model + agent on your machine? Add a data point by opening a pull request that appends one entry to data/reports.json.

No form, no account-scraping — every result is reviewable in the open. CI checks the shape, a maintainer merges, and it appears here on the next deploy. Copy the template, fill in your numbers, paste it into the file on GitHub.

{
  "machine": { "label": "Apple M4 / 16GB", "chip": "Apple M4", "ram_gb": 16, "gpu": "Apple M4", "os": "macos" },
  "runtime": { "name": "Ollama" },
  "model":   { "id": "qwen3:8b", "quant": "Q4_K_M" },
  "task":    "coding",
  "harness": { "name": "Aider" },
  "metrics": { "decode_tok_s": 12, "success": 4, "notes": "what worked / what broke" },
  "source":  { "by": "yourhandle", "x_url": null, "date": "2026-06-07" }
}

Add your result on GitHub ↗ How it works ↗

🚀 Get Started — How to Run Models

Other options: llama.cpp (power users), Jan, GPT4All, Open WebUI (self-hosted)

📖 Learn More — Quantization & Local AI