DigitalOne • Tool

LLM Hardware Requirements Calculator

Top banner — AdSense renders here

Step 1 — Your model & text sizes

Pick a model or choose Custom if it’s not listed.
We default to 4096 if blank.
We default to 1024 if blank.
We default to 1 if blank.
We pick the right quantization for you.

Step 2 — Your computer

If CPU-only can’t handle it, we’ll clearly suggest a GPU.
Sets a safe performance baseline; you can override RAM/VRAM next.
We leave ~10% spare and +3 GB for the OS.
Turn on to see estimated time to process.

Results — Minimal system & Recommended system

Headroom: GPU ≤85% VRAM, CPU ≤90% RAM; includes +20% runtime overhead and +3 GB OS.

Section 1 — Minimal hardware by quantization

“CPU-only works?” uses your RAM. “Fits your GPU?” uses your GPU’s VRAM. “Smallest GPU” is a suggestion, not a requirement if CPU-only works.

Section 2 — Recommended system

Why these numbers?
We count weights + KV cache, add 20% runtime overhead and +3 GB for the OS. For recommended: GPU fits ≤85% of VRAM; CPU fits ≤90% of RAM. Timings use a safe baseline from your System type; adjust in Advanced for more accuracy.
References & assumptions
  • GPU capacities: 4070 SUPER 12GB; 4080 SUPER 16GB; 4090 24GB; RTX 5000/6000 Ada 32/48GB; L40S 48GB; A100 80GB; H100 80GB; H200 141GB; MI300X 192GB; RX 7900 XTX 24GB.
  • Quantization impact: INT8 ~0–1% from FP16; Q6/Q5 ~1–3%; Q4 ~3–6%; Q3 ~6–12% (typical).
  • KV cache ≈ 2 × layers × hidden × bytes/elem × tokens × batch. FP16=2B, FP8=1B.

Contact & links

Prefer another contact method? Add it here later—this section is easy to extend.

FAQ — quick answers

Can I run a 7B model on CPU?

Yes, but it can be slow. With small prompts and Q5/Q4 quantization, 32–64 GB RAM is often enough. For snappy chat, a mid-range GPU is recommended.

What is quantization?

It stores weights in fewer bits (e.g., Q5, Q4) to reduce memory with a small accuracy trade-off. Our “Auto” pick chooses the highest quality that fits your hardware headroom.

Do I need a GPU?

Not always. For larger models, long prompts, or faster replies, a GPU helps a lot. The calculator tells you when CPU-only isn’t practical and suggests the smallest suitable GPU.

Why doesn’t it fit my GPU?

We include model weights, KV cache, +20% runtime overhead, and +3 GB for the OS, then keep ≤85% of VRAM for stability. Try a smaller batch, fewer tokens, or a lower-bit quantization.

Manage cookies & consent