Ollama benchmarks: Local vs RunPod

Two Ollama servers ran the same prompts and the same models. Higher numbers = faster. Every chart below shows the same answer in a different way.

—

RunPod is — faster overall

measuring median tokens/second across all tests & models

Per-model speedup

How much faster is RunPod for each model? Numbers below show the multiplier (e.g. "10×" = ten times faster).

Headline: tokens per second by model

Tokens per second is how fast the model produces words. Higher is better. Bars show the median across all tests.

Local machine RunPod (96 GB GPU)

Per test type — speed each user sees

Same models, four different load patterns: sequential = one request at a time, concurrent = several at once, queued = a stream of requests, mixed = all models hit at the same time.

Throughput under load (batch tokens/sec)

When many requests run at once, what's the total tokens/sec the server produces? Think of this as the kitchen output rate when the restaurant is busy. Only the "concurrent", "queued", and "mixed" tests stress this.

Local machine RunPod (96 GB GPU)

The raw numbers (median tokens/sec)

How to read this:

Tokens per second is roughly "words per second" — higher = the answer arrives faster.
We use the median (middle value) across all runs to avoid one weird run skewing the picture.
Failed or zero-token requests are filtered out before averaging.