How Many Uvicorn Workers Do You Actually Need? (FastAPI Performance Guide)

You deploy your FastAPI app.

Docker builds successfully.
Health checks pass.
Traffic starts flowing.

Then comes the question every backend engineer eventually asks:

“How many workers should I run?”

Pick a number too low, and your server sits idle while requests queue up.

Pick a number too high, and your machine spends more time juggling processes than executing code.

This is not a tuning detail.

Worker count is one of the highest-impact performance decisions you make in production.

Let’s break down the math, kill a dangerous myth, and expose the container trap that silently destroys FastAPI performance.

The Myth: “More Workers = More Speed”

It is natural to think: “If 1 worker handles 1,000 requests, 10 workers will handle 10,000.”

This assumption is dangerously wrong.

Uvicorn workers are processes, not threads. Each worker is an independent instance of your Python application with its own memory space. If your server has 2 CPU Cores, it can physically only run 2 things at the exact same instant.

If you spawn 10 workers on a 2-core machine:

The OS has to rapidly pause Worker #1 to let Worker #2 run, then pause #2 for #3, and so on.
This “Context Switching” is expensive.
You end up strictly slower than if you had just stuck to 2 or 3 workers.

More processes do not create more CPU, they only compete for it.

If your API slows down over time even with correct worker sizing, you may actually be leaking database sessions. I covered how to detect and fix that in my guide on FastAPI session leaks.

The Worker Formula Most Engineers Get Wrong

For years, the Gunicorn documentation (which manages Uvicorn workers) has recommended a specific formula that strikes the perfect balance for Python web servers:

Workers = (2 x CPU Cores) + 1

This formula is a starting point and not a law. Modern async workloads often need fewer workers than traditional WSGI apps.

Why this number?

The “2x”: Even in an Async world, processes sometimes wait (System I/O, OS overhead). Having slightly more workers than cores ensures that if one worker is momentarily stuck doing non-async work, another creates a “pipeline” to keep the CPU busy.
The “+1”: This is the “spare tire.” It handles the jitter/fluctuation to ensure 100% utilization.

Examples:

1 Core (Standard AWS t3.micro): (2*1)+1 = 3 Workers
2 Cores: (2*2)+1 = 5 Workers
4 Cores: (2*4)+1 = 9 Workers

Checkout why Uvicorn Health Checks Fail Under Load and how to fix?

The “Memory” Constraint

The formula above assumes you have infinite RAM. You do not.

This is the #1 cause of crashing containers. Each Uvicorn worker loads your entire application into RAM.

If your app takes 150MB to boot (imports, ML models, caches).
And you set workers=10.
You just consumed 1.5 GB of RAM immediately.

If your server only has 1GB of RAM, your container will crash with an OOM (Out of Memory) Kill before it serves a single request.

The Revised Rule:

Calculate CPU limit first. Then check Memory limit. If (Workers * App Memory) > Total RAM, reduce workers.

When You Should BREAK the Formula?

Use fewer workers when:

Your app is heavily async
Most latency comes from network calls
You use large DB pools
Memory is tight

Use more workers when:

You run CPU-heavy workloads
You do image processing
You hash aggressively
You run ML inference

The Docker Trap (Critical)

This is where 90% of deployments fail.

If you deploy to Kubernetes (K8s) or AWS ECS, you usually define a CPU Limit (e.g., 0.5 vCPU or 2 vCPU).

However, if you use Python’s automatic detection inside the container:

Python

import multiprocessing
print(multiprocessing.cpu_count())

It often reports the Physical Host’s CPU count (e.g., 64 Cores), not your container’s limit (2 Cores).

The Disaster Scenario:

You are on a massive K8s Node (64 Cores).
Your pod has a limit of 2 CPUs.
Your script sees 64 cores and spawns 129 workers (64*2 + 1).
Your 129 workers fight over 2 tiny CPU cores.
Performance creates a “Thundering Herd” problem, latency skyrockets, and health checks fail.

Containers lie about CPU. Always verify what your runtime actually sees.

The Fix

Never rely on auto-detection in containerized environments unless you are sure your library respects cgroups (limits). Always pass the worker count explicitly via an environment variable.

In Dockerfile / Kubernetes:

YAML

# CMD line
gunicorn -k uvicorn.workers.UvicornWorker -w $WORKERS main:app

In Deployment YAML:

YAML

env:
  - name: WORKERS
    value: "5"  # Manually calculated for 2 CPU limit

💡

If your API is slow even after tuning workers, you are probably blocking the event loop.

Running in Production: Gunicorn vs. Uvicorn

Should you run uvicorn directly?

Bash

# Development
uvicorn main:app --workers 4

For production, the standard is using Gunicorn as a process manager to spawn Uvicorn workers. Gunicorn is more robust at handling dead processes and signals.

Bash

# Production Standard
pip install gunicorn uvicorn
gunicorn -w 5 -k uvicorn.workers.UvicornWorker main:app

Workers Do NOT Increase Concurrency

Async handles concurrency.

Workers provide CPU isolation, not scale.

If your app slows down under load, blindly increasing workers often makes things worse, especially when database pools get exhausted.

Fix architecture first.
Then tune workers.

These are some FastAPI uvicorn workers best practices which you can follow to make your production alive and kicking.

Conclusion: The Checklist

Don’t guess. Calculate.

Count your Cores: Are you allocating 1 vCPU? 2 vCPUs?
Apply the Formula: (2 * Cores) + 1.
Check Memory: Multiply Workers * 200MB. Do you have enough RAM?
Hardcode it: Set the WORKERS env variable in your deployment config. Do not trust auto-detection.

Don’t guess your worker count.
Calculate it, or production will calculate it for you.

Once your async setup is correct, the next production bottleneck usually appears at the database layer specifically connection exhaustion. If you haven’t hit it yet, you probably will.
Fixing QueuePool Limit Errors in FastAPI