WSGI Server Concurrency Explained

1. Summary
2. WSGI Basics: The Synchronous Interface
3. How WSGI Servers Achieve Concurrency
4. Key Takeaway: The Server Provides Concurrency
5. Example: Gunicorn Multi-Processing Command

1. Summary

WSGI (Web Server Gateway Interface) is a synchronous specification, meaning a standard WSGI application callable handles only one request from start to finish. To handle multiple requests concurrently, WSGI servers (like Gunicorn, uWSGI) employ concurrency models outside the application itself. The two primary models are:

Multi-Processing: The server runs multiple independent worker processes. Each process handles one request at a time, blocking on I/O. Concurrency is achieved because multiple processes run in parallel, managed by the server. This is the most common approach for simple synchronous workers.
Multi-Threading: Within one or more worker processes, the server uses multiple threads. Each thread handles one request at a time. When a thread blocks on I/O, the OS can switch to another thread within the same process, allowing progress on other requests. This provides concurrency within a single process, especially for I/O-bound tasks, but is subject to Python's GIL for CPU-bound work.

In essence, the WSGI server provides the concurrency layer by managing multiple processes or threads, each executing the synchronous WSGI application callable for different requests.

2. WSGI Basics: The Synchronous Interface

WSGI defines a standard interface between web servers and Python web applications/frameworks.
The core of WSGI is the application(environ, start_response) callable provided by the Python application.
Crucially, this callable is defined to be synchronous. When the WSGI server calls it for a request:
- It receives the request details (environ).
- It must fully process that single request.
- It calls start_response to send headers.
- It returns the response body iterable.
The WSGI application object itself doesn't inherently manage multiple requests simultaneously or use event loops like ASGI.

3. How WSGI Servers Achieve Concurrency

Since the application callable is synchronous, the server must implement a strategy to handle more than one connection/request at a time.

3.0.1. 1. Multi-Processing

Mechanism: The WSGI server starts multiple independent operating system processes. Each process loads a copy of the Python application.
Request Handling: A master server process listens for connections and distributes incoming requests to available worker processes.
Execution: Each worker process handles one request at a time, synchronously. If the request involves waiting (e.g., database query, external API call), that specific process blocks.
Concurrency Source: Parallelism. While one worker process is blocked or busy, other worker processes are available to handle new or different requests immediately.
Common Implementation: Gunicorn's default sync worker type, used with the --workers N flag (where N > 1).
Analogy: Multiple checkout counters at a supermarket. Each serves one customer start-to-finish. The store handles many customers concurrently because it has multiple counters.
Pros:
- Good utilization of multiple CPU cores.
- Process isolation (one crashing worker doesn't necessarily kill others).
- Bypasses Python's Global Interpreter Lock (GIL) limitations for CPU-bound tasks (since each process has its own GIL).
Cons:
- Higher memory usage (each process loads the application).
- Inter-process communication overhead (if needed, though usually minimal for simple request dispatch).
- Potentially slower context switching than threads.

3.0.2. 2. Multi-Threading

Mechanism: The WSGI server starts one or more worker processes, and within each process, it creates multiple threads. Threads within a process share memory space (the loaded application).
Request Handling: Requests assigned to a worker process are handled by available threads within that process.
Execution: Each thread handles one request at a time, synchronously according to the WSGI spec. However, when a thread performs a blocking I/O operation (network, disk), it often yields control via the OS scheduler.
Concurrency Source: Interleaving. The Operating System can switch execution between different threads within the same process, especially when one thread is waiting for I/O. This allows other threads to make progress on their requests.
Common Implementation: Gunicorn's gthread worker type, usually used with --workers N --threads M flags.
Analogy: A single, skilled office worker (process) juggling multiple tasks (requests/threads). When they send an email and wait for a reply (I/O block), they switch to working on a different report.
Pros:
- Lower memory footprint compared to pure multi-processing (threads share memory).
- Generally faster context switching between threads than between processes.
Cons:
- Subject to Python's Global Interpreter Lock (GIL): Only one thread can execute Python bytecode at any given moment within a single process. This limits true parallelism for CPU-bound tasks but is still effective for I/O-bound concurrency.
- Requires careful programming if threads modify shared data (potential race conditions, need for locks).

3.0.3. 3. Hybrid Approach (Processes + Threads)

It's common practice to combine both models.
Run multiple worker processes (e.g., one per CPU core or slightly more) to achieve parallelism across cores.
Configure each worker process to use multiple threads to handle I/O-bound concurrency efficiently within that process.
Example: gunicorn --workers 4 --threads 4 myapp.wsgi:application (4 processes, each with 4 threads).

4. Key Takeaway: The Server Provides Concurrency

The essential point is that the concurrency mechanisms (managing processes, threads, distributing requests) are handled by the WSGI server, not by the WSGI application code itself, which operates synchronously for each request it receives.

5. Example: Gunicorn Multi-Processing Command

This is a typical command to run a WSGI application using Gunicorn with multiple processes (using the default `sync` worker):

gunicorn --workers 4 myproject.wsgi:application

This starts a master process and 4 worker processes.
Each worker process handles one request synchronously at a time.
Concurrency is achieved because the 4 workers can handle 4 requests in parallel.