I/O depth

The number of I/O requests in flight at once against a device or interface; the primary knob that trades latency for throughput on block storage and async networking.

also known as io-depth · queue-depth · iodepth

stack storage · kernel

I/O depth (often called queue depth, iodepth in fio) is the number of outstanding I/O requests a workload keeps in flight against a device at any given moment. At depth 1, the workload issues a request, waits for completion, and issues the next — end-to-end serial. At depth 32, 32 requests are in flight simultaneously, overlapping the latency of each.

For storage:

Low depth (1–4) shows best-case single-request latency and is where read() blocking code lives.
Medium depth (8–64) is where modern NVMe SSDs hit their sweet spot and where io_uring starts to pay.
High depth (128–1024+) saturates bandwidth but at the cost of tail latency — more requests means more chances for one to queue behind a slow sibling.

For networking:

Batch size on the submission side (how many recv/send operations per syscall) is the network analog.
At small batch sizes, io_uring and epoll are roughly equivalent; io_uring’s advantage grows with batch size as the per-crossing cost gets amortized.

The rule is Little’s Law: average outstanding = throughput * average latency. To achieve higher throughput on a latency-bound device, you need higher depth. But beyond the device’s natural parallelism (NVMe internal queue count, NIC RSS queues), more depth buys you nothing and costs p99 latency.

Always report results as a curve over depth, not a single number. A benchmark run only at QD=1 tells you latency; only at QD=256 tells you bandwidth; neither tells you the shape of the workload.

related

sources