context switch

The act of saving the CPU state of one thread (or process) and restoring another's; involves kernel work, TLB effects, and cache pollution that show up as latency tail outliers.

also known as ctxsw · task-switch

stack kernel · cpu

A context switch is when the kernel stops running one thread on a CPU and starts running another. The kernel saves the outgoing thread’s register file, program counter, stack pointer, and floating-point/SIMD state into its task struct, then restores the incoming thread’s state. If the switch crosses address spaces (i.e., different processes), the kernel also updates CR3 and invalidates the TLB accordingly.

The direct cost — a few microseconds on modern x86 — is only part of the story. The real expense is indirect: the incoming thread’s working set is usually cold in cache. It pays for L1/L2/L3 misses on the hot path until its data gets pulled back in. TLB misses happen until translations are re-walked. Branch predictor state is cold. On NUMA systems, if the scheduler moves the thread to a different socket, the cost compounds.

Context switches come from many sources: schedule() calls (yield, sleep, I/O wait), timer-driven preemption, and forced migration. Voluntary switches (blocked on I/O) are usually fine; involuntary ones (preempted by a higher-priority task) are what drives latency tails.

For low-latency work, you watch perf sched or /proc/<pid>/status for voluntary_ctxt_switches and nonvoluntary_ctxt_switches. You pin threads to cores, set RT priorities, isolate cores with isolcpus, and in extreme cases use sched_ext to avoid the default CFS/EEVDF tail.

sources