TLB shootdown

The kernel's mechanism for invalidating stale TLB entries on other CPUs after a page table change, implemented via an inter-processor interrupt that stalls every target core for the duration.

also known as tlb-flush · ipi-tlb

stack kernel · cpu · memory

A TLB shootdown is the kernel’s mechanism for invalidating stale TLB entries on other CPUs after a page-table change. When one CPU unmaps, remaps, or changes the permissions of a page, any other CPU that has cached the old translation in its TLB needs to drop it — otherwise that CPU could use a stale, possibly dangling, physical address. Because the TLB is private per-core, the issuing CPU sends an inter-processor interrupt (IPI) to every target core to force a flush.

On Linux, this appears as the flush_tlb_* family of functions and the CAL (function call interrupt) line in /proc/interrupts. Each shootdown is a synchronous stall: the target core stops what it’s doing, handles the IPI, and flushes the requested range (or its entire TLB, depending on the kernel path). On a 64-core system with a memory-intensive process, a full-range shootdown can cost the sending core hundreds of microseconds while it waits for acknowledgments.

Shootdowns come from many sources: munmap, mprotect, madvise(MADV_DONTNEED), huge-page splits, NUMA rebalancing, CoW faults after fork, and more. Memory-unmapping GCs (some JVM configurations) can generate thousands per second.

Mitigations are subtle: lazy TLB flushes (Linux uses them extensively), deferred invalidation where the architecture allows, and avoiding unmap-heavy patterns in latency-critical code. ARM’s TLBI broadcast and Intel’s INVPCID/remote-flush instructions shift some cost into hardware but don’t eliminate it.

related

sources