TLB

Translation Lookaside Buffer — a small on-CPU cache of recent virtual-to-physical page-table translations; a miss triggers a multi-level page walk that is one of the quiet latency costs on large working sets.

also known as translation-lookaside-buffer

stack cpu · memory

The TLB (Translation Lookaside Buffer) is a small hardware cache on each CPU core that stores recent virtual-to-physical page-table translations. When code dereferences a pointer, the CPU first consults the TLB. A hit returns the physical address in one or two cycles. A miss triggers a page walk — the hardware walks the multi-level page table in main memory to find the translation, which costs dozens to hundreds of cycles and can itself miss cache.

Modern x86 CPUs have separate L1 instruction and data TLBs (typically 64–128 entries each) plus a shared L2 TLB (hundreds to thousands of entries). With 4 KB base pages, that gives TLB coverage of a few megabytes — easily exhausted by any workload with a large working set. The hardware also has separate entries for 2 MB and 1 GB huge pages; a single 2 MB TLB entry covers 512x more memory than a 4 KB one, which is the single biggest reason huge pages speed up large-working-set workloads.

TLB misses are a silent tax. They don’t show up in cache-miss counters directly, but they do show up in perf stat as dTLB-load-misses, iTLB-load-misses, and dtlb_walk.completed. On scan-heavy workloads with random access, TLB misses can eat more cycles than cache misses.

Mitigations: huge pages, tight data layouts that reuse pages, and avoiding scattered allocations across a huge virtual address space.

sources