branch predictor

CPU hardware that speculates the outcome of upcoming branches so the pipeline can keep fetching instructions without waiting for the condition; mispredictions cost a full pipeline flush.

also known as branch-prediction · BPU

stack cpu

A branch predictor is a hardware unit that guesses the outcome of conditional branches (and the target of indirect branches) before they resolve, so the CPU’s front-end can keep fetching and decoding instructions down the predicted path. A correct prediction is invisible. A misprediction means the CPU flushes all the speculatively-issued work on the wrong path and restarts from the correct target — a penalty of ~15–20 cycles on modern x86, often more.

Modern predictors are sophisticated: TAGE-style predictors use many history lengths and tags to pick the best one per branch, indirect-branch predictors handle virtual calls and jump tables, and return-stack buffers handle function returns. Under favorable conditions, modern CPUs predict well above 95% of branches correctly.

For low-latency code:

Branchy hot paths (many data-dependent conditionals) lose predictor state to aliasing and context switches.
Branchless techniques (conditional moves, bitmask selects, SIMD blends) replace hard-to-predict branches with straight-line code. Worth it when the branch is unpredictable (~50/50).
__builtin_expect / [[likely]] / [[unlikely]] hint the compiler about which path is hot, influencing basic block ordering so the hot path is fall-through.
perf stat branch counters (branch-misses, branches) quantify the hit rate.

Misprediction storms are a known source of tail latency. Rare decisions on the hot path (error paths, degenerate inputs) cost disproportionately when they happen.

related

sources