false sharing

Two logically independent variables sharing a single cache line, causing coherence traffic between cores on every write and silently capping scaling of otherwise-parallel code.

also known as false-sharing

stack cpu · cache

False sharing happens when two logically independent variables are packed into the same cache line and written by two different cores. The cache coherence protocol (MESI, MOESI, etc.) treats the line, not the variable, as the atomic unit. Every write by one core invalidates the line in the other core’s cache, which then has to fetch it back. Each write pays the latency of a cross-core coherence round-trip — often tens of nanoseconds — even though the writers never touch each other’s data.

The classic example: a per-thread counter array int counters[N]. On x86_64 with 4-byte ints, 16 counters fit on a single 64-byte line. If eight threads each bump their own counter, all of them end up trading the same line back and forth, and throughput collapses to the coherence ping-pong rate.

The fix is usually padding: put each hot counter on its own cache line with alignas(64) or explicit padding bytes. std::hardware_destructive_interference_size (C++17) gives a portable constant for this. Rust’s crossbeam::CachePadded does the same.

How to find it: perf c2c record captures cross-core cache-line transfers and pinpoints the exact source lines causing them. The output shows HITM (hit-modified) events — a dead giveaway for false sharing. Brendan Gregg’s cpu-cache-tlb-contention flamegraphs are another angle.

Common misconception: false sharing only matters on writes. Reads alone don’t invalidate lines — the cost only appears when at least one of the cores writes to its variable.

related

sources