cache line
The unit of data transfer between a CPU's caches and main memory; 64 bytes on x86_64 (and most modern Arm), 128 bytes on Apple M-series L2. The atomic unit of coherence traffic.
A cache line is the unit of data that CPUs move between caches and main memory, and between caches on different cores. On x86_64 and most modern Arm it is 64 bytes; Apple M-series L2 uses 128-byte lines; some IBM POWER variants use 128 bytes as well. Reads and writes of a single byte pull (or invalidate) the entire enclosing line.
Because the cache line — not the byte — is the atomic unit of coherence traffic, the line is also the unit at which performance bugs like false sharing manifest. Two variables that happen to land on the same line, written by two different cores, cause the line to bounce between those cores’ caches regardless of the variables’ logical independence. The fix is usually padding or alignment: alignas(64) on the hot struct, or inserting padding bytes to push neighbors into their own line.
Cache lines also drive data-layout decisions. Array-of-structs (AoS) scatters each field across many lines; struct-of-arrays (SoA) packs each field contiguously, so a pass that touches only one field pulls fewer lines. For SIMD kernels this is often the difference between memory-bound and compute-bound.
Things to know:
- A cache-line load always pulls the full line, even if you read one byte — hence “locality of reference” pays off.
- The line’s home in L3 vs a sibling core’s L1 determines coherence latency (local hit vs cross-core snoop).
perf c2cexposes cross-core cache-line transfers; it is how you find false-sharing hotspots.