TL;DR

Every systems engineer “knows” what a cache line is. Most have it as a memorized fact — 64 bytes on x86, false sharing is bad, alignas(64) helps. This post replaces the memorized fact with a visceral one. The Phase 1 version ships a static SVG walkthrough; the Phase 3 full interactive adds drag-and-drop layouts and live MESI animation.

Methodology

FieldValue
CPU referenceTBD — x86_64 baseline (64B lines); Apple M-series addendum (128B L2)
KernelLinux 6.18 LTS for any sidecar benchmark
Compilerclang 19, -O3 -march=native for the optional inline microbenchmark
Datasetsynthetic stride-access arrays; not a corpus benchmark
ScopePhase 1 = static SVG; Phase 3 = full interactive (SPEC §4.4 Tier 4)
Reprolowlat-ms/bench-widgets/cache-line-visualizer

The question

  • Can an interactive convert memorized cache-line facts into visceral intuition?
  • If yes, this becomes a long-tail traffic source forever — the canonical page on the topic.
  • This post is unusual for lowlat.ms: it’s a visualization-first artifact, not a benchmark post.

Introduction

  • Framing: the reader already knows what a cache is; we’re skipping the explainer.
  • What this post is not: not a MESI deep-dive, not a “cache for beginners” piece.
  • The interactive sits at the top of the page — reader plays before reading.
  • Credit where due: Ciechanowski-style explorable explanations, Lemire’s false-sharing posts, Agner Fog’s manuals.

Setup

  • Phase 1 ships a static SVG walkthrough with labeled diagrams — no JS required.
  • Phase 3 full version: vanilla JS + Canvas, optional Rust→WASM for live measurement.
  • Hosted on bench.lowlat.ms/cache-line-visualizer to establish the bench subdomain pattern.
  • prefers-reduced-motion respected; static fallback is always visible.

Baseline

  • Diagram 1: the 64-byte cache line as a unit of transfer (not a unit of read).
  • Diagram 2: array-of-structs layout and which lines each field hits.
  • Diagram 3: struct-of-arrays layout and how the same workload touches fewer lines.
  • Diagram 4: two threads, two variables, one line — the minimum false-sharing example.

Optimizations

  • alignas(64) on hot structs to force a fresh line.
  • The padding trick: pad a hot counter to a full cache line to protect it from neighbors.
  • SoA + vectorization: how SIMD loads want contiguous same-type data.
  • Optional inline microbenchmark: measure stride-access throughput live in the reader’s browser (Phase 1 stretch goal).

Results

  • Not a measurement post. The “results” are the diagrams themselves and the reader’s shift in intuition.
  • Phase 3 will add a live hotness counter that turns reader interaction into a visible cache-miss heatmap.
  • Success metric: does this become the page people link when teaching cache effects?

Limitations

  • Static SVG (Phase 1) cannot show MESI state transitions animated — deferred to Phase 3.
  • The visualization is x86_64-first; ARM 128B L2 lines and POWER variations are footnotes.
  • Not a full MESI/MOESI simulator; it’s a teaching tool, not a microarchitectural reference.
  • Does not cover store forwarding, memory ordering, or cache-coherency protocols in depth.

Reproducibility

  • All diagrams are checked-in SVG source — reproducible at build time.
  • Phase 3 interactive will ship its source in lowlat-ms/bench-widgets/cache-line-visualizer.
  • Any inline microbenchmark has a justfile runner so the reader can cross-check on their own hardware.

References