A cache-line visualizer
An interactive that lets you feel cache-line behavior: array-of-structs vs struct-of-arrays, false sharing in action, 64-byte alignment, the padding trick — the visceral version of things every systems engineer has memorized.
- hardware
- TBD — x86_64 reference (64-byte cache lines); Apple M-series addendum for 128-byte L2 lines
- kernel
- Linux 6.18 LTS for the benchmark sidecar; visualization itself is browser-only
- compiler
- clang 19, -O3 -march=native for any inline microbenchmark; visualizer is vanilla JS + SVG
- dataset
- synthetic access patterns over {16B, 64B, 4KB} stride arrays; not a corpus benchmark
TL;DR
Every systems engineer “knows” what a cache line is. Most have it as a memorized fact — 64 bytes on x86, false sharing is bad, alignas(64) helps. This post replaces the memorized fact with a visceral one. The Phase 1 version ships a static SVG walkthrough; the Phase 3 full interactive adds drag-and-drop layouts and live MESI animation.
Methodology
| Field | Value |
|---|---|
| CPU reference | TBD — x86_64 baseline (64B lines); Apple M-series addendum (128B L2) |
| Kernel | Linux 6.18 LTS for any sidecar benchmark |
| Compiler | clang 19, -O3 -march=native for the optional inline microbenchmark |
| Dataset | synthetic stride-access arrays; not a corpus benchmark |
| Scope | Phase 1 = static SVG; Phase 3 = full interactive (SPEC §4.4 Tier 4) |
| Repro | lowlat-ms/bench-widgets/cache-line-visualizer |
The question
- Can an interactive convert memorized cache-line facts into visceral intuition?
- If yes, this becomes a long-tail traffic source forever — the canonical page on the topic.
- This post is unusual for lowlat.ms: it’s a visualization-first artifact, not a benchmark post.
Introduction
- Framing: the reader already knows what a cache is; we’re skipping the explainer.
- What this post is not: not a MESI deep-dive, not a “cache for beginners” piece.
- The interactive sits at the top of the page — reader plays before reading.
- Credit where due: Ciechanowski-style explorable explanations, Lemire’s false-sharing posts, Agner Fog’s manuals.
Setup
- Phase 1 ships a static SVG walkthrough with labeled diagrams — no JS required.
- Phase 3 full version: vanilla JS + Canvas, optional Rust→WASM for live measurement.
- Hosted on
bench.lowlat.ms/cache-line-visualizerto establish the bench subdomain pattern. prefers-reduced-motionrespected; static fallback is always visible.
Baseline
- Diagram 1: the 64-byte cache line as a unit of transfer (not a unit of read).
- Diagram 2: array-of-structs layout and which lines each field hits.
- Diagram 3: struct-of-arrays layout and how the same workload touches fewer lines.
- Diagram 4: two threads, two variables, one line — the minimum false-sharing example.
Optimizations
alignas(64)on hot structs to force a fresh line.- The padding trick: pad a hot counter to a full cache line to protect it from neighbors.
- SoA + vectorization: how SIMD loads want contiguous same-type data.
- Optional inline microbenchmark: measure stride-access throughput live in the reader’s browser (Phase 1 stretch goal).
Results
- Not a measurement post. The “results” are the diagrams themselves and the reader’s shift in intuition.
- Phase 3 will add a live hotness counter that turns reader interaction into a visible cache-miss heatmap.
- Success metric: does this become the page people link when teaching cache effects?
Limitations
- Static SVG (Phase 1) cannot show MESI state transitions animated — deferred to Phase 3.
- The visualization is x86_64-first; ARM 128B L2 lines and POWER variations are footnotes.
- Not a full MESI/MOESI simulator; it’s a teaching tool, not a microarchitectural reference.
- Does not cover store forwarding, memory ordering, or cache-coherency protocols in depth.
Reproducibility
- All diagrams are checked-in SVG source — reproducible at build time.
- Phase 3 interactive will ship its source in
lowlat-ms/bench-widgets/cache-line-visualizer. - Any inline microbenchmark has a
justfilerunner so the reader can cross-check on their own hardware.
References
- Ulrich Drepper, What Every Programmer Should Know About Memory — https://people.freebsd.org/~lstewart/articles/cpumemory.pdf
- Agner Fog, Optimization manuals — https://www.agner.org/optimize/
- Intel 64 and IA-32 Architectures Optimization Reference Manual — https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
- Daniel Lemire on false sharing — https://lemire.me/blog/2023/09/04/locality-and-cache-misses/
- Bartosz Ciechanowski, Internal Combustion Engine (style reference for explorable explanations) — https://ciechanow.ski/internal-combustion-engine/
- Mechanical Sympathy mailing list archives — https://groups.google.com/g/mechanical-sympathy