QPI / UPI
Intel's inter-socket interconnect (QuickPath Interconnect, replaced by UltraPath Interconnect on Skylake-SP and later) carrying coherence traffic and remote memory access between CPU sockets.
QPI (QuickPath Interconnect) and its successor UPI (UltraPath Interconnect) are Intel’s point-to-point interconnects between CPU sockets on multi-socket servers. They carry three kinds of traffic between sockets: cache coherence messages, remote memory requests/responses, and I/O. AMD’s equivalent is Infinity Fabric (and formerly HyperTransport).
Skylake-SP introduced UPI, replacing QPI with more lanes and higher per-link bandwidth. Sapphire Rapids and Emerald Rapids use UPI 2.0. A typical modern two-socket Xeon has 2–3 UPI links between the sockets, each delivering on the order of 20–25 GB/s in each direction.
Why it matters for latency:
- Remote memory access on a NUMA system traverses the UPI fabric. The extra hop adds ~40–50 ns to load latency compared to local DRAM.
- Cross-socket cache coherence — a line in Modified state on the remote socket has to be forwarded across UPI, adding to cross-core latency.
- UPI bandwidth is the bottleneck on memory-bound multi-socket workloads. Two sockets, each with 8 memory channels, can absorb more local bandwidth than UPI can carry between them — so cross-socket data movement caps aggregate throughput.
Observability: perf stat -e uncore_upi_* counters (on Intel server parts) report UPI traffic; Intel’s PCM (Performance Counter Monitor) tool surfaces per-link utilization. On AMD, perf stat -e df_* exposes Infinity Fabric counters.