AVX2

Intel's 256-bit SIMD instruction set extension, the practical sweet spot for SIMD on modern x86 before AVX-512 frequency and availability issues bite.

also known as avx2 · vpmuludq

stack cpu

AVX2 (Advanced Vector Extensions 2) is Intel’s 256-bit SIMD instruction set, introduced on Haswell in 2013 and now available on essentially every x86 CPU shipped this decade that matters for performance work. It widened AVX’s integer operations to 256 bits, added gather instructions, fused multiply-add (FMA3), and variable-length shifts — closing most of the functional gap with scalar integer code.

In 2026, AVX2 is the practical sweet spot for SIMD on x86:

Universally available — anything from Haswell forward, including every modern server CPU, supports it.
No frequency penalty — unlike AVX-512 on some Intel parts, AVX2 doesn’t cause wide-license downclocking.
Good lane width — 8x float32, 4x float64, 32x int8, 16x int16, 8x int32 per instruction.
Well-tuned libraries — Highway, simdjson, hnswlib, and every serious numerical library ship AVX2 fast paths.

Typical use cases that show up in lowlat.ms territory:

HNSW distance math (L2, inner product) over 768-dim vectors — 16 dimensions per instruction.
Hash table probing via vectorized compare-and-match.
String search and JSON parsing (simdjson, Agrep, Hyperscan).
Bitmap operations in columnar databases.

When to step up to AVX-512: wider lanes for dense linear algebra, mask registers for branch-free code, and the VPERM family for in-register shuffles. When to step down to SSE4: when you need to run on older hardware or your kernel is bounded by something other than vector width (rare in 2026).

related

sources