DMA
Direct Memory Access — a device reads or writes system memory over the I/O fabric without CPU involvement; the mechanism behind NVMe, NICs, and GPU data transfer.
DMA (Direct Memory Access) is the mechanism by which a peripheral device reads from or writes to main memory without involving the CPU on each byte. The device holds a descriptor pointing to a physical (or IOMMU-translated) address in host memory; the PCIe root complex (or historical DMA controller) routes the transfer, and the CPU is notified via an interrupt (or polled flag) when the transfer completes.
Every modern high-throughput device depends on DMA. NVMe SSDs DMA data blocks to/from page cache or user-registered buffers. NICs DMA packets into ring buffers that are then consumed by the driver. GPUs DMA tensors over PCIe or NVLink. Without DMA, the CPU would be reduced to copying bytes one at a time — a non-starter at modern speeds.
Linux’s DMA API (dma_map_single, dma_alloc_coherent, etc.) is the driver-side contract for declaring DMA mappings and synchronizing cache state. The IOMMU (Intel VT-d, AMD-Vi, Arm SMMU) sits between the device and memory, translating device-issued addresses through page tables — essential for virtualization and for device isolation.
Key performance facts:
- DMA crosses cache: on x86 the hardware snoops the CPU caches, so a DMA write into a buffer that a core has cached invalidates the cache copy. DMA-to-cacheline coordination shows up in memory ordering and performance.
- IOMMU is not free: translations can miss in the IOTLB and require page walks of the IOMMU tables. High-throughput workloads benefit from large IOMMU pages and from
iommu=pt(passthrough) in trusted environments. - Registered buffers (io_uring, DPDK, SPDK) pin memory for DMA ahead of time, skipping per-op map/unmap cost.