Techniques to mitigate performance impact of off-chip data migrations in modern GPU computing
Abstract
In response to growing compute and memory capacity requirements, modern systems are equipped to distribute the work over multiple GPUs and pool the memory from the host and other GPUs transparently. Compute capacity scales out with multiple GPUs, and the memory capacity afforded by the host is an order of magnitude larger than the GPUs’ device memory. However, both these approaches require data to be migrated over the system interconnect during program execution. Since migrating data over the system interconnect takes much longer than a GPU’s internal memory hierarchy, the efficacy of these approaches in achieving high performance is strongly dependent on the data migration overhead. This dissertation proposes several techniques that help mitigate this data migration overhead.