Data- and communication-centric approaches to model and design flexible deep neural network accelerators
Kwon, Hyouk Jun
MetadataShow full item record
Deep neural network (DNN) accelerators, which are specialized hardware for DNN inferences, enabled energy-efficient and low-latency DNN inferences. To maximize the efficiency (energy efficiency, latency, and throughput) of DNN accelerators, DNN accelerator designers optimize DNN accelerator and mapping of target DNN models on the accelerator. However, designing DNN accelerators for recent DNN models that contain diverse layer operations and size is challenging since optimizing accelerator and mapping for the average case of the layers in target DNN workloads often lead to uniformly inefficient design points. Therefore, this thesis proposes to design flexible mapping DNN accelerators that can run multiple mappings to adapt to diverse DNN layers in DNN workloads. This thesis first quantifies the costs and benefits of mapping using a data-centric approach. Based on the observation that no single mapping is ideal for all layers, this thesis explores two approaches to design flexible mapping accelerators: reconfigurability and heterogeneity. Reconfigurable accelerators are based on communication-centric approach that implements flexible network-on-chip (NoC) to enable to configure accelerator at runtime for any mapping styles. Heterogeneous accelerators employ multiple sub-accelerators with fixed but diverse mapping styles within an accelerator chip to provide coarser-grained flexibility with lower area and power cost than the reconfigurability. Case studies show that both approaches provide Pareto-optimal design points with different strengths.