DNN-Dataflow- Hardware Co-Design for Enabling Pervasive General-Purpose AI
MetadataShow full item record
The development of supervised learning based DL solutions today is mostly open loop. A typical DL model is created by hand-tuning the neural network (NN) topology by a team of experts over multiple iterations, often by trial and error, and then trained over gargantuan amounts of labeled data over weeks at a time to obtain a set of weights. The trained model hence obtained is then deployed in the cloud or at the edge over inference accelerators (such as GPUs, FPGAs, or ASICs). This form of DL breaks in the absence of labelled data, and/or if the model for the task at hand is unknown, and/or if the problem keeps changing. An AI system for continuous learning needs to have the ability to constantly interact with the environment and add and remove connections within the NN autonomously, just like our brains do. In this talk, we will briefly present our research efforts towards enabling general-purpose AI. First, we will present GeneSys, a HW-SW prototype of an Evolutionary Algorithm (EA)-based learning system, that comprises of a closed loop learning engine called EvE and an inference engine called ADAM. EvE is a genetic algorithm accelerator that can "evolve" the topology and weights of NNs completely in hardware for the task at hand, without requiring hand-optimization or back propogation training. ADAM continuously interacts with the environment and is optimized for efficiently running the irregular NNs generated by EvE, which today's suite of DL accelerators and GPUs are not optimized to handle. Next, we focus on the challenge of mapping a DNN model (developed via supervised or EA-based methods) efficiently over an accelerator (ASIC/GPU/FPGA). DNNs are essentially multi-dimensional loops, with millions of parameters and billions of computations. They can be partitioned in myriad ways to map over the compute array. Each unique mapping, or "dataflow" provides different trade-offs in terms of throughput and energy-efficiency, as it determines overall utilization and data reuse. Moreover, the right dataflow for a DNN depends heavily on the layer type, input activation to weight ratio, the accelerator microarchitecture, and its memory hierarchy. We will present an analytical tool called MAESTRO that we have been developing in collaboration with NVIDIA for formally characterizing the performance and energy-impact of dataflows in DNNs today. MAESTRO can be used at design-time, for providing quick first-order metrics at design-time when hardware resources (buffers and interconnects) are being allocated on-chip, and compile-time when different layers need to be optimally mapped for high utilization and energy-efficiency. Finally, we will present the micro-architecture of an open-source DNN accelerator called MAERI that is equipped to adaptively change the dataflow depending on the DNN layer currently being mapped by levering a runtime reconfigurable interconnection fabric.
- CRNCH Summit