• Login
    View Item 
    •   SMARTech Home
    • Georgia Tech Theses and Dissertations
    • Georgia Tech Theses and Dissertations
    • View Item
    •   SMARTech Home
    • Georgia Tech Theses and Dissertations
    • Georgia Tech Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Neurocube: Energy-efficient programmable digital deep learning accelerator based on processor in memory platform

    Thumbnail
    View/Open
    KIM-DISSERTATION-2017.pdf (3.833Mb)
    Date
    2017-08-17
    Author
    Kim, Duckhwan
    Metadata
    Show full item record
    Abstract
    Deep learning, machine learning algorithm based on artificial neural network, shows great success in numerous pattern recognition problems, such as image recognition or speech recognition. Most of deep learning developments are based on the software platform with general purpose graphic processor units (GPU). In terms of efficiency, however, operating deep learning with GPU is limited by power/thermal budget to be operated in mobile device or high performance computing cluster. In this thesis, I present a programmable and scalable deep learning accelerator based on 3D high-density memory integrated with logic tier. The proposed architecture consists of clusters of processing engines (PEs) and the PE clusters access multiple memory channels (vaults) in parallel. The operating principle, referred to as the memory centric computing, embeds specialized state-machines within the vault controllers of HMC to drive data into the PE clusters. Next version of NeuroCube is designed to improve throughput of global connections (fully connections) in the deep neural network, which is critical in recurrent neural network (RNN). NeuroCube is changed to accelerate deep learning training, which requires additional optimized data flow to improve throughput for both inference and training. For computing gradient, it also supports 32bit fixed point with stochastic rounding to prevent gradient vanishing issue. A programming model and supporting architecture utilizes the flexible data flow to efficiently accelerate training of various types of DNNs. The cycle level simulation and synthesized design in 15nm FinFET shows power efficiency of ~500 GFLOPS/W, and almost similar throughput for a wide range of DNNs including convolutional, recurrent, multi-layer-perceptron, and mixed (CNN+RNN) networks.
    URI
    http://hdl.handle.net/1853/60660
    Collections
    • Georgia Tech Theses and Dissertations [23878]
    • School of Electrical and Computer Engineering Theses and Dissertations [3381]

    Browse

    All of SMARTechCommunities & CollectionsDatesAuthorsTitlesSubjectsTypesThis CollectionDatesAuthorsTitlesSubjectsTypes

    My SMARTech

    Login

    Statistics

    View Usage StatisticsView Google Analytics Statistics
    facebook instagram twitter youtube
    • My Account
    • Contact us
    • Directory
    • Campus Map
    • Support/Give
    • Library Accessibility
      • About SMARTech
      • SMARTech Terms of Use
    Georgia Tech Library266 4th Street NW, Atlanta, GA 30332
    404.894.4500
    • Emergency Information
    • Legal and Privacy Information
    • Human Trafficking Notice
    • Accessibility
    • Accountability
    • Accreditation
    • Employment
    © 2020 Georgia Institute of Technology