• Adaptive Transaction Scheduling for Transactional Memory Systems 

      Yoo, Richard M.; Lee, Hsien-Hsin Sean (Georgia Institute of Technology, 2007)
      Transactional memory systems are expected to enable parallel programming at lower programming complexity, while delivering improved performance over traditional lock-based systems. Nonetheless, we observed that there are ...
    • A Characterization and Analysis of GPGPU Kernels 

      Kerr, Andrew; Diamos, Gregory; Yalamanchili, Sudhakar (Georgia Institute of Technology, 2009-05-05)
      General purpose application development for GPUs (GPGPU) has recently gained momentum as a cost-effective approach for accelerating data- and compute-intensive applications, pushed to the forefront by the introduction of ...
    • The Design and Implementation Ocelot’s Dynamic Binary Translator from PTX to Multi-Core x86 

      Diamos, Gregory (Georgia Institute of Technology, 2009)
      Ocelot is a dynamic compilation framework designed to map the explicitly parallel PTX execution model used by NVIDIA CUDA applications onto diverse many-core architectures. Ocelot includes a dynamic binary translator ...
    • Power- and area-efficient single SISO architecture of Turbo decoder 

      Lee, Dongwon; Wolf, Wayne (Georgia Institute of Technology, 2009)
      In this paper, we propose a power- and area-efficient architecture of Turbo decoder. In order to improve the nonfunctional performance metrics such as power consumption and area, we use the trade-off method between bit ...
    • Speculative Execution on Multi-GPU Systems 

      Diamos, Gregory; Yalamanchili, Sudhakar (Georgia Institute of Technology, 2009)
      The lag of parallel programming models and languages behind the advance of heterogeneous many-core processors has left a gap between the computational capability of modern systems and the ability of applications to ...
    • Translating GPU Binaries to Tiered SIMD Architectures with Ocelot 

      Diamos, Gregory; Kerr, Andrew; Kesavan, Mukil (Georgia Institute of Technology, 2009)
      Parallel Thread Execution ISA (PTX) is a virtual instruction set used by NVIDIA GPUs that explicitly expresses hierarchical MIMD and SIMD style parallelism in an application. In such a programming model, the programmer ...