Improving the throughput of novel cluster computing systems
MetadataShow full item record
Traditional cluster computing systems such as the supercomputers are equipped with specially designed high-performance hardware, which escalates the manufacturing cost and the energy cost of those systems. Due to such drawbacks and the diversified demand in computation, two new types of clusters are developed: the GPU clusters and the Hadoop clusters. The GPU cluster combines traditional CPU-only computing cluster with general purpose GPUs to accelerate the applications. Thanks to the massively-parallel architecture of the GPU, this type of system can deliver much higher performance-per-watt than the traditional computing clusters. The Hadoop cluster is another popular type of cluster computing system. It uses inexpensive off-the-shelf component and standard Ethernet to minimize manufacturing cost. The Hadoop systems are widely used throughout the industry. Alongside with the lowered cost, these new systems also bring their unique challenges. According to our study, the GPU clusters are prone to severe under-utilization due to the heterogeneous nature of its computation resources, and the Hadoop clusters are vulnerable to network congestion due to its limited network resources. In this research, we are trying to improve the throughput of these novel cluster computing systems by increasing the workload parallelism and network I/O parallelism.