Efficient resource sharing for big data applications in shared clusters
MetadataShow full item record
Modern data centers are shifting to shared clusters where the resources are shared among multiple users and frameworks. A key enabler for such shared clusters is a cluster resource management system which allocates resources among different frameworks. One key problem in these shared clusters is how to efficiently share cluster resources between multiple applications and users in an elastic and non-disruptive manner. Current cluster schedulers typically utilize kill-based preemption to coordinate resource sharing, achieve fairness and satisfy SLOs during resource contention by simply killing low priority jobs and restarting them later when resources are available. This simple preemption policy ensures fast service times of high priority jobs and prevents a single user/application from occupying too many resources and starving others; however, without saving the progress of preempted jobs, this policy causes significant resource waste and delays the response time of long running or low priority jobs. The issue of dynamic resource sharing becomes even more problematic when there are different types of applications running on the same cluster (e.g., batch processing systems running alongside real-time streaming systems). Different application types will often have varying quality of service metrics (e.g., higher throughput versus lower latency) which can make resource sharing among these applications contentious. In this dissertation, we show the impact of kill-based preemption in modern shared clusters and propose two solutions to more efficiently share resources in shared cluster environments by utilizing checkpoint-based preemption and supporting elasticity in distributed data stream processing systems.