Carleton University
Technical Report TR-13-02
June 28, 2013

On Optimal Budget-Driven Scheduling Algorithms for MapReduce Jobs in the Heterogeneous Cloud

Yang Wang & Wei Shi

Abstract

In this paper, we consider task-level scheduling algorithms with res-pect to budget and deadline constraints for a bag of MapReduce jobs on a set of provisioned heterogeneous (virtual) machines in cloud platforms. Heterogeneity is manifested in the ”pay-as-you-go” charging model we use, where service machines with different performance have different service rates. We organize the bag of jobs as a κ-stage workflow and achieve, for specific optimization goals, the following results. First, given a total monetary budget Bj for a particular stage j, we propose a greedy algorithm for distributing the budget, with minimal stage execution time as our goal. Based on the structure of this problem, we further prove the optimality of our algorithm in terms of the budget used and the execution time achieved. We then combine this algorithm with dynamic programming techniques to propose an optimal scheduling algorithm that obtains a minimum scheduling length in O(κB2). The algorithm is efficient if the total budget B is polynomially bounded by the number of tasks in the MapReduce jobs, which is usually the case in practice. Second, we consider the dual of this optimization problem to minimize the cost when the (time) deadline of the computation D is fixed. We convert this problem into the standard multiplechoice knapsack problem via a parallel transformation. Our empirical studies verify the proposed optimal algorithms.

TR-13-02.pdf